Subreddit Voice Mining

Scenario

Your brand wants to sound like the community it serves. This job pulls hot posts and comment threads, concatenates organic language into a corpus, then feeds it to Mavera’s Brand Voice endpoint as extracted_content. The output is a “community voice” that captures vocabulary, rhythm, humor, and register.

Architecture

Code

import os, requests, time

auth = requests.auth.HTTPBasicAuth(os.environ["REDDIT_CLIENT_ID"], os.environ["REDDIT_CLIENT_SECRET"])
tk = requests.post("https://www.reddit.com/api/v1/access_token", auth=auth,
    data={"grant_type": "client_credentials"},
    headers={"User-Agent": os.environ["REDDIT_USER_AGENT"]}).json()["access_token"]
RD = "https://oauth.reddit.com"
RD_H = {"Authorization": f"Bearer {tk}", "User-Agent": os.environ["REDDIT_USER_AGENT"]}
MV, MV_BASE = os.environ["MAVERA_API_KEY"], "https://app.mavera.io/api/v1"
MV_H = {"Authorization": f"Bearer {MV}", "Content-Type": "application/json"}

SUB = "skincare"

# 1. Pull hot posts + top comments
r = requests.get(f"{RD}/r/{SUB}/hot", headers=RD_H, params={"limit": 25, "raw_json": 1})
r.raise_for_status()
posts = [p["data"] for p in r.json()["data"]["children"] if p["data"].get("selftext") or p["data"].get("title")]

corpus = []
for post in posts:
    corpus.append(f"POST: {post['title']}\n{post.get('selftext','')[:500]}")
    cr = requests.get(f"{RD}/comments/{post['id']}", headers=RD_H,
        params={"limit": 15, "depth": 2, "sort": "top", "raw_json": 1})
    if cr.status_code == 429:
        time.sleep(int(cr.headers.get("X-Ratelimit-Reset", 60))); continue
    for c in (cr.json()[1]["data"]["children"] if len(cr.json()) > 1 else []):
        body = c.get("data", {}).get("body", "")
        if body and body not in ("[removed]", "[deleted]"):
            corpus.append(f"COMMENT: {body[:400]}")
    time.sleep(0.7)

text = "\n\n---\n\n".join(corpus)
print(f"Corpus: {len(corpus)} fragments, {len(text):,} chars from /r/{SUB}")

# 2. Create brand voice
bv = requests.post(f"{MV_BASE}/brand-voices", headers=MV_H, json={
    "name": f"Community Voice: /r/{SUB}",
    "extracted_content": text,
    "description": f"Voice from {len(posts)} hot posts and comments on /r/{SUB}.",
}).json()
print(f"Brand Voice: {bv['id']} — Traits: {bv.get('traits', bv.get('voice_summary', 'N/A'))}")

# 3. Test generation
test = requests.post(f"{MV_BASE}/generations", headers=MV_H, json={
    "brand_voice_id": bv["id"],
    "prompt": f"Write a short product announcement for a new moisturizer, using /r/{SUB} community voice. Under 150 words.",
}).json()
print(f"\n{test.get('output', test.get('content', ''))[:500]}")

const creds = btoa(`${process.env.REDDIT_CLIENT_ID}:${process.env.REDDIT_CLIENT_SECRET}`);
const tk = await fetch("https://www.reddit.com/api/v1/access_token", {
  method: "POST", headers: { Authorization: `Basic ${creds}`, "User-Agent": process.env.REDDIT_USER_AGENT,
    "Content-Type": "application/x-www-form-urlencoded" }, body: "grant_type=client_credentials",
}).then(r => r.json());
const RD = "https://oauth.reddit.com";
const RD_H = { Authorization: `Bearer ${tk.access_token}`, "User-Agent": process.env.REDDIT_USER_AGENT };
const MV = process.env.MAVERA_API_KEY, MV_BASE = "https://app.mavera.io/api/v1";
const MV_H = { Authorization: `Bearer ${MV}`, "Content-Type": "application/json" };

const SUB = "skincare";

// 1. Hot posts + comments
const posts = (await (await fetch(`${RD}/r/${SUB}/hot?limit=25&raw_json=1`, { headers: RD_H })).json())
  .data.children.map(p => p.data).filter(d => d.selftext || d.title);

const corpus = [];
for (const post of posts) {
  corpus.push(`POST: ${post.title}\n${(post.selftext || "").slice(0, 500)}`);
  const cr = await fetch(`${RD}/comments/${post.id}?limit=15&depth=2&sort=top&raw_json=1`, { headers: RD_H });
  if (cr.status === 429) { await new Promise(r => setTimeout(r, 60000)); continue; }
  for (const c of ((await cr.json())[1]?.data?.children || [])) {
    const body = c.data?.body || "";
    if (body && body !== "[removed]" && body !== "[deleted]") corpus.push(`COMMENT: ${body.slice(0, 400)}`);
  }
  await new Promise(r => setTimeout(r, 700));
}

const text = corpus.join("\n\n---\n\n");
console.log(`Corpus: ${corpus.length} fragments, ${text.length.toLocaleString()} chars from /r/${SUB}`);

// 2. Brand voice
const bv = await fetch(`${MV_BASE}/brand-voices`, { method: "POST", headers: MV_H,
  body: JSON.stringify({ name: `Community Voice: /r/${SUB}`, extracted_content: text,
    description: `Voice from ${posts.length} hot posts and comments on /r/${SUB}.` }),
}).then(r => r.json());
console.log(`Brand Voice: ${bv.id} — Traits: ${bv.traits || bv.voice_summary || "N/A"}`);

// 3. Test
const test = await fetch(`${MV_BASE}/generations`, { method: "POST", headers: MV_H,
  body: JSON.stringify({ brand_voice_id: bv.id,
    prompt: `Write a short product announcement for a new moisturizer, /r/${SUB} voice. Under 150 words.` }),
}).then(r => r.json());
console.log(`\n${(test.output || test.content || "").slice(0, 500)}`);

Example Output

Corpus: 312 fragments, 89,421 chars from /r/skincare
Brand Voice: bv_reddit_skincare_7k2m — Traits: Conversational, ingredient-aware,
  self-deprecating humor, supportive. Uses abbreviations (HG, YMMV).

Ok so I've been testing this new moisturizer for about 3 weeks and I think
we might have a new HG situation? Texture is gel-cream hybrid — absorbs fast,
doesn't pill under sunscreen. Ingredient list solid: ceramides, centella,
niacinamide. No fragrance. YMMV but at this price point it's worth a shot.

Error Handling

Deleted content

Reddit replaces removed/deleted comments. The code filters these. If corpus is small, switch to /top?t=month.

Rate limits

Monitor X-Ratelimit-Remaining and X-Ratelimit-Reset. Code sleeps on 429s.

Private subreddits

Return 403. Quarantined subs require quarantine: true header.

​Scenario

​Architecture

​Code

​Example Output

​Error Handling

Reddit Integration

Brand Voice

Scenario

Architecture

Code

Example Output

Error Handling