Skip to main content

Scenario

Your brand wants to sound like the community it serves. This job pulls hot posts and comment threads, concatenates organic language into a corpus, then feeds it to Mavera’s Brand Voice endpoint as extracted_content. The output is a “community voice” that captures vocabulary, rhythm, humor, and register.

Architecture

Code

import os, requests, time

auth = requests.auth.HTTPBasicAuth(os.environ["REDDIT_CLIENT_ID"], os.environ["REDDIT_CLIENT_SECRET"])
tk = requests.post("https://www.reddit.com/api/v1/access_token", auth=auth,
    data={"grant_type": "client_credentials"},
    headers={"User-Agent": os.environ["REDDIT_USER_AGENT"]}).json()["access_token"]
RD = "https://oauth.reddit.com"
RD_H = {"Authorization": f"Bearer {tk}", "User-Agent": os.environ["REDDIT_USER_AGENT"]}
MV, MV_BASE = os.environ["MAVERA_API_KEY"], "https://app.mavera.io/api/v1"
MV_H = {"Authorization": f"Bearer {MV}", "Content-Type": "application/json"}

SUB = "skincare"

# 1. Pull hot posts + top comments
r = requests.get(f"{RD}/r/{SUB}/hot", headers=RD_H, params={"limit": 25, "raw_json": 1})
r.raise_for_status()
posts = [p["data"] for p in r.json()["data"]["children"] if p["data"].get("selftext") or p["data"].get("title")]

corpus = []
for post in posts:
    corpus.append(f"POST: {post['title']}\n{post.get('selftext','')[:500]}")
    cr = requests.get(f"{RD}/comments/{post['id']}", headers=RD_H,
        params={"limit": 15, "depth": 2, "sort": "top", "raw_json": 1})
    if cr.status_code == 429:
        time.sleep(int(cr.headers.get("X-Ratelimit-Reset", 60))); continue
    for c in (cr.json()[1]["data"]["children"] if len(cr.json()) > 1 else []):
        body = c.get("data", {}).get("body", "")
        if body and body not in ("[removed]", "[deleted]"):
            corpus.append(f"COMMENT: {body[:400]}")
    time.sleep(0.7)

text = "\n\n---\n\n".join(corpus)
print(f"Corpus: {len(corpus)} fragments, {len(text):,} chars from /r/{SUB}")

# 2. Create brand voice
bv = requests.post(f"{MV_BASE}/brand-voices", headers=MV_H, json={
    "name": f"Community Voice: /r/{SUB}",
    "extracted_content": text,
    "description": f"Voice from {len(posts)} hot posts and comments on /r/{SUB}.",
}).json()
print(f"Brand Voice: {bv['id']} — Traits: {bv.get('traits', bv.get('voice_summary', 'N/A'))}")

# 3. Test generation
test = requests.post(f"{MV_BASE}/generations", headers=MV_H, json={
    "brand_voice_id": bv["id"],
    "prompt": f"Write a short product announcement for a new moisturizer, using /r/{SUB} community voice. Under 150 words.",
}).json()
print(f"\n{test.get('output', test.get('content', ''))[:500]}")

Example Output

Corpus: 312 fragments, 89,421 chars from /r/skincare
Brand Voice: bv_reddit_skincare_7k2m — Traits: Conversational, ingredient-aware,
  self-deprecating humor, supportive. Uses abbreviations (HG, YMMV).

Ok so I've been testing this new moisturizer for about 3 weeks and I think
we might have a new HG situation? Texture is gel-cream hybrid — absorbs fast,
doesn't pill under sunscreen. Ingredient list solid: ceramides, centella,
niacinamide. No fragrance. YMMV but at this price point it's worth a shot.

Error Handling

Reddit replaces removed/deleted comments. The code filters these. If corpus is small, switch to /top?t=month.
Monitor X-Ratelimit-Remaining and X-Ratelimit-Reset. Code sleeps on 429s.
Return 403. Quarantined subs require quarantine: true header.

Reddit Integration

Brand Voice