march 4, 20262 min read

redis before hero architecture

docgen was burning tokens on identical questions. cache-aside fixed it — 35% lower inference cost, sub-second repeats, zero heroics.

docgen generates documentation from your codebase using an llm. first version: every request hit the model. perfect for demos. expensive for production. same repo, same branch, same question — we paid for the same answer twice, then a third time, then wondered why the bill hurt.

the fix wasn't a smarter model. it was redis and a pattern so boring it has a name: cache-aside. check cache first. on miss, compute. store with a ttl. return. next identical request — redis answers in milliseconds and the model sleeps.

cache-aside in four lines

  1. read from cache
  2. on miss, compute
  3. write to cache with ttl
  4. return

everything else is key design and expiry policy.

what goes in the key

not just user id. for docgen:

  • repo hash — code changed? cache miss. fresh docs.
  • file path — scope the answer
  • prompt version — change the prompt? don't serve yesterday's tone

forget prompt version and you serve stale answers that look confident. that's worse than slow.

typescript
const key = `docs:${repoHash}:${path}:${promptVersion}`;

const cached = await redis.get(key);
if (cached) return JSON.parse(cached);

const docs = await generateDocs(repo, path);
await redis.set(key, JSON.stringify(docs), 'EX', 60 * 60 * 24);

return docs;
tip

version your prompts in the key

PROMPT_VERSION = 'v2' in a constant beats debugging why tuesday's answers differ from wednesday's.

when to cache — and when not to

good fit

  • repeated expensive reads
  • derived summaries and aggregations
  • rate-limit counters
  • session-ish state with a ttl

bad fit

  • one-off admin exports
  • user-specific secrets
  • anything with legal or safety stale risk
  • balances that must be exact every millisecond

we cut repeated inference cost by ~35% on real usage. not because redis is magic — because most users don't change their repo every five minutes. they re-run docs while iterating on the same branch. cache hits are a product feature disguised as infrastructure.

ttl is a product decision

expiry isn't just infra. it's how wrong you're willing to be, for how long.

  • short (minutes) — near-live data, higher cost
  • medium (hours) — docgen default for completed runs
  • long (days) — stable public content, manual invalidation on deploy

shorten ttl too aggressively → cost goes up. lengthen without prompt versioning → quality complaints go up. tune both together.


my order now:

  1. make it correct
  2. measure what's actually slow
  3. cache the hot path
  4. document how entries expire
  5. hero architecture waits until you know what hurts
· · ·
olderindexes saved our admin dashboardnewerrbac is not a checkbox