redis before hero architecture

docgen generates documentation from your codebase using an llm. first version: every request hit the model. perfect for demos. expensive for production. same repo, same branch, same question — we paid for the same answer twice, then a third time, then wondered why the bill hurt.

the fix wasn't a smarter model. it was redis and a pattern so boring it has a name: cache-aside. check cache first. on miss, compute. store with a ttl. return. next identical request — redis answers in milliseconds and the model sleeps.

cache-aside in four lines

read from cache
on miss, compute
write to cache with ttl
return

everything else is key design and expiry policy.

what goes in the key

not just user id. for docgen:

repo hash — code changed? cache miss. fresh docs.
file path — scope the answer
prompt version — change the prompt? don't serve yesterday's tone

forget prompt version and you serve stale answers that look confident. that's worse than slow.

typescript

const key = `docs:${repoHash}:${path}:${promptVersion}`;

const cached = await redis.get(key);
if (cached) return JSON.parse(cached);

const docs = await generateDocs(repo, path);
await redis.set(key, JSON.stringify(docs), 'EX', 60 * 60 * 24);

return docs;

when to cache — and when not to

good fit

repeated expensive reads
derived summaries and aggregations
rate-limit counters
session-ish state with a ttl

bad fit

one-off admin exports
user-specific secrets
anything with legal or safety stale risk
balances that must be exact every millisecond

we cut repeated inference cost by ~35% on real usage. not because redis is magic — because most users don't change their repo every five minutes. they re-run docs while iterating on the same branch. cache hits are a product feature disguised as infrastructure.

ttl is a product decision

expiry isn't just infra. it's how wrong you're willing to be, for how long.

short (minutes) — near-live data, higher cost
medium (hours) — docgen default for completed runs
long (days) — stable public content, manual invalidation on deploy

shorten ttl too aggressively → cost goes up. lengthen without prompt versioning → quality complaints go up. tune both together.

my order now:

make it correct
measure what's actually slow
cache the hot path
document how entries expire
hero architecture waits until you know what hurts

cache-aside in four lines

read from cache
on miss, compute
write to cache with ttl
return

everything else is key design and expiry policy.

what goes in the key

not just user id. for docgen:

repo hash — code changed? cache miss. fresh docs.
file path — scope the answer
prompt version — change the prompt? don't serve yesterday's tone

forget prompt version and you serve stale answers that look confident. that's worse than slow.

typescript

const key = `docs:${repoHash}:${path}:${promptVersion}`;

const cached = await redis.get(key);
if (cached) return JSON.parse(cached);

const docs = await generateDocs(repo, path);
await redis.set(key, JSON.stringify(docs), 'EX', 60 * 60 * 24);

return docs;

when to cache — and when not to

good fit

repeated expensive reads
derived summaries and aggregations
rate-limit counters
session-ish state with a ttl

bad fit

one-off admin exports
user-specific secrets
anything with legal or safety stale risk
balances that must be exact every millisecond

ttl is a product decision

expiry isn't just infra. it's how wrong you're willing to be, for how long.

short (minutes) — near-live data, higher cost
medium (hours) — docgen default for completed runs
long (days) — stable public content, manual invalidation on deploy

shorten ttl too aggressively → cost goes up. lengthen without prompt versioning → quality complaints go up. tune both together.

my order now:

make it correct
measure what's actually slow
cache the hot path
document how entries expire
hero architecture waits until you know what hurts