We had a cache in front of a paid API. It couldn't hit — not once, for weeks — and nothing told us. We were quietly paying twice for the same work.
The cache sat in our logo pipeline, in front of an image-generation API. Each call costs a few cents, and the same business often regenerates with the same inputs — a retry, a re-run, a tweak elsewhere in the flow. So we cached it: a small decorator that memoises a function's result to disk, keyed on its arguments. Same inputs in, cached image out, no second API call. The docstring even spelled out the intent: "the cache key does not include the job ID — results are shared across jobs."
The bug: one argument that's unique every call
The decorator did the obvious, general thing: hash every argument to build the key. That's the right default — a cache that silently ignores an argument is far scarier than one that keys on too much.
But the function's first parameter was job_id — a fresh UUID for every run. The docstring said the ID wasn't part of the key; the decorator had no way to know that. So every call hashed to a brand-new key:
key = sha256({
"job_id": "4f2c81…", ← unique every call — poisons the key
"mode": "new",
"name": "Acme Bakery",
"colors": ["#1b5e20", "#aaff00"],
"source": "9c41e2…", ← content hash — what SHOULD key it
})
Each run got its own bucket, wrote its result, and never read anyone else's. The keys made a cross-run hit impossible by construction — a cache directory steadily filling with entries that would never be read again.
Why nothing caught it
A broken cache is a strange kind of bug because a cache fails open: defeated, it still returns correct results — just at full price. Almost everything else in a system fails loudly — a wrong query returns wrong data, a broken send throws. Here, every output was right. Every test passed. The clearest symptom was cost: each run a few cents more than it should have been, a number far too small to stand out on a bill.
It surfaced in a code review, not in monitoring — someone read the decorator, read the docstring's claim, and noticed the wrapper couldn't possibly honour it. The docstring described the design; the code shipped something else; and no test pinned the difference.
There was a second bug hiding behind the first, and it's the part that stuck with us: the cached value included raw image bytes, which the JSON serializer was quietly mangling into a string. If the cache had ever hit, the read would have returned corrupted data. Because no read ever happened, that path was never exercised — the 0% hit rate was concealing a 100% corruption rate.
The fix: key on what shapes the output, carry the rest
The repair was a split. The public function keeps its signature and owns the per-call side effects — writing the image into this job's folder, recording metadata. The cached inner function does only the expensive API round-trip, and its arguments are exactly the things that shape the output: mode, business name, style hints, colours, and a content hash of the source image. The job ID never reaches it.
One wrinkle: the inner function still needs the source image bytes to make the API call, but bytes in the key would bloat it — and their identity is already captured by the content hash sitting next to them. So the decorator learned one convention: an argument whose name starts with an underscore is passed through to the function but excluded from the key.
@cacheable
def _generate_cached(mode, name, style, colors, source_hash, *, _source_bytes=None):
... # the API call — keyed on everything except _source_bytes
And because the original bug was invisible by nature, the regression test asserts the one thing that matters: call the function twice with two different job IDs and identical inputs, and the API client mock must record exactly one call — while both jobs still get their own file on disk.
What we took away
Two rules of thumb came out of this:
- A cache key should contain what shapes the output — nothing more, nothing less. Anything per-call (IDs, timestamps, destination paths) splits identical work into separate entries; anything missing serves stale results. If a large input is already represented by a hash in the key, pass the input itself separately.
- Test the hit, not just the result. A cache is one of the few components where "all outputs correct" proves nothing. The only honest test counts calls to the expensive thing underneath.
And one habit: when a docstring makes a claim about behaviour ("X is not part of the key"), that claim is a test waiting to be written. Ours sat in prose for weeks, being false the whole time.