Our product runs an AI assistant that talks to small-business owners over WhatsApp. Each inbound message triggers a model call (sometimes two) that reads the conversation, answers, and decides what to do next. It works. But running an AI agent in production raises a question most software never has to ask: what does one conversation actually cost?
Not the monthly model bill — that's easy to read off an invoice and useless for decisions. The number we wanted was smaller and sharper: this conversation, with this customer, about this request — what did the model cost us to handle it?
The number was already in the response
Here's the slightly embarrassing part: we were throwing the answer away.
Every call to the model comes back with usage metadata — how many tokens went in, how many came out. We were reading the reply text off the response and discarding the rest. For our setup, pricing it is straightforward arithmetic: input tokens times the input rate, output tokens times the output rate. (For the model we use that's roughly $0.30 per million tokens in and $2.50 per million out — the exact rates matter less than capturing them at all.)
So the first move wasn't to measure something new. It was to stop discarding a measurement we were already being handed on every call. One small function: read the token counts, multiply by the rate, return a {tokens, cost} dict alongside the reply. It was sitting in plain sight — we were just dropping it.
Put the cost on the work, not in a dashboard
The easy thing would be to log each call's cost and sum it in a metrics tool. But we didn't want a chart — we wanted the number attached to the thing it was spent on.
In our system the unit of work isn't the user, and it isn't the chat thread — it's the ticket: one intent, one piece of work. "Qualify this lead." "Make this change." "Set up a domain." A single customer can open several over time. So we accrue each call's cost onto the active ticket — a running total, incremented in the database as the conversation proceeds.
That choice — cost per intent, not per person — is the one we'd defend. "This customer has cost us $X over their lifetime" is too broad to act on. "This kind of request costs about this much to handle" is an operating number — it tells you which flows are expensive, which are cheap, and whether a feature pays for itself. Tracking on the ticket gives you that, and it means every new ticket starts at zero. The cost becomes a property of the work, not something you have to go assemble later.
Freeze it when the work is done
The last piece is the one we like most. When a ticket resolves — the lead's qualified, the change is logged, the bot or a human marks it done — we fold the final cost into the ticket's audit trail. The same event that records "resolved" also records "LLM cost $0.0018, 3 calls."
It doesn't matter who closes it — the bot, an operator, a cleanup job. The cost rides along automatically, because it's stamped at the moment of the state change rather than computed after the fact. Months later, a resolved ticket still carries exactly what it cost to handle, sitting right next to what happened — no dashboard to cross-reference, no number that's drifted out of date.
turn 1 → 812 in / 96 out → ticket total $0.0005
turn 2 → 1,240 in / 210 out → ticket total $0.0016
turn 3 → 540 in / 60 out → ticket total $0.0018
resolved · LLM cost $0.0018 (3 calls) ← written into the audit trail
What we took away
The instinct with cost is to reach for a billing dashboard — a big aggregate number, far from any decision. The more useful version is small and local: capture the cost the model already reports, attach it to the unit of work, and freeze it when the work ends. Then "what did that cost?" stops being a query you run. It's already written down, on the thing itself.