// dev.blog  —  jltdpxl
<< back to posts

The Token Ledger: Why Efficiency Is the Next Engineering Metric

Using AI tokens is a lot like ordering tapas. Each plate is so reasonable. A few of these, a couple of those — it barely registers as a decision. You're not splurging; you're just working. Prompt, response, iterate. Another context window, another agent run, another pass at the problem. Each one feels like a small thing.

Then the bill comes.

And I think the implications for how we think about engineering work — who should be doing what kind of work with which tools — are bigger than most people are treating them.

// Token Maxing and the Reckoning

There is a term gaining traction in certain corners of the developer community: token maxing. It describes what happens when individuals or teams run large, complex, deeply iterative sessions with frontier models — the kind of multi-hour, multi-agent, context-heavy work that agentic coding tools make possible. The per-session costs in these workflows aren't pocket change. They can run into hundreds of dollars for a single sustained session. Multiply that across a team, across a week, and the number stops being an infrastructure expense and starts being a budget line.

The data points are starting to surface. Uber burned through their entire 2026 AI budget in four months after rolling out Claude Code to roughly 5,000 engineers in December 2025. By April, monthly API costs were running $500 to $2,000 per engineer, 70% of code commits were AI-driven, and the COO was publicly asking whether any of it was actually worth it. Not because engineers were being reckless — because that's what serious AI adoption at scale genuinely costs. The budget was gone before summer.

Then there's Microsoft, which made news for canceling internal Claude Code licenses after token-based billing made the service too expensive to justify at scale. The irony is almost too much. Microsoft is the company that gave us the term "dogfooding" — the practice of using your own products internally to make them better. They built Copilot. They are Copilot's largest potential enterprise advocate. And yet the costs of a competing tool were significant enough that their own teams were pulling back. If the company that sells the product can't absorb the bill for the other guy's product, it's a signal worth taking seriously.

Most businesses are not built to absorb these numbers. Most projects don't justify them. And yet the default posture right now is to hand engineers access to the most capable frontier models, encourage them to use AI aggressively, and think about the cost later. The cost later is arriving.

This isn't an argument against using powerful models. It's an argument for being deliberate about when and why you do.

// Two Engineers, Two Very Different Bills

Consider a scenario that I think is coming to a lot of engineering teams within the next year. You have two engineers on the same team. One is your top committer — shipping features, architecting systems, closing the hard tickets. Their monthly AI spend: five thousand dollars. The other engineer spends most of their time on bugs. Routine maintenance. The kind of work that has historically been handled by junior engineers who are still getting their footing. Their monthly AI spend: ten thousand dollars.

In the old model, you would have looked at this and said: hire a junior engineer to handle the bugs, free up the senior engineer's time, keep costs down. That calculus was simple because the cost inputs were simple. The senior engineer's salary was the variable; the junior work was cheap to staff for. Now there's a new variable in the equation, and it doesn't distribute evenly across the kinds of work people do.

When AI spend becomes visible and attributable — and it will — someone in leadership is going to start asking what they're getting for that ten thousand dollars a month. And the answer "bug fixes that a junior engineer with a smaller model could have handled at a tenth of the cost" is going to produce a very uncomfortable meeting.

// The Mismatch Between Capability and Task

Part of the issue is that we haven't really developed the discipline to match model capability to task requirement. Right now, most organizations hand everyone access to the same frontier model and let them run. It's the path of least resistance. It avoids the complexity of tiering, the friction of provisioning different tools for different roles, the headache of explaining to a product manager why they have to use a different interface than the engineers.

But consider what a product manager is actually doing with an LLM most of the time. Writing PRDs. Drafting specs. Summarizing research. Cleaning up stakeholder emails. These are language tasks — real, valuable, worth doing well — but they do not require the same capabilities that an engineer needs when they're asking a model to reason through a distributed systems problem or debug a non-obvious memory issue across a complex codebase.

The marketer writing product copy is doing something different still. And the support analyst classifying tickets. And the recruiter drafting job descriptions. All of them are using tokens. None of them necessarily need the same model.

Right now, the frontier models are subsidizing all of this work. In the near future, the question of whether that subsidy is justified for each class of work is going to get a much harder look.

// The Coming Stratification

What I expect to see — and am already starting to see the early signals of — is a tiering of AI tooling by job function and by task class. Not because companies want to create a two-tier system, but because the economics will push them there.

Open source models have gotten remarkably capable. Running a well-tuned local model on company hardware for a use case that doesn't require frontier reasoning isn't a compromise anymore — for a lot of tasks, it's the right engineering decision. The marginal cost of that token drops to near zero. The model runs on infrastructure you already own. The data doesn't leave your environment. For copy generation, document summarization, ticket routing, HR drafts — the case for running this locally is starting to close.

Frontier models still win on the hard problems. The ambiguous architectural decision. The novel debugging scenario. The research synthesis that requires holding ten competing ideas in tension and finding the thread. These are the tasks where the capability premium is justified because the output quality difference actually changes outcomes.

The organizations that figure out how to route work to the right level of the stack — and charge the cost appropriately — are going to have a structural advantage over the ones that just spray frontier tokens at everything.

// Why Engineers Are Positioned Well Here

I have written before about why I think engineers aren't going away. I want to add a layer to that argument that I think the token economy makes sharper.

Engineers think about efficiency. Not as a performance, not because they were told to — because it's baked into how good engineering is done. You don't write a function that re-queries the database five times if once will do. You don't pull a full data set when you need a row count. You instrument, you measure, you look for where the cost is going and whether the output justifies it. This is not a new instinct. It is one of the core cognitive patterns of the discipline.

That instinct is about to become extremely relevant to a problem that most of the rest of the organization is not equipped to think about carefully. Token spend is an optimization problem. It has inputs, outputs, cost functions, and tradeoffs. Engineers already know how to hold that kind of problem in their heads.

The engineers who start applying that lens to their own AI usage — measuring what they're spending, understanding what's actually producing value, developing intuition for when a cheaper model is sufficient and when it isn't — are building a skill that is going to matter a lot as token costs become a first-class metric.

// Token Efficiency as a Professional Skill

I think we are somewhere between six and twelve months away from token efficiency being a thing that shows up in engineering performance conversations. Not as a punishment metric — "you spent too much, you're flagged" — but as a genuine signal of professional craft. The same way we talk about clean code, or test coverage, or system design: as evidence that someone has internalized the full cost of the thing they're building and is making deliberate decisions about it.

The engineer who can do in five thousand dollars what their peer is doing in ten thousand is demonstrating something real. Not just frugality — judgment. They know what level of capability the task requires. They know how to structure a prompt to get the needed output without burning context on noise. They know when to break a large problem into smaller ones that run cheaper. They know when the quick answer from a smaller model is enough and when it isn't.

These are not trivial skills. They're also not particularly mysterious ones for someone who already thinks carefully about resource utilization and system efficiency. They are, in a real sense, just the engineering discipline applied to a new resource.

// What Comes Next

The organizations that handle this well will build classification frameworks for their AI usage. Not rigid bureaucracies, but practical shared understanding: here are the task types that warrant frontier models, here are the ones that can run on something cheaper, here is how we think about the tradeoff when it's ambiguous. This is the same kind of conversation mature engineering organizations have had about cloud infrastructure tiers, about caching strategies, about make-versus-buy. The vocabulary is new. The shape of the problem is not.

For individual engineers, the implication is: start paying attention now. Not to the absolute cost — that varies too much by tool and model — but to the pattern. What kinds of work are you reaching for the heaviest tool to do? Which of those actually required it? Where did a lighter touch produce the same result? The instinct for this doesn't come from reading about it; it comes from noticing as you go.

The token ledger is coming. It isn't a threat to the engineers who already think in terms of efficiency. For them, it's just another metric to optimize — and one where their instincts happen to be exactly the right shape for the problem.