Perspectives

The token party is over

At comparable capability, completing the same task can cost twenty times more on one model than another. I treat model choice as a financial control, and I think your CFO should too.

Published
2 July 2026
Updated
3 July 2026
Read
3 min read

For two years, enterprise AI spend was small enough to hide inside software budgets. That period is ending: agent workloads consume by the token, token consumption scales with use, and organisations that scaled before instrumenting are discovering the bill the hard way. By mid-2026 even the most AI-forward companies had moved to per-user spend caps.

The underlying economics are easy to state and widely missed. On published prices as at mid-2026, models of comparable capability differ in what they charge to complete the same task by a factor of twenty or more, and the extremes run wider. Capability and cost have decoupled: paying more no longer reliably buys better output for a given task.

Seats and tokens are different bills

The first budgeting error I see is treating an AI subscription like software. A seat fee is a licence to access; tokens are the metered fuel the work actually burns. A flat plan is a throughput bet that pays off only above a usage threshold; metered pricing scales with use in both directions. Most organisations hold both without knowing which workloads sit on which, and the reconciliation between what you pay per person and what the work costs per task is exactly the number nobody tracks.

The second error is assuming usage is uniform. It never is. Token consumption across users spans orders of magnitude, and one heavy user, usually doing something entirely legitimate, can consume a department's monthly budget in days. Without per-user visibility, the first sign is the invoice.

Cost is a governance column

I run cost as one of the four columns every AI deployment stands on, alongside model, security, and guardrails. The discipline is unglamorous:

  1. Profile the workload before choosing the plan: tasks per day, tokens per task, latency tolerance.
  2. Pin the default model per task type. Route everyday work to the model that clears the quality bar at the lowest cost per task; reserve frontier capacity for work that needs it.
  3. Set per-user caps before scale, not after the first bill.
  4. Instrument cost per task, so every workflow's unit economics stay visible as models and prices move underneath it.

The fourth step is the one that compounds. Prices change monthly; a workflow instrumented for cost per task re-optimises with a routing change. A workflow priced by assumption re-optimises with a crisis.

The question for the board

Ask for one number: what does our most common AI-carried task cost, and what would it cost on the cheapest model that clears our quality bar? If the answer exists, cost is governed. If it does not, the gap between those two numbers is being paid monthly, and nobody has decided to pay it.


Sources

  • Published model pricing, June 2026 (Anthropic, OpenAI, Google, DeepSeek published rates): cost-per-task spreads of roughly 20× at comparable capability tiers; wider at the extremes.
  • TechCrunch, June 2026: Uber introduces per-employee AI spend caps (~US$1,500/user), reported as part of a broader enterprise shift to metered AI cost controls.

Book a scoping conversation.

A free working conversation on where you stand, and the first move that fits.

Book a scoping conversation