AI everywhere? The cost of the token has already been chosen for you

Digital sovereignty: the real test begins on day 2

Generative AI enters the token economy: costs, ROI, carbon and governance become the real decision criteria.

On June 1, 2026, GitHub Copilot is switching all of its plans to pay-per-use billing. For IT departments, this is not another price update: it is the end of a choice. The “unlimited access to AI for a flat-rate subscription” model has come to an end. Each token consumed will now be measured, re-invoiced, constrained by a monthly credit. And with this shift collapses the illusion of free AI, deployable everywhere without arbitration. The token economy is now taking over.

This shift coincides with a series of converging signals that CIOs can no longer put into perspective. On May 22, 2026, Fortune revealed, supported by internal Microsoft sources, that the agentic use of generative AI was, in certain workflows, more expensive than a human developer. Amazon has dubbed the phenomenon “toxenmaxx”, Meta “Claudeonomics”. Uber’s CTO publicly admitted to having exhausted its entire 2026 “AI tools for developers” budget in four months. Goldman Sachs, for its part, anticipates a 24-fold increase in the volume of tokens consumed by 2030, to reach around 120 quadrillion per month. On the supply side, Microsoft is now managing an AI capex at an annualized run-rate of around $150 billion, including $25 billion directly linked to the increase in the cost of memories and chips. The idea of ​​an AI whose marginal cost would mechanically tend towards zero was buried by its own operators.

The wall is not technological, it is economical

A study published in spring 2026 by Microsoft Research quantifies precisely what is changing. An agentic coding task (an agent that reads a codebase, modifies multiple files, runs tests, iterates) consumes up to a thousand times more tokens than a simple code chat. And, above all, this cost is intrinsically unpredictable: two executions of the same task can diverge by a factor of 30.

As for the rate cards themselves, they are less stable than they appear. Anthropic has maintained its nominal prices with the release of Opus 4.7, but the model’s new tokenizer can produce up to 35% more tokens for identical input text. The bill goes up without any price increasing. The result is mechanical: “putting AI everywhere” is no longer a choice of ambition, but of financial exposure. For an executive committee, this is a risk of budgetary drift of a new type: opaque for management control, difficult to control for purchases, and dangerously absent from the classic financial models of an IT department.

The illusion of productivity

Added to this economic pressure is a truth that publishers have long underestimated. The DORA 2025 report published by Google Cloud, based on nearly 5,000 responses from software professionals, confirms that 90% of developers now use AI on a daily basis, and that 80% of them report an individual productivity gain. But organizational indicators (deployment frequency, lead time, production failure rate, recovery time) remain generally stable. Worse: delivery stability is negatively correlated with AI adoption in organizations whose processes were not mature before the tool was introduced. Gartner, for its part, anticipates that more than 40% of agentic AI projects will be abandoned before the end of 2027, due to lack of clear ROI, governance and integration.

DORA probably delivers the most accurate formula: AI is an amplifier. It multiplies the strengths of well-equipped organizations and accelerates the dysfunctions of others. In other words, the “accelerating” effect that publishers sell does not exist in absolute terms: it only exists if the environment is ready to capture it. Deploying AI in a degraded value chain means paying more to produce more quickly what was already not working before.

The new equation: tokens, carbon, team

This is where the conversation about AI needs to shift. An AI project is no longer measured by its licensing cost, or even by its token cost alone. It is measured by the sum of three variables that are now inseparable.

First, the cost of inference. It is a variable to be provisioned as industrial consumption and not as a SaaS subscription. This requires detailed observability of tokens consumed by use case, by team, by project, and a FinOps discipline specific to AI.

Next, the carbon footprint. The International Energy Agency estimates that data centers will consume 945 TWh in 2030, compared to 415 TWh in 2024, or more than 1.7% of global electricity production, which is accelerating. ADEME observes that data centers already account for 46% of the carbon footprint of French digital technology in 2025, compared to 16% in 2020. AFNOR has published a frugal AI benchmark which is no longer optional: it becomes the minimal tool for framing a corporate AI project. Carbone 4 points out that a generative query consumes ten to fifteen times more than a traditional web search, an order of magnitude which renders any “all-AI, no arbitration” policy obsolete.

Finally, the cost and time of the human team. The DORA report is unambiguous: the time it takes for an organization to actually learn to exploit AI should be budgeted as an investment in its own right, not absorbed by reported individual productivity. Believing that an agentic assistant compensates for an immature organization is magical thinking.

These three variables add up. They do not compensate each other.

Industrialize: moving from “everywhere” to “in the right place”

The key point is to abandon the uniform extension reflex – AI everywhere – to move to an allocation logic. In practice, five questions make it possible to arbitrate each use case before signing a contract or generalizing a POC.

Is the volume sufficient to amortize the investment? Generating unit tests on a legacy asset of several million lines, classifying 50,000 ITSM tickets per month, continuously translating a multi-country backlog: yes, AI pays for itself quickly. Brainstorm a strategy on a niche market that the team visits twice a year: no, the cost/value ratio is not tenable and the risk of hallucination is not compensated.

Is the necessary model flagship or frugal? Sort tickets into five categories, extract the clauses from a standard contract, summarize a meeting report: a small model hosted internally is enough for a few cents per session. Summarize a 300-page call for tenders in four languages ​​to produce a qualification file: a flagship model is justified but for the file, not for each intermediate review.

Does the task tolerate asynchrony? Reconciling a product repository every night, generating a thousand descriptive sheets for a catalog, scoring CVs during off-peak hours: batch processing halves the cost at Anthropic as at OpenAI. Serving a recommendation in an e-commerce checkout or an assistant in a call center: real time is required, and therefore the full price, and therefore should only be used where the value really justifies it.

Is agentic orchestration really necessary? An autonomous agent that reads the code base, modifies multiple files, runs tests, and iterates: that’s powerful, and it’s what consumes up to a thousand times more tokens than a code chat, according to Microsoft Research. However, line-by-line completion or a conversational exchange with a senior developer often delivers 80% of the value for 1% of the cost. Agentic mode should be reserved for tasks for which the return of iterative effort is demonstrated, not applied by default.

What is the cost per session, compared to the business value actually created? Not a marketing average but an observed cost, case by case. How much does assisted writing of a sales email cost, and how much does it really make? How much does an agent who completes 50 rounds on a bug cost, and how much would the same developer have cost on the same task, with the same probability of success? Without this measurement, the AI ​​ROI remains a belief, not a management.

These five questions, asked systematically, in practice outline four areas in an AI portfolio: profitable uses to industrialize (high value, controlled cost), uses to frugalize (the value is there but the model is oversized), uses to monitor (marginal ROI, real carbon footprint), and uses to stop without qualms. Until this mapping is done, talking about “AI strategy” is an abuse of language.

Industrializing AI therefore does not mean deploying it everywhere. It means deploying it where the cost/value/carbon asymmetry is demonstrable, and instrumenting it to prove it. The organizations that will be able to formalize this arbitration (observability of tokens, FinOps AI, governance, clear RACI between Data, IT and Businesses) will emerge from the fog. Others will discover, like Uber, that we can consume in four months what we had budgeted for twelve.

Rigor as a new competitive advantage

The token economy is not bad news. It puts AI back in its rightful place: a powerful, expensive, constrained lever, which must be managed with as much rigor as an industrial capex. It also places the debate on digital sovereignty on tangible bases. A frugal European model, hosted in Europe, can now be compared to an American flagship on measurable criteria (cost per task, latency, carbon footprint) and no longer just on promises of raw performance.

The debate is no longer “do we have the right to use AI?”. It is: “do we have the discipline to deploy it where it really pays off”, financially, ecologically and organizationally. This discipline will probably be what will distinguish mature IT departments and general management from others in the next eighteen months.

Leave a Reply

Your email address will not be published. Required fields are marked *