80% of AI projects fail to scale. Resistance to change, data, skills… what if the real cause was simpler: the wrong tool?
80% of AI projects do not scale. This figure circulates in all articles, in all conferences, like an inevitability. It is attributed to resistance to change, data quality, lack of skills. These reasons exist. But they mask a simpler, more structural cause, about which we talk little: we massively used the wrong tool.
The LLM misunderstanding
The same scenario comes up again and again in companies in the insurance sector: a POC launched with a large language model, ChatGPT, Mistral, Gemini, etc., months of work, mobilized teams, and in the end a project that does not survive in production. Too many errors, spiraling costs, data that ultimately cannot leave the infrastructure of these Cloud giants for security policy reasons.
LLMs are remarkable tools. For producing content, synthesizing documents, doing consulting or marketing, they are unbeatable. But they are designed to be universal. A large model is trained on billions of parameters, for millions of different users. It cannot, structurally be trained on the data of a specific organization.
However, automating claims management, invoice processing or customer relations in insurance is not a general problem. It’s a problem of precision, repeatability, conformity. These are exactly the use cases where LLMs fail, and where Small Language Models, specialized models, trained on the company’s own data and processes, can achieve up to 99% accuracy.
Let’s take a concrete example: the disaster declaration. In a manual process, the insured calls, an advisor explains to him that he will receive an email, that his documents will have to be sent back, and that processing will take two to three weeks. With an SLM trained on the insurer’s data and business rules, the same claim can be processed in real time: the agent responds to the insured, immediately sends the appropriate form and checks the conformity of the documents the second they arrive. A generalist LLM simply cannot achieve this level of precision and automation, because they have not been trained for it.
Sovereignty is not an option
In regulated sectors, a CIO’s first question is not “does it work?”. It’s “does our data stay with us?”.
A cloud LLM cannot answer yes. An SLM deployed on-premises can, because its frugality makes it economically possible. These specialized models are several hundred times less resource intensive than a large model. They can run on the company’s servers, with a hardware infrastructure at a reasonable cost. No data passes outside, the model lives in the insurer’s infrastructure.
For a company subject to the GDPR, this is not a commercial argument. This is a prerequisite. The major accounts in the sector have also understood this well: before even talking about use cases or return on investment, security teams require certifications, ISO 27001, HDS for health data, which attest that no sensitive data can leak to the outside world. This is the entry ticket. And this is precisely what has been missing until now: a technology light enough to be sovereign without sacrificing performance.
Controlled budget: the silent advantage of SLM
There is a third advantage, regularly raised by the financial departments of insurers, and which is nevertheless conspicuous by its absence in the debates on the adoption of AI: the predictability of costs.
LLMs accessed via API charge in tokens, a unit whose volume depends directly on the size of the processed texts, reflections and generated text. This usage-based billing is difficult to anticipate: a project’s budget can explode between the testing phase and production, simply because actual volumes exceed initial estimates.
On the other hand, an SLM deployed on-premise entirely escapes this logic. The cost of the service depends on the material infrastructure installed at the insurer, and allows perfect predictability, depreciable, defensible in COMEX, from the moment we know the volumes passing through and invoicing is based on the actual volume. In a sector where budgetary control is a discipline as rigorous as risk management, this predictability is a decisive argument, and yet almost invisible in comparisons.
Many AI projects have not yet reached the critical dimension, not because they took the wrong direction but the wrong tool. Universal solutions have been proposed for problems that require specialization, sovereignty and predictability. These are exactly the three things that a large language model cannot offer, and that regulated sectors cannot do without.
The insurers who are starting to scale up today are not those who have chosen the most powerful models. These are those who have chosen the models best suited to their real constraints. It’s a simple lesson, but it took a few years of failed experimentation to learn it.