Spotify has developed an in-house code agent built on top of Anthropic’s Agent SDK. The company was thus able to industrialize its code production.
At the house of Spotifycode migrations that once took months and hundreds of developers are now completed in three days by a single engineer. By introducing code agents based onartificial intelligencethe group has managed to industrialize its code production on a large scale. A feat for a company of this size, with more than 3,000 engineers around the world. To achieve this, the Swedish group has developed its own code agent, adapted to its needs. A success also the result of several years of good practices, shared during Code with Claude in London where the JDN was present.
A code base that grows 7x more than the staff
For years, at Spotify, the code base grew faster than its engineering workforce (up to 7 times faster). Mechanically, the time spent maintaining existing code ended up devouring time spent creating new features. Rather than suffer, Spotify then developed an operational framework: Fleet Shift. The principle is simple. An engineer wrote the transformation rule once and Fleet Shift then automatically applied it, via scripts, to the thousands of repositories involved.
The approach worked perfectly for simple changes. But when it was necessary to apply complex or contextual changes, Fleet Shift was simply unable to make them. It is precisely this limit that led Spotify towards another approach, capable of adapting on a case-by-case basis: the code agent.
Honk: a code broker based on the Anthropic SDK
The idea of a code agent germinated early, even before the arrival of Claude Code. As the first LLMs appeared, the Spotify teams realized: rather than writing deterministic scripts to modify the code, why not entrust the task to a language model? The beginnings were difficult. “The models were just too stupid, and so was the way we did it,” admits Niklas Gustavsson, chief architect and VP of engineering at Spotify. Over the course of iterations, the engineers eventually identified the right patterns, while the models themselves gained capabilities. From this work Honk was born.
Honk relies on Claude via theAnthropic Agent SDKwhich Spotify wraps in its own infrastructure. Why not have chosen Claude Code directly? The challenge for Spotify was quite different: running hundreds of agents in parallel, autonomously and planned, across thousands of repositories, without human intervention. This is precisely what the in-house infrastructure around the Agent SDK allows. Each agent runs in a Kubernetes pod, which makes it possible to launch a large number of agents in parallel. Honk can only use a set of tools validated upstream by Spotify. Among them, verification tools that allow him to test his work himself. Once the code has been modified, Honk launches a build in the group’s continuous integration (CI) environment to ensure that nothing is broken.
Initially designed for mass migrations, Honk was quickly hijacked by the developers themselves. Very quickly, some people understood that they could invoke it directly in the manner of Claude Code. The movement was such that Spotify ended up packaging the tool for this interactive use. The group thus announced Honk V2, a version integrated into its agent orchestrator. With this new version, several developers can now share the same agent session, like Google Docs applied to Claude, and work together on a common objective.
Rapid migrations, prototyping for all professions
With Honk V2, the last major update of Java on the group’s backend, historically a project lasting several months involving hundreds of teams, was completed in three days. But the most direct effect lies elsewhere: the code is no longer the bottleneck for teams today. Developing an app prototype previously required mobilizing developers for days or even weeks. “Anyone now starts building these prototypes for the ideas they have in mind, including, as we eventually found out, one of our CEOs,” says Niklas Gustavsson, chief architect & VP of engineering at Spotify.
But this acceleration created a direct setback, an avalanche of pull requests to reread. Spotify saw its PR frequency jump by 76%. Spotify’s response today is based on one principle: reserve human judgment where it really counts. Concretely, changes deemed sufficiently secure are validated and merged automatically, without human proofreading. The human review is then focused on high-stakes modifications. A logic that allows the group to maintain a rate of around 4,500 daily production deployments.
Code standardization, a prerequisite for AI?
Tools, auto-merge, safeguards… one might believe that everything depends on the agent’s tools. But Niklas Gustavsson emphasizes an even more important success factor with AI: the consistency of the code base itself. “If Claude has a lot of code around him and that code is generally consistent, Claude will do a better job. That’s what we observe,” explains the chief architect and VP of engineering at Spotify. Consistency is at stake at all stages: the same technological stack and the same design patterns from one component to another. In the most fragmented areas of their code, teams see the agent lose performance.
What to remember from the Spotify case? Three factors stand out:
- a logical and coherent code base which gives the agent reliable benchmarks,
- the right tools in the right place to provide the right context
- a shift in human judgment towards the tasks that really matter.
With will and good practices, any company of comparable size can also scale its code production using AI agents. It only lacks one thing, but it is decisive: a frame.