What if AI could code, correct its own mistakes, and work for days without supervision? This is the direction that Anthropic is aiming for. On the sidelines of Code with Claude, its developers conference in London, the JDN interviewed Angela Jiang and Katelyn Lesse, head of product and head of engineering of the Claude Platform.
JDN. Claude Code, Cowork, the desktop app: you multiply the products that seem to overlap. What is the logic of this segmentation rather than a unified tool?
Angela Jiang. All the products you mention use Claude. In many ways, you could think of them like this: I wouldn’t exactly call them wrappers, but they all rely on slightly different harnesses and slightly different interfaces, because they’re designed for distinct use cases. The common point between them, beyond the model, is the platform on which they are built: they operate on a common set of tool APIs, harnesses, etc. What changes from one product to another are the system prompts, optimized for different uses and associated with different tools.
The intention for many of these products is to offer the right form factor to each type of user that is important to us. Claude Code is obviously primarily for engineers. Cowork is for non-engineers, although engineers can of course use it. But using a terminal is not necessarily easy. Each of these products is therefore a different form factor above Claude, designed to meet the needs of its users.
What is the difference in usage between Claude Code in the CLI and Claude Code in the desktop application?
Katelyn Lesse. The difference is that Claude Code in CLI runs locally on your computer: it manipulates the files directly on your machine. With the desktop application, it’s the same, but you can also send remote agents to go to work. In this case, we launch sandboxes in the background that execute the work outside your local machine, which allows you to take control later. That’s really the difference: the desktop application lets you do both, but the basic idea is the distinction between local execution and remote execution.
You release many new tools and features every week, if not every day. How does your team manage to deliver at such a pace?
Katelyn Lesse. Our team is very AI-pilled (convinced by AI to the point of making it a daily reading grid, editor’s note). We’re all really into building solutions for ourselves, using our own tools, and pushing the boundaries of what these technologies can actually accomplish. And there is a whole spectrum.
At one end, you have the engineers who are perhaps just starting out: they prompt Claude to fix a bug, he makes a mistake, and they go back and forth. At the other end, you have those who launch dozens or even hundreds of agents at the same time. And if you get there, it’s because you’ve spent a lot of time and energy setting up the right setup: really good testing, a real ability to enter a self-verification loop, because you can test yourself in the right environment. So a lot of our pace is that our teams have really pushed to the end of that spectrum, where you automate and push the boundaries as much as possible.
What frontier capabilities are you working on at the moment? Many mention the infinite context, agents in continuous autonomy… What is your roadmap?
Angela Jiang. We’re going to continue to push the boundaries to get really good at managing context. It’s a sort of permanent construction site. We also continue to progress on autonomy: you have obviously heard us do it with each model, and we want to bring Claude to the point where he is, literally, perpetually autonomous.
Then there are a few other abilities that we pay attention to. Computer use, for example. To return to the basic principle, we want Claude to be useful, productive and helpful. Computer use is very useful, especially in companies that have complex, sometimes crude UIs that you simply cannot automate. If Claude is smart enough to do it, that’s great. I really care about these really obscure interfaces: think of Bloomberg terminals, which are usually very sophisticated and unwieldy, or EHR systems. These are very complex environments, and we are leading the AI to become more and more efficient in this, while making it faster.
The last part concerns Claude’s ability to get out of trouble alone: learning from his mistakes, becoming more and more capable of managing them. This can rely on different primitives, such as memory. There are several mechanisms to achieve this, but on a larger scale, the fact that Claude can sort of self-learn and improve is also really key.
To achieve these new capabilities, do you need more compute?
Angela Jiang. We almost always need more compute, yes. But part of the gains can be obtained at the harness level, without additional infrastructure costs. Over our last two generations of models, we have found that a high-performance model, coupled with a well-designed harness, allows us to extract much more value from this software layer than is generally imagined. Historically, the relationship was unfavorable: for a given model, it was necessary to invest massively in the harness to obtain only a marginal benefit. The dynamic has reversed. As the models become more powerful, a harness of intermediate size is sufficient: as long as it is well constructed, the performance gain becomes significantly greater. This is a particularly interesting property that we have highlighted.
Beyond the cyber aspect which justifies its confinement, Mythos is a general model that is much more powerful than Opus. What new capabilities will it bring to businesses?
Angela Jiang. Mythos is a continuation of our work on Claude. Two points stand out: its endurance, with our best result to date on the METR benchmark, which confirms our focus on autonomy; and its excellent coding performance.