Serverless agent

Building an Agent is not easy.

A year ago, we needed to manage context and memory ourselves, think about RAG and vector databases, and very carefully adjust tool use accuracy. The emergence of Claude Code / Claude Agent SDK changed all this. Now more and more teams are starting to directly build Agents using Claude Agent SDK + VM. So has the problem become simpler? No.

Running Claude Code in VMs is still a very challenging thing.

Every user session needs an independent VM. In some scenarios, the persistence capability of VMs can be useful, but it also makes VMs difficult to upgrade and orchestrate. For example, from the user's perspective, they hope that software installed on the VM will still be available months later. But the parts on the VM that communicate with the orchestration layer also need to be able to roll out upgrades.
Creating and orchestration VMs at large scale. A single k8s cluster supports orchestration for about a few thousand pods, which is sufficient for application service clusters. But it's not quite enough for scenarios where every session needs to start a claude code pod.
Using k8s solutions faces the latency brought by k8s, making it hard to achieve performance within 1 second.
Using e2b solutions requires doing persistence yourself, and uploading files / downloading files are both very slow.
The e2b solution cannot meet specific network environment requirements, and private deployment is cumbersome.

How do we solve these problems? Let's learn from history. When the calendar turned back to over a decade ago, in 2012, there was no serverless in the world, no k8s, and no docker. Backend engineers needed to manually maintain a service process on a physical machine to run a service. At that time, ssh and ansible were standard tools for server engineers. Service processes heavily depended on various tool environments on the physical machine. Achieving consistent tool environment configurations across physical machines was very challenging, and large-scale scaling was a major undertaking that teams needed to prepare for far in advance. I still remember when an engineer changed one machine's environment and forgot to sync it to other machines' environments, causing a very difficult to troubleshoot ghost bug in production. That period was a very painful old time.

In 2012, docker appeared. Docker packaged all tool environments into one single image, solving the environment inconsistency problem. The community gradually realized that a service process running in a virtual machine on a physical machine could be managed like a process on the physical machine. But docker orchestration still required engineering teams to build an orchestration system. And once the service process inside docker became coupled with the physical machine's network and file system, orchestration would become a very painful thing.

In 2014, stateless services started to become the mainstream trend. That year saw the birth of Kubernetes and AWS Lambda. They each described the future of stateless services from different angles. Stateless services meant that service logic avoided coupling with network addresses and no longer locally depended on persistent file systems.

Then came the 10 years of cloud native, the wave of stateless services gave birth to a large batch of cloud native infrastructure, such as:

Service gateways Istio/Linkerd/Envoy/Treafik
CI/CD platforms Argo CD / Flux
Observability platforms Prometheus/Grafana/ELK Stack
Call chain analysis Jaeger/Zipkin/OpenTelemetry
Storage Rook/MinIO/JuiceFS

And we are now at a new beginning. Ten years ago, a compute unit was a java/python api service process. Starting this year, this compute unit has become a coding agent.

ACFS is like the era of manually maintaining service processes on physical machines. E2B is like Docker. Without task orchestration platforms, observability platforms, and distributed state persistence storage capabilities, scalability is still a hard thing for coding agents. And the Serverless Agent Infra I expect has these capabilities:

Able to appropriately persist the coding agent's sessions and working directories without losing context.
Super fast, able to get a coding agent working within 100ms.
Has Logging / Metric / Tracing for Agent behavior that is easy for humans to understand, making it easier for humans to understand and adjust Agent behavior patterns.
Developers no longer need to care about details like virtual machines, containers, and other ops details, making large-scale orchestration very easy.

I think all the infrastructure that can meet these capabilities is already ready, for example firecracker has validated the performance and stability of microVMs, and there are also many choices for distributed observability platforms and file storage systems. But as a developer who has experienced using e2b to encapsulate claude code, integrating these together is still very difficult. For example, when claude code in E2B is unexpectedly interrupted by OOM, it's difficult for engineers to monitor this, and even harder to get the scene and logs at that time.

I hope developers after 2026 won't need to go through such a painful process, just like most engineers in 2026 no longer need to use ssh to manually initialize physical machines.

This is VM0's dream. I hope one day I can drive a coding agent through an ergonomic API, like:

cosnt agent = vm0.run({
	framework: 'claude-code',
	prompt: 'buy me a coffee'
})

Serverless agent

Related Articles

VM0 dev workflow: Managing AI agents like a team

How to automatically test AI agent skills against rapidly changing APIs

Automate your AI agents with VM0 GitHub actions integration

Stay in the loop