VM0 dev workflow: Managing AI agents like a team

In the VM0 dev team, every developer works with multiple Claude Code instances at the same time. Usually more than eight.

We treat Claude Code the same way we treat a real developer. (Yes, our company is half-jokingly called AI Colleagues Co!)

Because of that, the design philosophy behind the VM0 dev workflow mirrors classic team management practices in software engineering.

We use GitHub Issues to track work, Pull Requests for code review and merging, and GitHub Actions to handle automation. Over two months, this setup helped us ship 404 releases and write more than 230,000 lines of code.

This post explains how we made that workable, and why the key problem was never AI capability, but human coordination.

AI-Powered dev workflow in practice

When you coordinate many AI agents in parallel, the bottleneck isn’t whether the model can write code. The real bottleneck is human cognitive load

This workflow consists of 14 slash commands, organized into three layers: Deep Dive, Issue Management, and PR Management.

Let’s first look at what my workflow looks like and how a feature usually gets built.

Requirement alignment

A human opens a Claude session and starts with /deep-research. Claude gathers facts from the codebase, documentation, and relevant context. We discuss the findings and align on what problem we are actually solving.
Solution exploration

Using /deep-innovate, Claude proposes several possible directions, with trade‑offs. We discuss, narrow down, and choose a direction.
Issue creation

We create a GitHub issue using /issue-create. The human reviews the issue to make sure requirements are clearly captured.
Planning and approval

Use /issue-plan to let Claude continue the work. Claude will automatically run the full deep-dive workflow and post the results to the issue, including:
1. findings from /deep-research
2. comparisons from /deep-innovate
3. a concrete implementation plan from /deep-plan
Implementation

After approval, /issue-action lets Claude implement the plan, write tests, open a PR, and ensure CI passes.
Review and merge

We use /pr-review for a structured review, then do final human review before merging.

The human intervenes at three checkpoints: requirements, direction, and acceptance. Everything else runs autonomously.

Mindset shift: you’re leading a team of AI developers

The moment I realized we needed a structured workflow was when adding more Claude sessions actually made things worse. The more instances I ran in parallel, the harder it became to track what each one was doing, what state the work was in, and what had already been decided.

Without external tools, I simply couldn’t manage that many Claude instances at once. That’s when it clicked: this wasn’t an AI problem, it was a management problem.

GitHub is already the natural tool for collaboration in software development, so instead of inventing something new, I started treating Claude the same way I treat a human teammate. Once I did that, my management bandwidth suddenly scaled.

Ten years of project and team management experience finally made sense in this new context. By treating Claude as a team member and GitHub as our shared communication and management space, the whole system became manageable again.

A good team leader knows when to engage and when to step back:

Checkpoint	What I do	What AI does
Requirements	Align on the problem, clarify scope	Research codebase, gather context
Direction	Review findings, approve approach	Propose 2-3 approaches, evaluate trade-offs
Acceptance	Review PR, verify quality	Implement, test, fix CI

This mirrors how effective software teams operate. I don't micromanage developer but set clear requirements, review key decisions, and verify the final output. The same principle applies when managing AI agents.

The deep dive flow enforces structured, slow thinking

The deep dive workflow enforces deliberate thinking before implementation. Sometimes Claude runs into a dead end. When that happens, we force Claude to stop and think, and then talk it through together. It has three phases:

Phase	Command	Purpose	Output
Research	/deep-research	Gather facts, understand context	research.md
Innovate	/deep-innovation	Explore multiple approaches	innovate.md
Plan	/deep-plan	Define concrete steps	plan.md

Each phase has strict boundaries.

Research: no suggestions
Innovate: no details
Plan: no implementation

These constraints force Claude into slow, deliberate reasoning instead of jumping straight to code. Without them, edge cases and architectural concerns are often missed!

Usage example

/deep-research investigate the authentication flow, I'm seeing token expiration issues

[Claude researches, analyzes 12 related files, finds 3 similar patterns]

/deep-innovate what are our options for fixing this?

[Claude presents 3 approaches with trade-offs, you pick one]

/issue-create let's track this fix

For simple tasks, you can skip the deep dive and go directly to /issue-create.

For complex tasks with technical uncertainty, the deep dive phases help ensure you and Claude are aligned before implementation begins.

Use GitHub as shared memory

Most AI tools treat context as temporary. When the session ends, the memory disappears.

VM0 uses GitHub as persistent memory:

GitHub feature	What it stores
Issue body	Requirements and decisions
Issue comments	Research, options, plans
PR comments	Reviews and summaries
Labels	Workflow state

This also solves a human problem: context recovery.

When I am managing 8+ Claude instances, I receive notifications that work is complete. But I can't reconstruct from Claude's conversation what it was doing, what decisions were made, or what the current state is.

GitHub issues solve this. Each issue displays:

The original requirements
Research findings (what was discovered)
Innovation phase (what options were considered)
The approved plan (what will be implemented)

This structured format makes review efficient. So I can quickly scan the phases, understand the approach, and approve or request changes, all without needing to remember the original conversation.

When work finishes, I don’t need to remember what happened in a chat window. I can open the issue and see the full story, structured and written down.

Handoff between agents

Because all context lives in GitHub, work can move between agents seamlessly:

One agent create an issue or PR
Another continues later using /deep-research issue 123 or /issue-plan 123 or /deep-research PR 124

For long discussions, /issue-compact consolidates everything into a clean issue body. This makes handoffs easy for both humans and AI.

Let’s summarize the workflow patterns

After all that, let me summarize a few practical tips.

Simple tasks

/issue-create → /issue-plan → /issue-action → /pr-check-and-merge

Use this when requirements are clear and the work is straightforward.

Complex tasks

/deep-research → discussion → /deep-innovate → discussion →
/issue-create → /issue-plan → /issue-action →
/pr-review → /pr-check

This prevents wasted effort on the wrong approach.

Parallel work

Multiple agents can work at once while the human reviews completed checkpoints. This is where the workflow scales best.

Agent 1: /issue-plan #123
Agent 2: /issue-plan #124
Agent 3: /pr-review #100
Agent 4: /deep-research new feature requirements

Command reference

Deep dive commands

Command	Purpose
`/deep-research`	Gather information, understand codebase. No suggestions allowed.
`/deep-innovate`	Explore 2-3 approaches, evaluate trade-offs. No code allowed.
`/deep-plan`	Create concrete implementation steps. No implementation allowed.

Issue commands

Command	Purpose
`/issue-create`	Create issue from conversation context
`/issue-bug`	Create bug report with reproduction steps
`/issue-feature`	Create feature request focused on requirements
`/issue-plan`	Execute full deep-dive workflow, post results to issue
`/issue-action`	Continue implementation after human approval
`/issue-compact`	Consolidate issue body + comments for handoff

PR commands

Command	Purpose
`/pr-check`	Monitor CI pipeline, auto-fix, retry up to 3x
`/pr-review`	Review PR commit-by-commit against project standards
`/pr-comment`	Summarize conversation discussion to PR comment

Getting started

Start simple: Use /issue-create → /issue-plan → /issue-action for your first task
Add deep dive for complex tasks: When requirements are unclear or technically complex, start with /deep-research
Scale gradually: Add more Claude instances as you get comfortable with the review rhythm
Trust the process: Let Claude work autonomously between checkpoints

The workflow is designed to be adopted incrementally. You don't need to use all 14 commands from day one. Start with the basic issue flow, then add deep dive phases and parallel work as you gain confidence.

Scaling considerations: What to do when you have more agents

The workflow has been tested with 10+ concurrent Claude instances. Our recommendation:

Up to 10 agents: Comfortable for deep collaboration with each
Beyond 10: Not recommended

The limiting factor isn't the workflow, it's human attention and decision quality. When managing more than 10 agents, you risk becoming a bottleneck at review checkpoints, and decision quality starts to degrade.

The classic "two pizza team" principle applies here. The same constraints that limit human team size also limit how many AI agents one person can effectively manage.

I'm currently exploring an 8×8 two-tier team structure for scaling beyond 10 agents, but haven't yet developed effective practices. I'll share more when there are concrete results…

The VM0 dev workflow changes how we think about software development when AI becomes part of the team.

When you treat AI agents as team members rather than tools, everything clicks into place. GitHub becomes your team's shared memory. Issues become work items. PRs become deliverables. And you become the team leader, focusing on architecture, direction, and quality while your AI team handles the implementation.

That's how we shipped 404 releases in 2 months. And it's how you can scale your own development with AI.