From 1 Agent to 5: Building a Multi-Agent Architecture
Starting point: one person, one agent
Section titled “Starting point: one person, one agent”January 2026. One MacBook Air and one OpenClaw agent. It did everything: wrote code, ran research, sent messages, managed crons.
Problems surfaced fast:
- Context explosion — one agent handling all tasks, context window full in 30 minutes
- Task conflicts — research and coding fighting for the same session
- Single point of failure — agent goes down, everything stops
Two weeks later, I started splitting.
The first split: coordinator + executors
Section titled “The first split: coordinator + executors”The most critical decision was separating coordination from execution.
COO became the coordinator (COO). No coding, no research, no content writing. Three jobs only:
- Receive human instructions
- Break down tasks and dispatch to executor agents
- Collect results and report back to human
This solved 80% of problems. The COO Principle: the coordinator never touches execution.
Why? Because once a coordinator starts executing, it enters debug loops, loses the big picture, and the human can’t reach it. This lesson cost several all-nighters.
Further reading: This pattern is called “Orchestrator-Worker” architecture in multi-agent systems. Similar designs appear in Microsoft AutoGen and CrewAI, but our implementation is simpler — no framework, just native OpenClaw capabilities.
Role design: why 5 agents, not 3 or 7?
Section titled “Role design: why 5 agents, not 3 or 7?”Five agents, each with clear boundaries. This number wasn’t planned — it was driven by real needs:
| Order | Agent | Role | Why needed |
|---|---|---|---|
| 1st | COO | Coordinator (COO) | Single agent doing everything caused context overflow — had to separate coordination |
| 2nd | Coder | Engineer (Coder) | Coding tasks were most frequent, needed a dedicated agent |
| 3rd | QA | QA | Feb 8 broken deployment incident forced a quality gate |
| 4th | Research | Research | Research tasks conflicting with Coder’s coding sessions |
| 5th | Marketing | Marketing | Content creation needs different models and prompt styles |
Key insight: Each new agent was added because of a specific problem, not pre-planned. If your scenario doesn’t need 5, don’t force it.
When do you need multi-agent? (Decision tree)
Section titled “When do you need multi-agent? (Decision tree)”Is your single agent enough?├── Yes → Don't add more. Keep going.└── No → Where's it falling short? ├── Context frequently overflowing → Split: coordinator + executor ├── Different task types conflicting → Split agents by task type ├── Output quality unstable → Add a QA agent ├── Too many tasks queuing up → Add more executor agents └── None of the above → Problem is likely in SOUL.md or model selection, not agent countRules of thumb:
- 1 agent handles most personal projects
- 2-3 for clear division needs (e.g., one for coding + one for QA)
- 5+ only when you truly have multiple concurrent task types
- Beyond 7 management overhead skyrockets unless you have a mature orchestration system
Detailed agent configs
Section titled “Detailed agent configs”COO — Coordinator (COO)
- Only agent that talks directly to human
- Dispatches to other agents via Discord
- Maintains task queue, monitors completion
- Runs on MacBook Air
- Core SOUL.md rule: Never execute tasks, write code, or do research. Only dispatch.
Coder — Engineer (Coder)
- Writes code, debugs, deploys
- Can SSH into any machine
- Runs on Mac Mini
- Model selection: Kimi 2.5 for frontend, Codex for backend/algorithms (see Debug Playbook #3)
Research — Researcher
- Web search, data analysis, competitive research
- Multi-platform information aggregation
- Runs on Mac Mini
Marketing — Marketing
- Content creation, social media, brand strategy
- Drafts only, never publishes (human approves) — hard rule
- Runs on Mac Mini
QA — QA
- Reviews all agent output
- Code review + functional testing + deployment verification
- Runs on MacBook Air (same gateway as COO)
- Why same gateway:
sessions_sendonly works within the same gateway. QA needs fast response to COO’s QA requests.
Hardware architecture
Section titled “Hardware architecture”MacBook Air — Coordinator node├── COO (dispatch)├── QA (QA)└── Gateway A
Mac Mini — Worker node├── Coder (engineering)├── Research (research)├── Marketing (marketing)└── Gateway BWhy two machines?
Section titled “Why two machines?”Not performance. MacBook Air handles 5 agents fine. The reason is isolation:
- Stability isolation — worker node can restart/reinstall without affecting coordinator
- Reachability guarantee — coordinator stays stable and reachable at all times
- Network policy isolation — worker runs proxies, VPNs; coordinator uses direct connection
- Resource isolation — coding tasks consuming CPU/memory won’t affect dispatch responsiveness
Connected via Tailscale, SSH between them.
Single vs multi-machine: how to choose
Section titled “Single vs multi-machine: how to choose”| Scenario | Recommendation | Reason |
|---|---|---|
| 1-3 agents | Single machine | No need for added complexity |
| 3-5 agents | Depends | If tasks frequently max CPU or need different network policies, consider dual-machine |
| 5+ agents | Dual machine | Isolation stability benefits outweigh management cost |
| Need 24/7 uptime | Must use dual | Single machine restart = everything down |
Multi-machine communication setup
Section titled “Multi-machine communication setup”# 1. Install Tailscale on both machinescurl -fsSL https://tailscale.com/install.sh | shtailscale up
# 2. Verify connectivityping mac-mini.tailnet # Should respond
# 3. Configure passwordless SSHssh-copy-id user@mac-mini.tailnet
# 4. Test cross-machine OpenClaw commandsssh user@mac-mini.tailnet "openclaw status"Gotcha: Tailscale may disconnect after macOS sleep. Disable auto-sleep in System Settings → Energy, or use
caffeinate -dto keep the machine awake.
Communication
Section titled “Communication”Agents don’t communicate via direct API calls. They talk through Discord.
Why Discord?
- Each agent has a dedicated channel (#⚡・coder, #🔍・research, etc.)
- Messages are persisted
- Human can inspect any agent’s conversation at any time
- Supports @mention notifications
- Free, no API call limits
Communication path
Section titled “Communication path”Human (Telegram) → COOCOO → Discord #⚡・coder → Coder (Mac Mini)Coder completes → Discord @COO → COOCOO → QA (MacBook Air, sessions_send)QA QA PASS → COO → Human (Telegram)The @mention saga: three failures
Section titled “The @mention saga: three failures”This looked simple but took three days to fix. Full story:
Failure 1: Plain text @mention
Agent writes @COO in Discord — plain text, not recognized as a mention, no notification.
❌ @COO task complete✅ <@1234567890> task complete ← Need Discord user ID formatFailure 2: Session cached old config
Updated SOUL.md but agent’s old session didn’t refresh — still using old format.
Root cause: OpenClaw sessions load SOUL.md at creation time and don’t auto-refresh. Must restart sessions after SOUL.md changes.
# Correct approach: after changing SOUL.mdopenclaw gateway restart # Restart gateway, all sessions recreatedCommunity note: This issue is commonly discussed — search GitHub: “session stale config” for more solutions.
Failure 3: Receiver config missing bot IDs
Correct format, fresh session. But COO’s openclaw.json users list only had the human’s Discord ID. Bot messages silently dropped.
{ "discord": { "users": ["human_discord_id", "gus_bot_id", "jason_bot_id"], "ignoreBots": false }}Final fix: Added all bot IDs + set ignoreBots: false.
Lesson: Communication channels must be tested end-to-end. Don’t test only the sending side. This bug caused agents to be blocked 2+ hours with nobody knowing.
Monitoring
Section titled “Monitoring”Agents will go silent. Not “might” — will. You need layered monitoring:
Layer 1: Heartbeat (every 15 min)
Section titled “Layer 1: Heartbeat (every 15 min)”COO self-checks: read task queue, check pending tasks, check agent activity, alert human if anomaly.
Layer 2: Block detection (every 10 min)
Section titled “Layer 2: Block detection (every 10 min)”Scan agent Discord channels. If agent asked a question with no reply for >15 min → alert. Known gap: only detects “agent asked, nobody answered.” Cannot detect “agent promised to deliver, then went silent.”
Layer 3: Task timeout (30 min)
Section titled “Layer 3: Task timeout (30 min)”Pure time-based. If task dispatched >30 min ago with no completion → alert. Most reliable layer. No dependency on agent behavior.
Advanced: Independent Watchdog
Section titled “Advanced: Independent Watchdog”For 24/7 systems, deploy a monitoring process outside of OpenClaw:
#!/bin/bashwhile true; do if ! openclaw status | grep -q "RUNNING"; then curl -X POST "YOUR_WEBHOOK_URL" \ -H "Content-Type: application/json" \ -d '{"text":"⚠️ OpenClaw gateway is DOWN!"}' fi sleep 300doneWhy independent? If OpenClaw itself crashes, its internal monitoring crashes too. An independent process can still alert when the entire system is down.
QA Gate (hard rule)
Section titled “QA Gate (hard rule)”All agent output must pass QA’s review before COO reports to human.
Origin: February 8, 2026. Deployed two products without testing. One was completely broken. Human found it, not the agents.
After that: no QA PASS, never say “done.”
Cost evolution
Section titled “Cost evolution”| Phase | Monthly cost | Model | Notes |
|---|---|---|---|
| Single agent (Jan) | ~$200 | Claude (Anthropic API) | Pay-per-use, infinite retries |
| Post-migration (early Feb) | ~$80 | Mixed (Kimi + OpenAI) | Switched to subscription, added retry limits |
| Current (Mar) | ~$30-50 | Kimi 2.5 primary + Codex fallback | Task tiering + budget model fallback |
Cost dropped not from using worse models, but from:
- Fewer wasted calls — full context upfront, fewer round-trips (saved 15%)
- Task tiering — cheap model for P3, best model for P0 (saved 15%)
- No debug loops — 2 failures then stop, no infinite retries (saved 40%)
- Model-task matching — Kimi for frontend, Codex for backend (saved 20%)
Advice for beginners
Section titled “Advice for beginners”- Start with 1 agent but separate coordination and execution from day 1
- SOUL.md is your most important file — it defines agent behavior boundaries
- Test communication end-to-end — don’t assume “sent = received”
- Monitoring isn’t optional — agents will go silent
- Add QA gate from day 1 — or you’ll ship broken things to users
- Use single agent for one week first — understand its behavior before scaling
New to OpenClaw? Start with Getting Started — complete walkthrough from installation to first task.
Next: Model Selection War Stories — from Anthropic ban to Kimi 2.5.