Skip to content

From 1 Agent to 5: Building a Multi-Agent Architecture

January 2026. One MacBook Air and one OpenClaw agent. It did everything: wrote code, ran research, sent messages, managed crons.

Problems surfaced fast:

  • Context explosion — one agent handling all tasks, context window full in 30 minutes
  • Task conflicts — research and coding fighting for the same session
  • Single point of failure — agent goes down, everything stops

Two weeks later, I started splitting.


The most critical decision was separating coordination from execution.

COO became the coordinator (COO). No coding, no research, no content writing. Three jobs only:

  1. Receive human instructions
  2. Break down tasks and dispatch to executor agents
  3. Collect results and report back to human

This solved 80% of problems. The COO Principle: the coordinator never touches execution.

Why? Because once a coordinator starts executing, it enters debug loops, loses the big picture, and the human can’t reach it. This lesson cost several all-nighters.

Further reading: This pattern is called “Orchestrator-Worker” architecture in multi-agent systems. Similar designs appear in Microsoft AutoGen and CrewAI, but our implementation is simpler — no framework, just native OpenClaw capabilities.


Five agents, each with clear boundaries. This number wasn’t planned — it was driven by real needs:

OrderAgentRoleWhy needed
1stCOOCoordinator (COO)Single agent doing everything caused context overflow — had to separate coordination
2ndCoderEngineer (Coder)Coding tasks were most frequent, needed a dedicated agent
3rdQAQAFeb 8 broken deployment incident forced a quality gate
4thResearchResearchResearch tasks conflicting with Coder’s coding sessions
5thMarketingMarketingContent creation needs different models and prompt styles

Key insight: Each new agent was added because of a specific problem, not pre-planned. If your scenario doesn’t need 5, don’t force it.

When do you need multi-agent? (Decision tree)

Section titled “When do you need multi-agent? (Decision tree)”
Is your single agent enough?
├── Yes → Don't add more. Keep going.
└── No → Where's it falling short?
├── Context frequently overflowing → Split: coordinator + executor
├── Different task types conflicting → Split agents by task type
├── Output quality unstable → Add a QA agent
├── Too many tasks queuing up → Add more executor agents
└── None of the above → Problem is likely in SOUL.md or model selection, not agent count

Rules of thumb:

  • 1 agent handles most personal projects
  • 2-3 for clear division needs (e.g., one for coding + one for QA)
  • 5+ only when you truly have multiple concurrent task types
  • Beyond 7 management overhead skyrockets unless you have a mature orchestration system

COO — Coordinator (COO)

  • Only agent that talks directly to human
  • Dispatches to other agents via Discord
  • Maintains task queue, monitors completion
  • Runs on MacBook Air
  • Core SOUL.md rule: Never execute tasks, write code, or do research. Only dispatch.

Coder — Engineer (Coder)

  • Writes code, debugs, deploys
  • Can SSH into any machine
  • Runs on Mac Mini
  • Model selection: Kimi 2.5 for frontend, Codex for backend/algorithms (see Debug Playbook #3)

Research — Researcher

  • Web search, data analysis, competitive research
  • Multi-platform information aggregation
  • Runs on Mac Mini

Marketing — Marketing

  • Content creation, social media, brand strategy
  • Drafts only, never publishes (human approves) — hard rule
  • Runs on Mac Mini

QA — QA

  • Reviews all agent output
  • Code review + functional testing + deployment verification
  • Runs on MacBook Air (same gateway as COO)
  • Why same gateway: sessions_send only works within the same gateway. QA needs fast response to COO’s QA requests.

MacBook Air — Coordinator node
├── COO (dispatch)
├── QA (QA)
└── Gateway A
Mac Mini — Worker node
├── Coder (engineering)
├── Research (research)
├── Marketing (marketing)
└── Gateway B

Not performance. MacBook Air handles 5 agents fine. The reason is isolation:

  1. Stability isolation — worker node can restart/reinstall without affecting coordinator
  2. Reachability guarantee — coordinator stays stable and reachable at all times
  3. Network policy isolation — worker runs proxies, VPNs; coordinator uses direct connection
  4. Resource isolation — coding tasks consuming CPU/memory won’t affect dispatch responsiveness

Connected via Tailscale, SSH between them.

ScenarioRecommendationReason
1-3 agentsSingle machineNo need for added complexity
3-5 agentsDependsIf tasks frequently max CPU or need different network policies, consider dual-machine
5+ agentsDual machineIsolation stability benefits outweigh management cost
Need 24/7 uptimeMust use dualSingle machine restart = everything down
Terminal window
# 1. Install Tailscale on both machines
curl -fsSL https://tailscale.com/install.sh | sh
tailscale up
# 2. Verify connectivity
ping mac-mini.tailnet # Should respond
# 3. Configure passwordless SSH
ssh-copy-id user@mac-mini.tailnet
# 4. Test cross-machine OpenClaw commands
ssh user@mac-mini.tailnet "openclaw status"

Gotcha: Tailscale may disconnect after macOS sleep. Disable auto-sleep in System Settings → Energy, or use caffeinate -d to keep the machine awake.


Agents don’t communicate via direct API calls. They talk through Discord.

Why Discord?

  • Each agent has a dedicated channel (#⚡・coder, #🔍・research, etc.)
  • Messages are persisted
  • Human can inspect any agent’s conversation at any time
  • Supports @mention notifications
  • Free, no API call limits
Human (Telegram) → COO
COO → Discord #⚡・coder → Coder (Mac Mini)
Coder completes → Discord @COO → COO
COO → QA (MacBook Air, sessions_send)
QA QA PASS → COO → Human (Telegram)

This looked simple but took three days to fix. Full story:

Failure 1: Plain text @mention

Agent writes @COO in Discord — plain text, not recognized as a mention, no notification.

❌ @COO task complete
✅ <@1234567890> task complete ← Need Discord user ID format

Failure 2: Session cached old config

Updated SOUL.md but agent’s old session didn’t refresh — still using old format.

Root cause: OpenClaw sessions load SOUL.md at creation time and don’t auto-refresh. Must restart sessions after SOUL.md changes.

Terminal window
# Correct approach: after changing SOUL.md
openclaw gateway restart # Restart gateway, all sessions recreated

Community note: This issue is commonly discussed — search GitHub: “session stale config” for more solutions.

Failure 3: Receiver config missing bot IDs

Correct format, fresh session. But COO’s openclaw.json users list only had the human’s Discord ID. Bot messages silently dropped.

{
"discord": {
"users": ["human_discord_id", "gus_bot_id", "jason_bot_id"],
"ignoreBots": false
}
}

Final fix: Added all bot IDs + set ignoreBots: false.

Lesson: Communication channels must be tested end-to-end. Don’t test only the sending side. This bug caused agents to be blocked 2+ hours with nobody knowing.


Agents will go silent. Not “might” — will. You need layered monitoring:

COO self-checks: read task queue, check pending tasks, check agent activity, alert human if anomaly.

Scan agent Discord channels. If agent asked a question with no reply for >15 min → alert. Known gap: only detects “agent asked, nobody answered.” Cannot detect “agent promised to deliver, then went silent.”

Pure time-based. If task dispatched >30 min ago with no completion → alert. Most reliable layer. No dependency on agent behavior.

For 24/7 systems, deploy a monitoring process outside of OpenClaw:

#!/bin/bash
while true; do
if ! openclaw status | grep -q "RUNNING"; then
curl -X POST "YOUR_WEBHOOK_URL" \
-H "Content-Type: application/json" \
-d '{"text":"⚠️ OpenClaw gateway is DOWN!"}'
fi
sleep 300
done

Why independent? If OpenClaw itself crashes, its internal monitoring crashes too. An independent process can still alert when the entire system is down.

All agent output must pass QA’s review before COO reports to human.

Origin: February 8, 2026. Deployed two products without testing. One was completely broken. Human found it, not the agents.

After that: no QA PASS, never say “done.”


PhaseMonthly costModelNotes
Single agent (Jan)~$200Claude (Anthropic API)Pay-per-use, infinite retries
Post-migration (early Feb)~$80Mixed (Kimi + OpenAI)Switched to subscription, added retry limits
Current (Mar)~$30-50Kimi 2.5 primary + Codex fallbackTask tiering + budget model fallback

Cost dropped not from using worse models, but from:

  1. Fewer wasted calls — full context upfront, fewer round-trips (saved 15%)
  2. Task tiering — cheap model for P3, best model for P0 (saved 15%)
  3. No debug loops — 2 failures then stop, no infinite retries (saved 40%)
  4. Model-task matching — Kimi for frontend, Codex for backend (saved 20%)

  1. Start with 1 agent but separate coordination and execution from day 1
  2. SOUL.md is your most important file — it defines agent behavior boundaries
  3. Test communication end-to-end — don’t assume “sent = received”
  4. Monitoring isn’t optional — agents will go silent
  5. Add QA gate from day 1 — or you’ll ship broken things to users
  6. Use single agent for one week first — understand its behavior before scaling

New to OpenClaw? Start with Getting Started — complete walkthrough from installation to first task.


Next: Model Selection War Stories — from Anthropic ban to Kimi 2.5.