From 1 Agent to 5: Building a Multi-Agent Architecture

Starting point: one person, one agent

January 2026. One MacBook Air and one OpenClaw agent. It did everything: wrote code, ran research, sent messages, managed crons.

Problems surfaced fast:

Context explosion — one agent handling all tasks, context window full in 30 minutes
Task conflicts — research and coding fighting for the same session
Single point of failure — agent goes down, everything stops

Two weeks later, I started splitting.

The first split: coordinator + executors

The most critical decision was separating coordination from execution.

COO became the coordinator (COO). No coding, no research, no content writing. Three jobs only:

Receive human instructions
Break down tasks and dispatch to executor agents
Collect results and report back to human

This solved 80% of problems. The COO Principle: the coordinator never touches execution.

Why? Because once a coordinator starts executing, it enters debug loops, loses the big picture, and the human can’t reach it. This lesson cost several all-nighters.

Further reading: This pattern is called “Orchestrator-Worker” architecture in multi-agent systems. Similar designs appear in Microsoft AutoGen and CrewAI, but our implementation is simpler — no framework, just native OpenClaw capabilities.

Role design: why 5 agents, not 3 or 7?

Five agents, each with clear boundaries. This number wasn’t planned — it was driven by real needs:

Order	Agent	Role	Why needed
1st	COO	Coordinator (COO)	Single agent doing everything caused context overflow — had to separate coordination
2nd	Coder	Engineer (Coder)	Coding tasks were most frequent, needed a dedicated agent
3rd	QA	QA	Feb 8 broken deployment incident forced a quality gate
4th	Research	Research	Research tasks conflicting with Coder’s coding sessions
5th	Marketing	Marketing	Content creation needs different models and prompt styles

Key insight: Each new agent was added because of a specific problem, not pre-planned. If your scenario doesn’t need 5, don’t force it.

When do you need multi-agent? (Decision tree)

Is your single agent enough?
├── Yes → Don't add more. Keep going.
└── No → Where's it falling short?
    ├── Context frequently overflowing → Split: coordinator + executor
    ├── Different task types conflicting → Split agents by task type
    ├── Output quality unstable → Add a QA agent
    ├── Too many tasks queuing up → Add more executor agents
    └── None of the above → Problem is likely in SOUL.md or model selection, not agent count

Rules of thumb:

1 agent handles most personal projects
2-3 for clear division needs (e.g., one for coding + one for QA)
5+ only when you truly have multiple concurrent task types
Beyond 7 management overhead skyrockets unless you have a mature orchestration system

Detailed agent configs

COO — Coordinator (COO)

Only agent that talks directly to human
Dispatches to other agents via Discord
Maintains task queue, monitors completion
Runs on MacBook Air
Core SOUL.md rule: Never execute tasks, write code, or do research. Only dispatch.

Coder — Engineer (Coder)

Writes code, debugs, deploys
Can SSH into any machine
Runs on Mac Mini
Model selection: Kimi 2.5 for frontend, Codex for backend/algorithms (see Debug Playbook #3)

Research — Researcher

Web search, data analysis, competitive research
Multi-platform information aggregation
Runs on Mac Mini

Marketing — Marketing

Content creation, social media, brand strategy
Drafts only, never publishes (human approves) — hard rule
Runs on Mac Mini

QA — QA

Reviews all agent output
Code review + functional testing + deployment verification
Runs on MacBook Air (same gateway as COO)
Why same gateway: sessions_send only works within the same gateway. QA needs fast response to COO’s QA requests.

Hardware architecture

MacBook Air — Coordinator node
├── COO (dispatch)
├── QA (QA)
└── Gateway A

Mac Mini — Worker node
├── Coder (engineering)
├── Research (research)
├── Marketing (marketing)
└── Gateway B

Why two machines?

Not performance. MacBook Air handles 5 agents fine. The reason is isolation:

Stability isolation — worker node can restart/reinstall without affecting coordinator
Reachability guarantee — coordinator stays stable and reachable at all times
Network policy isolation — worker runs proxies, VPNs; coordinator uses direct connection
Resource isolation — coding tasks consuming CPU/memory won’t affect dispatch responsiveness

Connected via Tailscale, SSH between them.

Single vs multi-machine: how to choose

Scenario	Recommendation	Reason
1-3 agents	Single machine	No need for added complexity
3-5 agents	Depends	If tasks frequently max CPU or need different network policies, consider dual-machine
5+ agents	Dual machine	Isolation stability benefits outweigh management cost
Need 24/7 uptime	Must use dual	Single machine restart = everything down

Multi-machine communication setup

# 1. Install Tailscale on both machines
curl -fsSL https://tailscale.com/install.sh | sh
tailscale up

# 2. Verify connectivity
ping mac-mini.tailnet  # Should respond

# 3. Configure passwordless SSH
ssh-copy-id user@mac-mini.tailnet

# 4. Test cross-machine OpenClaw commands
ssh user@mac-mini.tailnet "openclaw status"

Gotcha: Tailscale may disconnect after macOS sleep. Disable auto-sleep in System Settings → Energy, or use caffeinate -d to keep the machine awake.

Communication

Agents don’t communicate via direct API calls. They talk through Discord.

Why Discord?

Each agent has a dedicated channel (#⚡・coder, #🔍・research, etc.)
Messages are persisted
Human can inspect any agent’s conversation at any time
Supports @mention notifications
Free, no API call limits

Communication path

Human (Telegram) → COO
COO → Discord #⚡・coder → Coder (Mac Mini)
Coder completes → Discord @COO → COO
COO → QA (MacBook Air, sessions_send)
QA QA PASS → COO → Human (Telegram)

The @mention saga: three failures

This looked simple but took three days to fix. Full story:

Failure 1: Plain text @mention

Agent writes @COO in Discord — plain text, not recognized as a mention, no notification.

❌ @COO task complete
✅ <@1234567890> task complete  ← Need Discord user ID format

Failure 2: Session cached old config

Updated SOUL.md but agent’s old session didn’t refresh — still using old format.

Root cause: OpenClaw sessions load SOUL.md at creation time and don’t auto-refresh. Must restart sessions after SOUL.md changes.

# Correct approach: after changing SOUL.md
openclaw gateway restart  # Restart gateway, all sessions recreated

Community note: This issue is commonly discussed — search GitHub: “session stale config” for more solutions.

Failure 3: Receiver config missing bot IDs

Correct format, fresh session. But COO’s openclaw.json users list only had the human’s Discord ID. Bot messages silently dropped.

{
  "discord": {
    "users": ["human_discord_id", "gus_bot_id", "jason_bot_id"],
    "ignoreBots": false
  }
}

Final fix: Added all bot IDs + set ignoreBots: false.

Lesson: Communication channels must be tested end-to-end. Don’t test only the sending side. This bug caused agents to be blocked 2+ hours with nobody knowing.

Monitoring

Agents will go silent. Not “might” — will. You need layered monitoring:

Layer 1: Heartbeat (every 15 min)

COO self-checks: read task queue, check pending tasks, check agent activity, alert human if anomaly.

Layer 2: Block detection (every 10 min)

Scan agent Discord channels. If agent asked a question with no reply for >15 min → alert. Known gap: only detects “agent asked, nobody answered.” Cannot detect “agent promised to deliver, then went silent.”

Layer 3: Task timeout (30 min)

Pure time-based. If task dispatched >30 min ago with no completion → alert. Most reliable layer. No dependency on agent behavior.

Advanced: Independent Watchdog

For 24/7 systems, deploy a monitoring process outside of OpenClaw:

#!/bin/bash
while true; do
  if ! openclaw status | grep -q "RUNNING"; then
    curl -X POST "YOUR_WEBHOOK_URL" \
      -H "Content-Type: application/json" \
      -d '{"text":"⚠️ OpenClaw gateway is DOWN!"}'
  fi
  sleep 300
done

Why independent? If OpenClaw itself crashes, its internal monitoring crashes too. An independent process can still alert when the entire system is down.

QA Gate (hard rule)

All agent output must pass QA’s review before COO reports to human.

Origin: February 8, 2026. Deployed two products without testing. One was completely broken. Human found it, not the agents.

After that: no QA PASS, never say “done.”

Cost evolution

Phase	Monthly cost	Model	Notes
Single agent (Jan)	~$200	Claude (Anthropic API)	Pay-per-use, infinite retries
Post-migration (early Feb)	~$80	Mixed (Kimi + OpenAI)	Switched to subscription, added retry limits
Current (Mar)	~$30-50	Kimi 2.5 primary + Codex fallback	Task tiering + budget model fallback

Cost dropped not from using worse models, but from:

Fewer wasted calls — full context upfront, fewer round-trips (saved 15%)
Task tiering — cheap model for P3, best model for P0 (saved 15%)
No debug loops — 2 failures then stop, no infinite retries (saved 40%)
Model-task matching — Kimi for frontend, Codex for backend (saved 20%)

Advice for beginners

Start with 1 agent but separate coordination and execution from day 1
SOUL.md is your most important file — it defines agent behavior boundaries
Test communication end-to-end — don’t assume “sent = received”
Monitoring isn’t optional — agents will go silent
Add QA gate from day 1 — or you’ll ship broken things to users
Use single agent for one week first — understand its behavior before scaling

New to OpenClaw? Start with Getting Started — complete walkthrough from installation to first task.

Next: Model Selection War Stories — from Anthropic ban to Kimi 2.5.