Skip to content

Workflow Recipes: Core Multi-Agent Processes

Every task’s complete path from human to delivery:

Human (Telegram) → COO receives
COO writes Task Brief (Goal / Context / Definition of Done / Priority)
COO dispatches to agent Discord channel
Agent executes → @mentions COO on completion
COO receives result → dispatches to QA (QA)
QA reviews → PASS / FAIL
PASS → COO reports to human
FAIL → feedback to original agent → fix → QA re-reviews → loop until PASS

Key constraints:

  • COO never touches execution (COO Principle)
  • No QA PASS = never say “done”
  • Every step logged to memory/YYYY-MM-DD.md

QA isn’t optional. It’s hard-coded into the system.

Code deliveries:

  1. Code logic correct
  2. No security vulnerabilities (hardcoded API keys, unvalidated input)
  3. Build passes
  4. Deployment succeeds (HTTP 200, not 404)
  5. Core features work (not just homepage loads — this is a direct result of the Feb 8 incident)

Content deliveries:

  1. Facts accurate
  2. Grammar/spelling
  3. Format matches target platform
  4. Sensitive info check (no leaked API keys, personal info)
QA FAIL → COO extracts failure reason
COO creates fix Brief (with QA's specific feedback)
Routes back to original agent
Agent fixes → resubmits
QA second review
If fails again → escalate to human

Hard rule: Same task fails QA twice consecutively → escalate to human, stop cycling. Infinite QA loops are as harmful as infinite API retries.


All tasks tracked in memory/task-queue.md:

- [ ] Task description | agent: xxx | priority: P1 | added: ISO-time
- [~] In-progress task | dispatched: ISO-time
- [x] Completed task | completed: ISO-time | qa_status: PASS
- [-] Recycled task | reason: timeout/no-response
LevelMeaningBehavior
P0Human explicitly requestedNeeds human approval before execution
P1ImportantNotify human then execute
P2AutomatedExecute directly
P3Low-riskExecute directly, use budget model
  1. Every human instruction → immediately becomes a [ ] task
  2. COO checks queue every heartbeat (15 min)
  3. If any [ ] tasks exist → dispatch immediately
  4. Dispatched → [~]
  5. Completed → [x]
  6. Timeout (30 min no response) → [-] + alert

Never allowed: Writing “waiting for human confirmation” unless genuinely needing human decision (payment, publishing, irreversible actions).

Origin story: February 11, 2026. Sub-agent finished at 7:14 AM. Remaining tasks written on a “Still TODO” list. Not executed. COO idle for 4 hours. Human woke up to find nothing done.


Layer 1: COO self-check (every 15 min)

1. Read memory/task-queue.md
2. Check [~] tasks for timeout
3. Check today's log for anomalies
4. Issue found → Telegram human
5. All clear → HEARTBEAT_OK

Layer 2: Agent block detection (every 10 min) Scan agent Discord channels. If agent posted a question with no reply for 15+ min → alert.

Known limitation: only detects “agent asked, nobody answered.” Cannot detect “agent promised delivery then disappeared.” That needs Layer 3.

Layer 3: Task-level timeout (30 min)

Each [~] task records dispatched timestamp
If now - dispatched > 30min with no completion
→ Telegram alert: "Task XXX dispatched to Coder 30+ min ago, no completion"

Most reliable layer. Pure time-based. No dependency on agent behavior.

When timeout detected:

Step 1: Check agent logs
├── Error logs → Identify error type
│ ├── API error (429/403/500) → Switch model or wait
│ ├── Tool call timeout → Likely SSH/network issue
│ └── Context overflow → Needs new session
└── Logs normal → Agent may just be slower than expected
Step 2: Telegram alert to human
Include: who, what task, how long, log summary
Step 3: Human decides
├── Retry → New session, re-dispatch
├── Switch agent → Dispatch to different capable agent
├── Switch model → Same agent, more suitable model
└── Manual → Human handles it

MacBook Air (COO + QA) and Mac Mini (Coder + Research + Marketing) on different gateways. sessions_send only works within the same gateway.

  • COO → Coder/Research/Marketing: Discord messages
  • Coder/Research/Marketing → COO: Discord @mention
  • COO → QA: sessions_send (same gateway, direct)
Terminal window
~/.openclaw/bin/openclaw message send \
--channel discord \
--target "⚡・coder" \
--message "[Task brief] — COO (COO)"
ProblemCauseFix
sessions_send not workingOnly works within same gatewayUse Discord messages
@mention not triggeringPlain text @name, not <@BOT_ID>Use correct Discord ID format
Messages silently droppedReceiver users list missing bot IDAdd all bot IDs + ignoreBots: false
Config changes not taking effectSession cached old configMust openclaw gateway restart after config changes

Recipe 6: Memory management & optimization

Section titled “Recipe 6: Memory management & optimization”
memory/
├── MEMORY.md # Long-term decision rules (manual, <200 lines)
├── task-queue.md # Current task queue
├── improvements-log.md # Improvement log (updated after each failure)
├── 2026-01-25.md # Daily logs
├── ...
└── 2026-03-09.md
LayerFileContentLoading
L1: Hard rulesSOUL.mdRole definition, behavior constraints, hard rulesAuto-loaded every session start
L2: Working memorymemory/YYYY-MM-DD.mdToday’s tasks, decisions, progressAuto-read every heartbeat
L3: Long-term rulesMEMORY.mdCross-day decisions, lessons, config rulesAuto-loaded every session start
# SOUL.md rules:
# ⛔ HARD RULE: When context usage > 60%:
# 1. Save current state to memory/YYYY-MM-DD.md
# 2. Execute /compact
# 3. Continue working
# ⛔ HARD RULE: Never let context exceed 70%

When to trigger Compaction:

  • Context usage hits 60% → auto-trigger
  • After task completion → archive results to daily log, then compact
  • Session running >2 hours → proactive compact

Compaction isn’t a built-in OpenClaw feature — it’s implemented via the /compact command or auto-trigger rules in SOUL.md:

  1. /compact tells the model to summarize the conversation, discarding early details
  2. You can also trigger similar behavior via cron
  3. Critical info should be saved to memory files before compacting

We tried embedding-based memory retrieval but found:

  1. Unreliable recall — semantic search recall rate isn’t high enough, critical info may not be retrieved
  2. Noisy results — “relevant” content mixed with irrelevant info, diluting useful context
  3. Maintenance overhead — requires additional infrastructure (vector DB, embedding API)
  4. Non-deterministic — you don’t know what the agent will retrieve each time

Our alternative: SOUL.md + MEMORY.md = deterministic documents loaded every session. 100% reliable.


Provide all context upfront. Target: each sub-agent completes in 3-5 tool calls.

## Task
[One sentence describing the goal]
## Context
- Relevant file paths: [specific paths]
- Background: [what the agent needs to know]
- References: [links or files]
## Done Criteria
- [ ] Specific completion standard 1
- [ ] Specific completion standard 2
- [ ] Specific completion standard 3
## Constraints
- Priority: P[0-3]
- Deadline: [time]
- Model: [recommended model]

If a task involves API calls, write a script first then run it. Don’t let agents make API calls one by one.

  • Infrastructure failure (API timeout, 429) → max 2 retries
  • After 2 → stop 30 minutes or change approach
  • Never retry infinitely

Cost of this rule: One sub-agent once spent 30 minutes retrying an exhausted API. 30 minutes = zero output.

Further reading: See Anthropic’s agent design best practices and OpenAI’s agent building guide.


All configs and commands tested on OpenClaw 2026.3.2.

Your versionNotes
2026.2.xMost configs compatible, openclaw message send params may differ slightly
2026.3.xThis handbook fully applies
2026.4.x+Check OpenClaw Changelog for breaking changes

Next: Opinions — contrarian views on AI agents.