Model Selection War Stories: From Anthropic Ban to Kimi 2.5
Emergency triage: quick diagnosis
Section titled “Emergency triage: quick diagnosis”Agent broken? Follow this flowchart:
openclaw status├── Gateway: STOPPED│ ├── Run: openclaw gateway start│ ├── If fails → Check port: lsof -i :3000│ │ ├── Port busy → kill -9 $(lsof -t -i :3000) → retry│ │ └── Port free → Check logs: openclaw logs --level error│ │ ├── "permission denied" → Permission issue, use sudo or fix file perms│ │ ├── "EADDRINUSE" → Port conflict, change port or kill process│ │ └── Other error → Search GitHub Issues│ └── If succeeds → Continue below│├── Gateway: RUNNING, Agents: 0│ ├── Agent not started → openclaw agent start --workspace [path]│ ├── Agent start fails → Check SOUL.md syntax and model config│ └── Config file errors → openclaw config check│├── Gateway: RUNNING, Agents: N (active)│ ├── Agent running but not working│ │ ├── Run: openclaw logs --follow│ │ ├── See "429" → API quota exhausted, switch to fallback model│ │ ├── See "403" → API key banned or expired, regenerate│ │ ├── See "ECONNREFUSED" → Network problem│ │ │ ├── Check proxy: echo $HTTPS_PROXY│ │ │ ├── Proxy OK → API service itself is down, wait or switch│ │ │ └── No proxy → export HTTPS_PROXY=http://127.0.0.1:7890│ │ ├── See "context_length_exceeded" → Context full│ │ │ └── Run /compact or restart session│ │ └── Logs normal but no output → Agent may be waiting for input, check Discord│ ││ └── Agent output quality poor│ ├── SOUL.md too long (>200 lines) → Trim to essential rules only│ ├── Model doesn't match task → See model selection matrix below│ └── Context polluted with irrelevant info → Start new session│└── Command itself errors ├── "command not found" → OpenClaw not installed or not in PATH ├── "EACCES" → Permission issue └── Other → openclaw --version to confirm version, search GitHub IssuesGitHub Issues search tip: Go to github.com/anthropics/claude-code/issues and search for the error message keywords. Most problems have been encountered and solved by someone.
Pit #1: Anthropic OAuth ban — full post-mortem
Section titled “Pit #1: Anthropic OAuth ban — full post-mortem”What happened
Section titled “What happened”Running Claude Max subscription ($200/month) for OpenClaw agents. Everything worked for three weeks.
Then one morning, all agents stopped simultaneously. Logs showed 403 Forbidden everywhere.
Why it happened
Section titled “Why it happened”Root cause: OAuth risk control, not a policy violation. Specifically:
- OpenClaw calls Anthropic API via OAuth token
- Claude Max subscription is a consumer product, designed for human use in web/app
- Anthropic’s risk control detected OAuth token being used for automated call patterns (high frequency, no intervals, no human interaction signatures)
- Triggered automatic ban — no warning, no email, no degradation. Straight 403.
Key distinction:
- ❌ Not because of too many calls
- ❌ Not because of overdue payment
- ❌ Not because of ToS violation
- ✅ Because consumer OAuth token was used for automated calls, triggering risk control
Impact
Section titled “Impact”All agents instantly down. No degradation, no warning. Token invalidated. System halted for ~4 hours until we switched to Kimi 2.5.
- Immediately switched to Kimi 2.5 (
kimi-coding/kimi-for-coding) - Kept Claude Max subscription for VS Code plugin (human manual use)
- Removed all Anthropic models from OpenClaw agent configs
Prevention
Section titled “Prevention”- MEMORY.md rule: ”⛔ Don’t use Anthropic models via OpenClaw’s OAuth channel”
- Every agent’s SOUL.md has this reminder
- Removed all Anthropic fallbacks from model config
- Configure multi-model fallback from day 1
The more accurate conclusion
Section titled “The more accurate conclusion”Original claim: “Never use Anthropic” — this was too absolute.
More accurately: Don’t use Anthropic via OpenClaw’s OAuth channel. The problem isn’t Claude (it’s one of the best models), but consumer subscription OAuth tokens aren’t designed for automated calls.
If you need Claude in your agents, the correct approaches are:
- Use Anthropic API Keys (not OAuth tokens)
- Use enterprise API subscriptions (with explicit automation support)
- Use third-party aggregators (e.g., OpenRouter) for indirect access
Community experience: This is very common in the OpenClaw community. Search GitHub: “anthropic ban” OR “OAuth 403” for more discussions and solutions.
Pit #2: OpenAI quota burn
Section titled “Pit #2: OpenAI quota burn”What happened
Section titled “What happened”GPT-4 API has rate limits + monthly quota. After one week of agent operation, quota hit zero. Constant 429 errors.
Why it happened
Section titled “Why it happened”Root cause: sub-agent retrying failed API calls infinitely.
- An API endpoint was temporarily unavailable (server maintenance)
- Sub-agent’s default behavior: “if it fails, retry”
- No retry limit was set
- One sub-agent retried dozens of times in 30 minutes
- Each retry consumed API quota
- Monthly quota burned overnight by useless retries
Core problem: OpenClaw has no default retry limit. If you don’t set one, agents will retry until quota is gone.
- Three-tier fallback: Kimi 2.5 → OpenAI Codex → MiniMax M2.5
- Stop after 2 failures, no infinite retries
- Weekly API key health check
{ "retry": { "maxAttempts": 2, "backoffMs": 30000 }}Prevention
Section titled “Prevention”- TOOLS.md maintains a “known dead APIs” list
- Heartbeat check includes API availability verification
- Hard rule: 2 infrastructure failures → stop 30 min or change approach
- Check API bills weekly — estimated cost and actual cost are usually 3-5x apart
Pit #3: Wrong model for the job
Section titled “Pit #3: Wrong model for the job”What happened
Section titled “What happened”Coder (engineer agent) using OpenAI Codex for backend — great quality. But when doing frontend UI:
- Used pure red (#FF0000) instead of design spec’s coral (#FF5A36)
- Zero aesthetic sense in layout
- CSS with backend engineer taste
Why it happened
Section titled “Why it happened”Codex is a backend/algorithm-oriented model. Its training data and optimization target is logical reasoning and code generation, not visual design.
This isn’t a prompt problem. Same prompt given to Kimi 2.5 produces completely different UI quality.
Fix & prevention
Section titled “Fix & prevention”| Task type | Recommended model | Reason |
|---|---|---|
| Frontend/UI | Kimi 2.5 | Best CSS/design capability |
| Backend/infra | OpenAI Codex | Strong logical reasoning, weak UI |
| Research/writing | Kimi 2.5 | Multi-language, fast |
| Budget tasks | MiniMax M2.5 | Cheapest, adequate quality |
| Complex reasoning | Claude Opus | Best reasoning, use via API Key not OAuth |
Key finding: Same agent, different model = vastly different output quality. Models aren’t universal — match them to task types.
Pit #4: Agent silent disappearance
Section titled “Pit #4: Agent silent disappearance”What happened
Section titled “What happened”Dispatched task to Coder. He said “30-40 minutes.” Five hours later: no file changes, no response, Discord silent.
Why it happened
Section titled “Why it happened”Agents can silently fail for multiple reasons:
- Session timeout — OpenClaw sessions have lifespans, agent silently exits when expired
- Context window full — agent consumed all context on a complex task, starts hallucinating or freezing
- Tool call stuck — e.g., SSH connection timeout, agent waiting indefinitely
- Model API disconnected — API service interrupted, no reconnection mechanism
Core problem: Agents don’t have self-reporting for “I’m dead.” They don’t know they’ve stopped.
Fix & prevention
Section titled “Fix & prevention”- 30-minute task timeout (pure time-based, not dependent on agent behavior)
- Timeout → automatic Telegram alert to human
- Agent SOUL.md rule: must @mention COO after completion
- Three-layer monitoring (heartbeat + block detection + task timeout) — see Workflows
- Advanced: Deploy independent watchdog process outside OpenClaw — see Architecture
Known gap: If agent is actively outputting but going in the wrong direction, timeout won’t trigger. Needs output validation (not yet implemented).
Pit #5: @mention triple failure
Section titled “Pit #5: @mention triple failure”See Architecture — Communication for the full story.
Summary: took three days to get inter-agent notifications working. Problem wasn’t in sending — it was in receiver configuration.
Three failures:
- Plain text @name → Discord doesn’t recognize
- Correct format but session cached old config → must restart session after SOUL.md changes
- Format correct but receiver’s
userslist missing bot ID → messages silently dropped
Lesson: Communication chains are more fragile than you think. Any single broken link = “sent but not received.” End-to-end testing is the only reliable verification.
Pit #6: Shipping without QA
Section titled “Pit #6: Shipping without QA”What happened
Section titled “What happened”February 8, 2026. Deployed DogSnap and FamilyVault simultaneously. Sub-agent said “deployment complete.” I sent links directly to human.
DogSnap’s scan feature was completely broken — used a non-existent model name. Human discovered it, not agents.
Why it happened
Section titled “Why it happened”- Sub-agent only verified “did deployment succeed?” (HTTP 200), didn’t test core features
- COO trusted sub-agent’s “complete” report without independent verification
- No QA step — the entire chain was “done → ship”
Core problem: “Deployment succeeded” ≠ “Features work.” These are two completely different verifications.
Impact
Section titled “Impact”Trust shattered. Human questioned all subsequent agent deliveries. Rebuilding trust is 10x harder than fixing the bug.
Fix & prevention
Section titled “Fix & prevention”- QA Gate: all deliveries must pass QA (QA agent) review
- QA does code review + functional testing + deployment verification
- No QA PASS = COO never says “done”
- SOUL.md hard rule:
⛔ NEVER report done without QA PASS - Same task fails QA twice → escalate to human, stop cycling
Pit #7: Context window overflow
Section titled “Pit #7: Context window overflow”What happened
Section titled “What happened”COO processed too many tasks in one session. Context hit 70% → slowdown. Hit 90% → completely frozen. Human sent messages, waited 5 minutes, no response.
Why it happened
Section titled “Why it happened”- Coordinator’s session accumulated context from all tasks
- Every result, dispatch, QA feedback added to context
- No periodic cleanup mechanism
- At 70%, model inference speed drops. At 90%, nearly unresponsive.
Fix & prevention
Section titled “Fix & prevention”- Cron checks context usage every 5 minutes
- At 60%: auto-save state to memory file + run /compact
- Never let context reach 70%
- After task completion: immediately archive to
memory/YYYY-MM-DD.md - Large tasks go to sub-agents, not in COO’s session
Detailed memory management: See Workflows — Memory Management
Model selection matrix (current, tested on 2026.3.2)
Section titled “Model selection matrix (current, tested on 2026.3.2)”| Task type | Recommended model | Reason |
|---|---|---|
| Frontend/UI | Kimi 2.5 | Best CSS/design capability |
| Backend/infra | OpenAI Codex | Strong logical reasoning, weak UI |
| Research/writing | Kimi 2.5 | Multi-language, fast |
| Budget tasks | MiniMax M2.5 | Cheapest, adequate quality |
| Complex reasoning | Claude Opus | Via API Key, not OAuth |
Model switching checklist
Section titled “Model switching checklist”# 1. Edit OpenClaw configvim ~/.openclaw/config/openclaw.json
# 2. Restart gateway (config doesn't hot-reload)openclaw gateway restart
# 3. Confirm old sessions are terminatedopenclaw status
# 4. Verify new session uses new modelopenclaw logs --followGotcha: Changed config but didn’t restart gateway → old sessions continue with old model. Changed SOUL.md but didn’t kill old session → old session continues with old rules. Config change = must restart.
Next: Workflow Recipes — QA gates, task queues, heartbeat monitoring.