Model Selection War Stories: From Anthropic Ban to Kimi 2.5

Emergency triage: quick diagnosis

Agent broken? Follow this flowchart:

openclaw status
├── Gateway: STOPPED
│   ├── Run: openclaw gateway start
│   ├── If fails → Check port: lsof -i :3000
│   │   ├── Port busy → kill -9 $(lsof -t -i :3000) → retry
│   │   └── Port free → Check logs: openclaw logs --level error
│   │       ├── "permission denied" → Permission issue, use sudo or fix file perms
│   │       ├── "EADDRINUSE" → Port conflict, change port or kill process
│   │       └── Other error → Search GitHub Issues
│   └── If succeeds → Continue below
│
├── Gateway: RUNNING, Agents: 0
│   ├── Agent not started → openclaw agent start --workspace [path]
│   ├── Agent start fails → Check SOUL.md syntax and model config
│   └── Config file errors → openclaw config check
│
├── Gateway: RUNNING, Agents: N (active)
│   ├── Agent running but not working
│   │   ├── Run: openclaw logs --follow
│   │   ├── See "429" → API quota exhausted, switch to fallback model
│   │   ├── See "403" → API key banned or expired, regenerate
│   │   ├── See "ECONNREFUSED" → Network problem
│   │   │   ├── Check proxy: echo $HTTPS_PROXY
│   │   │   ├── Proxy OK → API service itself is down, wait or switch
│   │   │   └── No proxy → export HTTPS_PROXY=http://127.0.0.1:7890
│   │   ├── See "context_length_exceeded" → Context full
│   │   │   └── Run /compact or restart session
│   │   └── Logs normal but no output → Agent may be waiting for input, check Discord
│   │
│   └── Agent output quality poor
│       ├── SOUL.md too long (>200 lines) → Trim to essential rules only
│       ├── Model doesn't match task → See model selection matrix below
│       └── Context polluted with irrelevant info → Start new session
│
└── Command itself errors
    ├── "command not found" → OpenClaw not installed or not in PATH
    ├── "EACCES" → Permission issue
    └── Other → openclaw --version to confirm version, search GitHub Issues

GitHub Issues search tip: Go to github.com/anthropics/claude-code/issues and search for the error message keywords. Most problems have been encountered and solved by someone.

Pit #1: Anthropic OAuth ban — full post-mortem

What happened

Running Claude Max subscription ($200/month) for OpenClaw agents. Everything worked for three weeks.

Then one morning, all agents stopped simultaneously. Logs showed 403 Forbidden everywhere.

Why it happened

Root cause: OAuth risk control, not a policy violation. Specifically:

OpenClaw calls Anthropic API via OAuth token
Claude Max subscription is a consumer product, designed for human use in web/app
Anthropic’s risk control detected OAuth token being used for automated call patterns (high frequency, no intervals, no human interaction signatures)
Triggered automatic ban — no warning, no email, no degradation. Straight 403.

Key distinction:

❌ Not because of too many calls
❌ Not because of overdue payment
❌ Not because of ToS violation
✅ Because consumer OAuth token was used for automated calls, triggering risk control

Impact

All agents instantly down. No degradation, no warning. Token invalidated. System halted for ~4 hours until we switched to Kimi 2.5.

Fix

Immediately switched to Kimi 2.5 (kimi-coding/kimi-for-coding)
Kept Claude Max subscription for VS Code plugin (human manual use)
Removed all Anthropic models from OpenClaw agent configs

Prevention

MEMORY.md rule: ”⛔ Don’t use Anthropic models via OpenClaw’s OAuth channel”
Every agent’s SOUL.md has this reminder
Removed all Anthropic fallbacks from model config
Configure multi-model fallback from day 1

The more accurate conclusion

Original claim: “Never use Anthropic” — this was too absolute.

More accurately: Don’t use Anthropic via OpenClaw’s OAuth channel. The problem isn’t Claude (it’s one of the best models), but consumer subscription OAuth tokens aren’t designed for automated calls.

If you need Claude in your agents, the correct approaches are:

Use Anthropic API Keys (not OAuth tokens)

Use enterprise API subscriptions (with explicit automation support)

Use third-party aggregators (e.g., OpenRouter) for indirect access

Community experience: This is very common in the OpenClaw community. Search GitHub: “anthropic ban” OR “OAuth 403” for more discussions and solutions.

Pit #2: OpenAI quota burn

What happened

GPT-4 API has rate limits + monthly quota. After one week of agent operation, quota hit zero. Constant 429 errors.

Why it happened

Root cause: sub-agent retrying failed API calls infinitely.

An API endpoint was temporarily unavailable (server maintenance)
Sub-agent’s default behavior: “if it fails, retry”
No retry limit was set
One sub-agent retried dozens of times in 30 minutes
Each retry consumed API quota
Monthly quota burned overnight by useless retries

Core problem: OpenClaw has no default retry limit. If you don’t set one, agents will retry until quota is gone.

Fix

Three-tier fallback: Kimi 2.5 → OpenAI Codex → MiniMax M2.5
Stop after 2 failures, no infinite retries
Weekly API key health check

{
  "retry": {
    "maxAttempts": 2,
    "backoffMs": 30000
  }
}

Prevention

TOOLS.md maintains a “known dead APIs” list
Heartbeat check includes API availability verification
Hard rule: 2 infrastructure failures → stop 30 min or change approach
Check API bills weekly — estimated cost and actual cost are usually 3-5x apart

Pit #3: Wrong model for the job

What happened

Coder (engineer agent) using OpenAI Codex for backend — great quality. But when doing frontend UI:

Used pure red (#FF0000) instead of design spec’s coral (#FF5A36)
Zero aesthetic sense in layout
CSS with backend engineer taste

Why it happened

Codex is a backend/algorithm-oriented model. Its training data and optimization target is logical reasoning and code generation, not visual design.

This isn’t a prompt problem. Same prompt given to Kimi 2.5 produces completely different UI quality.

Fix & prevention

Task type	Recommended model	Reason
Frontend/UI	Kimi 2.5	Best CSS/design capability
Backend/infra	OpenAI Codex	Strong logical reasoning, weak UI
Research/writing	Kimi 2.5	Multi-language, fast
Budget tasks	MiniMax M2.5	Cheapest, adequate quality
Complex reasoning	Claude Opus	Best reasoning, use via API Key not OAuth

Key finding: Same agent, different model = vastly different output quality. Models aren’t universal — match them to task types.

Pit #4: Agent silent disappearance

What happened

Dispatched task to Coder. He said “30-40 minutes.” Five hours later: no file changes, no response, Discord silent.

Why it happened

Agents can silently fail for multiple reasons:

Session timeout — OpenClaw sessions have lifespans, agent silently exits when expired
Context window full — agent consumed all context on a complex task, starts hallucinating or freezing
Tool call stuck — e.g., SSH connection timeout, agent waiting indefinitely
Model API disconnected — API service interrupted, no reconnection mechanism

Core problem: Agents don’t have self-reporting for “I’m dead.” They don’t know they’ve stopped.

Fix & prevention

30-minute task timeout (pure time-based, not dependent on agent behavior)
Timeout → automatic Telegram alert to human
Agent SOUL.md rule: must @mention COO after completion
Three-layer monitoring (heartbeat + block detection + task timeout) — see Workflows
Advanced: Deploy independent watchdog process outside OpenClaw — see Architecture

Known gap: If agent is actively outputting but going in the wrong direction, timeout won’t trigger. Needs output validation (not yet implemented).

Pit #5: @mention triple failure

See Architecture — Communication for the full story.

Summary: took three days to get inter-agent notifications working. Problem wasn’t in sending — it was in receiver configuration.

Three failures:

Plain text @name → Discord doesn’t recognize
Correct format but session cached old config → must restart session after SOUL.md changes
Format correct but receiver’s users list missing bot ID → messages silently dropped

Lesson: Communication chains are more fragile than you think. Any single broken link = “sent but not received.” End-to-end testing is the only reliable verification.

Pit #6: Shipping without QA

What happened

February 8, 2026. Deployed DogSnap and FamilyVault simultaneously. Sub-agent said “deployment complete.” I sent links directly to human.

DogSnap’s scan feature was completely broken — used a non-existent model name. Human discovered it, not agents.

Why it happened

Sub-agent only verified “did deployment succeed?” (HTTP 200), didn’t test core features
COO trusted sub-agent’s “complete” report without independent verification
No QA step — the entire chain was “done → ship”

Core problem: “Deployment succeeded” ≠ “Features work.” These are two completely different verifications.

Impact

Trust shattered. Human questioned all subsequent agent deliveries. Rebuilding trust is 10x harder than fixing the bug.

Fix & prevention

QA Gate: all deliveries must pass QA (QA agent) review
QA does code review + functional testing + deployment verification
No QA PASS = COO never says “done”
SOUL.md hard rule: ⛔ NEVER report done without QA PASS
Same task fails QA twice → escalate to human, stop cycling

Pit #7: Context window overflow

What happened

COO processed too many tasks in one session. Context hit 70% → slowdown. Hit 90% → completely frozen. Human sent messages, waited 5 minutes, no response.

Why it happened

Coordinator’s session accumulated context from all tasks
Every result, dispatch, QA feedback added to context
No periodic cleanup mechanism
At 70%, model inference speed drops. At 90%, nearly unresponsive.

Fix & prevention

Cron checks context usage every 5 minutes
At 60%: auto-save state to memory file + run /compact
Never let context reach 70%
After task completion: immediately archive to memory/YYYY-MM-DD.md
Large tasks go to sub-agents, not in COO’s session

Detailed memory management: See Workflows — Memory Management

Model selection matrix (current, tested on 2026.3.2)

Task type	Recommended model	Reason
Frontend/UI	Kimi 2.5	Best CSS/design capability
Backend/infra	OpenAI Codex	Strong logical reasoning, weak UI
Research/writing	Kimi 2.5	Multi-language, fast
Budget tasks	MiniMax M2.5	Cheapest, adequate quality
Complex reasoning	Claude Opus	Via API Key, not OAuth

Model switching checklist

# 1. Edit OpenClaw config
vim ~/.openclaw/config/openclaw.json

# 2. Restart gateway (config doesn't hot-reload)
openclaw gateway restart

# 3. Confirm old sessions are terminated
openclaw status

# 4. Verify new session uses new model
openclaw logs --follow

Gotcha: Changed config but didn’t restart gateway → old sessions continue with old model. Changed SOUL.md but didn’t kill old session → old session continues with old rules. Config change = must restart.

Next: Workflow Recipes — QA gates, task queues, heartbeat monitoring.