Opinions: Contrarian Views on AI Agents
These views come from three months of real operations. Not theory — conclusions hammered out by reality. You may disagree. That’s fine.
Opinion 1: Organization > Product
Section titled “Opinion 1: Organization > Product”Most people use AI agents to build products: an app, a website, a bot.
I’m not building a product. I’m building an organization.
A 24/7 digital company staffed by AI agents. COO is COO, Coder is engineer, Research is analyst, Marketing is marketing, QA is QA. They have roles, processes, and performance standards.
Why organization over product?
- Products become obsolete. The app you build today might be irrelevant in 6 months.
- Organizations keep producing. The same agent team can build unlimited products.
- Organizations self-improve. Every failure gets written into SOUL.md. Product bug fixes are one-time. Organizational mechanism improvements are permanent.
Spend 80% of time building organization (processes, monitoring, communication, QA). 20% on specific products. Not the reverse.
Comparison: Frameworks like CrewAI and AutoGen provide multi-agent collaboration scaffolding, but they lean toward “use agents to build a product.” Our approach is closer to “use agents to build an organization.” Not mutually exclusive, but different starting points.
Opinion 2: SOUL.md is the most important code
Section titled “Opinion 2: SOUL.md is the most important code”Agent behavior isn’t determined by code — it’s determined by system prompts. SOUL.md is the agent’s “operating system.”
Well-written SOUL.md:
- Clear role boundaries (what you do, what you don’t)
- Specific operational rules (not “be safe,” but “never write API keys into chat”)
- Failure behavior constraints (“stop retrying after 2 failures”)
- Communication protocols (“must @mention COO after completion”)
Poorly-written SOUL.md:
- “You are a helpful AI assistant”
- No constraints, relies entirely on agent judgment
- Rules too long, agent ignores half
Rule of thumb: Keep SOUL.md under 200 lines. Beyond 200, agents start ignoring later rules. Put critical rules first, mark hard constraints with ⛔.
Concrete SOUL.md structure advice
Section titled “Concrete SOUL.md structure advice”# Good SOUL.md structure
## 1. Identity (first 10 lines)Who you are, your role, core responsibilities
## 2. Hard constraints (first 50 lines)⛔-marked rules that must never be violatedPut these first — model compliance is highest for the first 50 lines
## 3. Workflow (lines 50-150)Standard operating procedures, communication protocols, reporting formats
## 4. Context (lines 150-200)Current project state, known issues, notesFurther reading: Anthropic has detailed system prompt best practices. Also, Cursor Rules patterns has many transferable ideas.
Opinion 3: Agents don’t need long-term memory
Section titled “Opinion 3: Agents don’t need long-term memory”Intuition says agents should remember all history. In practice, this causes context overflow and hallucination.
My approach: short-term memory + on-demand search:
- Today’s log auto-loaded (
memory/YYYY-MM-DD.md) - Historical info retrieved via semantic search as needed
- Long-term decisions written to MEMORY.md (manually maintained, under 200 lines)
This is 10x more effective than “remember everything.” Agent context windows are finite — filling them with irrelevant history = degrading current task quality.
Analogy: You don’t need to remember what you ate every day last year. You need to know what you’re allergic to. The former is historical data; the latter is decision rules. Same for agents.
Why not vector databases / RAG?
Section titled “Why not vector databases / RAG?”Many people research better agent memory via vector databases (embedding → store → retrieve). We tried it but found:
- Unreliable recall — semantic search recall rate isn’t high enough for critical info
- Noisy results — “relevant” content mixed with irrelevant info, diluting useful context
- High maintenance — requires additional infrastructure (vector DB, embedding API)
- Non-deterministic — you can’t be sure what the agent will retrieve each time
Our alternative:
- SOUL.md + MEMORY.md = deterministic documents, loaded every session, 100% reliable
- Daily logs = short-term memory, only today’s, not drowned by history
- When historical info needed = human specifies (“refer to memory/2026-02-08.md”)
Detailed memory configs: See Workflows — Memory Management
Opinion 4: Failure is the system’s most valuable input
Section titled “Opinion 4: Failure is the system’s most valuable input”Every failure should become a permanent mechanism. Not “be more careful next time” — write it as a rule, script, or constraint.
Our process:
- Failure happens
- Ask the agent: “Walk me through your reasoning”
- Identify root cause (lost context? wrong tool? bad assumption?)
- Implement prevention: edit SOUL.md, add memory note, create new check
- Log to
memory/improvements-log.md
Hard rule: never just retry. Always add a guardrail first.
This is Harness Engineering — every failure makes the system stronger. After months, most common failure patterns are covered by rules. Agents become more reliable not because models improved, but because constraints became more comprehensive.
Real examples
Section titled “Real examples”| Failure | Root cause | New rule |
|---|---|---|
| Shipped broken product | No QA step | ⛔ QA Gate — QA must PASS |
| Context froze COO | No context monitoring | ⛔ Auto-compact at 60% |
| API quota burned overnight | Infinite retries | ⛔ Stop after 2 failures |
| Agent silent for 6 hours | No timeout mechanism | 30-min task timeout alert |
| @mention not working | Format/config errors | End-to-end communication test checklist |
| SOUL.md update not taking effect | Session caching | Must restart gateway after config changes |
After three months: ~30% of SOUL.md rules were distilled from failures. These rules are worth more than any individual task output.
Opinion 5: Human attention is the real bottleneck
Section titled “Opinion 5: Human attention is the real bottleneck”With 5 agents running simultaneously, the bottleneck isn’t compute, API quota, or model capability. It’s your attention.
5 agents producing simultaneously → you need to review 5 outputs → each needs quality judgment → each judgment needs context → your brain is single-threaded.
This is why QA Gate isn’t wasted time — it’s an attention lever. QA does first-pass review; you only need to see PASS/FAIL and a summary.
Deeper insight: As agent count grows, you don’t need more agents — you need better filtering. Information must be tiered. Only things requiring human decisions reach you. Everything else, agents handle.
Concrete practices
Section titled “Concrete practices”- COO is the only interface — human only talks to COO. COO handles all dispatch and aggregation.
- Agents shouldn’t ask questions — if the brief is good, no follow-up needed. If follow-up is needed, the brief was bad.
- Automated decisions — P2/P3 tasks, agents decide execution. Only P0/P1 need human input.
- Batch reporting — don’t report after every task. Batch 3-5 together.
Goal: Human spends 30 minutes/day managing agents. Rest on things only humans can do. Not there yet. Working toward it.
Opinion 6: Speed is worthless, reliability is everything
Section titled “Opinion 6: Speed is worthless, reliability is everything”A system that gives 80% results every time is far more valuable than one that gives 100% sometimes and 0% other times.
The former you can trust, delegate to, go to sleep. The latter you must watch, always ready to firefight.
This is why I spent enormous time on QA, monitoring, constraints — looks “slow,” but it’s building trust. When you trust the system, you can truly let go. After letting go, system output far exceeds what you’d get by watching it.
Analogy: You wouldn’t give money to a fund manager who sometimes makes 100% and sometimes loses everything. You’d pick the one with steady 15% annual returns. Agent systems are the same.
Summary
Section titled “Summary”The common theme across all six opinions: AI agents’ value isn’t how smart they are — it’s how much you can trust them. Trust comes from reliability. Reliability comes from constraints and monitoring.
Being smart is the model company’s job. Your job is building systems that make smart reliable.
If you have different experiences and opinions — open a GitHub Issue for discussion. Multi-agent systems are still early stage. We need more real cases to validate or challenge these views.
More practical content: Architecture | Debug Playbook | Workflows