OpenClaw vs Hermes: the always-on personal-agent debate
The loudest harness argument of 2026. Both are open-source (MIT), self-hosted, always-on personal agents you talk to from chat apps — and they're two of the most-starred projects on this entire list. The flame war treats them as rivals; the architecture says they're opposite design philosophies that happen to share a category.
| OpenClaw | Hermes | |
|---|---|---|
| ⭐ Stars | 380k | 199k |
| Steward | OpenClaw Foundation (community) | Nous Research |
| License | MIT | MIT |
| Core language | TypeScript | Python |
| The bet | Presence — one event loop where messages, heartbeats, crons, and webhooks are a single input queue feeding one conversation; it accumulates unbounded memory of you and feels like a person | Discipline — separated execution domains: sandboxed cron, a deliberately bounded (~3k-char) user model, script-gated wake-ups; it feels like a harness, by design |
| Proactivity | Native HEARTBEAT — background events can inject into the main conversation | No heartbeat; cron runs are isolated from the main conversation, but a wakeAgent script gate decides whether a check is worth waking the LLM at all |
| Ecosystem | 13,700+ community skills (ClawHub), the most channels and integrations, native multi-agent | Lean: skills are mostly self-generated by the learning loop; checkpoint/rollback built in |
| Security posture | Relaxed by default — "free reign," users sandbox it themselves | Restrictive by default — permission prompts, confined script locations, per-domain isolation (users report it "works on the third try") |
| Config surface | Fully drivable from chat | Several commands are CLI/TUI-only — hard to administer from Telegram alone |
| Release tempo | 82 releases and counting; fast-moving, frequently breaking | 6 releases at last count; conservative |
| Tokens per request | ~48k avg (one user's OpenRouter telemetry) | ~62k avg, same source — heavier per request, lighter when idle |
| Autonomy (list axis) | headless | headless |
| Recovery (list axis) | resumable | resumable (checkpoint/rollback is a marquee feature) |
Stars as captured for the main list (see README for the capture date).
What the field reports actually say
Primary-source notes from the debate threads (r/openclaw, r/hermesagent, r/AskClaw, r/AI_Agents — links inline), claims attributed as claims:
- The migration driver is update churn, not features. The recurring OpenClaw complaint isn't capability — it's that frequent releases break working setups ("the last 5 updates in a row," broken Telegram sessions, flaky crons; experience report). Hermes converts say they're "in the repair shop one-eighth of the time." The confounder, named by an OpenClaw user: Hermes has shipped ~6 releases to OpenClaw's ~82 — it hasn't lived long enough to accumulate breakage, so its stability record is partly an artifact of age (thread). Counter-reports exist in both directions ("I went to Hermes and it was a nightmare… back with OpenClaw").
- The learning loop is real, and double-edged. A three-week side-by-side had Hermes turn a daily news-briefing task into a reusable skill plus scheduled routine unprompted. The dark side, from a detailed migration report: bad decisions get learned too, "etched in stone," and scraping a mislearned pattern out of skills/memory/crons "reminds me of scraping a virus infection out of your PC manually."
- Run-both is the power-user consensus, with a twist. OpenClaw as orchestrator, Hermes as execution specialist (they interoperate over ACP) — but the most-cited benefit isn't throughput, it's redundancy: multiple independent reports of telling one agent to diagnose and fix the other when it bricks. Cost of running both: roughly +30%, for reportedly much more than +30% output.
- Trust the threads less than usual. Both camps accuse the other of astroturfing, and the "everyone is migrating" narrative is itself contested in OpenClaw's home sub. Volume of sentiment here is unusually weak evidence; the mechanism-level reports above are what's load-bearing.
The billing earthquake (April → June 2026)
- Early April 2026: Anthropic banned third-party agents from running on Claude subscriptions (capacity issues; an estimated 135,000+ OpenClaw instances were on subscription auth).
- June 15, 2026: Reinstated, with a catch: programmatic and third-party usage now draws from a separate Agent SDK credit pool — $20–$200/month by plan, billed at API rates, non-rollover. Always-on agents on subscription auth are no longer flat-rate.
What the ops threads add — the cost problem is usage shape, not harness choice:
- The trap is frontier models on background work: "running Opus on heartbeats is hiring a PhD physicist to check whether the fridge door is closed every 15 minutes." One lightweight monitoring setup ran $0.50/hour ≈ $360/month before optimization.
- The fix that recurs across threads, in priority order (worked example, claimed ~80% reduction in an afternoon): (1) wake less — deterministic scripts and narrow watches that only invoke a model on a match (Hermes's
wakeAgentand OpenClaw's LLM-bypassmessage sendboth support this); (2) two-tier routing — cheap/free models (DeepSeek Flash, Gemini Flash) for background, a mid-tier model for conversations; (3) slim the tool list — big tool registries silently tax every turn. Field claims of monitoring setups landing under $5–10/month after all three. - Post-June 15 escape routes in the wild: open-weight models (local or via OpenRouter), and subscription arbitrage shifting to other providers' plans — both harnesses are provider-flexible, Hermes most natively (open weights are Nous's founding thesis).
Our read
Forget the flame war's framing of better/worse. OpenClaw optimizes for presence; Hermes optimizes for discipline — almost every observed difference (heartbeat vs sandboxed cron, unbounded vs bounded memory, chat-everything vs CLI-first admin, relaxed vs restrictive security) falls out of that one fork in the design tree. Pick by which failure you'd rather live with: an exuberant agent you have to contain and patch weekly, or a careful one you have to coax and can't fully drive from your phone. If you genuinely can't choose, the orchestrator/executor + mutual-repair pattern is real, ACP makes it practical, and ~+30% cost is a fair price for redundancy. And whatever you pick, your bill is determined by questions 3–5 of How to pick a harness, not by this page.
Part of best-of-Agent-Harnesses. New to this decision? Start with How to pick a harness. Spot an error or a stale claim? Open an issue.