Best of Agent Harnesses
Hand-curated, ranked list of 110 AI agent harnesses — the runtimes that close the loop between a stateless model and the outside world.
110 harnesses · 10 categories · MCP-ready · weekly-rescored
claude mcp add agent-harnesses -- uvx agent-harnesses-mcp
Browse all 110
| Project | Category | Stars | Tier | OSS |
|---|---|---|---|---|
| OpenClaw typescriptmulti-agent | Personal agent runtimes | 380k | complex | ✅ |
| superpowers memoryide | Coding harness configs and SDKs | 235k | complex | ✅ |
| everything-claude-code multi-agent | Coding harness configs and SDKs | 219k | complex | ✅ |
| Hermes memorypythonprovider-agnostic | Personal agent runtimes | 199k | slightly complex | ✅ |
| n8n workflowlocaltypescript | Frameworks | 193k | complex | ⚠️ Fair-code |
| AutoGPT memoryevalspython | Frameworks | 185k | complex | ⚠️ Polyform-SU |
| opencode mcpprovider-agnosticclituitypescript | Coding agent products (IDEs, CLIs, full suites) | 177k | slightly complex | ✅ |
| Anthropic Skills | Coding harness configs and SDKs | 153k | mostly simple | ✅ |
| langflow low-codepython | Frameworks | 150k | complex | ✅ |
| Dify low-coderagpython | Frameworks | 146k | complex | ⚠️ Fair-code |
| langchain python | Frameworks | 140k | complex | ✅ |
| GStack typescript | Coding harness configs and SDKs | 112k | slightly complex | ✅ |
| Gemini CLI mcpclitypescript | Coding agent products (IDEs, CLIs, full suites) | 105k | slightly complex | ✅ |
| browser-use mcpbrowserpython | Frameworks | 99.9k | slightly complex | ✅ |
| Codex sandboxprovider-agnosticcli | Coding agent products (IDEs, CLIs, full suites) | 92.4k | slightly complex | ✅ |
| claude-mem memory | Plugins, MCPs, CLI tools | 83.5k | slightly complex | ✅ |
| OpenHands memorybrowsersandboxpython | Coding agent products (IDEs, CLIs, full suites) | 77.9k | complex | ⚠️ (multi-license) |
| Daytona sandbox | Libraries and SDKs | 72.4k | slightly complex | ✅ |
| MetaGPT multi-agentpython | Multi-agent and orchestration | 68.9k | complex | ✅ |
| get-shit-done clipython | Coding harness configs and SDKs | 64.4k | mostly simple | ✅ |
| Open Interpreter clipython | Coding agent products (IDEs, CLIs, full suites) | 64.1k | mostly simple | ✅ |
| Cline idetypescript | Coding agent products (IDEs, CLIs, full suites) | 63.6k | slightly complex | ✅ |
| autogen multi-agentpython | Multi-agent and orchestration | 59.1k | complex | ✅ CC-BY |
| Mem0 memorypython | Libraries and SDKs | 59k | slightly complex | ✅ |
| crewAI python | Multi-agent and orchestration | 54.1k | complex | ✅ |
| Flowise low-codetypescript | Frameworks | 53.9k | complex | ⚠️ Apache+CLA |
| LiteLLM provider-agnosticpython | Libraries and SDKs | 51k | mostly simple | ✅ |
| llama-index ragpython | Frameworks | 50.2k | complex | ✅ |
| goose mcprust | Coding agent products (IDEs, CLIs, full suites) | 50k | slightly complex | ✅ |
| aider mcpclipython | Plugins, MCPs, CLI tools | 46.5k | slightly complex | ✅ |
| agno memoryevalspython | Frameworks | 40.8k | complex | ✅ |
| awesome-cursorrules ide | Progressive disclosure harnesses | 40k | super simple | ✅ |
| langgraph workflowpython | Frameworks | 35.3k | slightly complex | ✅ |
| Khoj python | Personal agent runtimes | 35.2k | complex | ✅ |
| continue idetypescript | Plugins, MCPs, CLI tools | 34.2k | complex | ✅ |
| ChatDev python | Multi-agent and orchestration | 33.5k | slightly complex | ✅ |
| github-mcp-server mcp | Plugins, MCPs, CLI tools | 30.9k | slightly complex | ✅ |
| Composio sandboxtool-discoverypythontypescript | Libraries and SDKs | 28.9k | complex | ✅ |
| semantic-kernel python | Frameworks | 28.2k | complex | ✅ |
| smolagents sandboxpython | Libraries and SDKs | 27.9k | mostly simple | ✅ |
| gpt-researcher multi-agentpython | Research and task-specific harnesses | 27.8k | complex | ✅ |
| openai-agents-python python | Multi-agent and orchestration | 27.3k | mostly simple | ✅ |
| crush memoryclitui | Coding agent products (IDEs, CLIs, full suites) | 25.5k | slightly complex | ⚠️ FSL-1.1-MIT |
| mastra typedtypescript | Frameworks | 25.3k | slightly complex | ⚠️ Elastic-2.0 |
| vercel/ai provider-agnostictypescript | Libraries and SDKs | 25k | slightly complex | ✅ |
| deepagents multi-agentsandboxpythontypescript | Libraries and SDKs | 24.9k | slightly complex | ✅ |
| Roo Code mcpworkflowidetypescript | Coding agent products (IDEs, CLIs, full suites) | 24.2k | slightly complex | ✅ |
| letta memorypython | Frameworks | 23.4k | mostly simple | ✅ |
| MCP Python SDK mcppython | Plugins, MCPs, CLI tools | 23.4k | mostly simple | ✅ |
| agents.md typescript | Progressive disclosure harnesses | 22.4k | super simple | ✅ |
| rasa voicepython | Frameworks | 21.2k | complex | ✅ |
| Google ADK evalssandboxpython | Frameworks | 20.2k | complex | ✅ |
| SWE-agent memoryevalspython | Coding harness configs and SDKs | 19.6k | slightly complex | ✅ |
| Eliza memorymulti-agenttypescript | Personal agent runtimes | 18.6k | complex | ✅ |
| Agent Zero memorymulti-agentbrowsersandboxpython | Personal agent runtimes | 18.1k | slightly complex | ❓ |
| pydantic-ai mcptypedprovider-agnosticpython | Libraries and SDKs | 17.9k | slightly complex | ✅ |
| Agent Lightning evalstrainingpython | Evaluation and benchmarking harnesses | 17.3k | complex | ✅ |
| botpress low-codetypescript | Frameworks | 14.7k | complex | ✅ |
| OpenHarness (HKUDS) memorymulti-agent | Personal agent runtimes | 14k | complex | ✅ |
| MCP TypeScript SDK mcptypescript | Plugins, MCPs, CLI tools | 12.7k | mostly simple | ✅ |
| E2B sandboxpython | Libraries and SDKs | 12.7k | slightly complex | ✅ |
| Microsoft Agent Framework multi-agentworkflowpython | Multi-agent and orchestration | 11.5k | slightly complex | ✅ |
| MCP Inspector mcptypescript | Plugins, MCPs, CLI tools | 10.1k | super simple | ✅ |
| PraisonAI multi-agentpython | Multi-agent and orchestration | 8.2k | mostly simple | ✅ |
| R2R visionragworkflowpython | Frameworks | 7.9k | complex | ✅ |
| agent-squad multi-agent | Frameworks | 7.7k | slightly complex | ✅ |
| Claude Agent SDK mcpmemorypythontypescript | Coding harness configs and SDKs | 7.4k | complex | ✅ |
| MCP Registry mcp | Plugins, MCPs, CLI tools | 6.9k | slightly complex | ✅ |
| strands-agents mcpmulti-agenttypedpython | Libraries and SDKs | 6.2k | mostly simple | ✅ |
| SWE-bench evalssandboxpython | Evaluation and benchmarking harnesses | 5.2k | slightly complex | ✅ |
| Cloudflare Agents memorytypescript | Libraries and SDKs | 5.1k | slightly complex | ✅ |
| AgentVerse multi-agentpython | Frameworks | 5.1k | complex | ✅ |
| AgentBench evalssandboxragworkflowpython | Evaluation and benchmarking harnesses | 3.5k | complex | ✅ |
| Bee Agent Framework mcpmulti-agentpythontypescript | Frameworks | 3.3k | complex | ✅ |
| openai-agents-js multi-agentvoicetypescript | Libraries and SDKs | 3.3k | slightly complex | ✅ |
| inspect_ai evalssandboxpython | Evaluation and benchmarking harnesses | 2.2k | complex | ✅ |
| AgentStack | Frameworks | 2.2k | slightly complex | ✅ |
| WebArena python | Evaluation and benchmarking harnesses | 1.5k | complex | ✅ |
| Docker MCP Gateway mcpsandboxcli | Plugins, MCPs, CLI tools | 1.5k | slightly complex | ✅ |
| AIlice sandboxpython | Personal agent runtimes | 1.4k | slightly complex | ✅ |
| WebVoyager evalsvision | Evaluation and benchmarking harnesses | 1.1k | slightly complex | ✅ |
| ARC-AGI-2 | Evaluation and benchmarking harnesses | 715 | super simple | ✅ |
| SWE-Gym evalstrainingpython | Evaluation and benchmarking harnesses | 694 | slightly complex | ✅ |
| swe-smith trainingpython | Evaluation and benchmarking harnesses | 681 | slightly complex | ✅ |
| open-harness mcpmulti-agenttypescript | Libraries and SDKs | 571 | slightly complex | ✅ |
| inspect_evals evalssandbox | Evaluation and benchmarking harnesses | 547 | slightly complex | ✅ |
| langgraph-bigtool tool-discoverypython | Progressive disclosure harnesses | 542 | slightly complex | ✅ |
| RepoMaster workflowpython | Coding harness configs and SDKs | 529 | slightly complex | ❓ |
| claw-code-agent mcprustpythontypescript | Coding agent products (IDEs, CLIs, full suites) | 515 | slightly complex | ❓ |
| MCP-Zero tool-discovery | Progressive disclosure harnesses | 489 | complex | ✅ |
| AgentSilex python | Frameworks | 451 | super simple | ✅ |
| openagents | Research and task-specific harnesses | 427 | complex | ✅ |
| arc-agi-benchmarking evalsprovider-agnosticpython | Evaluation and benchmarking harnesses | 350 | mostly simple | ✅ |
| AutoHarness memorymulti-agentprovider-agnosticpython | Coding harness configs and SDKs | 326 | super simple | ✅ |
| AgentRL trainingpython | Multi-agent and orchestration | 302 | complex | ✅ |
| SuperAgentX multi-agentpython | Frameworks | 200 | mostly simple | ✅ |
| ToolGen tool-discoverypython | Progressive disclosure harnesses | 180 | complex | ❓ |
| VitaBench | Evaluation and benchmarking harnesses | 145 | complex | ✅ |
| AgencyBench evalssandboxpython | Evaluation and benchmarking harnesses | 87 | complex | ✅ |
| spring-ai-tool-search-tool tool-discovery | Progressive disclosure harnesses | 74 | mostly simple | ✅ |
| letta-evals memorypython | Evaluation and benchmarking harnesses | 72 | mostly simple | ✅ |
| SUPER sandboxpython | Evaluation and benchmarking harnesses | 53 | slightly complex | ✅ |
| ToolRAG mcptool-discovery | Progressive disclosure harnesses | 28 | mostly simple | ✅ |
| puppeteer-real-browser-mcp mcpbrowsertypescript | Plugins, MCPs, CLI tools | 23 | mostly simple | ❓ |
| TRAIL | Evaluation and benchmarking harnesses | 19 | mostly simple | ✅ |
| Community-curated agent lists | Libraries and SDKs | 11 | super simple | ❓ |
| Better-OpenCodeMCP mcptypescript | Plugins, MCPs, CLI tools | 8 | mostly simple | ✅ |
| coderClaw idetypescript | Coding agent products (IDEs, CLIs, full suites) | 3 | slightly complex | ❓ |
| pmstack evals | Coding harness configs and SDKs | 2 | super simple | ✅ |
| agentlog memoryclipython | Plugins, MCPs, CLI tools | 0 | super simple | ✅ |
Categories
Progressive disclosure harnesses7 projectsCoding agent products (IDEs, CLIs, full suites)11 projectsCoding harness configs and SDKs10 projectsPersonal agent runtimes7 projectsFrameworks23 projectsMulti-agent and orchestration8 projectsPlugins, MCPs, CLI tools12 projectsEvaluation and benchmarking harnesses16 projectsResearch and task-specific harnesses2 projectsLibraries and SDKs14 projects
Pick by use case
- I want a turnkey coding agent today — opencode, Cline, Codex, Gemini CLI, OpenHands
- I want an always-on personal agent that lives in my chat apps — OpenClaw, Hermes, Khoj, Agent Zero, OpenHarness (HKUDS)
- I want to extend Claude Code, Codex, or OpenCode with skills and slash commands — Anthropic Skills, everything-claude-code, superpowers, GStack, pmstack
- I want to build my own coding harness from scratch — Claude Agent SDK, Google ADK, AutoHarness, SWE-agent, RepoMaster
- I want a drop-in memory layer for agents — Mem0, claude-mem, agentlog, agno, letta
- I want to plug hundreds to thousands of tools without context bloat — MCP-Zero, ToolGen, ToolRAG, langgraph-bigtool, spring-ai-tool-search-tool
- I want multi-agent orchestration — openai-agents-python, crewAI, autogen, Microsoft Agent Framework, PraisonAI
- I want a general LLM app framework — langgraph, langchain, llama-index, pydantic-ai, agno
- I want low-code / visual workflows — langflow, Flowise, Dify, n8n
- I want browser-using agents — browser-use, WebVoyager, puppeteer-real-browser-mcp
- I want sandboxed code execution for agent-generated code — E2B, Daytona, smolagents, OpenHands
- I want to evaluate or benchmark agents — SWE-bench, AgencyBench, inspect_ai, WebArena, ARC-AGI-2
- I want a deep research / autonomous research agent — deepagents, gpt-researcher, openagents
- I want a provider-agnostic LLM pipe (not a framework) — LiteLLM, vercel/ai
Decision guides
- How to pick a harness — Six questions, in order. Each one eliminates most of the list; by the end you should be choosing between two or three projects, not 103. The
- Agent memory layers: Mem0 vs Letta vs claude-mem — "Add memory to my agent" hides three different products: a **memory API** you call from any agent (Mem0), an **agent runtime** where memory
- Multi-agent orchestration: OpenAI Agents SDK vs CrewAI vs AutoGen vs LangGraph — Four very different answers to "how should multiple agents coordinate?" — and the differences are architectural, not cosmetic. Picking wrong
- OpenClaw vs Hermes: the always-on personal-agent debate — The loudest harness argument of 2026. Both are open-source (MIT), self-hosted, always-on personal agents you talk to from chat apps — and th
- Terminal coding agents: opencode vs Codex vs Gemini CLI vs crush vs goose — The most-asked pick in this list: *"I want a turnkey coding agent in my terminal today."* These five are the open(ish)-source field. The clo
For agents
This list is published machine-readable so coding and research agents can recommend harnesses directly:
- harnesses.json — every project with tier, tags, axes, license, example, and use-case index.
- llms.txt — the whole list in one agent-readable file.
- harnesses.jsonld — schema.org Dataset + ItemList.
- feed.json — JSON Feed of refreshes.
- MCP server:
uvx agent-harnesses-mcp— pick_harness, search_harnesses, get_harness, comparisons.