# Best of Agent Harnesses

> Hand-curated, ranked list of 110 AI agent harnesses — the runtimes that close the loop between a stateless model and the outside world. 10 categories, a 4-tier adoption-surface rating (simplicity ↔ capability), capability tags, a license signal, and one concrete example link per project. Stars captured 2026-06-21.

Maintained at https://github.com/RyanAlberts/best-of-Agent-Harnesses (CC-BY-SA-4.0).
Structured data: https://raw.githubusercontent.com/RyanAlberts/best-of-Agent-Harnesses/main/harnesses.json
Tiers, least to most adoption surface: super simple → mostly simple → slightly complex → complex.

## Pick by use case

- I want a turnkey coding agent today: opencode (https://github.com/anomalyco/opencode), Cline (https://github.com/cline/cline), Codex (https://github.com/openai/codex), Gemini CLI (https://github.com/google-gemini/gemini-cli), OpenHands (https://github.com/OpenHands/OpenHands), crush (https://github.com/charmbracelet/crush), Roo Code (https://github.com/RooCodeInc/Roo-Code)
- I want an always-on personal agent that lives in my chat apps: OpenClaw (https://github.com/openclaw/openclaw), Hermes (https://github.com/NousResearch/hermes-agent), Khoj (https://github.com/khoj-ai/khoj), Agent Zero (https://github.com/agent0ai/agent-zero), OpenHarness (HKUDS) (https://github.com/HKUDS/OpenHarness)
- I want to extend Claude Code, Codex, or OpenCode with skills and slash commands: Anthropic Skills (https://github.com/anthropics/skills), everything-claude-code (https://github.com/affaan-m/ECC), superpowers (https://github.com/obra/superpowers), GStack (https://github.com/garrytan/gstack), pmstack (https://github.com/RyanAlberts/pmstack)
- I want to build my own coding harness from scratch: Claude Agent SDK (https://github.com/anthropics/claude-agent-sdk-python), Google ADK (https://github.com/google/adk-python), AutoHarness (https://github.com/aiming-lab/AutoHarness), SWE-agent (https://github.com/SWE-agent/SWE-agent), RepoMaster (https://github.com/QuantaAlpha/RepoMaster), claw-code-agent (https://github.com/HarnessLab/claw-code-agent)
- I want a drop-in memory layer for agents: Mem0 (https://github.com/mem0ai/mem0), claude-mem (https://github.com/thedotmack/claude-mem), agentlog (https://github.com/RyanAlberts/agentlog), agno (https://github.com/agno-agi/agno), letta (https://github.com/letta-ai/letta)
- I want to plug hundreds to thousands of tools without context bloat: MCP-Zero (https://github.com/xfey/MCP-Zero), ToolGen (https://github.com/Reason-Wang/ToolGen), ToolRAG (https://github.com/antl3x/ToolRAG), langgraph-bigtool (https://github.com/langchain-ai/langgraph-bigtool), spring-ai-tool-search-tool (https://github.com/spring-ai-community/spring-ai-tool-search-tool)
- I want multi-agent orchestration: openai-agents-python (https://github.com/openai/openai-agents-python), crewAI (https://github.com/crewAIInc/crewAI), autogen (https://github.com/microsoft/autogen), Microsoft Agent Framework (https://github.com/microsoft/agent-framework), PraisonAI (https://github.com/MervinPraison/PraisonAI), agent-squad (https://github.com/2FastLabs/agent-squad)
- I want a general LLM app framework: langgraph (https://github.com/langchain-ai/langgraph), langchain (https://github.com/langchain-ai/langchain), llama-index (https://github.com/run-llama/llama_index), pydantic-ai (https://github.com/pydantic/pydantic-ai), agno (https://github.com/agno-agi/agno)
- I want low-code / visual workflows: langflow (https://github.com/langflow-ai/langflow), Flowise (https://github.com/FlowiseAI/Flowise), Dify (https://github.com/langgenius/dify), n8n (https://github.com/n8n-io/n8n)
- I want browser-using agents: browser-use (https://github.com/browser-use/browser-use), WebVoyager (https://github.com/MinorJerry/WebVoyager), puppeteer-real-browser-mcp (https://github.com/withLinda/puppeteer-real-browser-mcp-server)
- I want sandboxed code execution for agent-generated code: E2B (https://github.com/e2b-dev/E2B), Daytona (https://github.com/daytonaio/daytona), smolagents (https://github.com/huggingface/smolagents), OpenHands (https://github.com/OpenHands/OpenHands)
- I want to evaluate or benchmark agents: SWE-bench (https://github.com/SWE-bench/SWE-bench), AgencyBench (https://github.com/GAIR-NLP/AgencyBench), inspect_ai (https://github.com/UKGovernmentBEIS/inspect_ai), WebArena (https://github.com/web-arena-x/webarena), ARC-AGI-2 (https://github.com/arcprize/ARC-AGI-2), VitaBench (https://github.com/meituan-longcat/vitabench)
- I want a deep research / autonomous research agent: deepagents (https://github.com/langchain-ai/deepagents), gpt-researcher (https://github.com/assafelovic/gpt-researcher), openagents (https://github.com/OpenAgentsInc/openagents)
- I want a provider-agnostic LLM pipe (not a framework): LiteLLM (https://github.com/BerriAI/litellm), vercel/ai (https://github.com/vercel/ai)

## FAQ

### What is the best agent harness if I want a turnkey coding agent today?
Top picks: opencode, Cline, Codex. See the “Coding agent products (IDEs, CLIs, full suites)” category for the full ranked list.

### What is the best agent harness if I want an always-on personal agent that lives in my chat apps?
Top picks: OpenClaw, Hermes, Khoj. See the “Personal agent runtimes” category for the full ranked list.

### What is the best agent harness if I want to extend Claude Code, Codex, or OpenCode with skills and slash commands?
Top picks: Anthropic Skills, everything-claude-code, superpowers. See the “Coding harness configs and SDKs” category for the full ranked list.

### What is the best agent harness if I want to build my own coding harness from scratch?
Top picks: Claude Agent SDK, Google ADK, AutoHarness. See the “Coding harness configs and SDKs” category for the full ranked list.

### What is the best agent harness if I want a drop-in memory layer for agents?
Top picks: Mem0, claude-mem, agentlog. See the “Plugins, MCPs, CLI tools” category for the full ranked list.

### What is the best agent harness if I want to plug hundreds to thousands of tools without context bloat?
Top picks: MCP-Zero, ToolGen, ToolRAG. See the “Progressive disclosure harnesses” category for the full ranked list.

### What is the best agent harness if I want multi-agent orchestration?
Top picks: openai-agents-python, crewAI, autogen. See the “Multi-agent and orchestration” category for the full ranked list.

### What is the best agent harness if I want a general LLM app framework?
Top picks: langgraph, langchain, llama-index. See the “Frameworks” category for the full ranked list.

### What is the best agent harness if I want low-code / visual workflows?
Top picks: langflow, Flowise, Dify. See the “Frameworks” category for the full ranked list.

### What is the best agent harness if I want browser-using agents?
Top picks: browser-use, WebVoyager, puppeteer-real-browser-mcp. See the “Plugins, MCPs, CLI tools” category for the full ranked list.

### What is the best agent harness if I want sandboxed code execution for agent-generated code?
Top picks: E2B, Daytona, smolagents. See the “Libraries and SDKs” category for the full ranked list.

### What is the best agent harness if I want to evaluate or benchmark agents?
Top picks: SWE-bench, AgencyBench, inspect_ai. See the “Evaluation and benchmarking harnesses” category for the full ranked list.

### What is the best agent harness if I want a deep research / autonomous research agent?
Top picks: deepagents, gpt-researcher, openagents. See the “Research and task-specific harnesses” category for the full ranked list.

### What is the best agent harness if I want a provider-agnostic LLM pipe (not a framework)?
Top picks: LiteLLM, vercel/ai. See the “Libraries and SDKs” category for the full ranked list.

### Which agent harnesses can run unattended (headless)?
Harnesses designed for unattended runs, batches, and fleets: opencode, OpenHands, goose, SWE-agent, Claude Agent SDK, RepoMaster, OpenClaw, Hermes.

### Which agent harnesses survive a crash mid-task (durable)?
Harnesses whose execution state persists across restarts: langgraph-bigtool, n8n, langgraph, mastra, letta, deepagents, pydantic-ai, Cloudflare Agents.

### How many of these agent harnesses are open source?
96 of 110 carry a standard open-source license; the rest are source-available or unclear, and flagged per row.

### What is an agent harness?
The runtime that turns a model into an agent: it decides what the model's reasoning is allowed to touch, and supplies the orchestration, tool wiring, memory, error recovery, and guardrails around per-turn inference.

### How is this list ranked?
By relevance to harness concerns (environment, orchestration, lifecycle, guardrails) and by GitHub stars (captured 2026-06-21); each project also carries an adoption-surface tier and autonomy/recovery scores.

### How can an AI agent use this list directly?
Three machine-readable surfaces: harnesses.json (structured), llms.txt (one file), and an MCP server (uvx agent-harnesses-mcp) exposing pick_harness and search_harnesses.

## Progressive disclosure harnesses (7 projects)

Formats, runtimes, and patterns that reveal context, tools, or instructions in layers—index first, details on demand—to control tokens and improve agent focus (the "map, not encyclopedia" principle).

- [awesome-cursorrules](https://github.com/PatrickJS/awesome-cursorrules) — ⭐40k, super simple, autonomy: n/a, recovery: n/a, open-source: Curated .cursorrules and skills that leverage Cursor's index-then-load model; the canonical collection for rules-as-progressive-disclosure in the IDE. [ide]
- [agents.md](https://github.com/agentsmd/agents.md) — ⭐22.4k, super simple, autonomy: n/a, recovery: n/a, open-source: Open format for repo-scoped agent briefings; v1.1 adds hierarchical scope and progressive disclosure so agents get a map of what exists, then load only what's relevant. [typescript]
- [langgraph-bigtool](https://github.com/langchain-ai/langgraph-bigtool) — ⭐542, slightly complex, autonomy: bounded, recovery: durable, open-source: Build LangGraph agents with large tool sets; retrieval and on-demand tool loading so agents scale beyond context without stuffing every schema upfront. [tool-discovery, python]
- [MCP-Zero](https://github.com/xfey/MCP-Zero) — ⭐489, complex, autonomy: bounded, recovery: none, open-source: Active tool discovery for autonomous agents: model requests tools by requirement; hierarchical semantic routing over 308 servers / 2,797 tools with ~98% token reduction (APIBank). [tool-discovery]
- [ToolGen](https://github.com/Reason-Wang/ToolGen) — ⭐180, complex, autonomy: n/a, recovery: n/a, unknown: ICLR 2025: unified tool retrieval and calling via generation; 47k+ tools without context stuffing—retrieval and invocation in one generative step. [tool-discovery, python]
- [spring-ai-tool-search-tool](https://github.com/spring-ai-community/spring-ai-tool-search-tool) — ⭐74, mostly simple, autonomy: n/a, recovery: n/a, open-source: Dynamic tool discovery for Spring AI: model gets a search tool first, then pulls definitions for relevant tools; 34–64% token reduction across providers. [tool-discovery]
- [ToolRAG](https://github.com/antl3x/ToolRAG) — ⭐28, mostly simple, autonomy: n/a, recovery: n/a, open-source: Semantic tool retrieval for LLMs; serves only the tools the user query demands (MCP-compatible), unlimited tool sets with zero context penalty. [mcp, tool-discovery]

## Coding agent products (IDEs, CLIs, full suites) (11 projects)

Turnkey coding agents you install and run: IDE extensions, terminal CLIs, Dockerized workspaces. Each entry notes which part is the harness (the agent loop, tool wiring, approval model) versus the UI shell (VS Code extension, TUI, browser client).

- [opencode](https://github.com/anomalyco/opencode) — ⭐177k, slightly complex, autonomy: headless, recovery: resumable, open-source: Open-source terminal coding agent (formerly `sst/opencode`; transferred to anomalyco). The **harness** is a multi-provider tool-call loop (Claude, OpenAI, Gemini, local) with strong plugin and MCP support; the TUI is the shell. 100% OSS, very actively shipped. [mcp, provider-agnostic, cli, tui, typescript]
- [Gemini CLI](https://github.com/google-gemini/gemini-cli) — ⭐105k, slightly complex, autonomy: bounded, recovery: resumable, open-source: Google's first-party terminal agent for Gemini. The **harness** is the plugin/MCP tool-call loop; the terminal is the shell—Google's parallel to Claude Code / Codex, not just an API. [mcp, cli, typescript]
- [Codex](https://github.com/openai/codex) — ⭐92.4k, slightly complex, autonomy: bounded, recovery: resumable, open-source: OpenAI's terminal coding agent. The **harness** is the sandboxed tool-call loop with multi-provider support; the CLI is the shell. Reference implementation for "official CLI that ships code." [sandbox, provider-agnostic, cli]
- [OpenHands](https://github.com/OpenHands/OpenHands) — ⭐77.9k, complex, autonomy: headless, recovery: resumable, restricted ((multi-license)): Dockerized software-engineering agent. The **harness** is the bash/editor/browser toolset with micro-agents and event-stream session bridging; Docker is the sandbox. Main OSS choice for teams self-hosting autonomous repo work. [memory, browser, sandbox, python]
- [Open Interpreter](https://github.com/openinterpreter/openinterpreter) — ⭐64.1k, mostly simple, autonomy: bounded, recovery: resumable, open-source: Lightweight terminal coding agent oriented to open models (DeepSeek, Kimi, Qwen). The **harness** is a code-execution loop — the model writes code, the harness executes it with confirmation gates; the CLI is the shell. The original "let the LLM run code on my machine" project, reborn for open weights. [cli, python]
- [Cline](https://github.com/cline/cline) — ⭐63.6k, slightly complex, autonomy: step-gated, recovery: resumable, open-source: VS Code extension whose **harness** is a plan-then-act loop with per-step human approval and cost transparency; the VS Code integration is the UI shell. Open-source counterweight to Cursor. [ide, typescript]
- [goose](https://github.com/aaif-goose/goose) — ⭐50k, slightly complex, autonomy: headless, recovery: resumable, open-source: Block-originated Rust agent, now stewarded by the Linux Foundation's Agentic AI Foundation (`aaif-goose/goose`). The **harness** is the MCP/ACP extension model with recipes and provider choice; there's no fixed UI slot—you bolt it into whatever shell you use. [mcp, rust]
- [crush](https://github.com/charmbracelet/crush) — ⭐25.5k, slightly complex, autonomy: bounded, recovery: resumable, restricted (FSL-1.1-MIT): Charm's terminal coding agent (Charm's fork of the original OpenCode). The **harness** is the tool-calling loop with session persistence; the Bubble Tea TUI is the shell. [memory, cli, tui]
- [Roo Code](https://github.com/RooCodeInc/Roo-Code) — ⭐24.2k, slightly complex, autonomy: step-gated, recovery: resumable, open-source: VS Code/Cursor extension in the Cline lineage. The **harness** is the approval-gated agent with custom modes and a strong MCP story; the IDE is the UI. Popular community fork when you want that workflow without the upstream extension. [mcp, workflow, ide, typescript]
- [claw-code-agent](https://github.com/HarnessLab/claw-code-agent) — ⭐515, slightly complex, autonomy: checkpoint-gated, recovery: none, unknown: Python reimplementation of the Claude Code agent architecture with zero external dependencies; interactive chat, streaming, plugin runtime, nested agent delegation, cost tracking, MCP transport—portable harness without the Rust/TS toolchain. [mcp, rust, python, typescript]
- [coderClaw](https://github.com/SeanHogg/BuilderForceAgents) — ⭐3, slightly complex, autonomy: bounded, recovery: none, unknown: Self-hosted multi-role coding system (Creator, Reviewer, Test, Refactor, etc.) with AST and semantic maps; IDE-agnostic, chat-channel triggers. [ide, typescript]

## Coding harness configs and SDKs (10 projects)

Skill packs, slash-command libraries, meta-prompting frameworks, and official SDKs that give you the harness (the agent loop, planning, memory, hooks) without bundling a specific IDE or CLI shell.

- [superpowers](https://github.com/obra/superpowers) — ⭐235k, complex, autonomy: n/a, recovery: n/a, open-source: Performance-oriented harness pack for Claude Code, Codex, OpenCode, Cursor: skills, instincts, memory, security, research-first workflows. Treats harness engineering itself as the performance lever. [memory, ide]
- [everything-claude-code](https://github.com/affaan-m/ECC) — ⭐219k, complex, autonomy: n/a, recovery: n/a, open-source: The breakout 2026 harness pack for Claude Code: 28 specialized subagents, 119 reusable skills, 60 slash commands, 34 rules, 20+ automated hooks. Ships a full "AI engineering team" as config. [multi-agent]
- [Anthropic Skills](https://github.com/anthropics/skills) — ⭐153k, mostly simple, autonomy: n/a, recovery: n/a, open-source: Anthropic's official Agent Skills repository: SKILL.md-based folders (instructions, scripts, resources) Claude dynamically loads on Claude Code, Claude.ai, and the API. The reference for progressive-disclosure skill packs in 2026.
- [GStack](https://github.com/garrytan/gstack) — ⭐112k, slightly complex, autonomy: n/a, recovery: n/a, open-source: Garry Tan's Claude Code skill stack: 23 slash-command modes (CEO/eng/design review, QA, ship, browse, retro, …) that structure one assistant as a virtual engineering team. Daily driver while running YC. [typescript]
- [get-shit-done](https://github.com/gsd-build/get-shit-done) — ⭐64.4k, mostly simple, autonomy: bounded, recovery: resumable, open-source: Goal-backward planning and wave-based execution over fresh context windows; avoids context rot by design. Python/JS meta-prompting for Claude Code, OpenCode, Gemini CLI. [cli, python]
- [SWE-agent](https://github.com/SWE-agent/SWE-agent) — ⭐19.6k, slightly complex, autonomy: headless, recovery: resumable, open-source: LM-driven harness built for SWE-bench: edit state, command execution, and issue-focused loop—the reference agent stack next to the benchmark itself. [memory, evals, python]
- [Claude Agent SDK](https://github.com/anthropics/claude-agent-sdk-python) — ⭐7.4k, complex, autonomy: headless, recovery: resumable, open-source: Official Anthropic SDK (Python + [TypeScript](https://github.com/anthropics/claude-agent-sdk-typescript), [demos](https://github.com/anthropics/claude-agent-sdk-demos), [quickstarts](https://github.com/anthropics/claude-quickstarts)): built-in tools, MCP, long-running coding agents with session bridging. [mcp, memory, python, typescript]
- [RepoMaster](https://github.com/QuantaAlpha/RepoMaster) — ⭐529, slightly complex, autonomy: headless, recovery: none, unknown: Repo-scoped research harness: builds function-call and module-dependency graphs to explore only what's needed; large relative gains on MLE-bench and GitTaskBench with lower token use. [workflow, python]
- [AutoHarness](https://github.com/aiming-lab/AutoHarness) — ⭐326, super simple, autonomy: bounded, recovery: none, open-source: Lightweight governance harness: wraps any LLM client in ~2 lines for automated harness engineering—6–14 step pipeline, YAML constitution, risk-pattern matching, session persistence with cost tracking, multi-agent profiles. [memory, multi-agent, provider-agnostic, python]
- [pmstack](https://github.com/RyanAlberts/pmstack) — ⭐2, super simple, autonomy: n/a, recovery: n/a, open-source: Claude Code config for AI product managers: CLAUDE.md plus skills for competitive analysis, PRD-from-signal, metric frameworks, stakeholder briefs, and agent eval design. "GStack for PMs." [evals]

## Personal agent runtimes (7 projects)

Always-on, self-hosted agents you run as a daemon and talk to from chat apps: gateway runtimes, second brains, and self-improving assistants. The agent as a product you operate, not a library you build with.

- [OpenClaw](https://github.com/openclaw/openclaw) — ⭐380k, complex, autonomy: headless, recovery: resumable, open-source: Self-hosted, always-on personal agent (formerly Clawdbot/Moltbot): a gateway + event-loop runtime that treats messages, heartbeats, crons, and webhooks as one input queue, persists state to local files, and lives in your chat apps (WhatsApp, Telegram, Slack, Discord). 13,700+ community skills; the fastest-growing repo in GitHub history. [typescript, multi-agent]
- [Hermes](https://github.com/NousResearch/hermes-agent) — ⭐199k, slightly complex, autonomy: headless, recovery: resumable, open-source: Nous Research's self-improving agent: a learning loop turns experience into reusable skills, builds a persistent user model across sessions, and checkpoints state to disk with rollback; lean enough for a $5 VPS, driven from chat, and model-agnostic (Nous Portal, OpenRouter, OpenAI, or any endpoint). [memory, python, provider-agnostic]
- [Khoj](https://github.com/khoj-ai/khoj) — ⭐35.2k, complex, autonomy: headless, recovery: resumable, open-source: Self-hostable "AI second brain": answers over your docs and the web, custom agents, scheduled automations, and multi-client reach (web, Obsidian, Emacs, WhatsApp). A personal-agent harness with retrieval at the core. [python]
- [Eliza](https://github.com/elizaOS/eliza) — ⭐18.6k, complex, autonomy: headless, recovery: resumable, open-source: Open "agentic operating system" (elizaOS): persistent multi-agent runtime with character files, a plugin ecosystem, and social/platform integrations — the harness behind a large share of autonomous social agents. [memory, multi-agent, typescript]
- [Agent Zero](https://github.com/agent0ai/agent-zero) — ⭐18.1k, slightly complex, autonomy: bounded, recovery: resumable, unknown: Organic, prompt-defined personal agent framework: hierarchical sub-agents, persistent memory, browser and code tools, and self-modifying behavior; runs in Docker with a web UI. [memory, multi-agent, browser, sandbox, python]
- [OpenHarness (HKUDS)](https://github.com/HKUDS/OpenHarness) — ⭐14k, complex, autonomy: bounded, recovery: resumable, open-source: Open agent harness with a built-in personal agent ("Ohmo") that runs across Feishu, Slack, Telegram, and Discord; core tool-use, skills, memory, multi-agent coordination with auto-compaction for multi-day sessions. [memory, multi-agent]
- [AIlice](https://github.com/myshell-ai/AIlice) — ⭐1.4k, slightly complex, autonomy: bounded, recovery: none, open-source: Fully autonomous general-purpose agent; one binary, Docker-ready, for when you want "set goal and walk away" without a framework. [sandbox, python]

## Frameworks (23 projects)

General-purpose agent and LLM application frameworks (the app layer, not harnesses per se).

- [n8n](https://github.com/n8n-io/n8n) — ⭐193k, complex, autonomy: headless, recovery: durable, restricted (Fair-code): Fair-code workflow engine with 400+ nodes and native AI nodes; the self-hosted Zapier that actually does agents and LangChain. [workflow, local, typescript]
- [AutoGPT](https://github.com/Significant-Gravitas/AutoGPT) — ⭐185k, complex, autonomy: headless, recovery: resumable, restricted (Polyform-SU): The original autonomous loop: goal in, agent iterates with tools and memory; Forge is the dev framework, Benchmark the eval harness. [memory, evals, python]
- [langflow](https://github.com/langflow-ai/langflow) — ⭐150k, complex, autonomy: headless, recovery: retry, open-source: Low-code UI to build and deploy LangChain/LangGraph flows; visual DAG editor and one-click run. [low-code, python]
- [Dify](https://github.com/langgenius/dify) — ⭐146k, complex, autonomy: headless, recovery: retry, restricted (Fair-code): One-stop LLM app platform: visual workflows, RAG pipeline, 50+ tools, model management; "ship from prototype to prod" in a single UI. [low-code, rag, python]
- [langchain](https://github.com/langchain-ai/langchain) — ⭐140k, complex, autonomy: bounded, recovery: retry, open-source: Chains, tools, retrievers, and agents; the usual entry point for "add tools to an LLM" in Python/JS. [python]
- [browser-use](https://github.com/browser-use/browser-use) — ⭐99.9k, slightly complex, autonomy: bounded, recovery: retry, open-source: Python layer over Playwright: natural-language goals become browser actions—web-agent loop without hand-rolling MCP or a custom driver for every site. [mcp, browser, python]
- [Flowise](https://github.com/FlowiseAI/Flowise) — ⭐53.9k, complex, autonomy: headless, recovery: retry, restricted (Apache+CLA): Drag-and-drop LangChain UI; deploy flows without code. The low-code sibling to Langflow, with a different component and hosting story. [low-code, typescript]
- [llama-index](https://github.com/run-llama/llama_index) — ⭐50.2k, complex, autonomy: bounded, recovery: retry, open-source: Data-centric: indexing, RAG, and query engines; agent abstractions sit on top of your data pipelines. [rag, python]
- [agno](https://github.com/agno-agi/agno) — ⭐40.8k, complex, autonomy: bounded, recovery: resumable, open-source: Python agents with memory, knowledge bases, tools, and structured outputs; continues the PhiData-era product line under the Agno name—production apps, evals, and pipelines. [memory, evals, python]
- [langgraph](https://github.com/langchain-ai/langgraph) — ⭐35.3k, slightly complex, autonomy: headless, recovery: durable, open-source: State-machine graphs over LLM steps; checkpointing, human-in-the-loop, and durable execution so workflows survive restarts. [workflow, python]
- [semantic-kernel](https://github.com/microsoft/semantic-kernel) — ⭐28.2k, complex, autonomy: bounded, recovery: retry, open-source: Microsoft's plugin and planner layer for LLMs; C#, Python, Java; strong on enterprise auth and orchestration. [python]
- [mastra](https://github.com/mastra-ai/mastra) — ⭐25.3k, slightly complex, autonomy: bounded, recovery: durable, restricted (Elastic-2.0): TypeScript-first; agents, tools, and workflows with a single runtime and minimal boilerplate. [typed, typescript]
- [letta](https://github.com/letta-ai/letta) — ⭐23.4k, mostly simple, autonomy: headless, recovery: durable, open-source: Python agent runtime with tool use and control flow; lean API; stateful agents with long-horizon memory. [memory, python]
- [rasa](https://github.com/RasaHQ/rasa) — ⭐21.2k, complex, autonomy: headless, recovery: resumable, open-source: Conversational AI stack (NLU, dialogue, actions); long-standing OSS choice for chat and voice bots. [voice, python]
- [Google ADK](https://github.com/google/adk-python) — ⭐20.2k, complex, autonomy: headless, recovery: resumable, open-source: Google's official Agent Development Kit: code-first Python toolkit for building, evaluating, and deploying agents. Optimized for Gemini but model-agnostic; deploys to Cloud Run / Vertex AI; ships a dev UI with eval and a code-execution sandbox. [evals, sandbox, python]
- [botpress](https://github.com/botpress/botpress) — ⭐14.7k, complex, autonomy: headless, recovery: resumable, open-source: Visual bot builder and runtime; multi-channel, open-source alternative to commercial bot platforms. [low-code, typescript]
- [R2R](https://github.com/SciPhi-AI/R2R) — ⭐7.9k, complex, autonomy: headless, recovery: retry, open-source: RAG-first: hybrid search, knowledge graphs, multimodal; the framework for "production RAG" when you care more about retrieval than chat UI. [vision, rag, workflow, python]
- [agent-squad](https://github.com/2FastLabs/agent-squad) — ⭐7.7k, slightly complex, autonomy: bounded, recovery: resumable, open-source: AWS-originated orchestrator (now under 2FastLabs): intent classification, streaming, SupervisorAgent; "agent-as-tools" so one agent delegates to a squad. [multi-agent]
- [AgentVerse](https://github.com/OpenBMB/AgentVerse) — ⭐5.1k, complex, autonomy: headless, recovery: none, open-source: Task-solving and simulation envs for multi-LLM agents; deploy many agents in custom environments without building infra from scratch. [multi-agent, python]
- [Bee Agent Framework](https://github.com/i-am-bee/beeai-framework) — ⭐3.3k, complex, autonomy: bounded, recovery: resumable, open-source: Python + TypeScript, LF AI–backed; MCP/ACP, workflows, Requirement Agent; the one that pushes "production multi-agent" without LangChain. [mcp, multi-agent, python, typescript]
- [AgentStack](https://github.com/agentstack-ai/AgentStack) — ⭐2.2k, slightly complex, autonomy: n/a, recovery: n/a, open-source: Scaffolds full agent projects; plugs in CrewAI, LangGraph, OpenAI Swarm, LlamaStack and wires AgentOps observability from day one.
- [AgentSilex](https://github.com/howl-anderson/agentsilex) — ⭐451, super simple, autonomy: bounded, recovery: none, open-source: ~300 lines of readable agent code on top of LiteLLM; the "I want to see the whole loop" option for learning or minimal production. [python]
- [SuperAgentX](https://github.com/superagentxai/superagentx) — ⭐200, mostly simple, autonomy: bounded, recovery: none, open-source: Lightweight multi-agent orchestrator with an AGI-angle; minimal surface, docs-first, for teams that want orchestration without the kitchen sink. [multi-agent, python]

## Multi-agent and orchestration (8 projects)

Harnesses and patterns for multi-agent coordination and handoffs.

- [MetaGPT](https://github.com/FoundationAgents/MetaGPT) — ⭐68.9k, complex, autonomy: headless, recovery: resumable, open-source: The "AI software company" multi-agent framework: role-played PM, architect, and engineer agents turn a one-line requirement into specs, designs, and code along an SOP assembly line. The landmark of the genre; development pace has slowed in 2026. [multi-agent, python]
- [autogen](https://github.com/microsoft/autogen) — ⭐59.1k, complex, autonomy: bounded, recovery: resumable, open-source: Conversable agents and group chats; code execution and human-in-the-loop; Microsoft origin, AG2 ecosystem. [multi-agent, python]
- [crewAI](https://github.com/crewAIInc/crewAI) — ⭐54.1k, complex, autonomy: bounded, recovery: resumable, open-source: Role-based agents (roles, goals, backstories) in Crews; Flows add event-driven and hierarchical control for production. [python]
- [ChatDev](https://github.com/OpenBMB/ChatDev) — ⭐33.5k, slightly complex, autonomy: headless, recovery: none, open-source: Multi-agent software-company simulation (CEO, CTO, programmer, tester) built on chat chains with communicative dehallucination; ChatDev 2.0 continues the line. MetaGPT's conversational sibling. [python]
- [openai-agents-python](https://github.com/openai/openai-agents-python) — ⭐27.3k, mostly simple, autonomy: bounded, recovery: resumable, open-source: Handoffs, guardrails, and multi-LLM routing; minimal surface so you own the loop. [python]
- [Microsoft Agent Framework](https://github.com/microsoft/agent-framework) — ⭐11.5k, slightly complex, autonomy: bounded, recovery: resumable, open-source: Microsoft's convergence of AutoGen and Semantic Kernel: build, orchestrate, and deploy agents and multi-agent workflows in Python and .NET, with graph-based workflows and checkpointing — the designated successor harness for both lines. [multi-agent, workflow, python]
- [PraisonAI](https://github.com/MervinPraison/PraisonAI) — ⭐8.2k, mostly simple, autonomy: bounded, recovery: none, open-source: Autonomous multi-agent teams with a single entry point; emphasis on minimal config. [multi-agent, python]
- [AgentRL](https://github.com/THUDM/AgentRL) — ⭐302, complex, autonomy: headless, recovery: resumable, open-source: Multitask, multiturn RL for LLM agents; Ray-based scaling, rollout/actor workers—for teams that want to train agents, not just run them. [training, python]

## Plugins, MCPs, CLI tools (12 projects)

IDE plugins, concrete MCP servers, and CLI tools that give agents tools and context.

- [claude-mem](https://github.com/thedotmack/claude-mem) — ⭐83.5k, slightly complex, autonomy: n/a, recovery: n/a, open-source: Claude Code plugin that captures everything an agent does during a session, AI-compresses it (via claude-agent-sdk), and injects the relevant context into future sessions—session-to-session memory as a drop-in. [memory]
- [aider](https://github.com/Aider-AI/aider) — ⭐46.5k, slightly complex, autonomy: checkpoint-gated, recovery: resumable, open-source: Git-aware CLI pair programmer; edits in-repo, supports multiple models and MCP so agents see version control and tools. [mcp, cli, python]
- [continue](https://github.com/continuedev/continue) — ⭐34.2k, complex, autonomy: checkpoint-gated, recovery: resumable, open-source: Open-source IDE extension (VS Code, JetBrains); in-editor completion and chat with local or API models. [ide, typescript]
- [github-mcp-server](https://github.com/github/github-mcp-server) — ⭐30.9k, slightly complex, autonomy: n/a, recovery: n/a, open-source: GitHub's official MCP server (Go): repos, issues, PRs, code search, Actions. Replaces the older community `cyanheads/github-mcp-server` as the canonical way to give agents GitHub access. [mcp]
- [MCP Python SDK](https://github.com/modelcontextprotocol/python-sdk) — ⭐23.4k, mostly simple, autonomy: n/a, recovery: n/a, open-source: Official SDK to build and consume MCP servers/clients in Python; stdio and SSE transports. [mcp, python]
- [MCP TypeScript SDK](https://github.com/modelcontextprotocol/typescript-sdk) — ⭐12.7k, mostly simple, autonomy: n/a, recovery: n/a, open-source: Official MCP implementation for Node/TS; reference for the protocol. [mcp, typescript]
- [MCP Inspector](https://github.com/modelcontextprotocol/inspector) — ⭐10.1k, super simple, autonomy: n/a, recovery: n/a, open-source: GUI to test and debug MCP servers; inspect tools, resources, and prompts. [mcp, typescript]
- [MCP Registry](https://github.com/modelcontextprotocol/registry) — ⭐6.9k, slightly complex, autonomy: n/a, recovery: n/a, open-source: Official, community-driven registry for MCP servers—the "app store" MCP clients use to discover servers. Maintained by Anthropic + ecosystem maintainers; v0.1 API frozen, production-grade. [mcp]
- [Docker MCP Gateway](https://github.com/docker/mcp-gateway) — ⭐1.5k, slightly complex, autonomy: n/a, recovery: n/a, open-source: Docker's official MCP CLI plugin / gateway; container-aware MCP tooling from Docker (replaces deprecated `docker/mcp-servers` path). [mcp, sandbox, cli]
- [puppeteer-real-browser-mcp](https://github.com/withLinda/puppeteer-real-browser-mcp-server) — ⭐23, mostly simple, autonomy: n/a, recovery: n/a, unknown: Puppeteer MCP with real-browser and anti-detection; for agents that need to drive sites that block headless. [mcp, browser, typescript]
- [Better-OpenCodeMCP](https://github.com/ajhcs/Better-OpenCodeMCP) — ⭐8, mostly simple, autonomy: n/a, recovery: n/a, open-source: MCP server for OpenCode/Crush: async task execution, model bridging (e.g. Claude→Gemini), process pooling. [mcp, typescript]
- [agentlog](https://github.com/RyanAlberts/agentlog) — ⭐0, super simple, autonomy: n/a, recovery: n/a, open-source: Persistent decision memory for any project: `remember`, `recall`, `reflect`. Single-file Python CLI that stores decisions as JSONL and uses Claude or Gemini to retrieve and synthesize patterns—Karpathy's LLM Wiki concept as a CLI. [memory, cli, python]

## Evaluation and benchmarking harnesses (16 projects)

Agentic eval systems, reasoning benchmarks, and open agent benchmarks.

- [Agent Lightning](https://github.com/microsoft/agent-lightning) — ⭐17.3k, complex, autonomy: headless, recovery: resumable, open-source: Microsoft's training-oriented harness: optimization loops for agent behavior—when you need to improve policies over rollouts, not only score a fixed prompt. [evals, training, python]
- [SWE-bench](https://github.com/SWE-bench/SWE-bench) — ⭐5.2k, slightly complex, autonomy: headless, recovery: resumable, open-source: LMs resolve real GitHub issues; Docker harness, instance IDs; standard for code-agent evals. [evals, sandbox, python]
- [AgentBench](https://github.com/THUDM/AgentBench) — ⭐3.5k, complex, autonomy: headless, recovery: none, open-source: ICLR'24 benchmark: agents across AlfWorld, DB, knowledge graphs, OS, webshop; Docker Compose, function-calling interface. [evals, sandbox, rag, workflow, python]
- [inspect_ai](https://github.com/UKGovernmentBEIS/inspect_ai) — ⭐2.2k, complex, autonomy: headless, recovery: resumable, open-source: Inspect AI core: composable eval tasks, sandboxes, scorers, and multi-model runs; the framework behind inspect_evals, not just the task bundle. [evals, sandbox, python]
- [WebArena](https://github.com/web-arena-x/webarena) — ⭐1.5k, complex, autonomy: headless, recovery: none, open-source: Realistic web env (e.g. e‑commerce, CMS, dev tools); 812 tasks; measures end-to-end web agent success. [python]
- [WebVoyager](https://github.com/MinorJerry/WebVoyager) — ⭐1.1k, slightly complex, autonomy: headless, recovery: none, open-source: End-to-end web agent with LMMs: screenshots + actions on real sites; benchmark on 15 sites, GPT-4V for automatic eval. [evals, vision]
- [ARC-AGI-2](https://github.com/arcprize/ARC-AGI-2) — ⭐715, super simple, autonomy: n/a, recovery: n/a, open-source: ARC Prize task set: grid-based abstraction/reasoning; public and private splits for generalization.
- [SWE-Gym](https://github.com/SWE-Gym/SWE-Gym) — ⭐694, slightly complex, autonomy: headless, recovery: none, open-source: Training and evaluation for SWE agents and verifiers (ICML 2025). [evals, training, python]
- [swe-smith](https://github.com/SWE-bench/SWE-smith) — ⭐681, slightly complex, autonomy: headless, recovery: none, open-source: Data generation for SWE agents; 50k+ instances across 128 repos; used for SWE-agent-LM training. [training, python]
- [inspect_evals](https://github.com/UKGovernmentBEIS/inspect_evals) — ⭐547, slightly complex, autonomy: headless, recovery: resumable, open-source: UK AISI/Arcadia/Vector: GAIA and other evals in Inspect AI; level 1–3, sandboxed, tool-calling solvers. [evals, sandbox]
- [arc-agi-benchmarking](https://github.com/arcprize/arc-agi-benchmarking) — ⭐350, mostly simple, autonomy: headless, recovery: retry, open-source: Runner for ARC-AGI: multi-provider (OpenAI, Anthropic, Gemini, etc.), rate limits, retries, and scoring. [evals, provider-agnostic, python]
- [VitaBench](https://github.com/meituan-longcat/vitabench) — ⭐145, complex, autonomy: headless, recovery: none, open-source: ICLR'26: 66 tools, real-world apps (delivery, travel, retail); 100 cross-scenario + 300 single-scenario tasks; adopted by Qwen/Seed.
- [AgencyBench](https://github.com/GAIR-NLP/AgencyBench) — ⭐87, complex, autonomy: headless, recovery: none, open-source: Long-horizon agent benchmark: 32 scenarios, 138 tasks, ~1M tokens and ~90 tool calls; Docker sandbox and rubric-based + LLM judges. [evals, sandbox, python]
- [letta-evals](https://github.com/letta-ai/letta-evals) — ⭐72, mostly simple, autonomy: headless, recovery: none, open-source: Eval harness for stateful Letta agents; configurable suites and grading (LLM or rule-based) so you can measure what you ship. [memory, python]
- [SUPER](https://github.com/allenai/super-benchmark) — ⭐53, slightly complex, autonomy: headless, recovery: none, open-source: Agents that set up and run ML/NLP from GitHub repos; 45 expert problems, 152 masked tasks, 602 AutoGen tasks; Docker-based. [sandbox, python]
- [TRAIL](https://github.com/patronus-ai/trail-benchmark) — ⭐19, mostly simple, autonomy: n/a, recovery: n/a, open-source: Trace reasoning and agentic issue localization; 148 long-context traces, 841 errors, 20+ error types; Hugging Face dataset.

## Research and task-specific harnesses (2 projects)

Deep research, document QA, and domain-specific agent loops.

- [gpt-researcher](https://github.com/assafelovic/gpt-researcher) — ⭐27.8k, complex, autonomy: bounded, recovery: retry, open-source: Autonomous deep-research agent: web + local sources, citation-grounded reports, multi-agent and deep-research modes. The reference open-source research harness. [multi-agent, python]
- [openagents](https://github.com/OpenAgentsInc/openagents) — ⭐427, complex, autonomy: headless, recovery: resumable, open-source: Platform for autonomous agents and autopilot-style workflows; decentralized/Nostr-oriented (Pylon runtime, actively shipped in 2026).

## Libraries and SDKs (14 projects)

Lightweight runtimes, tool loops, and provider-agnostic harness primitives.

- [Daytona](https://github.com/daytonaio/daytona) — ⭐72.4k, slightly complex, autonomy: n/a, recovery: n/a, open-source: Elastic dev environments for AI-generated code: workspaces, Git, previews—infra harness between "the model wrote a patch" and "it ran in a real machine." [sandbox]
- [Mem0](https://github.com/mem0ai/mem0) — ⭐59k, slightly complex, autonomy: n/a, recovery: n/a, open-source: Universal memory layer for AI agents: stores user/org/session memory, retrieves on demand. Apache-2.0; the de-facto memory primitive paired with most harnesses in 2026. [memory, python]
- [LiteLLM](https://github.com/BerriAI/litellm) — ⭐51k, mostly simple, autonomy: n/a, recovery: retry, open-source: One interface to 100+ LLMs; routing, caching, budgets. Not an agent framework—the pipe every agent framework uses. [provider-agnostic, python]
- [Composio](https://github.com/ComposioHQ/composio) — ⭐28.9k, complex, autonomy: n/a, recovery: n/a, open-source: 1,000+ toolkits with auth, tool search, and a sandboxed workbench—drop-in tool layer so agents stop reinventing OAuth + integrations. Python and TypeScript. [sandbox, tool-discovery, python, typescript]
- [smolagents](https://github.com/huggingface/smolagents) — ⭐27.9k, mostly simple, autonomy: bounded, recovery: none, open-source: Code-as-action agents: model outputs Python executed in sandbox (E2B, Modal, etc.); ~1k LOC core. [sandbox, python]
- [vercel/ai](https://github.com/vercel/ai) — ⭐25k, slightly complex, autonomy: bounded, recovery: retry, open-source: React and Node SDK for streaming, tool calls, and agent-style UIs; provider-agnostic. [provider-agnostic, typescript]
- [deepagents](https://github.com/langchain-ai/deepagents) — ⭐24.9k, slightly complex, autonomy: bounded, recovery: durable, open-source: LangChain's Python+TypeScript agent harness on top of LangGraph: planning tool, virtual filesystem, shell sandbox, sub-agent spawning—the "Claude Code-style" harness as a reusable library. [multi-agent, sandbox, python, typescript]
- [pydantic-ai](https://github.com/pydantic/pydantic-ai) — ⭐17.9k, slightly complex, autonomy: bounded, recovery: durable, open-source: Type-safe Python agents with Pydantic I/O; multi-provider, MCP, Logfire observability, and human-in-the-loop. [mcp, typed, provider-agnostic, python]
- [E2B](https://github.com/e2b-dev/E2B) — ⭐12.7k, slightly complex, autonomy: n/a, recovery: n/a, open-source: Firecracker sandboxes for executing agent-generated code; the hosted isolation layer many tool-calling demos use instead of running arbitrary LLM output on your laptop. [sandbox, python]
- [strands-agents](https://github.com/strands-agents/harness-sdk) — ⭐6.2k, mostly simple, autonomy: bounded, recovery: resumable, open-source: Model-driven Python SDK; decorators for tools, native MCP, multi-agent; "minimal code" without sacrificing provider choice. [mcp, multi-agent, typed, python]
- [Cloudflare Agents](https://github.com/cloudflare/agents) — ⭐5.1k, slightly complex, autonomy: headless, recovery: durable, open-source: Persistent, stateful agents on Durable Objects: state, websockets, scheduling, and AI chat baked in. The serverless answer to "where does the agent live?" [memory, typescript]
- [openai-agents-js](https://github.com/openai/openai-agents-js) — ⭐3.3k, slightly complex, autonomy: bounded, recovery: resumable, open-source: Official OpenAI Agents SDK for Node/TS: handoffs, guardrails, voice; the JS counterpart to openai-agents-python. [multi-agent, voice, typescript]
- [open-harness](https://github.com/MaxGfeller/open-harness) — ⭐571, slightly complex, autonomy: bounded, recovery: none, open-source: TypeScript Agent class on Vercel AI SDK; streaming events, filesystem/bash tools, MCP, and subagent delegation. [mcp, multi-agent, typescript]
- [Community-curated agent lists](https://github.com/brandonhimpfen/awesome-ai-agents) — ⭐11, super simple, autonomy: n/a, recovery: n/a, unknown: Broader directories: e.g. [brandonhimpfen/awesome-ai-agents](https://github.com/brandonhimpfen/awesome-ai-agents), [axioma-ai-labs/awesome-ai-agent-frameworks](https://github.com/axioma-ai-labs/awesome-ai-agent-frameworks), [mb-mal/awesome-ai-agents-frameworks](https://github.com/mb-mal/awesome-ai-agents-frameworks)—differ by scope and update cadence.