best-of-Agent-Harnesses

letta-evals

Eval harness for stateful Letta agents; configurable suites and grading (LLM or rule-based) so you can measure what you ship.

memorypython

Stars: 72
Adoption surface: mostly simple
Autonomy: headless
Recovery: none
License: ✅ open-source
Category: Evaluation and benchmarking harnesses

Repository ↗ Example: LoCoMo memory benchmark ↗

Related in Evaluation and benchmarking harnesses