letta-evals

Eval harness for stateful Letta agents; configurable suites and grading (LLM or rule-based) so you can measure what you ship.

memorypython
Stars
72
Adoption surface
mostly simple
Autonomy
headless
Recovery
none
License
✅ open-source
Category
Evaluation and benchmarking harnesses

Repository ↗ Example: LoCoMo memory benchmark ↗

Related in Evaluation and benchmarking harnesses