inspect_evals

UK AISI/Arcadia/Vector: GAIA and other evals in Inspect AI; level 1–3, sandboxed, tool-calling solvers.

evalssandbox
Stars
547
Adoption surface
slightly complex
Autonomy
headless
Recovery
resumable
License
✅ open-source
Category
Evaluation and benchmarking harnesses

Repository ↗ Example: inspect SWE-bench eval ↗

Related in Evaluation and benchmarking harnesses