arc-agi-benchmarking

Runner for ARC-AGI: multi-provider (OpenAI, Anthropic, Gemini, etc.), rate limits, retries, and scoring.

evalsprovider-agnosticpython
Stars
350
Adoption surface
mostly simple
Autonomy
headless
Recovery
retry
License
✅ open-source
Category
Evaluation and benchmarking harnesses

Repository ↗ Example: o3 prompt example ↗

Related in Evaluation and benchmarking harnesses