SUPER

Agents that set up and run ML/NLP from GitHub repos; 45 expert problems, 152 masked tasks, 602 AutoGen tasks; Docker-based.

sandboxpython
Stars
53
Adoption surface
slightly complex
Autonomy
headless
Recovery
none
License
✅ open-source
Category
Evaluation and benchmarking harnesses

Repository ↗ Example: SUPER EMNLP paper ↗

Related in Evaluation and benchmarking harnesses