WebArena

Realistic web env (e.g. e‑commerce, CMS, dev tools); 812 tasks; measures end-to-end web agent success.

python
Stars
1.5k
Adoption surface
complex
Autonomy
headless
Recovery
none
License
✅ open-source
Category
Evaluation and benchmarking harnesses

Repository ↗ Example: WebArena leaderboard ↗

Related in Evaluation and benchmarking harnesses