WebVoyager

End-to-end web agent with LMMs: screenshots + actions on real sites; benchmark on 15 sites, GPT-4V for automatic eval.

evalsvision
Stars
1.1k
Adoption surface
slightly complex
Autonomy
headless
Recovery
none
License
✅ open-source
Category
Evaluation and benchmarking harnesses

Repository ↗ Example: 643 web tasks dataset ↗

Related in Evaluation and benchmarking harnesses