VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications
VitaBench is a challenging benchmark that evaluates agents on versatile interactive tasks grounded in real-world settings, comprising 66 tools and 400 tasks.
vitabench.github.io