Start
liu-2023-agentbench
liu-2023-agentbench - Skill Dossier
liu-2023-agentbench

liu-2023-agentbench

Comprehensive benchmark suite for evaluating LLM agents across diverse interactive environments

Research & Academic
#benchmarks#llm-agents#evaluation#agent-testing#capabilities

Share this skill

Coming in Spring 2026 Beta

WinDAGs will match this skill automatically. Then ask:

"Use liu-2023-agentbench to help me build..."
Request Early Access
"Use liu-2023-agentbench to help me build a benchmarks system"
"I need expert help with comprehensive benchmark suite for evaluating llm a..."