Start
empirical-systems-evaluation
empirical-systems-evaluation - Skill Dossier
empirical-systems-evaluation

empirical-systems-evaluation

Rigorous benchmarking of multi-agent coordination systems: experiment design, statistical analysis, human evaluation protocols, and reproducible reporting. NOT FOR ML model evaluation (use llm-evaluation-harness), A/B testing for web products, survey design, or general data science.

Uncategorized

Share this skill

Skills use the open SKILL.md standard — the same file works across all platforms.

Install all 463+ skills as a plugin
claude plugin marketplace add curiositech/windags-skills claude plugin install windags-skills

Claude activates empirical-systems-evaluation automatically when your task matches its description.

View on GitHub
"Use empirical-systems-evaluation to help me build a feature system"
"I need expert help with rigorous benchmarking of multi-agent coordination ..."