Start
empirical-systems-evaluation
empirical-systems-evaluation - Skill Dossier

empirical-systems-evaluation
Rigorous benchmarking of multi-agent coordination systems: experiment design, statistical analysis, human evaluation protocols, and reproducible reporting. NOT FOR ML model evaluation (use llm-evaluation-harness), A/B testing for web products, survey design, or general data science.
Uncategorized
Skills use the open SKILL.md standard — the same file works across all platforms.
Install all 463+ skills as a plugin
claude plugin marketplace add curiositech/windags-skills
claude plugin install windags-skills
Claude activates empirical-systems-evaluation automatically when your task matches its description.
