Start
empirical-systems-evaluation
empirical-systems-evaluation - Skill Dossier

empirical-systems-evaluation
Benchmark multi-agent coordination systems with experiment design, power analysis, human rating protocols, bootstrap confidence intervals, and reproducible reporting. Use for salvage latency, recovery fidelity, coordination overhead, and protocol comparison studies. NOT for ML model benchmarking, web-product A/B testing, survey design, or general-purpose data science.
Uncategorized
Skills use the open SKILL.md standard — the same file works across all platforms.
Install all 551 skills as a plugin
claude plugin marketplace add curiositech/windags-skills
claude plugin install windags-skills
Claude activates empirical-systems-evaluation automatically when your task matches its description.
