Start
empirical-systems-evaluation
empirical-systems-evaluation - Skill Dossier
empirical-systems-evaluation

empirical-systems-evaluation

Benchmark multi-agent coordination systems with experiment design, power analysis, human rating protocols, bootstrap confidence intervals, and reproducible reporting. Use for salvage latency, recovery fidelity, coordination overhead, and protocol comparison studies. NOT for ML model benchmarking, web-product A/B testing, survey design, or general-purpose data science.

Uncategorized

Share this skill

Skills use the open SKILL.md standard — the same file works across all platforms.

Install all 551 skills as a plugin
claude plugin marketplace add curiositech/windags-skills claude plugin install windags-skills

Claude activates empirical-systems-evaluation automatically when your task matches its description.

View on GitHub
"Use empirical-systems-evaluation to help me build a feature system"
"I need expert help with benchmark multi-agent coordination systems with ex..."