Start
llm-evaluation-harness
llm-evaluation-harness - Skill Dossier

llm-evaluation-harness
Build automated LLM evaluation pipelines with benchmarks, regression tests, RAGAS, and human eval workflows. Activate on: LLM evaluation, benchmark testing, eval pipeline, RAGAS, model regression tests. NOT for: traditional software testing (testing-expert), model training (ai-engineer).
Uncategorized
Allowed Tools
ReadWriteEditBash(python:*pip:*npm:*npx:*)
Skills use the open SKILL.md standard — the same file works across all platforms.
Install all 544 skills as a plugin
claude plugin marketplace add curiositech/windags-skills
claude plugin install windags-skills
Claude activates llm-evaluation-harness automatically when your task matches its description.
