Start
llm-evaluation-harness
llm-evaluation-harness - Skill Dossier

llm-evaluation-harness
Build automated LLM evaluation pipelines with benchmarks, regression tests, RAGAS, and human eval workflows. Activate on: LLM evaluation, benchmark testing, eval pipeline, RAGAS, model regression tests. NOT for: traditional software testing (testing-expert), model training (ai-engineer).
AI & Machine Learning
#evaluation#benchmarks#ragas#llm-testing#regression
Allowed Tools
ReadWriteEditBash(python:*pip:*npm:*npx:*)
⚡
Coming in Spring 2026 Beta
WinDAGs will match this skill automatically. Then ask:
"Use llm-evaluation-harness to help me build..."
Request Early Access