Start
llm-evaluation-harness
llm-evaluation-harness - Skill Dossier
llm-evaluation-harness

llm-evaluation-harness

Build automated LLM evaluation pipelines with benchmarks, regression tests, RAGAS, and human eval workflows. Activate on: LLM evaluation, benchmark testing, eval pipeline, RAGAS, model regression tests. NOT for: traditional software testing (testing-expert), model training (ai-engineer).

AI & Machine Learning
#evaluation#benchmarks#ragas#llm-testing#regression

Allowed Tools

ReadWriteEditBash(python:*pip:*npm:*npx:*)

Share this skill

Coming in Spring 2026 Beta

WinDAGs will match this skill automatically. Then ask:

"Use llm-evaluation-harness to help me build..."
Request Early Access
"Use llm-evaluation-harness to help me build a evaluation system"
"I need expert help with build automated llm evaluation pipelines with benc..."
"Orchestrate llm-evaluation-harness with ai-engineer for evaluation validates llm application quality before deployment"