Start
llm-evaluation-harness
llm-evaluation-harness - Skill Dossier
llm-evaluation-harness

llm-evaluation-harness

Build automated LLM evaluation pipelines with benchmarks, regression tests, RAGAS, and human eval workflows. Activate on: LLM evaluation, benchmark testing, eval pipeline, RAGAS, model regression tests. NOT for: traditional software testing (testing-expert), model training (ai-engineer).

Uncategorized

Allowed Tools

ReadWriteEditBash(python:*pip:*npm:*npx:*)

Share this skill

Skills use the open SKILL.md standard — the same file works across all platforms.

Install all 544 skills as a plugin
claude plugin marketplace add curiositech/windags-skills claude plugin install windags-skills

Claude activates llm-evaluation-harness automatically when your task matches its description.

View on GitHub
"Use llm-evaluation-harness to help me build a feature system"
"I need expert help with build automated llm evaluation pipelines with benc..."