llm-evaluation-harness

Build automated LLM evaluation pipelines with benchmarks, regression tests, RAGAS, and human eval workflows. Activate on: LLM evaluation, benchmark testing, eval pipeline, RAGAS, model regression tests. NOT for: traditional software testing (testing-expert), model training (ai-engineer).

Uncategorized

Allowed Tools

ReadWriteEditBash(python:*pip:*npm:*npx:*)

Share this skill

Twitter LinkedIn

Skills use the open SKILL.md standard — the same file works across all platforms.

Install all 544 skills as a plugin

claude plugin marketplace add curiositech/windags-skills claude plugin install windags-skills

Claude activates llm-evaluation-harness automatically when your task matches its description.

View on GitHub