Skill Quality · Part 2

$0.14 per book. 88% compression. Zero decision-critical information lost.

skillsknowledge-engineeringdistillationbooksmethodologypipeline

How to Distill a Book Into a Skill

We turned 77 books and research papers into AI agent skills — at $0.14 per book. Here's the 3-pass pipeline, the 88% compression ratio, the five ways to screw it up, and what you get when it works.

Wednesday, March 25, 2026

Every team has the senior engineer who read the book. The one who can tell you, in twenty seconds, why you should not use that pattern, where the gotchas are, and which page the load-bearing argument starts on. They're an institutional asset. They're also a single point of failure — when they leave, the team relearns everything the hard way.

The honest version of "writing it down" is that nobody reads the Notion page. The book sits on the shelf. The bookmark is still on page 30. We tried this for years.

What actually works is turning the book into something the AI reads on every relevant query. Not a summary — a structured skill: decision triggers, anti-patterns, the failure modes that took the author a chapter to set up, distilled into the format an agent can consume in 2,000 tokens.

We've done this 77 times now. 77 books and research papers, distilled into production skills at $0.14 per book. Polya's How to Solve It — 300 pages → 200 lines, 88% compression, every decision-critical move preserved. The WinDAGs skill library has 551 skills, and 90 of them trace their lineage to a specific source text with full provenance.

This post is the pipeline: how to do it, how to screw it up, and what you actually get when it works.

Try it yourself:

claude plugin marketplace add curiositech/windags-skills
claude plugin install windags-skills

Browse any book-derived skill in the windags-skills repo. Each includes _book_identity.json, _raw_response.md, and provenance.json.

The preview: Lakatos's Proofs and Refutations (80 pages) → lakatos skill (206 lines). An 88% compression ratio. The Socratic dialogue became a failure triage decision tree. The mathematical proofs became anti-pattern detection rules. See the full pipeline below.

Why Bother?

A book is a terrible format for an AI agent. It's optimized for human reading — narrative scaffolding, pedagogical pacing, case studies that build intuition over hundreds of pages. An agent doesn't need intuition built over time. It needs decision procedures right now.

But a book contains something an agent does need: the compiled expertise of someone who spent years thinking about a problem. Polya spent decades studying how mathematicians solve problems. Klein spent decades on firegrounds and aircraft carriers studying how experts make decisions under pressure. Gawande watched surgeons die from not using checklists.

That expertise is in the book. It's just buried under 250 pages of narrative, case studies, historical context, and pedagogical scaffolding that the agent doesn't need.

The question is: can you extract the decision-critical knowledge and throw away the rest?

Yes. Here's how.

The 3-Pass Pipeline

We process every book through three passes, each using a different model at a different cost tier. The total cost is ~$0.14 per 300-page book.

Pass 1: The Haiku Army ($0.04/book)

Split the book into ~4,000 token chunks with 500-token overlaps. Send each chunk to Claude Haiku in parallel (10 concurrent calls) with a structured extraction prompt.

Haiku extracts 10 semantic categories from each chunk:

Category	What It Captures	Example (from Polya)
Key claims	Factual assertions	"Heuristic reasoning is the only reasoning available before the answer is known"
Processes	Step-by-step procedures	"Understand → Plan → Execute → Review"
Decisions	Heuristics and branch points	"If you can't solve it, find a related problem you can"
Failures	Anti-patterns and mistakes	"Understanding → Execution, skipping Planning entirely"
Aha moments	Insights and breakthroughs	"The more general problem is often easier to solve (Inventor's Paradox)"
Metaphors	Mental models and analogies	"Looking at the unknown is like looking at the keyhole"
Temporal	Evolution patterns	"Before Polya, problem-solving was considered unteachable talent"
Quotes	Notable direct quotes (max 3)	Kept for attribution and flavor
Domain terms	Specialized vocabulary	"Heuristic, auxiliary problem, stepping stone"

Why Haiku for Pass 1? Extraction is mechanical — find claims, processes, decisions. It doesn't need reasoning. At $0.80/MTok input, Haiku processes a 300-page book for $0.04. The 500-token overlap costs an extra $0.005 but prevents splitting concepts mid-sentence.

Wall-clock time: ~3 minutes per book with 10-way parallelism. We processed 28 books in 6 hours for $1.50 total.

Pass 2: Sonnet Synthesis ($0.05/book)

Take all Pass 1 extractions (typically 50K-100K tokens for a full book) and feed them to Claude Sonnet as a single prompt. Sonnet's job: merge, deduplicate, structure.

The same concept often appears in 5-10 different chunks expressed differently. "Monster-barring" might appear in Lakatos's chapter 1 as a philosophical argument, chapter 3 as a mathematical example, and the appendix as a formal definition. Sonnet recognizes these as the same concept and merges them into one entry with the richest definition.

The output is a unified knowledge map — a JSON structure with:

Core concepts with relationships
Processes with decision points and common mistakes
Expertise patterns (what experts do vs. what novices do)
Anti-patterns with symptoms and fixes
Temporal evolution (how the field's understanding changed)

Why Sonnet for Pass 2? This requires actual reasoning — recognizing the same idea expressed differently, building concept hierarchies, resolving contradictions. Haiku can't do this reliably. Opus could, but at 5x the cost. Sonnet is the sweet spot.

Pass 3: Skill Generation ($0.05/book)

Feed the knowledge map to Sonnet with a skill template. The output is a publication-ready SKILL.md with:

YAML frontmatter (name, description, category, tags)
When to Use / NOT for boundaries
Core process as decision trees
Anti-patterns with detection rules
Worked examples
Quality gates
References pointing back to source material

The critical constraint: the skill must be under 5,000 tokens. This is the progressive disclosure budget — the maximum that gets loaded into an agent's context when the skill activates. Everything else goes into references/ for on-demand loading.

What Gets Kept vs. Discarded

This is where most people get it wrong. The instinct is to keep everything and compress it. That produces a worse skill than aggressively cutting.

Always Keep

Content Type	Why	Example
Decision trees	Agents execute branches	"If failure in argument → repair proof. If failure in claim → revise conjecture."
Anti-patterns with symptoms	Agents self-correct from these	"Monster-barring: redefining terms to exclude counterexamples"
Failure triage frameworks	This is the core expertise	"Local counterexample → fix the sub-argument. Global → abandon the claim."
Paradigm shifts	Context for when rules change	"Before 2024, CLIP was used for everything. After: use DETR for counting."
Concrete worked examples	Pattern matching is how agents work	One or two, not two hundred

Always Discard

Content Type	Why	Typical % of Source
Domain-specific case studies	Not transferable	50-70%
Narrative scaffolding	Agent doesn't need motivation to read	30-40%
Historical context	What matters is current best practice	20-30%
Deep technical notation	Decision trees, not formal proofs	10-20%
Pedagogical pacing	Agent doesn't learn incrementally	15-25%

The numbers are dramatic. Polya's 300-page book is 50% worked math problems, 20% pedagogical narrative, and 15% historical context. The decision-critical content — the four phases, the heuristic strategies, the failure modes, the Looking Back techniques — fits in 200 lines.

Five Ways to Screw It Up

1. Keeping the case studies

The most common mistake. A book about surgical checklists has 150 pages of hospital case studies. They're compelling. They build intuition. They're useless for an agent.

An agent doesn't build intuition over time. It needs a decision tree now. The case study that took Gawande 8 pages to narrate distills to one line: "DO-CONFIRM for experts (perform from memory, then verify). READ-DO for novices (execute each step as read)."

Detection rule: If your skill is longer than 300 lines, you probably kept the case studies.

2. Preserving the author's structure

A book is organized for reading. Chapter 1 motivates. Chapter 2 provides background. Chapter 3 introduces the first concept. Chapters 4-8 build complexity. Chapter 9 applies it. Chapter 10 summarizes.

A skill is organized for execution. Decision tree first. Anti-patterns second. Worked examples third. Quality gates last. The author's chapter order is pedagogical — it teaches. The skill order is operational — it acts.

Detection rule: If your skill has a "Background" or "Introduction" section, you preserved the author's structure.

3. Extracting at the wrong granularity

Too fine: "On page 47, Polya suggests looking at the unknown." This is a quote, not a procedure.

Too coarse: "Polya's book teaches problem-solving." This is a description, not knowledge.

Right: "Before mobilizing any knowledge, identify the unknown. What kind of thing is the answer? A number? A function? A decision? A proof? This determines which heuristic strategies apply." This is executable.

Detection rule: If an agent reading your skill still doesn't know what to do, you extracted too coarse. If it knows 47 micro-actions but not the overall strategy, you extracted too fine.

4. Treating all books as equal

An 8-page research paper (Smith's Contract Net Protocol) and a 550-page textbook (Evans' Domain-Driven Design) need radically different extraction strategies. The paper has one core idea that can be extracted almost verbatim. The textbook has 50 ideas, 20 of which are noise for your use case.

Our pipeline adapts: short papers get lightweight processing (Pass 1 alone sometimes suffices). Long books get the full 3-pass treatment with aggressive filtering in Pass 2.

Detection rule: If your pipeline produces skills of similar length regardless of source length, your extraction is broken. An 8-page paper should produce a tighter skill than a 550-page book.

5. No provenance tracking

You distill a book and ship the skill. Six months later, someone asks "where did this anti-pattern come from?" and you can't answer. The skill has no memory of its source.

Every skill in our library has a provenance.json and _book_identity.json linking it to its source text, the extraction parameters, the model versions, and the cost. The _raw_response.md preserves the full intermediate output. You can trace any claim in any skill back to a specific page in a specific book.

Detection rule: If you can't answer "which book did this decision tree come from?", your provenance is broken.

What It Looks Like: Before and After

How to Solve It — **George Polya**
300pp → 200 lines

Proofs and Refutations — **Imre Lakatos**
80pp → 206 lines

The Checklist Manifesto — **Atul Gawande**
200pp → 180 lines

Sources of Power — **Gary Klein**
330pp → 180 lines

Domain-Driven Design — **Eric Evans**
550pp → 200 lines

Lakatos → `lakatos` (Philosophy → Failure Triage)

The book (80 pages): A Socratic dialogue about Euler's polyhedron formula. Characters argue about whether a counterexample disproves a theorem. Deeply philosophical, dense with mathematical examples.

The skill (150 lines): A failure triage framework for AI agents. When something fails, do you:

Surrender — abandon the approach (global counterexample)
Monster-bar — redefine terms to exclude the failure (degenerate)
Lemma-incorporate — absorb the failure into a better approach (generative)

The 80 pages of Socratic dialogue became a 3-branch decision tree. The mathematical examples became detection rules: "If a skill responds to failure by narrowing its NOT-FOR clause, that's monster-barring." The philosophical argument became operational: track monster-barring rate as a quality metric.

Evans (2003) → `ddd-evans-2003` (Textbook → Strategic Decisions)

The book (550 pages): The definitive text on Domain-Driven Design. Half is tactical patterns (entities, value objects, repositories). Half is strategic (bounded contexts, ubiquitous language, context mapping).

The skill (200 lines): Focuses almost entirely on strategic decisions — where to invest modeling depth, how to draw context boundaries, when to separate vs. share models. The tactical patterns (which are the book's most famous content) were mostly discarded because they're too domain-specific and Claude already knows them from training data.

This is the hardest decision in distillation: cutting what the book is famous for because an LLM already knows it. Evans' entity/value-object distinction is in every programming tutorial on the internet. What's NOT in training data is his strategic reasoning about when DDD is worth the cost and where to draw boundaries. That's what the skill keeps.

Klein (1998) → `klein-1998-sources-of-power` (Field Research → Recognition Pattern)

The book (330 pages): Twenty years of field studies on fireground commanders, ICU nurses, and chess players. Rich with stories of how experts make split-second decisions.

The skill (180 lines): The Recognition-Primed Decision model: experts don't compare options — they recognize the situation, mentally simulate the first response that comes to mind, and act if no fatal flaw appears. The 330 pages of field studies became one decision loop and five detection rules for when an agent's situation assessment is likely wrong.

See the Full Pipeline

Here is the complete Lakatos distillation in four tabs. Real data from the pipeline, not a mock-up.

Proofs and Refutations: The Logic of Mathematical Discovery

Imre Lakatos (1976) | 80 pages | ~12,000 tokens | Philosophy of Science

A Socratic dialogue about Euler's polyhedron formula (V - E + F = 2). Characters — a Teacher and students named Alpha, Beta, Gamma, Delta — argue about whether counterexamples disprove a theorem or just reveal definitional gaps.

What makes this book special: It's the most precise taxonomy ever written of how reasoners respond to falsification. When something fails, people do one of five things — and most of them are wrong. Lakatos named each response, showed when each is legitimate vs. degenerate, and embedded it all in 80 pages of mathematical drama.

Why it's hard to distill: The content is delivered through dialogue, not exposition. The key concepts (monster-barring, lemma-incorporation, concept-stretching) emerge from arguments between characters, not section headers. A naive extraction would grab quotes; a good one must reconstruct the taxonomy from the dialogue.

Source metrics:

~50% mathematical proofs and Euler's formula mechanics (discarded)
~25% dialogue scaffolding and character interactions (discarded)
~15% taxonomy of responses to counterexamples (kept — this is the gold)
~10% meta-philosophy about how science progresses (kept selectively)

The Economics

Metric	Value
Books/papers processed	77
Average cost per book	$0.14
Total pipeline cost	~$10.78
Human review (7 core skills)	~$350
Total project cost	~$361
Equivalent expert analysis	~$38,500 (77 books x $500/book)
Cost reduction	99%

The $0.14 breaks down as:

Pass 1 (Haiku extraction): $0.04
Pass 2 (Sonnet synthesis): $0.05
Pass 3 (Sonnet skill draft): $0.05

A 300-page book processes in ~5 minutes wall-clock time with parallel Haiku calls. We processed 28 books in a single 6-hour session.

The expensive part isn't the algorithm — it's the 7 skills that needed human review. The meta-skills (the ones that operate on the orchestration system itself, not within tasks) required a human to decide which concepts from the book apply to DAG orchestration. This can't be fully automated because it requires understanding the system being built, not just the source material.

What You Get When It Works

A skill distilled from a book gives an agent something it can't get from training data: structured procedural knowledge from a specific expert's lifetime of work, organized for execution rather than comprehension.

Training data teaches Claude what Domain-Driven Design is. The ddd-evans-2003 skill teaches it when to apply DDD and when not to — the strategic reasoning that Evans spent 550 pages building up to, compressed into a decision tree with quality gates.

Training data teaches Claude that checklists exist. The checklist-discipline skill teaches it how to design a checklist — Gawande's distinction between DO-CONFIRM and READ-DO, the killer-items heuristic (5-9 per pause point), the anti-pattern of paperwork compliance theater.

Training data teaches Claude about Recognition-Primed Decision Making. The rpd-klein-calderwood skill teaches it how to classify situations and detect when its classification is wrong — the specific failure modes Klein identified after 20 years of field research.

The difference is between knowing about something and knowing how to do it. A 300-page book contains both. A 200-line skill contains only the latter.

Try It Yourself

Use the skills

claude plugin marketplace add curiositech/windags-skills
claude plugin install windags-skills