Skill Quality · Part 2

$0.14 per book. 88% compression. Zero decision-critical information lost.

skillsknowledge-engineeringdistillationbooksmethodologypipeline

How to Distill a Book Into a Skill

We turned 77 books and research papers into AI agent skills — at $0.14 per book. Here's the 3-pass pipeline, the 88% compression ratio, the five ways to screw it up, and what you get when it works.

How to Distill a Book Into a Skill

George Polya's How to Solve It is 300 pages. The skill we extracted from it is 200 lines. That's an 88% compression ratio — and the skill is more useful than the book for an AI agent, because it contains only the parts an agent can execute.

We've done this 77 times now. 77 books and research papers, distilled into production skills at $0.14 per book. The WinDAGs skill library has 463+ skills, and 90 of them trace their lineage to a specific source text with full provenance.

This post is about how to do it, how to screw it up, and what you get when it works.


Why Bother?

A book is a terrible format for an AI agent. It's optimized for human reading — narrative scaffolding, pedagogical pacing, case studies that build intuition over hundreds of pages. An agent doesn't need intuition built over time. It needs decision procedures right now.

But a book contains something an agent does need: the compiled expertise of someone who spent years thinking about a problem. Polya spent decades studying how mathematicians solve problems. Klein spent decades on firegrounds and aircraft carriers studying how experts make decisions under pressure. Gawande watched surgeons die from not using checklists.

That expertise is in the book. It's just buried under 250 pages of narrative, case studies, historical context, and pedagogical scaffolding that the agent doesn't need.

The question is: can you extract the decision-critical knowledge and throw away the rest?

Yes. Here's how.


The 3-Pass Pipeline

We process every book through three passes, each using a different model at a different cost tier. The total cost is ~$0.14 per 300-page book.

Pass 1: The Haiku Army ($0.04/book)

Split the book into ~4,000 token chunks with 500-token overlaps. Send each chunk to Claude Haiku in parallel (10 concurrent calls) with a structured extraction prompt.

Haiku extracts 10 semantic categories from each chunk:

Category What It Captures Example (from Polya)
Key claims Factual assertions "Heuristic reasoning is the only reasoning available before the answer is known"
Processes Step-by-step procedures "Understand → Plan → Execute → Review"
Decisions Heuristics and branch points "If you can't solve it, find a related problem you can"
Failures Anti-patterns and mistakes "Understanding → Execution, skipping Planning entirely"
Aha moments Insights and breakthroughs "The more general problem is often easier to solve (Inventor's Paradox)"
Metaphors Mental models and analogies "Looking at the unknown is like looking at the keyhole"
Temporal Evolution patterns "Before Polya, problem-solving was considered unteachable talent"
Quotes Notable direct quotes (max 3) Kept for attribution and flavor
Domain terms Specialized vocabulary "Heuristic, auxiliary problem, stepping stone"

Why Haiku for Pass 1? Extraction is mechanical — find claims, processes, decisions. It doesn't need reasoning. At $0.80/MTok input, Haiku processes a 300-page book for $0.04. The 500-token overlap costs an extra $0.005 but prevents splitting concepts mid-sentence.

Wall-clock time: ~3 minutes per book with 10-way parallelism. We processed 28 books in 6 hours for $1.50 total.

Pass 2: Sonnet Synthesis ($0.05/book)

Take all Pass 1 extractions (typically 50K-100K tokens for a full book) and feed them to Claude Sonnet as a single prompt. Sonnet's job: merge, deduplicate, structure.

The same concept often appears in 5-10 different chunks expressed differently. "Monster-barring" might appear in Lakatos's chapter 1 as a philosophical argument, chapter 3 as a mathematical example, and the appendix as a formal definition. Sonnet recognizes these as the same concept and merges them into one entry with the richest definition.

The output is a unified knowledge map — a JSON structure with:

  • Core concepts with relationships
  • Processes with decision points and common mistakes
  • Expertise patterns (what experts do vs. what novices do)
  • Anti-patterns with symptoms and fixes
  • Temporal evolution (how the field's understanding changed)

Why Sonnet for Pass 2? This requires actual reasoning — recognizing the same idea expressed differently, building concept hierarchies, resolving contradictions. Haiku can't do this reliably. Opus could, but at 5x the cost. Sonnet is the sweet spot.

Pass 3: Skill Generation ($0.05/book)

Feed the knowledge map to Sonnet with a skill template. The output is a publication-ready SKILL.md with:

  • YAML frontmatter (name, description, category, tags)
  • When to Use / NOT for boundaries
  • Core process as decision trees
  • Anti-patterns with detection rules
  • Worked examples
  • Quality gates
  • References pointing back to source material

The critical constraint: the skill must be under 5,000 tokens. This is the progressive disclosure budget — the maximum that gets loaded into an agent's context when the skill activates. Everything else goes into references/ for on-demand loading.


What Gets Kept vs. Discarded

This is where most people get it wrong. The instinct is to keep everything and compress it. That produces a worse skill than aggressively cutting.

Always Keep

Content Type Why Example
Decision trees Agents execute branches "If failure in argument → repair proof. If failure in claim → revise conjecture."
Anti-patterns with symptoms Agents self-correct from these "Monster-barring: redefining terms to exclude counterexamples"
Failure triage frameworks This is the core expertise "Local counterexample → fix the sub-argument. Global → abandon the claim."
Paradigm shifts Context for when rules change "Before 2024, CLIP was used for everything. After: use DETR for counting."
Concrete worked examples Pattern matching is how agents work One or two, not two hundred

Always Discard

Content Type Why Typical % of Source
Domain-specific case studies Not transferable 50-70%
Narrative scaffolding Agent doesn't need motivation to read 30-40%
Historical context What matters is current best practice 20-30%
Deep technical notation Decision trees, not formal proofs 10-20%
Pedagogical pacing Agent doesn't learn incrementally 15-25%

The numbers are dramatic. Polya's 300-page book is 50% worked math problems, 20% pedagogical narrative, and 15% historical context. The decision-critical content — the four phases, the heuristic strategies, the failure modes, the Looking Back techniques — fits in 200 lines.


Five Ways to Screw It Up

1. Keeping the case studies

The most common mistake. A book about surgical checklists has 150 pages of hospital case studies. They're compelling. They build intuition. They're useless for an agent.

An agent doesn't build intuition over time. It needs a decision tree now. The case study that took Gawande 8 pages to narrate distills to one line: "DO-CONFIRM for experts (perform from memory, then verify). READ-DO for novices (execute each step as read)."

Detection rule: If your skill is longer than 300 lines, you probably kept the case studies.

2. Preserving the author's structure

A book is organized for reading. Chapter 1 motivates. Chapter 2 provides background. Chapter 3 introduces the first concept. Chapters 4-8 build complexity. Chapter 9 applies it. Chapter 10 summarizes.

A skill is organized for execution. Decision tree first. Anti-patterns second. Worked examples third. Quality gates last. The author's chapter order is pedagogical — it teaches. The skill order is operational — it acts.

Detection rule: If your skill has a "Background" or "Introduction" section, you preserved the author's structure.

3. Extracting at the wrong granularity

Too fine: "On page 47, Polya suggests looking at the unknown." This is a quote, not a procedure.

Too coarse: "Polya's book teaches problem-solving." This is a description, not knowledge.

Right: "Before mobilizing any knowledge, identify the unknown. What kind of thing is the answer? A number? A function? A decision? A proof? This determines which heuristic strategies apply." This is executable.

Detection rule: If an agent reading your skill still doesn't know what to do, you extracted too coarse. If it knows 47 micro-actions but not the overall strategy, you extracted too fine.

4. Treating all books as equal

An 8-page research paper (Smith's Contract Net Protocol) and a 550-page textbook (Evans' Domain-Driven Design) need radically different extraction strategies. The paper has one core idea that can be extracted almost verbatim. The textbook has 50 ideas, 20 of which are noise for your use case.

Our pipeline adapts: short papers get lightweight processing (Pass 1 alone sometimes suffices). Long books get the full 3-pass treatment with aggressive filtering in Pass 2.

Detection rule: If your pipeline produces skills of similar length regardless of source length, your extraction is broken. An 8-page paper should produce a tighter skill than a 550-page book.

5. No provenance tracking

You distill a book and ship the skill. Six months later, someone asks "where did this anti-pattern come from?" and you can't answer. The skill has no memory of its source.

Every skill in our library has a provenance.json and _book_identity.json linking it to its source text, the extraction parameters, the model versions, and the cost. The _raw_response.md preserves the full intermediate output. You can trace any claim in any skill back to a specific page in a specific book.

Detection rule: If you can't answer "which book did this decision tree come from?", your provenance is broken.


What It Looks Like: Before and After

Lakatos → lakatos (Philosophy → Failure Triage)

The book (80 pages): A Socratic dialogue about Euler's polyhedron formula. Characters argue about whether a counterexample disproves a theorem. Deeply philosophical, dense with mathematical examples.

The skill (150 lines): A failure triage framework for AI agents. When something fails, do you:

  • Surrender — abandon the approach (global counterexample)
  • Monster-bar — redefine terms to exclude the failure (degenerate)
  • Lemma-incorporate — absorb the failure into a better approach (generative)

The 80 pages of Socratic dialogue became a 3-branch decision tree. The mathematical examples became detection rules: "If a skill responds to failure by narrowing its NOT-FOR clause, that's monster-barring." The philosophical argument became operational: track monster-barring rate as a quality metric.

Evans (2003) → ddd-evans-2003 (Textbook → Strategic Decisions)

The book (550 pages): The definitive text on Domain-Driven Design. Half is tactical patterns (entities, value objects, repositories). Half is strategic (bounded contexts, ubiquitous language, context mapping).

The skill (200 lines): Focuses almost entirely on strategic decisions — where to invest modeling depth, how to draw context boundaries, when to separate vs. share models. The tactical patterns (which are the book's most famous content) were mostly discarded because they're too domain-specific and Claude already knows them from training data.

This is the hardest decision in distillation: cutting what the book is famous for because an LLM already knows it. Evans' entity/value-object distinction is in every programming tutorial on the internet. What's NOT in training data is his strategic reasoning about when DDD is worth the cost and where to draw boundaries. That's what the skill keeps.

Klein (1998) → klein-1998-sources-of-power (Field Research → Recognition Pattern)

The book (330 pages): Twenty years of field studies on fireground commanders, ICU nurses, and chess players. Rich with stories of how experts make split-second decisions.

The skill (180 lines): The Recognition-Primed Decision model: experts don't compare options — they recognize the situation, mentally simulate the first response that comes to mind, and act if no fatal flaw appears. The 330 pages of field studies became one decision loop and five detection rules for when an agent's situation assessment is likely wrong.


The Economics

Metric Value
Books/papers processed 77
Average cost per book $0.14
Total pipeline cost ~$10.78
Human review (7 core skills) ~$350
Total project cost ~$361
Equivalent expert analysis ~$38,500 (77 books x $500/book)
Cost reduction 99%

The $0.14 breaks down as:

  • Pass 1 (Haiku extraction): $0.04
  • Pass 2 (Sonnet synthesis): $0.05
  • Pass 3 (Sonnet skill draft): $0.05

A 300-page book processes in ~5 minutes wall-clock time with parallel Haiku calls. We processed 28 books in a single 6-hour session.

The expensive part isn't the algorithm — it's the 7 skills that needed human review. The meta-skills (the ones that operate on the orchestration system itself, not within tasks) required a human to decide which concepts from the book apply to DAG orchestration. This can't be fully automated because it requires understanding the system being built, not just the source material.


What You Get When It Works

A skill distilled from a book gives an agent something it can't get from training data: structured procedural knowledge from a specific expert's lifetime of work, organized for execution rather than comprehension.

Training data teaches Claude what Domain-Driven Design is. The ddd-evans-2003 skill teaches it when to apply DDD and when not to — the strategic reasoning that Evans spent 550 pages building up to, compressed into a decision tree with quality gates.

Training data teaches Claude that checklists exist. The checklist-discipline skill teaches it how to design a checklist — Gawande's distinction between DO-CONFIRM and READ-DO, the killer-items heuristic (5-9 per pause point), the anti-pattern of paperwork compliance theater.

Training data teaches Claude about Recognition-Primed Decision Making. The rpd-klein-calderwood skill teaches it how to classify situations and detect when its classification is wrong — the specific failure modes Klein identified after 20 years of field research.

The difference is between knowing about something and knowing how to do it. A 300-page book contains both. A 200-line skill contains only the latter.


Try It Yourself

The distillation pipeline, all 463+ skills, and the full provenance chain are available:

claude plugin marketplace add curiositech/windags-skills
claude plugin install windags-skills

Every skill with a _book_identity.json traces back to its source. Every _raw_response.md shows the full extraction. Every provenance.json records the cost, model, and parameters.

The corpus methodology is documented in corpus/METHODOLOGY.md (26,000 tokens — the longest document in the project, and the one that took the most human thought to write).


77 books. $0.14 each. 88% compression. Zero decision-critical information lost. The experts spent decades learning this. The algorithm reads it in 5 minutes. The skill makes it available to every agent, forever.