Skill Quality · Part 6

Books teach what. People know when. We built tools for both.

skillsexpertiseinterviewingbrainstormingknowledge-captureai-agents

The Reasoning Books Don't Have

Every team has the senior engineer who 'just knows.' That instinct doesn't fit in a doc — books can't write it, Wikipedia can't index it, and AI doesn't ship with it. We built two tools this weekend: one to pull it out of the people who have it, one to generate it when no one does.

Sunday, May 10, 2026

Every engineering team has at least one of these people. The senior who glances at a graph and says "it's not the database, it's head-of-line blocking on the upstream queue." The PM who reads a feature spec in 20 seconds and points at the one constraint everyone missed. The designer who looks at a layout you've stared at for a week and says "the eyeline is wrong."

How do they do it? Years of pattern-matching, expensive mistakes, and a thousand cases their memory quietly indexed. Can they write it down? Not really. Try and you'll get a runbook that says things like "investigate carefully" or "trust your gut." Useless to anyone who isn't already them.

That's the part of expertise this post is about — the judgment part. The part that doesn't fit in a doc, doesn't show up in Wikipedia, and doesn't make it into the training data for AI models. We've been building AI agents at Curiositech for a year, and the single most valuable thing we can give an agent isn't more facts. It's structured judgment: if you see X and Y but not Z, do A — unless W, then B. That kind of conditional, anchored, falsifiable reasoning.

We've gotten pretty good at extracting the facts part. The Curiositech library has 90+ files — we call them skills, which is the standard term, lower-case — distilled from books and papers. Each skill is a ~2,000-word structured document that an AI agent loads when it's working on a relevant task. Polya's How to Solve It became a 200-line problem-solving skill. Klein's Sources of Power became a 180-line decision-making skill. (Previous post on the distillation pipeline.)

But here's the thing: a book is mostly what. The when and the why I trust this judgment in this situation — that's not in the book. The author can't write it down. They know it, but they can only describe it in stories, and our distillation pipeline can't reliably extract conditional reasoning from anecdote.

Quick framework

Three levels of expert knowledge

Cognitive scientists Robert Hoffman and Gary Lintern distinguished three layers in any expert's head. Knowing which layer you're missing tells you which tool you need.

L1Domain structure

How the world actually works. The mechanisms, constraints, and causal relationships of the system. Newton's laws. Distributed-systems impossibility results. Why this particular grid topology behaves the way it does.

Example: Why a blocked queue head causes the entire downstream stack to stall.

L2Vocabulary

What experts call things. Terms, categories, mental models a domain shares. "Cascading failure." "Bounded context." "Recognition-primed decision." Most of what books, RFCs, and Wikipedia capture lives here.

Example: The term "head-of-line blocking" itself, and the family of patterns it names.

L3Judgment

When and why to act. Conditional reasoning that makes someone an expert: which cue triggers which strategy, what would change their mind, where novices reliably go wrong. The hardest layer to write down — and the rarest in the wild.

Example: Knowing to drain a pod before scaling out, the moment p99 latency climbs and error rate stays flat.

Why this matters for AI skills: our distillation pipeline reliably extracts L1 and L2 from a book — that's most of what's on the page. L3 is the rarest and the most valuable, because it's the part that turns a competent agent into a useful one. The rest of this post is the two tools we built specifically for L3. Source: Hoffman & Lintern, The Cambridge Handbook of Expertise and Expert Performance, 2nd ed., 2018.

So we have two problems, and they need two different tools.

Problem 1

Get the judgment out of someone's head

An expert exists. They've solved this problem 200 times. They can't write down how. How do you interview them so the actual reasoning comes out — not the polished textbook answer?

→ Interview agent (CDM)

Problem 2

Generate options when no one has the answer

The team isn't stuck because someone knows and won't say — they're stuck because nobody has thought of the right idea yet. How do you run a real brainstorm that produces 30 ideas instead of 3, and includes a surprise?

→ Brainstorm agent

I built one tool for each this weekend. They live in ~/.claude/skills/ and each ships with a companion agent — meaning Claude can run them on demand, interactively, in your terminal. The two tabs below cover each: what it does, how it works, what it refuses to let you do. Read them in either order.

When you've got the person, but not the words

Pixel-art notebook on a desk: a numbered timeline runs down the left page with amber decision-point flags, a magnifying glass hovers above it, and an arrow points to a pinned index card with WHEN / DO / UNLESS fields. — One specific incident. Four passes through it. Heuristics out the other side.

You sit down with the senior engineer. You ask "how do you handle distributed-systems incidents?" They give you the runbook answer they'd give a journalist. You learn nothing new.

This is the universal failure mode of expert interviewing. People are bad at describing their own reasoning when asked abstractly — they reach for the polished version, the one that sounds smart, the one that's been worn smooth by retelling. The actual reasoning shows up only when you anchor them in a specific case and walk through it with structure.

The protocol that works is called the Critical Decision Method, or CDM. It was invented in 1989 by cognitive scientist Gary Klein, who was studying how fireground commanders make split-second calls at burning buildings. ("How did you know to pull your crew out 30 seconds before the floor collapsed?" "I just knew." That's the problem.) Klein figured out that if you pick one specific case and walk through it four times with different questions each pass, the expert discovers their own reasoning in real-time — usually surprising themselves.

The diagnostic for a CDM session working is when the expert says "huh, I've never thought about it that way." That's the moment scaffolding worked.

The agent I built runs Klein's four-pass protocol interactively. It refuses to start until you pick a specific incident with a date, place, and stakes. It builds a timeline before probing any decisions. It treats "I just knew" as the start of a five-step recovery sequence, not the end of an answer. And it ends every session with a closing probe — "is there anything about this you've never told anyone, or assume everyone knows but might not?" — that consistently produces surprises in the last ten percent of session time.

What's in the bundle

~/.claude/skills/cdm-interviewer/
  SKILL.md                          protocol + mental models
  references/
    four-sweeps.md                  sweep-by-sweep procedure
    probe-library.md                question families with success signals
    case-selection.md               how to ground in a specific case
    output-format.md                artifact template + heuristic form
    anti-patterns.md                failure modes + recovery
~/.claude/agents/cdm-interviewer.md  interactive session driver

The output you actually want

A transcript isn't useful. Raw incident narrative isn't useful. What you want is conditional rules that capture when, why, and under what exceptions the expert would actually act. Every CDM session ends with the agent walking you through translation to this form:

When p99 latency exceeds 2× baseline AND error rate is flat,

because that pattern indicates head-of-line blocking rather than backend failure,

do drain a pod before scaling out —

unless error rate is also rising, in which case scale first.

The because clause is the load-bearing part. Strip it and you have a brittle rule someone will misapply. Keep it and the rule generalizes — a future engineer in a related-but-different situation can decide whether the reason still applies. The unless clause is what keeps it from being a tautology. Without an exception, the rule covers everything, which means it covers nothing.

The interventions that actually matter

Most "expert interview" advice is just lists of open-ended questions. That doesn't help — the failure mode isn't running out of questions, it's letting the expert drift into abstraction. The agent intervenes on five specific failure patterns:

Pattern you'll see	Why it's bad	What the agent does
"How do you generally handle X?"	Triggers the textbook answer, not actual reasoning	Refuses to ask. Forces you to pick a specific past incident first
Jumping to why before what	If you don't have a timeline, the expert reconstructs a plausible story instead of recalling actual reasoning	Builds the chronology first; flags decision points without probing them yet
"I'm not supposed to do this, but I always..."	Most novice interviewers redirect here ("but the procedure says..."). They lose the gold.	Probes deeper. Workarounds are evolved expertise. Procedure is documented; the workaround was learned.
"I just knew"	Treating this as an answer is the single most common interview failure	Runs a five-step recovery sequence: "what were you looking at?" → "what would a novice have missed?" → "how would you teach someone to see it?"
Closing too early	The last 10% of session time has disproportionate yield	Always asks the closing probe before letting the session end

If your CDM session doesn't contain at least one moment of the expert saying "I never thought about it that way," the deepening pass didn't work. That's the detection rule.

We ran the agents on three real questions. Here's what came out.

Talking about these tools is one thing. Showing the dialogue they produce is another. Before publishing this post, we ran three live sessions in isolated agent contexts (no shared state between them, each agent reading the relevant skill from scratch). One CDM interview on a senior site reliability engineer, one CDM interview on a senior emergency physician, and one brainstorm on a real WinDAGs product question. Below are three short excerpts that show what the protocol actually does. Full transcripts are linked at the bottom of each excerpt.

CDM · Site reliability Black Friday Kafka consumer lag, 12 minutes into the interview

The moment the textbook answer cracks

INTERVIEWER: You said at T-plus-eleven you scaled from twelve consumer pods to twenty-four. Walk me through the moment you decided that. What were you looking at?

EXPERT: Standard playbook. Lag climbs, you add consumers. That topic has twenty-four partitions, so you can go up to twenty-four consumers in the group before you stop getting parallelism gains. It is the first thing you do.

INTERVIEWER: Okay. And when you clicked the scale button, what were you expecting to see in the next two minutes?

EXPERT: Lag would start coming down. Maybe not immediately, but you would see the slope change.

INTERVIEWER: Did the slope change?

EXPERT: No. (pause) No, actually, it got worse on those two partitions. The other twenty-two were fine, they were always fine, but those two just kept climbing. (...) Honestly? I felt stupid for a second. Like, why did I just do that. Because if I had looked at the partition breakdown first, I would have seen the lag was not evenly distributed. (laughs) I ran the runbook because it was Black Friday and I wanted to do the safe thing first, and the safe thing was the wrong thing.

INTERVIEWER: Let me pause there. You just said you wanted to do the safe thing first. Is that something you would say in the post-mortem?

EXPERT: Probably not in those words, no.

What just happened: the expert was giving the textbook answer ("standard playbook"). The interviewer didn't dismiss it. Instead they probed the expectation ("what were you expecting to see?") and let the failure of that expectation pull the real reasoning out. The "I felt stupid for a second" line is the moment the polished narrative collapses and the actual decision logic appears. Read full transcript →

CDM · Emergency medicine Near-miss cerebellar stroke, the cue-articulation moment

"The daughter triaged this for me"

INTERVIEWER: Walk me back to the moment the nurse offered to help her to the bathroom. What did you actually notice, in your body, in that moment, before you said anything?

EXPERT: I had already turned toward the door. I was leaving. And the daughter said the thing about her mom holding it for hours because getting up was hard. That stopped me. Not because I had a diagnosis yet. Because the word "hard" didn't match the chart. The chart said vertigo. Vertigo is awful but it doesn't usually make standing hard in that way. People with peripheral vertigo can usually stand, they just feel sick when they move their head. They walk wide-based, they hold the wall, but they get up. "Hard" the way she said it sounded like the patient couldn't physically do it.

INTERVIEWER: So the cue was the daughter's word choice before it was anything you saw.

EXPERT: Yes. I never thought about it that way, but yes. The daughter triaged this for me. She used a word that didn't match the leading hypothesis, and I noticed the mismatch before I noticed why.

INTERVIEWER: And then you watched her try to get up.

EXPERT: Yeah. And her trunk listed. That was the confirming finding. But I was already alert by then. If I hadn't been alert, I might have written off the listing as her being stiff from lying down, or the meclizine kicking in. Context determines what a movement means.

What just happened: "I never thought about it that way" is the diagnostic signal that CDM scaffolding worked. The expert had told this story before, but never noticed that the leading cue was a family member's word choice rather than an examination finding. That cue is now a heuristic: when a family member's word choice doesn't match the working diagnosis, treat the mismatch as a positive finding. No textbook teaches this. The doctor didn't know they knew it until they were asked. Read full transcript →

Brainstorm · Product question "How might we let the library introduce itself?" — technique switch at the 8-minute mark

When Crazy 8s dries up, switch to Reverse

FACILITATOR: We just spent eight minutes generating variants on "an explainer agent talks to the user." Sixteen ideas, but four of them are wearing the same hat. I want to switch to Reverse. New question: how do we guarantee a new developer rage-quits inside the first hour?

TECH LEAD: Oh, that is much easier. Drop them at a grid of ninety untitled cards. Make them read about phases, waves, and topologies before anything will run. Require an API key for the first action. Print a stack trace on first launch if their env is not set up. Fail silently on missing config. Bury the cost meter so they get a surprise bill on day three. Have docs and product disagree.

FACILITATOR: Now invert each one. Quickly.

TECH LEAD: Untitled cards becomes filtered-to-six. Read first becomes do first. API key required becomes local-fallback model so the first DAG always completes. Stack trace becomes assumed-defaults with a "change these" link. Silent failure becomes a live thought stream. Surprise bill becomes pre-execution cost estimate. Doc-product disagreement becomes the SKILL.md is the docs. That is seven ideas in ninety seconds, and none of them are "an agent explains things." Reverse is doing something different.

FACILITATOR: That is exactly why we switch. The first technique gave us "what could we add." Reverse gives us "what must we remove." Different muscle.

What just happened: the agent noticed pattern-locking ("four of them are wearing the same hat") and forced a switch from a generative technique (Crazy 8s) to an inversion technique (Reverse Brainstorming). The shortlist that ultimately won the session had three of four ideas come from the Reverse phase, not the Crazy 8s phase. The technique switch wasn't optional; it was where the real ideas came from. Read full transcript →

The three sessions produced, in total: six structured heuristics in when-because-do-unless form, two anti-patterns with concrete teaching cases, and one ranked product shortlist of four ideas with two preserved-but-rejected options. None of this content existed in any book. All of it was extracted or generated by the agents in under two hours of session time. The full session artifacts are in /blog/sessions/two-skills-for-the-l3/ if you want to see what a complete output looks like.

Why they're complementary, not redundant

These are the two halves of the same problem. One pulls judgment out of someone who has it. The other generates options when nobody has the answer yet. They share architecture.

Dimension	Interview agent	Brainstorm agent
The problem	Expert exists, judgment trapped in their head	No one has the answer yet
Core discipline	Four passes through one specific case	Strict separation of generate and evaluate
Worst failure mode	Accepting "I just knew" as an answer	Picking the first idea that survives critique
What you walk out with	Conditional rules with cause and exception	Shortlist + rejected-but-interesting list
Sign it worked	"I never thought about it that way"	The shortlist surprises someone

Both agents have the same architectural bones: a tight 400-line specification, a library of references that load on demand, an interactive driver that intervenes when you start to drift, and a structured output artifact at the end. Neither session ends with "let's circle back." Both refuse the things people commonly want to do that compromise the protocol.

What you can do with these

If you're an engineering manager: run a CDM session with anyone leaving the team. The hour you spend will produce more useful documentation than they'd write in a week.

If you're a tech lead: run a brainstorm before the next architecture review, not during it. The architecture review then has options to weigh, not a single proposal to ratify.

If you're a product manager: run CDM on the customer support team about your top three churn patterns. The patterns they "just know" are usually the ones product roadmaps miss.

If you're a founder: run both. Capture your own reasoning before it gets blurry. Generate options before committing to the first plausible path.

If you're building AI agents: these are the input pipelines for the part of expertise that doesn't fit in a book. Run CDM on practitioners, feed the conditional rules into your agent's skill library. Run brainstorms on the open questions you don't have answers to yet. The combined output is a kind of knowledge your competitors won't have because they're still distilling Wikipedia.

Try it now

Both agents live in ~/.claude/skills/. In a Claude Code session, ask:

# Pull judgment out of someone's head

> Use the cdm-interviewer agent to interview me about [a hard past incident]

# Generate options for an open question

> Use the brainstorm-facilitator agent to help me think through [open question]

The interview agent needs 60–90 minutes minimum. The brainstorm agent needs 20+ minutes. Both refuse to compress further — that's by design.

The agents work with any Claude Code project on your machine — they're not tied to a specific codebase or product. To share them with a team, copy the four files (two skill directories plus two agent files) into a shared plugin.

Install These Skills

Drop them into ~/.claude/skills/ and they activate automatically. Works with Claude Code, Codex, Gemini CLI, Cursor, and 40+ agents.

/next-move

Predict your next DAG. Zero API cost.

All 551 skills (plugin)

$ claude plugin marketplace add curiositech/windags-skills && claude plugin install windags-skills

Manual

$ git clone https://github.com/curiositech/windags-skills.git && cp -r windags-skills/skills/next-move ~/.claude/commands/

skill-architect

Design, create, and audit agent skills.

All 551 skills (plugin)

$ claude plugin marketplace add curiositech/windags-skills && claude plugin install windags-skills

Manual

$ git clone https://github.com/curiositech/windags-skills.git && cp -r windags-skills/skills/skill-architect ~/.claude/commands/

skill-coach

Guides creation of high-quality skills with scoring.

All 551 skills (plugin)

$ claude plugin marketplace add curiositech/windags-skills && claude plugin install windags-skills

Manual

$ git clone https://github.com/curiositech/windags-skills.git && cp -r windags-skills/skills/skill-coach ~/.claude/commands/

Browse All 551 Skills on GitHub

The skills we shipped this weekend close a loop. Books gave us facts. Practitioners have judgment. Half of it can be captured from people who already have it. The other half has to be generated from open questions nobody's answered yet. The next CDM session we run will be on the brainstorm facilitator itself — we'll let you know what comes out.

Discussion

Scroll down to load comments

Back to all posts