
Books teach what. People know when. We built tools for both.
The Reasoning Books Don't Have
Every team has the senior engineer who 'just knows.' That instinct doesn't fit in a doc — books can't write it, Wikipedia can't index it, and AI doesn't ship with it. We built two tools this weekend: one to pull it out of the people who have it, one to generate it when no one does.
Every engineering team has at least one of these people. The senior who glances at a graph and says "it's not the database, it's head-of-line blocking on the upstream queue." The PM who reads a feature spec in 20 seconds and points at the one constraint everyone missed. The designer who looks at a layout you've stared at for a week and says "the eyeline is wrong."
How do they do it? Years of pattern-matching, expensive mistakes, and a thousand cases their memory quietly indexed. Can they write it down? Not really. Try and you'll get a runbook that says things like "investigate carefully" or "trust your gut." Useless to anyone who isn't already them.
That's the part of expertise this post is about — the judgment part. The part that doesn't fit in a doc, doesn't show up in Wikipedia, and doesn't make it into the training data for AI models. We've been building AI agents at Curiositech for a year, and the single most valuable thing we can give an agent isn't more facts. It's structured judgment: if you see X and Y but not Z, do A — unless W, then B. That kind of conditional, anchored, falsifiable reasoning.
We've gotten pretty good at extracting the facts part. The Curiositech library has 90+ files — we call them skills, which is the standard term, lower-case — distilled from books and papers. Each skill is a ~2,000-word structured document that an AI agent loads when it's working on a relevant task. Polya's How to Solve It became a 200-line problem-solving skill. Klein's Sources of Power became a 180-line decision-making skill. (Previous post on the distillation pipeline.)
But here's the thing: a book is mostly what. The when and the why I trust this judgment in this situation — that's not in the book. The author can't write it down. They know it, but they can only describe it in stories, and our distillation pipeline can't reliably extract conditional reasoning from anecdote.
So we have two problems, and they need two different tools.
I built one tool for each this weekend. They live in ~/.claude/skills/ and each ships with a companion agent — meaning Claude can run them on demand, interactively, in your terminal. The two tabs below cover each: what it does, how it works, what it refuses to let you do. Read them in either order.
When you've got the person, but not the words
You sit down with the senior engineer. You ask "how do you handle distributed-systems incidents?" They give you the runbook answer they'd give a journalist. You learn nothing new.
This is the universal failure mode of expert interviewing. People are bad at describing their own reasoning when asked abstractly — they reach for the polished version, the one that sounds smart, the one that's been worn smooth by retelling. The actual reasoning shows up only when you anchor them in a specific case and walk through it with structure.
The protocol that works is called the Critical Decision Method, or CDM. It was invented in 1989 by cognitive scientist Gary Klein, who was studying how fireground commanders make split-second calls at burning buildings. ("How did you know to pull your crew out 30 seconds before the floor collapsed?" "I just knew." That's the problem.) Klein figured out that if you pick one specific case and walk through it four times with different questions each pass, the expert discovers their own reasoning in real-time — usually surprising themselves.
The diagnostic for a CDM session working is when the expert says "huh, I've never thought about it that way." That's the moment scaffolding worked.
The agent I built runs Klein's four-pass protocol interactively. It refuses to start until you pick a specific incident with a date, place, and stakes. It builds a timeline before probing any decisions. It treats "I just knew" as the start of a five-step recovery sequence, not the end of an answer. And it ends every session with a closing probe — "is there anything about this you've never told anyone, or assume everyone knows but might not?" — that consistently produces surprises in the last ten percent of session time.
What's in the bundle
~/.claude/skills/cdm-interviewer/
SKILL.md protocol + mental models
references/
four-sweeps.md sweep-by-sweep procedure
probe-library.md question families with success signals
case-selection.md how to ground in a specific case
output-format.md artifact template + heuristic form
anti-patterns.md failure modes + recovery
~/.claude/agents/cdm-interviewer.md interactive session driver
The output you actually want
A transcript isn't useful. Raw incident narrative isn't useful. What you want is conditional rules that capture when, why, and under what exceptions the expert would actually act. Every CDM session ends with the agent walking you through translation to this form:
because that pattern indicates head-of-line blocking rather than backend failure,
do drain a pod before scaling out —
unless error rate is also rising, in which case scale first.
The because clause is the load-bearing part. Strip it and you have a brittle rule someone will misapply. Keep it and the rule generalizes — a future engineer in a related-but-different situation can decide whether the reason still applies. The unless clause is what keeps it from being a tautology. Without an exception, the rule covers everything, which means it covers nothing.
The interventions that actually matter
Most "expert interview" advice is just lists of open-ended questions. That doesn't help — the failure mode isn't running out of questions, it's letting the expert drift into abstraction. The agent intervenes on five specific failure patterns:
| Pattern you'll see | Why it's bad | What the agent does |
|---|---|---|
| "How do you generally handle X?" | Triggers the textbook answer, not actual reasoning | Refuses to ask. Forces you to pick a specific past incident first |
| Jumping to why before what | If you don't have a timeline, the expert reconstructs a plausible story instead of recalling actual reasoning | Builds the chronology first; flags decision points without probing them yet |
| "I'm not supposed to do this, but I always..." | Most novice interviewers redirect here ("but the procedure says..."). They lose the gold. | Probes deeper. Workarounds are evolved expertise. Procedure is documented; the workaround was learned. |
| "I just knew" | Treating this as an answer is the single most common interview failure | Runs a five-step recovery sequence: "what were you looking at?" → "what would a novice have missed?" → "how would you teach someone to see it?" |
| Closing too early | The last 10% of session time has disproportionate yield | Always asks the closing probe before letting the session end |
If your CDM session doesn't contain at least one moment of the expert saying "I never thought about it that way," the deepening pass didn't work. That's the detection rule.
We ran the agents on three real questions. Here's what came out.
Talking about these tools is one thing. Showing the dialogue they produce is another. Before publishing this post, we ran three live sessions in isolated agent contexts (no shared state between them, each agent reading the relevant skill from scratch). One CDM interview on a senior site reliability engineer, one CDM interview on a senior emergency physician, and one brainstorm on a real WinDAGs product question. Below are three short excerpts that show what the protocol actually does. Full transcripts are linked at the bottom of each excerpt.
INTERVIEWER: You said at T-plus-eleven you scaled from twelve consumer pods to twenty-four. Walk me through the moment you decided that. What were you looking at?
EXPERT: Standard playbook. Lag climbs, you add consumers. That topic has twenty-four partitions, so you can go up to twenty-four consumers in the group before you stop getting parallelism gains. It is the first thing you do.
INTERVIEWER: Okay. And when you clicked the scale button, what were you expecting to see in the next two minutes?
EXPERT: Lag would start coming down. Maybe not immediately, but you would see the slope change.
INTERVIEWER: Did the slope change?
EXPERT: No. (pause) No, actually, it got worse on those two partitions. The other twenty-two were fine, they were always fine, but those two just kept climbing. (...) Honestly? I felt stupid for a second. Like, why did I just do that. Because if I had looked at the partition breakdown first, I would have seen the lag was not evenly distributed. (laughs) I ran the runbook because it was Black Friday and I wanted to do the safe thing first, and the safe thing was the wrong thing.
INTERVIEWER: Let me pause there. You just said you wanted to do the safe thing first. Is that something you would say in the post-mortem?
EXPERT: Probably not in those words, no.
INTERVIEWER: Walk me back to the moment the nurse offered to help her to the bathroom. What did you actually notice, in your body, in that moment, before you said anything?
EXPERT: I had already turned toward the door. I was leaving. And the daughter said the thing about her mom holding it for hours because getting up was hard. That stopped me. Not because I had a diagnosis yet. Because the word "hard" didn't match the chart. The chart said vertigo. Vertigo is awful but it doesn't usually make standing hard in that way. People with peripheral vertigo can usually stand, they just feel sick when they move their head. They walk wide-based, they hold the wall, but they get up. "Hard" the way she said it sounded like the patient couldn't physically do it.
INTERVIEWER: So the cue was the daughter's word choice before it was anything you saw.
EXPERT: Yes. I never thought about it that way, but yes. The daughter triaged this for me. She used a word that didn't match the leading hypothesis, and I noticed the mismatch before I noticed why.
INTERVIEWER: And then you watched her try to get up.
EXPERT: Yeah. And her trunk listed. That was the confirming finding. But I was already alert by then. If I hadn't been alert, I might have written off the listing as her being stiff from lying down, or the meclizine kicking in. Context determines what a movement means.
FACILITATOR: We just spent eight minutes generating variants on "an explainer agent talks to the user." Sixteen ideas, but four of them are wearing the same hat. I want to switch to Reverse. New question: how do we guarantee a new developer rage-quits inside the first hour?
TECH LEAD: Oh, that is much easier. Drop them at a grid of ninety untitled cards. Make them read about phases, waves, and topologies before anything will run. Require an API key for the first action. Print a stack trace on first launch if their env is not set up. Fail silently on missing config. Bury the cost meter so they get a surprise bill on day three. Have docs and product disagree.
FACILITATOR: Now invert each one. Quickly.
TECH LEAD: Untitled cards becomes filtered-to-six. Read first becomes do first. API key required becomes local-fallback model so the first DAG always completes. Stack trace becomes assumed-defaults with a "change these" link. Silent failure becomes a live thought stream. Surprise bill becomes pre-execution cost estimate. Doc-product disagreement becomes the SKILL.md is the docs. That is seven ideas in ninety seconds, and none of them are "an agent explains things." Reverse is doing something different.
FACILITATOR: That is exactly why we switch. The first technique gave us "what could we add." Reverse gives us "what must we remove." Different muscle.
The three sessions produced, in total: six structured heuristics in when-because-do-unless form, two anti-patterns with concrete teaching cases, and one ranked product shortlist of four ideas with two preserved-but-rejected options. None of this content existed in any book. All of it was extracted or generated by the agents in under two hours of session time. The full session artifacts are in /blog/sessions/two-skills-for-the-l3/ if you want to see what a complete output looks like.
Why they're complementary, not redundant
These are the two halves of the same problem. One pulls judgment out of someone who has it. The other generates options when nobody has the answer yet. They share architecture.
| Dimension | Interview agent | Brainstorm agent |
|---|---|---|
| The problem | Expert exists, judgment trapped in their head | No one has the answer yet |
| Core discipline | Four passes through one specific case | Strict separation of generate and evaluate |
| Worst failure mode | Accepting "I just knew" as an answer | Picking the first idea that survives critique |
| What you walk out with | Conditional rules with cause and exception | Shortlist + rejected-but-interesting list |
| Sign it worked | "I never thought about it that way" | The shortlist surprises someone |
Both agents have the same architectural bones: a tight 400-line specification, a library of references that load on demand, an interactive driver that intervenes when you start to drift, and a structured output artifact at the end. Neither session ends with "let's circle back." Both refuse the things people commonly want to do that compromise the protocol.
What you can do with these
If you're an engineering manager: run a CDM session with anyone leaving the team. The hour you spend will produce more useful documentation than they'd write in a week.
If you're a tech lead: run a brainstorm before the next architecture review, not during it. The architecture review then has options to weigh, not a single proposal to ratify.
If you're a product manager: run CDM on the customer support team about your top three churn patterns. The patterns they "just know" are usually the ones product roadmaps miss.
If you're a founder: run both. Capture your own reasoning before it gets blurry. Generate options before committing to the first plausible path.
If you're building AI agents: these are the input pipelines for the part of expertise that doesn't fit in a book. Run CDM on practitioners, feed the conditional rules into your agent's skill library. Run brainstorms on the open questions you don't have answers to yet. The combined output is a kind of knowledge your competitors won't have because they're still distilling Wikipedia.
~/.claude/skills/. In a Claude Code session, ask:> Use the cdm-interviewer agent to interview me about [a hard past incident]
# Generate options for an open question
> Use the brainstorm-facilitator agent to help me think through [open question]
The agents work with any Claude Code project on your machine — they're not tied to a specific codebase or product. To share them with a team, copy the four files (two skill directories plus two agent files) into a shared plugin.
Install These Skills
Drop them into ~/.claude/skills/ and they activate automatically. Works with Claude Code, Codex, Gemini CLI, Cursor, and 40+ agents.
Predict your next DAG. Zero API cost.
$ claude plugin marketplace add curiositech/windags-skills && claude plugin install windags-skills$ git clone https://github.com/curiositech/windags-skills.git && cp -r windags-skills/skills/next-move ~/.claude/commands/Design, create, and audit agent skills.
$ claude plugin marketplace add curiositech/windags-skills && claude plugin install windags-skills$ git clone https://github.com/curiositech/windags-skills.git && cp -r windags-skills/skills/skill-architect ~/.claude/commands/Guides creation of high-quality skills with scoring.
$ claude plugin marketplace add curiositech/windags-skills && claude plugin install windags-skills$ git clone https://github.com/curiositech/windags-skills.git && cp -r windags-skills/skills/skill-coach ~/.claude/commands/The skills we shipped this weekend close a loop. Books gave us facts. Practitioners have judgment. Half of it can be captured from people who already have it. The other half has to be generated from open questions nobody's answered yet. The next CDM session we run will be on the brainstorm facilitator itself — we'll let you know what comes out.