MCP server reference

What your agent can call

The full surface of the WinDAGs MCP server. 551 specialist skills, zero API keys for the read-only tools, on-demand reference loading. Install it once with brew install and any MCP-speaking client gets the toolset.

shell

$ brew install curiositech/windags/windags && windags init

Then in Claude Code: claude mcp add windags -- windags-mcp. Other clients: point your MCP config at windags-mcp over stdio.

Shipping

windags_skill_searchShipping

Six-stage cascade — BM25 + MiniLM + RRF + cross-encoder + local k-NN + cross-user global priors.

Returns ranked candidates with descriptions only (no full bodies — pair with skill_graft to load winners). Stage 1 is wink-bm25 over name + description + tags + category. Stage 2 is cosine similarity vs ~800KB of bundled embeddings (all-MiniLM-L6-v2, 384-dim, one vector per skill, packed as a Float32 array and shipped with the brew formula). Stage 3 is reciprocal rank fusion (Cormack et al, K=60). Stage 4 takes the RRF top-30 and reranks with Xenova/ms-marco-MiniLM-L-6-v2 — a cross-encoder that scores (query, candidate) jointly, catching interactions that bi-encoder cosine misses. Stage 5 is per-user attribution k-NN: every /next-move you run writes a triple to ~/.windags/triples/ with the prompt and the skills that got accepted; this stage embeds the historical prompts (cached after first call), finds the k nearest to your current query, and boosts skills that worked in similar past sessions. Empty history → no-op. Stage 6 blends in cross-user global priors served by api.windags.ai — three nested tiers (manifest_match ⊆ exact_match ⊆ any_match) decomposed into exclusive subsets and capped at a 0.2 blend weight; fetched once at startup with adaptive freshness refresh. Network unreachable or WINDAGS_TELEMETRY=off → Stage 6 silently no-ops. Both MiniLM models (~25MB each) download once and cache in ~/.cache/transformers-js/. Local-first, with one cheap public read per startup.

Inputs

query: string — Free-text task description
limit: number = 10 — How many candidates to return

Returns

Array of { id, name, description, score, breakdown } where breakdown shows each stage's rank.

Example

windags_skill_search({ query: "stripe webhook idempotency", limit: 5 })

CostFirst call: ~25MB model download (~5s on broadband). After that: ~30ms in-process. Zero API keys, no phone-home.

windags_skill_graftShipping

Full SKILL.md bodies for the top N + adjacency descriptions + per-skill asset manifest.

Runs the same six-stage cascade as skill_search (lexical + semantic + RRF + cross-encoder rerank + per-user attribution k-NN + cross-user global priors), then loads the actual SKILL.md bodies from disk for the top N primaries (~10–12K tokens for top-4) plus 4 adjacent catalog entries (name + description, ~500 tokens) for awareness, plus a per-skill asset manifest (references, scripts, templates, examples — paths + sizes) the agent can call windags_skill_reference on. This is the tool measured in the bench article.

Inputs

task: string — What the user is trying to do
count: number = 4 — How many primary skills to graft (full body)

Returns

{ primary: [{ id, name, body, references[] }], adjacencies: [{ id, name, description }], task_summary }

Example

windags_skill_graft({ task: "set up stripe webhook with retries" })

CostFull cascade runs locally — embeddings via Transformers.js, model cached after first use. Zero API keys, no phone-home.

windags_skill_referenceShipping

Load one reference file from a skill on demand.

Skills carry references/, scripts/, templates/, diagrams/, and examples/ directories. The graft response includes a manifest (paths + sizes), and the agent calls this tool when it decides a specific reference is worth pulling into context. Returns the file contents as text. If the path is wrong, returns the actual file list so the agent can pivot.

Inputs

skill_id: string — ID from a previous graft response
file_path: string — Path within the skill, e.g., references/oauth-flow-types.md

Returns

{ contents } on success, or { error, available_paths } if the path doesn't exist.

Example

windags_skill_reference({ skill_id: "stripe-webhooks", file_path: "references/idempotency-keys.md" })

Cost~10 ms, local read. Zero API keys.

windags_historyShipping

Recent /next-move predictions for context.

Returns the last N /next-move predictions from your local history (~/.windags/history/). Useful for resuming work, comparing predictions, or letting an agent see what was recently planned without re-running.

Inputs

limit: number = 10 — How many recent predictions to return

Returns

Array of prediction summaries with timestamp, prompt, top skill, and acceptance state.

Cost~2 ms, local SQLite read. Zero API keys.

windags_skill_search_batchShipping

N cascade searches in one round-trip — designed for DAG planners.

When a planner is materializing a multi-node DAG, calling skill_search per node costs N MCP round-trips. This tool takes an array of {query, limit} and runs them in parallel, returning results in the same order. Same six-stage cascade as skill_search per query — bundled+user catalogs, BM25 + MiniLM + RRF + cross-encoder + per-user attribution k-NN + cross-user global priors. Hard cap of 20 queries per call to bound payload size.

Inputs

queries: { query: string; limit?: number }[] — Array of search queries (max 20)

Returns

{ batch_size, results: [{ query, stage, total_matches, skills }] } — preserves input order.

Example

windags_skill_search_batch({ queries: [{ query: 'caching', limit: 5 }, { query: 'docker', limit: 5 }] })

CostSame per-query cost as skill_search; N queries run in parallel after first model load.

windags_skill_graft_batchShipping

N grafts in one round-trip for materializing a whole DAG up-front.

Same shape as skill_search_batch but for graft — array of {task, count} returns the full SKILL.md bodies + adjacencies + asset manifests for each task in parallel. Per-task primary count is capped at 3 to bound payload size. When you need a deeper reference for a specific node, call windags_skill_reference. Hard cap of 20 tasks per call.

Inputs

tasks: { task: string; count?: number }[] — Array of graft requests (max 20, count capped at 3 each)

Returns

{ batch_size, results: [{ task, primary, adjacencies, cascade, reasoning }] }

Example

windags_skill_graft_batch({ tasks: [{ task: 'design REST API' }, { task: 'add caching layer' }] })

CostSame per-task cost as skill_graft; tasks run in parallel after first model load.

windags_node_requirementsShipping

Per-node specs a DAG planner needs — allowed-tools, pairs-with, and provider-native model IDs.

Given an array of skill IDs, returns each skill's allowed-tools (from frontmatter), pairs-with relationships, suggested model_tier (fast/balanced/powerful — heuristic by category), and a list of provider-native model IDs that match the tier. Critical: model IDs are real, provider-accepted strings (`gpt-5.4-nano`, `llama-3.1-8b-instant`, `claude-haiku-4-5-20251001`), not abstract Anthropic-only labels. Fixes the bug where DAGs emitted bare "haiku" / "sonnet" strings that 400'd on OpenAI/Groq/etc. The full all_tier_options block is included so a planner can pick a different provider per node if needed.

Inputs

skill_ids: string[] — Up to 50 skill IDs to look up

Returns

{ requirements: [{ skill_id, name, category, allowed_tools, pairs_with, recommended_model_tier, model_options }], tier_legend, all_tier_options }

Example

windags_node_requirements({ skill_ids: ['api-architect', 'data-pipeline-engineer'] })

Cost~5 ms after catalog warm-up. No external calls.

windags_validate_dagShipping

Schema-check a candidate DAG before you save it.

A DAG planner emits JSON; this tool validates it against the PredictedDAG schema (waves, nodes, dependencies, premortem, confidence, problem_classification). Returns either { valid: true, plus a summary } or { valid: false, errors: ["path.to.field: message", ...] }. The schema is permissive on most fields (defaults applied) but strict on shape — a malformed wave or missing required node field surfaces clearly. Use this during multi-step planning to catch problems before they reach the executor.

Inputs

dag: unknown — Any JSON-shaped object claimed to be a PredictedDAG

Returns

{ valid: true, title, topology, wave_count, node_count, confidence } | { valid: false, errors: string[] }

Example

windags_validate_dag({ dag: { title: 'my plan', waves: [{ nodes: [{ id: 'r', skill_id: 'research-craft', role_description: 'Research' }] }] } })

Cost~2 ms. No external calls. Pure zod validation against the canonical schema.

windags_estimate_costShipping

Per-node + total cost estimate during planning, surfaced before the executor's runtime cost gate.

Given the planned nodes (id, skillIds, description, dependencies, optional model + referenceFileCount) plus a defaultModel tier, returns the predicted total USD + total tokens + a per-node breakdown. Char-based estimator (no tokenizer) — calibrated to Anthropic Claude tier pricing. Treat as planning-time order-of-magnitude, not a billing prediction. The point is to surface roughly-how-much-this-will-cost during predict, before the runtime cost gate fires at execution time. Reads bundled SKILL.md bodies from disk to estimate prompt size.

Inputs

nodes: { id, skillIds[], description, dependencies[], model?, referenceFileCount? }[] — DAG nodes in dependency order
defaultModel: "haiku" | "sonnet" | "opus" — Tier used when a node doesn't specify one

Returns

{ total_cost_usd, total_tokens, node_count, default_model, per_node: [{ node_id, input_tokens, output_tokens, total_tokens, cost_usd }] }

Example

windags_estimate_cost({ nodes: [{ id: 'r', skillIds: ['research-craft'], description: 'Research', dependencies: [] }, { id: 'w', skillIds: ['technical-writer'], description: 'Write', dependencies: ['r'], model: 'sonnet' }], defaultModel: 'haiku' })

Cost~10 ms. Local skill body reads + arithmetic. No external calls.

Preview

On the roadmap. We're publishing the design now so you can argue with the API surface before it lands. Open an issue at curiositech/windags-skills.

windags_run_skill_scriptPreview

Execute a script bundled with a skill (locally, sandboxed) and return its output.

Many skills carry executable scripts (compliance checks, code generators, lint rules). Today an agent has to inline the script into context to use it. This tool runs the script on the user's machine in a subprocess sandbox (no network unless --remote=allow, time + output caps), returning only stdout/stderr/exit. The agent gets the result without burning context on the script body. Local-only by design — the MCP server is what runs on the user's machine; api.windags.ai never sees the user's files.

Inputs

skill_id: string — Skill that owns the script
script_path: string — Relative path within the skill, e.g., scripts/lint.py
args: string[] — Arguments passed to the script
stdin: string? — Optional stdin payload

Returns

{ stdout, stderr, exit_code, duration_ms, truncated }

CostLocal subprocess. Sandboxed: 30s wall clock, 1 MB output cap, no network by default.

windags_list_skill_assetsPreview

Index the references/, scripts/, templates/, examples/ for a grafted skill.

Graft already returns a per-skill manifest, but this lets the agent re-list assets later in the conversation without re-grafting. Returns paths + sizes + a one-line peek so the agent can decide what's worth pulling via skill_reference. Also surfaces script entry points (scripts/lint.py, scripts/migrate.sh) so the agent knows what's runnable via run_skill_script.

Inputs

skill_id: string — Skill to index
kind: "references" | "scripts" | "templates" | "examples" | "all" = "all" — Filter

Returns

Array of { path, kind, bytes, peek }.

Cost~5 ms, local fs walk. Zero API keys.

windags_pairs_withPreview

Graph traversal: given a skill, return the skills it pairs with (frontmatter pairs-with field).

Skills declare their natural partners in YAML frontmatter (`pairs-with: [{skill: data-pipeline-engineer, reason: ...}]`). When the agent realizes a grafted skill needs reinforcement, this tool returns the partners without a fresh search.

Inputs

skill_id: string — The skill to traverse from
depth: number = 1 — 1 = direct partners, 2 = partners-of-partners

Returns

Array of { id, name, reason, distance }.

Cost~3 ms, in-process graph walk.

← Install Bench: does graft help? →Source: mcp-server/index.js →