MCP server reference

What your agent can call

The full surface of the WinDAGs MCP server. 551 specialist skills, zero API keys for the read-only tools, on-demand reference loading. Install it once with brew install and any MCP-speaking client gets the toolset.

shell
$ brew install curiositech/windags/windags && windags init

Then in Claude Code: claude mcp add windags -- windags-mcp. Other clients: point your MCP config at windags-mcp over stdio.

Shipping

windags_skill_graftShipping
#

Skill grafting with discovery, estimate, and full-body modes so agents do not burn context blindly.

Runs the same six-stage cascade as skill_search, then either returns a body-free shortlist or loads the actual SKILL.md bodies for the primary skills. Use mode:'ids' while browsing: it returns skill IDs, one-line descriptions, confidence scores, and a windags_skill_reference pointer for each full body. Use mode:'estimate' to get the shortlist plus context_cost and warnings. Use mode:'full' when you are ready to load bodies; if estimated body tokens exceed max_body_tokens, the response preflights with no bodies unless allow_large_payload:true is set. The default body threshold is 8K estimated tokens, with warnings around 6K.

Inputs
  • task: stringWhat the user is trying to do
  • count: number = 2How many primary skills to graft (capped at 3)
  • mode: "ids" | "estimate" | "full" = "full"Discovery, preflight estimate, or full-body response
  • max_body_tokens: number = 8000Abort full-body return when estimated skill-body tokens exceed this threshold
  • allow_large_payload: boolean = falseExplicit opt-in for oversized full-body graft responses
Returns

{ task, mode, context_cost, warnings, primary, secondary, reasoning } or { mode:'preflight', aborted:true, retry_with } when bodies would be too large.

Example
windags_skill_graft({ task: "set up stripe webhook with retries", mode: "ids", count: 2 })
CostCascade runs locally. Body tokens are estimated with ceil(characters / 4). Zero API keys, no phone-home.
windags_skill_referenceShipping
#

Load SKILL.md or one reference file from a skill on demand.

Skills carry SKILL.md plus references/, scripts/, templates/, diagrams/, and examples/ directories. Discovery-mode graft responses point here with file_path:'SKILL.md' for full body loading. Full graft responses include manifests, and the agent calls this tool when a specific reference is worth pulling into context. Returns the file contents as text. If the path is wrong, returns the actual file list so the agent can pivot.

Inputs
  • skill_id: stringID from a previous graft response
  • file_path: stringPath within the skill, e.g., SKILL.md or references/oauth-flow-types.md
Returns

{ contents } on success, or { error, available_paths } if the path doesn't exist.

Example
windags_skill_reference({ skill_id: "stripe-webhooks", file_path: "SKILL.md" })
Cost~10 ms, local read. Zero API keys.
windags_historyShipping
#

Recent /next-move predictions for context.

Returns the last N /next-move predictions from your local history (~/.windags/history/). Useful for resuming work, comparing predictions, or letting an agent see what was recently planned without re-running.

Inputs
  • limit: number = 10How many recent predictions to return
Returns

Array of prediction summaries with timestamp, prompt, top skill, and acceptance state.

Cost~2 ms, local SQLite read. Zero API keys.
windags_skill_search_batchShipping
#

N cascade searches in one round-trip — designed for DAG planners.

When a planner is materializing a multi-node DAG, calling skill_search per node costs N MCP round-trips. This tool takes an array of {query, limit} and runs them in parallel, returning results in the same order. Same six-stage cascade as skill_search per query — bundled+user catalogs, BM25 + MiniLM + RRF + cross-encoder + per-user attribution k-NN + cross-user global priors. Hard cap of 20 queries per call to bound payload size.

Inputs
  • queries: { query: string; limit?: number }[]Array of search queries (max 20)
Returns

{ batch_size, results: [{ query, stage, total_matches, skills }] } — preserves input order.

Example
windags_skill_search_batch({ queries: [{ query: 'caching', limit: 5 }, { query: 'docker', limit: 5 }] })
CostSame per-query cost as skill_search; N queries run in parallel after first model load.
windags_skill_graft_batchShipping
#

N grafts in one round-trip, with aggregate context-cost preflight for planners.

Same shape as skill_search_batch but for graft. Array of {task, count} runs in parallel and supports the same mode:'ids' | 'estimate' | 'full' contract as the single graft tool. Per-task primary count is capped at 3, and max_body_tokens is evaluated against the aggregate skill-body payload for the whole batch. If a planner asks for full bodies and the batch would be too large, the tool returns mode:'preflight', a body-free result set, warnings, and retry_with instructions.

Inputs
  • tasks: { task: string; count?: number }[]Array of graft requests (max 20, count capped at 3 each)
  • mode: "ids" | "estimate" | "full" = "full"Discovery, aggregate estimate, or full-body batch response
  • max_body_tokens: number = 8000Aggregate skill-body token threshold for full-body batch responses
  • allow_large_payload: boolean = falseExplicit opt-in for oversized full-body batch responses
Returns

{ batch_size, mode, context_cost, warnings, results } or { mode:'preflight', aborted:true, retry_with } when aggregate bodies would be too large.

Example
windags_skill_graft_batch({ mode: 'ids', tasks: [{ task: 'design REST API' }, { task: 'add caching layer' }] })
CostSame per-task retrieval cost as skill_graft; tasks run in parallel after first model load. Context estimate covers the entire response.
windags_node_requirementsShipping
#

Per-node specs a DAG planner needs — allowed-tools, pairs-with, and provider-native model IDs.

Given an array of skill IDs, returns each skill's allowed-tools (from frontmatter), pairs-with relationships, suggested model_tier (fast/balanced/powerful — heuristic by category), and a list of provider-native model IDs that match the tier. Critical: model IDs are real, provider-accepted strings (`gpt-5.4-nano`, `llama-3.1-8b-instant`, `claude-haiku-4-5-20251001`), not abstract Anthropic-only labels. Fixes the bug where DAGs emitted bare "haiku" / "sonnet" strings that 400'd on OpenAI/Groq/etc. The full all_tier_options block is included so a planner can pick a different provider per node if needed.

Inputs
  • skill_ids: string[]Up to 50 skill IDs to look up
Returns

{ requirements: [{ skill_id, name, category, allowed_tools, pairs_with, recommended_model_tier, model_options }], tier_legend, all_tier_options }

Example
windags_node_requirements({ skill_ids: ['api-architect', 'data-pipeline-engineer'] })
Cost~5 ms after catalog warm-up. No external calls.
windags_validate_dagShipping
#

Schema-check a candidate DAG before you save it.

A DAG planner emits JSON; this tool validates it against the PredictedDAG schema (waves, nodes, dependencies, premortem, confidence, problem_classification). Returns either { valid: true, plus a summary } or { valid: false, errors: ["path.to.field: message", ...] }. The schema is permissive on most fields (defaults applied) but strict on shape — a malformed wave or missing required node field surfaces clearly. Use this during multi-step planning to catch problems before they reach the executor.

Inputs
  • dag: unknownAny JSON-shaped object claimed to be a PredictedDAG
Returns

{ valid: true, title, topology, wave_count, node_count, confidence } | { valid: false, errors: string[] }

Example
windags_validate_dag({ dag: { title: 'my plan', waves: [{ nodes: [{ id: 'r', skill_id: 'research-craft', role_description: 'Research' }] }] } })
Cost~2 ms. No external calls. Pure zod validation against the canonical schema.
windags_estimate_costShipping
#

Per-node + total cost estimate during planning, surfaced before the executor's runtime cost gate.

Given the planned nodes (id, skillIds, description, dependencies, optional model + referenceFileCount) plus a defaultModel tier, returns the predicted total USD + total tokens + a per-node breakdown. Char-based estimator (no tokenizer) — calibrated to Anthropic Claude tier pricing. Treat as planning-time order-of-magnitude, not a billing prediction. The point is to surface roughly-how-much-this-will-cost during predict, before the runtime cost gate fires at execution time. Reads bundled SKILL.md bodies from disk to estimate prompt size.

Inputs
  • nodes: { id, skillIds[], description, dependencies[], model?, referenceFileCount? }[]DAG nodes in dependency order
  • defaultModel: "haiku" | "sonnet" | "opus"Tier used when a node doesn't specify one
Returns

{ total_cost_usd, total_tokens, node_count, default_model, per_node: [{ node_id, input_tokens, output_tokens, total_tokens, cost_usd }] }

Example
windags_estimate_cost({ nodes: [{ id: 'r', skillIds: ['research-craft'], description: 'Research', dependencies: [] }, { id: 'w', skillIds: ['technical-writer'], description: 'Write', dependencies: ['r'], model: 'sonnet' }], defaultModel: 'haiku' })
Cost~10 ms. Local skill body reads + arithmetic. No external calls.

Preview

On the roadmap. We're publishing the design now so you can argue with the API surface before it lands. Open an issue at curiositech/windags-skills.

windags_run_skill_scriptPreview
#

Execute a script bundled with a skill (locally, sandboxed) and return its output.

Many skills carry executable scripts (compliance checks, code generators, lint rules). Today an agent has to inline the script into context to use it. This tool runs the script on the user's machine in a subprocess sandbox (no network unless --remote=allow, time + output caps), returning only stdout/stderr/exit. The agent gets the result without burning context on the script body. Local-only by design — the MCP server is what runs on the user's machine; api.windags.ai never sees the user's files.

Inputs
  • skill_id: stringSkill that owns the script
  • script_path: stringRelative path within the skill, e.g., scripts/lint.py
  • args: string[]Arguments passed to the script
  • stdin: string?Optional stdin payload
Returns

{ stdout, stderr, exit_code, duration_ms, truncated }

CostLocal subprocess. Sandboxed: 30s wall clock, 1 MB output cap, no network by default.
windags_list_skill_assetsPreview
#

Index the references/, scripts/, templates/, examples/ for a grafted skill.

Graft already returns a per-skill manifest, but this lets the agent re-list assets later in the conversation without re-grafting. Returns paths + sizes + a one-line peek so the agent can decide what's worth pulling via skill_reference. Also surfaces script entry points (scripts/lint.py, scripts/migrate.sh) so the agent knows what's runnable via run_skill_script.

Inputs
  • skill_id: stringSkill to index
  • kind: "references" | "scripts" | "templates" | "examples" | "all" = "all"Filter
Returns

Array of { path, kind, bytes, peek }.

Cost~5 ms, local fs walk. Zero API keys.
windags_pairs_withPreview
#

Graph traversal: given a skill, return the skills it pairs with (frontmatter pairs-with field).

Skills declare their natural partners in YAML frontmatter (`pairs-with: [{skill: data-pipeline-engineer, reason: ...}]`). When the agent realizes a grafted skill needs reinforcement, this tool returns the partners without a fresh search.

Inputs
  • skill_id: stringThe skill to traverse from
  • depth: number = 11 = direct partners, 2 = partners-of-partners
Returns

Array of { id, name, reason, distance }.

Cost~3 ms, in-process graph walk.