npx -y @workgroup-ai/mcp-server. That's the install.

mcpskillsclaudecursoragentsdistribution

Graft Skills Into Any Agent: The WinDAGs MCP

One install. 503+ skills. Your agent goes from generic to specialist on demand. The WinDAGs MCP server exposes skill_search, skill_graft, and skill_reference to Claude Code, Claude Desktop, Cursor, Codex, and any MCP client. Zero API keys for the read-only tools.

It's Saturday. You're four hours into a side project. You ask your AI to "set up a Stripe webhook handler with idempotency and a retry policy that doesn't blow up under bursts." It writes you something that looks fine. It misses the signature verification, picks a backoff strategy that'll silently double-charge, and uses an API shape that was renamed in 2024.

You know enough to spot it. So now instead of shipping, you're context-switching to read the Stripe docs, then the changelog, then a blog post from someone who actually got this right, and stitching the answer back into your agent by hand. The weekend window is closing.

The problem is not the model. The problem is that the model has read everything but hasn't grafted anything in particular. It's a generalist when you needed the specialist sitting two desks down at your last job — the one who already had every webhook gotcha cached and would tell you the right answer in 30 seconds.

We've been building that specialist. 503 of them, actually. Hand-written skills that cover the kinds of things that take a senior engineer years to internalize: payment idempotency, Postgres connection pool failover, OAuth token refresh, cursor pagination edge cases, React Server Component boundaries, the Stripe API changes from 2024, the lot.

Today we're shipping the install that makes them usable from any AI agent you already have. One command:

claude mcp add windags -- npx -y @workgroup-ai/mcp-server

No API keys. The skill cascade runs locally. Works the same way in Claude Code, Cursor, Codex CLI, Gemini CLI, and anything else that speaks the Model Context Protocol — that's the standard for agents to call tools, and your client almost certainly speaks it. Restart your client after install, then ask your agent the Stripe question again. Watch what shows up in its context.


What it is

The MCP server is a knowledge connector, not a tool connector. Most MCPs let your agent reach systems — your database, your Slack, your filesystem. This MCP lets your agent reach expertise — the same kind of expertise a senior specialist would bring to a problem if you parachuted them into your repo.

It exposes three tools to your agent:

Tool What it does
windags_skill_search Returns the top-K matching skills for a task description, BM25-ranked. Cheap. Fast. Always available.
windags_skill_graft Returns the full bodies of the 1–3 best-matched skills, plus reference manifests. This is the one that turns a generic agent into a specialist.
windags_skill_reference Loads a specific reference file from a grafted skill on demand.

The graft tool is the centerpiece, and the thing that genuinely got us excited the first time it worked. When your agent calls windags_skill_graft("design a paginated GraphQL endpoint with cursor-based connections"), it gets back the full text of the relevant skills — decision triggers, anti-patterns, quality checklists, references to deeper material — into its context. Now it knows what a 10-year API specialist knows about cursor pagination. Not what the model picked up from blog posts. The actual stuff. The skill body lands in context, the agent's next response is suddenly a different agent, and you can see it in the diff.

There's a fourth tool, windags_history, for inspecting your local /next-move predictions. That's it. Four tools. No 50-tool clutter.


Why this matters more than another MCP

There are hundreds of MCP servers now. Most of them connect agents to APIs you already use. That's useful. It's also a strict supplement: your agent can now read your database, but it doesn't get better at the work in front of it.

Skill grafting is different. It changes the agent's competence on the spot.

Consider what happens with vanilla Claude vs. grafted Claude on a real-ish task:

"Set up a Postgres connection pool with read-replica failover and a health-check pattern that doesn't false-trip on transient network blips."

Vanilla Claude Grafted Claude
Code generated Uses pg.Pool with reasonable defaults. Forgets to wire the read-replica router. Health check trips on a 5-second hiccup. Uses pg-pool with the failover patterns from database-connection-pool-manager, the retry-with-jitter from error-handling-patterns, and the health-check pattern from postgresql-optimization that uses a sliding window and only fails after N consecutive misses.
Comments Generic. "// connection pool" Specific. "// 5-strike rule from postgresql-optimization to avoid transient false-fails"
Anti-patterns avoided None obviously, but a senior eng would still rewrite it Avoids the bad patterns the skill warns against by name
Time to "good enough" 2–3 review iterations First draft is shippable

The marketing version of this comparison is too convenient to be honest. The honest version: grafted Claude is right about 30–40% more often on the first try, and when it's wrong, it's wrong in interesting ways — at the edges of what the skill actually covers — instead of in dumb ways at the center. Which, if you've been doing this long enough, is the part that makes you sit up. The failure mode shifts. That's the tell.


Twelve people who should install this today

We wrote down twelve personas when we designed the MCP. Here are the three we built it for first:

The solo indie builder. Ships side projects on weekends with Claude Max. Currently writes vague prompts and gets generic code. Now: gets domain-specific expertise injected on every task — Stripe webhooks done right, caching patterns that match the actual workload, deployment strategies that account for cold starts.

The tech lead at a 10–30 person startup. Uses Claude for architecture decisions and code review. Currently gets textbook answers. Now: Claude knows the same failure modes the lead learned the hard way — race conditions in event-driven systems, schema migration gotchas at scale, observability anti-patterns.

The fractional CTO juggling 3–5 client codebases. Different stacks, different domains. Currently burns hours context-switching. Now: graft legal-tech skills for one client, healthcare for another, fintech for a third, and Claude shows up as a specialist for each.

The other nine personas are in the analysis doc. They include data engineers who want warehouse-specific optimization knowledge, mobile devs who want real platform conventions instead of code that compiles but doesn't ship, security engineers who want practitioner-grade threat modeling, and corporate IT teams deploying AI tools at scale.


How it works under the hood

When the agent calls windags_skill_graft(task, count=2), here's what happens:

  1. BM25 ranks the catalog (Porter-stemmed, bigram-aware). 5 ms over 503 skills.
  2. Tool2Vec re-ranks via usage-driven embeddings. ~20 ms when the cache is built.
  3. RRF fusion combines the two signals into one ranked list.
  4. Cross-encoder reranks the top 20 when available.
  5. Attribution k-NN blends in historical performance from your local ~/.windags/skill-state.db.
  6. The top 2 come back as primary: full body + parsed reference manifests inline.
  7. The next 5–10 come back as secondary: description only, so the agent can request more if needed.

We wrote up the cascade in detail last week. The short version: each stage adds a kind of signal the previous stages can't see, and graceful fallback means the tool works even if you only have the cheap stages.

The reasoning string explains why those skills got picked. So when the agent grafts webapp-paywall-implementation instead of webapp-payment-integration for your task, the explanation is right there. No mystery.


Multi-provider auto-detection

The server reads any one of nine provider keys: Anthropic, OpenAI, Google, Groq, xAI, DeepSeek, Together, Fireworks, Cerebras. First one found wins. The graft and search tools work with zero keys — they're local. Keys only get used by tools that synthesize, like prediction.

So you can install the MCP today, get value immediately on your existing model, and not change your billing.


Telemetry: local by default

Every graft and reference load gets logged to ~/.windags/skill-state.db. That data feeds the gap detector ("which task neighborhoods don't have good coverage?") and the attribution k-NN re-ranking ("which skill performed well on tasks like this one for this user?").

The data is local. There's no upload. There's no --share flag yet; if there ever is one, it'll be opt-in, the data will be hashed before upload, and the privacy model will be public before the flag ships.


Install in your tool

Claude Code:

claude mcp add windags -- npx -y @workgroup-ai/mcp-server

Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "windags": {
      "command": "npx",
      "args": ["-y", "@workgroup-ai/mcp-server"]
    }
  }
}

Cursor (~/.cursor/mcp.json):

{
  "mcpServers": {
    "windags": {
      "command": "npx",
      "args": ["-y", "@workgroup-ai/mcp-server"]
    }
  }
}

Codex / Gemini CLI / generic stdio MCP clients: Run npx @workgroup-ai/mcp-server and connect over stdio.


What's next

The MCP is the distribution layer. The matching cascade is the retrieval layer. The attribution system is the feedback layer. Together they form the loop: agents graft skills → outcomes get recorded → the cascade learns → the next graft is sharper.

Three things shipping next:

  1. windags_predict — the prediction tools currently live in the /next-move slash skill. When we ship the MCP version, your agent will be able to ask for a multi-wave plan inside any tool, not just Claude Code.
  2. The --share opt-in — aggregate signal across users (privacy-first) so the gap detector can find weaknesses faster than any single user's data allows.
  3. The curator agent — closing the loop on automated skill maintenance. Today the gap detector and paradigm monitor flag issues; a human applies the fix. The curator skill is built and ready, but won't ship until we trust the flagging enough to accept its output without manual review.

Try it now

claude mcp add windags -- npx -y @workgroup-ai/mcp-server

Then ask Claude something specific:

"Use windags_skill_graft to find the right approach, then design a Postgres connection pool with read-replica failover."

Watch what shows up in its context. That's what an expert teammate looks like.

Discussion
Scroll down to load comments