ENGINEERING SURFACE

What WinDAGs is, in code.

One page. Subsystems with file links, real numbers, what's shipped vs in flight, open questions. If you'd rather read code than copy, start here.

workgroup-ai on GitHub Benchmarks (74% / 88%)Cascade deep dive

Numbers

Each row is a number I can produce on demand from the repo or a one-liner. No vanity metrics — only ones a reviewer can verify.

478

Skills in catalog

Hand-written SKILL.md, frontmatter + body + reference manifest

~84k

Lines of TS in core

DAG executor, retrieval, observability, runtimes, types

94+

Test files

Vitest, colocated with source

8.4 MB

Tool2Vec cache size

478 × 768d float32 + synthetic task list per skill

~$0.20

Cache build cost

OpenAI gpt-4o-mini for tasks, text-embedding-3-small for vectors

3.1 min

Cache build time

Concurrency 10, full 478-skill catalog, no failures

~25 ms

Per-query latency (warm)

BM25 + Tool2Vec cosine + RRF, all in-process

Cold-start cost

Plugin install ships with BM25 ready; cascade stages opt in

Subsystems

Every row links to the file in the repo. Status reflects what's actually running in the bundled plugin and the desktop app today, not what's designed.

DAG executor

SHIPPED

Topologically sorts subtasks into waves, runs each wave's nodes as parallel claude -p processes, propagates aborts via SIGTERM → SIGKILL. Zero token overhead per node.

Skill retrieval cascade

SHIPPED

Five stages: BM25 → Tool2Vec → RRF fusion → cross-encoder rerank → attribution k-NN. One scoring path; the DAG executor, MCP server, slash skill, and shortlister all call SkillSearchService.search(). The cascade is what gets the agent's first response to look like the one a senior engineer would have written.

Tool2Vec cache generator

SHIPPED

Pre-computes per-skill task-space vectors. For each skill: ask gpt-4o-mini for 15 diverse task descriptions; embed each via text-embedding-3-small@768d; average; L2-normalize. Provider chain (Anthropic → OpenAI → Gemini) auto-falls-through on quota/auth errors. .env.local autoload up the directory tree.

windags-skills/scripts/build-tool2vec-cache.mjs

MCP server (bundled)

PARTIAL

Self-contained MCP server distributed with the windags-skills plugin. Exposes windags_skill_search (BM25 today; cascade in v2.5.0) and windags_skill_graft (returns full skill bodies + reference manifests). Forwards to api.windags.ai when online; falls back to local BM25 when offline.

Cost gate + human-in-the-loop

SHIPPED

CostCalculator returns a per-node + per-wave estimate before execution. The onCostEstimate callback in DAGExecutorOptions can return false to abort. PhaseOrchestrator forwards the gate to each phase's inner executor.

Attribution + skill sharpening

PARTIAL

Five orthogonal signals per graft (completion, accept, downstream rating, self-confidence, shipped) keyed by query embedding in ~/.windags/skill-state.db. Stage 1 (deterministic Floor + Wall + Envelope) runs after every node. Stage 2 (Haiku-based eval) is wired but the haikuCallFn isn't configured by default.

Streaming execution events

SHIPPED

ExecutionEvent → DAGStateEvent via event-mapper. WebSocket server with rooms, buffering, heartbeat, sequence numbers + replay. useDAGStateEvents hook dispatches to a Zustand reducer. Handles node_status_change, wave_transition, cost_update, human_gate_waiting, evaluation, mutation.

Topology dispatch

PARTIAL

Seven topologies built: DAG, Team Loop, Swarm, Blackboard, Team Builder, Recurring, Workflow. Core executors exist for all seven. /api/execute today only dispatches DAG; rolling out the others is gated on UI controls.

packages/core/src/topologies/

Workers API (api.windags.ai)

IN FLIGHT

Cloudflare Worker over D1 FTS5. Implements /v1/graft today (server-side BM25, returns full skill bodies). Tool2Vec via KV + Workers AI cross-encoder is the next-release plan — eliminates the 80 MB local model dependency entirely.

apps/api/src/index.ts

What's next

Concrete commitments, not aspirations. Each row corresponds to a specific commit or release.

Surface	What
Plugin v2.5.0	Bundle Tool2Vec cache; runtime BM25 + Tool2Vec + RRF in mcp-server/index.js
Plugin v2.6.0	Cross-encoder rerank via Workers AI (eliminates the local-model dependency entirely)
Workers API	Tool2Vec cosine over a KV-backed cache; bge-reranker-base via Workers AI
Bench	Vanilla Sonnet 4.7 vs Sonnet + WinDAGs on a 50-prompt held-out set; report at /engineering/benchmarks
Eval	Haiku-based Stage 2 evaluator; needs a default haikuCallFn wired into createQualityHooks()

Open questions

Things I haven't resolved. If you have an opinion on any of these, I'd actually like to hear it.

Cross-encoder rerank served from Workers AI vs. shipping a signed local ONNX bundle via brew tap — same UX, very different supply-chain story.
Whether attribution k-NN should learn from cross-user data with consent, or stay strictly local. Default is local; the question is whether opt-in pooled signal would change cascade quality enough to matter.
How much the Tool2Vec cache drifts as the catalog evolves. Content-hash invalidation works per-skill; the empirical question is how often skills get edited enough to invalidate.
Whether the bandit framing (deleted) was actually wrong, or whether we just hadn't built enough attribution data to make it work. Current bet: it was a category error. Open to being shown otherwise.

Built by Curiositech. Source available under BSL 1.1. The skill catalog is curiositech/windags-skills; the runtime is erichowens/workgroup-ai.