Insights — what the taxonomy is actually telling us

Personal commentary after looking at all 559 skills, the embeddings, and the declared metadata. The numbers in the other reports are factual; this doc is editorial. Treat as a punch list.

TL;DR

The skill catalog is a mature corpus that's outgrown its filing cabinet. The skills themselves are well-written — bodies are thoughtful, references exist, quality is solid. The problems are organizational: too many categories, weak reciprocal pairing, a tag space with no controlled vocabulary, and ~22 skill IDs that were silently truncated to 40 chars and collide visually even when they don't collide as identifiers.

The good news is that almost everything below is fixable with one or two background agents that already half-exist in pd-fleet.yml.

Top 8 things I noticed

1. The category system is broken

60 categories for 559 skills. 32 are sparse (< 3 members). 39 skills are uncategorized. Worse, several of the giant categories have near-twins:

Pair	Centroid cosine
`Design & Creative` ⟷ `Design & UX`	0.954
`Agent & Orchestration` ⟷ `DAG Framework`	0.950
`Research & Academic` ⟷ `AI & Agents`	0.946
`Cognitive Science & Decision Making` ⟷ `(uncategorized)`	0.948
`DevOps & Infrastructure` ⟷ `DevOps & Site Reliability`	0.895
`Backend & Infrastructure` ⟷ `DevOps & Site Reliability`	0.902

The (uncategorized) cluster is itself coherent at 0.95 with Cognitive Science & Decision Making — meaning many uncategorized skills have an obvious home but never got assigned one.

Recommendation: define a controlled vocabulary of ~15–18 canonical categories, then run a one-shot Haiku reclassification pass with the embedding neighborhoods as the prior. The taxonomy schema should live in a small file (skills/_taxonomy.yaml or similar) so the validator can lint against it.

2. We have literal duplicates of papers

Cluster	Skills
Knowledge distillation	`knowledge-distillation-a-survey`, `knowledge-distillation-deep`, `knowledge-distillation-in-deep-learning`, `knowledge-distillation-survey` (4 skills, all > 0.94 cosine to each other)
Rao & Georgeff BDI	`rao-georgeff-1991-modeling-rational-agents-bdi`, `rao-georgeff-1995-bdi-agents-from-theory-to-practice`, `agentspeak-bdi`
Wooldridge MAS	`ai-wiley-wooldridge-an-introduction-to-multi-agent-systems`, `wooldridge-multiagent-intro` (0.95)
Hypertree planning	`chen-et-al-2025-hypertree-planning`, `hypertree-planning` (0.95)
Refactoring	`refactor-architect`, `refactoring-surgeon`, `script-refactorer`
Cognitive systems engineering	`the-state-of-cognitive-systems-engineeri`, `the-role-of-cognitive-systems-engineerin` (0.945, note the truncated names)

For the bibliography ones (Wooldridge, Rao & Georgeff, Knowledge Distillation), this looks like the result of multiple ingestion passes that didn't dedupe. For windags-architect vs windags-architect-v1 (cosine 0.94) it's clearly a versioning leak that should be one canonical skill.

3. ~22 skill IDs are truncated at exactly 40 chars

There's a slug-length cap somewhere in the ingestion pipeline that stops at 40 characters, often mid-word:

the-state-of-cognitive-systems-engineeri    ← engineering
the-role-of-cognitive-systems-engineerin    ← engineering
mastering-the-game-of-go-with-deep-neura    ← neural
fast-and-practical-dag-decomposition-wit    ← with…
causal-reasoning-initial-report-of-a-nat    ← naturalistic

These IDs collide visually (the first two are different papers), are hard to remember, and break IDE auto-complete heuristics. Four of them are also uncategorized — likely because the importer choked when it saw the malformed slug.

Fix: raise the cap to ~64, normalize, and have the next ingestion pass canonicalize the existing 22.

4. pairs-with reciprocity is 34%

Of 958 declarations, only 163 are mutual. That means when skill A says "I work well with B", 66% of the time B doesn't say anything back. The recommender already uses this graph; one-way edges silently weaken every "what else should I run with this?" suggestion.

The existing pairs-with-validator agent in pd-fleet.yml only checks for broken references (5 found — see 02-pairs-with.md). It should be extended (or paired with a sibling agent) that proposes the missing reciprocations — usually a one-line edit to the partner skill.

5. 157 orphans (28%)

These skills have zero in-pairs and zero out-pairs. The recommender has nothing to suggest alongside them. A handful of these are clearly meant to be leaf skills (port-daddy-marketing-copy, init, review), but most are just skills nobody wired up.

The cluster that surprised me most: a chunk of the academic bibliography is orphaned. Papers like klein-1998-sources-of-power, polya-problem-solving, mythical-man-month have no pairs-with in or out — even though their applied counterparts do.

6. The tag space is feral

1799 unique tags for 559 skills. ~3.2 tags per skill on average, with a heavy long tail of singletons. There's no controlled vocabulary, no synonym map (e.g. agent vs agents, orchestration vs orchestrator), no parent tags. This makes tag-based filtering on the website mostly useless.

Solution: a tags.yaml controlled vocabulary plus a normalizer that runs on ingestion. ~30–60 canonical tags, with synonym aliases. The 1799 → ~50 collapse will probably also reveal natural sub-categories.

7. The hubs are not who I'd have guessed

Top-degree nodes by pairs-with:

deg=25  web-design-expert
deg=21  task-decomposer
deg=20  devops-automator
deg=16  interview-loop-strategist
deg=15  ai-engineer
deg=14  interview-simulator
deg=13  color-theory-palette-harmony-expert
deg=13  technical-writer

web-design-expert is the most connected skill in the entire library. That's interesting — it's also a strong piece of evidence that the design sub-graph is the most internally consolidated, which matches the recent warm-palette migration project.

What's not there is striking: none of the WinDAGs DAG-framework meta-skills (dag-planner, dag-graph-builder, dag-orchestrator, windags-architect) are in the top 20. For a project where DAG composition is the marquee capability, that suggests the DAG family talks to itself but hasn't been wired into the rest of the catalog yet.

8. Tool2Vec cache: stale and the wrong dimensions

The cache at ~/.windags/tool2vec-embeddings.json:

Last updated 2026-04-29 — 6 days stale
Contains 547/559 skills (12 missing, including ideal-web-app-builder, ultimate-seo-geo-skill, port-daddy-marketing-copy, find-skills, imagegen, openai-docs, plugin-creator, skill-installer, ui-ux-pro-max, plus three port-daddy-related)
Embeddings are 1536-dimensional, but LocalEmbeddingService produces 384-dim all-MiniLM-L6-v2 vectors. Whoever built the cache used a different embedder (Voyage / OpenAI ada-style). New embeddings won't be comparable.

The tool2vec-refresher fleet agent (pd-fleet.yml, schedule 0 3 * * *) should be catching these. Either it's not firing or it's failing silently.

Background fleet agents — what to add

pd-fleet.yml has good bones. Here's what I'd add or fix:

Agent	Status	What to do
`tool2vec-refresher`	exists, broken or stale	Add a healthcheck note + alert. 12 skills missing, cache 6 days old.
Embedder dimension lock	missing	Pick one embedding service (recommend MiniLM 384-dim, free + local). Add a CI check that the cache dim matches `LocalEmbeddingService.DIMENSIONS`.
`pairs-with-validator`	exists, validates only	Extend to propose reciprocations when A→B exists but B→A doesn't, write a `.skill-suggestions/<skill>-add-pairs.md` per skill.
`skill-id-canary`	missing	Flag any skill with `len(id) == 40` (truncation suspect). 22 cases right now.
`near-dup-detector`	missing	Run on commit. Compute embedding, flag if cosine > 0.9 to anything in the catalog. Auto-comment on the commit asking "should this merge into X?".
`category-normalizer`	missing	Maintain `skills/_taxonomy.yaml` (canonical categories), reject SKILL.md frontmatter that uses a non-canonical category. Suggest the closest canonical when violated.
`tag-normalizer`	missing	Same idea but for `tags`. Maintain a synonym map and singularize/lowercase. Right now we have `agent` and `agents` and `agent-systems` as separate tags.
`orphan-suggester`	missing	For each orphan, propose 3 candidate `pairs-with` from semantic neighbors with cosine > 0.6. The skill author can accept or reject.
`skill-cartographer`	exists, probably fine	Already counts skill growth — extend to report category-balance health (sparse/giant ratio over time).

Critical first move: wire the dim mismatch. Until then, half the recommender stack is operating on stale 1536-dim vectors and any local re-embedding produces 384-dim vectors that can't be compared. Pick MiniLM (it's already shipped with the local service) and rebuild.

What I want to do next (if you say yes)

Build skills/_taxonomy.yaml with ~16 canonical categories.
One-shot Haiku reclassification of all 39 uncategorized + 32 sparse-category skills.
Merge the 4-skill knowledge-distillation cluster into one canonical skill.
Fix the 22 truncated IDs (rename + leave forwarding aliases for back-compat).
Wire the missing fleet agents above as YAML in pd-fleet.yml.
Re-run this analysis monthly; track the orphan/sparse/reciprocity numbers as headline health metrics.

Reproducing this analysis

# 1. scan SKILL.md frontmatter
npx tsx scripts/skill-taxonomy/scan.ts

# 2. compute stats + similarity from cached tool2vec
npx tsx scripts/skill-taxonomy/analyze.ts

# 3. render markdown reports
npx tsx scripts/skill-taxonomy/render-mermaid.ts

# 4. render interactive HTML explorer
npx tsx scripts/skill-taxonomy/render-explorer.ts

# Open in Typora to see mermaid:
open docs/skill-taxonomy/00-overview.md

# Open the explorer:
open docs/skill-taxonomy/explorer.html