Insights — what the taxonomy is actually telling us
Personal commentary after looking at all 559 skills, the embeddings, and the declared metadata. The numbers in the other reports are factual; this doc is editorial. Treat as a punch list.
TL;DR
The skill catalog is a mature corpus that's outgrown its filing cabinet. The skills themselves are well-written — bodies are thoughtful, references exist, quality is solid. The problems are organizational: too many categories, weak reciprocal pairing, a tag space with no controlled vocabulary, and ~22 skill IDs that were silently truncated to 40 chars and collide visually even when they don't collide as identifiers.
The good news is that almost everything below is fixable with one or two
background agents that already half-exist in pd-fleet.yml.
Top 8 things I noticed
1. The category system is broken
60 categories for 559 skills. 32 are sparse (< 3 members). 39 skills are uncategorized. Worse, several of the giant categories have near-twins:
| Pair | Centroid cosine |
|---|---|
Design & Creative ⟷ Design & UX | 0.954 |
Agent & Orchestration ⟷ DAG Framework | 0.950 |
Research & Academic ⟷ AI & Agents | 0.946 |
Cognitive Science & Decision Making ⟷ (uncategorized) | 0.948 |
DevOps & Infrastructure ⟷ DevOps & Site Reliability | 0.895 |
Backend & Infrastructure ⟷ DevOps & Site Reliability | 0.902 |
The (uncategorized) cluster is itself coherent at 0.95 with Cognitive Science & Decision Making — meaning many uncategorized skills have an obvious
home but never got assigned one.
Recommendation: define a controlled vocabulary of ~15–18 canonical
categories, then run a one-shot Haiku reclassification pass with the embedding
neighborhoods as the prior. The taxonomy schema should live in a small file
(skills/_taxonomy.yaml or similar) so the validator can lint against it.
2. We have literal duplicates of papers
| Cluster | Skills |
|---|---|
| Knowledge distillation | knowledge-distillation-a-survey, knowledge-distillation-deep, knowledge-distillation-in-deep-learning, knowledge-distillation-survey (4 skills, all > 0.94 cosine to each other) |
| Rao & Georgeff BDI | rao-georgeff-1991-modeling-rational-agents-bdi, rao-georgeff-1995-bdi-agents-from-theory-to-practice, agentspeak-bdi |
| Wooldridge MAS | ai-wiley-wooldridge-an-introduction-to-multi-agent-systems, wooldridge-multiagent-intro (0.95) |
| Hypertree planning | chen-et-al-2025-hypertree-planning, hypertree-planning (0.95) |
| Refactoring | refactor-architect, refactoring-surgeon, script-refactorer |
| Cognitive systems engineering | the-state-of-cognitive-systems-engineeri, the-role-of-cognitive-systems-engineerin (0.945, note the truncated names) |
For the bibliography ones (Wooldridge, Rao & Georgeff, Knowledge Distillation),
this looks like the result of multiple ingestion passes that didn't dedupe.
For windags-architect vs windags-architect-v1 (cosine 0.94) it's clearly a
versioning leak that should be one canonical skill.
3. ~22 skill IDs are truncated at exactly 40 chars
There's a slug-length cap somewhere in the ingestion pipeline that stops at 40 characters, often mid-word:
the-state-of-cognitive-systems-engineeri ← engineering
the-role-of-cognitive-systems-engineerin ← engineering
mastering-the-game-of-go-with-deep-neura ← neural
fast-and-practical-dag-decomposition-wit ← with…
causal-reasoning-initial-report-of-a-nat ← naturalistic
These IDs collide visually (the first two are different papers), are hard to remember, and break IDE auto-complete heuristics. Four of them are also uncategorized — likely because the importer choked when it saw the malformed slug.
Fix: raise the cap to ~64, normalize, and have the next ingestion pass canonicalize the existing 22.
4. pairs-with reciprocity is 34%
Of 958 declarations, only 163 are mutual. That means when skill A says "I work well with B", 66% of the time B doesn't say anything back. The recommender already uses this graph; one-way edges silently weaken every "what else should I run with this?" suggestion.
The existing pairs-with-validator agent in pd-fleet.yml only checks for
broken references (5 found — see 02-pairs-with.md).
It should be extended (or paired with a sibling agent) that proposes the
missing reciprocations — usually a one-line edit to the partner skill.
5. 157 orphans (28%)
These skills have zero in-pairs and zero out-pairs. The recommender has
nothing to suggest alongside them. A handful of these are clearly meant to be
leaf skills (port-daddy-marketing-copy, init, review), but most are
just skills nobody wired up.
The cluster that surprised me most: a chunk of the academic bibliography is
orphaned. Papers like klein-1998-sources-of-power, polya-problem-solving,
mythical-man-month have no pairs-with in or out — even though their applied
counterparts do.
6. The tag space is feral
1799 unique tags for 559 skills. ~3.2 tags per skill on average, with a
heavy long tail of singletons. There's no controlled vocabulary, no synonym
map (e.g. agent vs agents, orchestration vs orchestrator), no parent
tags. This makes tag-based filtering on the website mostly useless.
Solution: a tags.yaml controlled vocabulary plus a normalizer that runs on
ingestion. ~30–60 canonical tags, with synonym aliases. The 1799 → ~50
collapse will probably also reveal natural sub-categories.
7. The hubs are not who I'd have guessed
Top-degree nodes by pairs-with:
deg=25 web-design-expert
deg=21 task-decomposer
deg=20 devops-automator
deg=16 interview-loop-strategist
deg=15 ai-engineer
deg=14 interview-simulator
deg=13 color-theory-palette-harmony-expert
deg=13 technical-writer
web-design-expert is the most connected skill in the entire library.
That's interesting — it's also a strong piece of evidence that the design
sub-graph is the most internally consolidated, which matches the recent
warm-palette migration project.
What's not there is striking: none of the WinDAGs DAG-framework meta-skills
(dag-planner, dag-graph-builder, dag-orchestrator, windags-architect)
are in the top 20. For a project where DAG composition is the marquee
capability, that suggests the DAG family talks to itself but hasn't been wired
into the rest of the catalog yet.
8. Tool2Vec cache: stale and the wrong dimensions
The cache at ~/.windags/tool2vec-embeddings.json:
- Last updated 2026-04-29 — 6 days stale
- Contains 547/559 skills (12 missing, including
ideal-web-app-builder,ultimate-seo-geo-skill,port-daddy-marketing-copy,find-skills,imagegen,openai-docs,plugin-creator,skill-installer,ui-ux-pro-max, plus three port-daddy-related) - Embeddings are 1536-dimensional, but
LocalEmbeddingServiceproduces 384-dim all-MiniLM-L6-v2 vectors. Whoever built the cache used a different embedder (Voyage / OpenAI ada-style). New embeddings won't be comparable.
The tool2vec-refresher fleet agent (pd-fleet.yml, schedule 0 3 * * *)
should be catching these. Either it's not firing or it's failing silently.
Background fleet agents — what to add
pd-fleet.yml has good bones. Here's what I'd add or fix:
| Agent | Status | What to do |
|---|---|---|
tool2vec-refresher | exists, broken or stale | Add a healthcheck note + alert. 12 skills missing, cache 6 days old. |
| Embedder dimension lock | missing | Pick one embedding service (recommend MiniLM 384-dim, free + local). Add a CI check that the cache dim matches LocalEmbeddingService.DIMENSIONS. |
pairs-with-validator | exists, validates only | Extend to propose reciprocations when A→B exists but B→A doesn't, write a .skill-suggestions/<skill>-add-pairs.md per skill. |
skill-id-canary | missing | Flag any skill with len(id) == 40 (truncation suspect). 22 cases right now. |
near-dup-detector | missing | Run on commit. Compute embedding, flag if cosine > 0.9 to anything in the catalog. Auto-comment on the commit asking "should this merge into X?". |
category-normalizer | missing | Maintain skills/_taxonomy.yaml (canonical categories), reject SKILL.md frontmatter that uses a non-canonical category. Suggest the closest canonical when violated. |
tag-normalizer | missing | Same idea but for tags. Maintain a synonym map and singularize/lowercase. Right now we have agent and agents and agent-systems as separate tags. |
orphan-suggester | missing | For each orphan, propose 3 candidate pairs-with from semantic neighbors with cosine > 0.6. The skill author can accept or reject. |
skill-cartographer | exists, probably fine | Already counts skill growth — extend to report category-balance health (sparse/giant ratio over time). |
Critical first move: wire the dim mismatch. Until then, half the recommender stack is operating on stale 1536-dim vectors and any local re-embedding produces 384-dim vectors that can't be compared. Pick MiniLM (it's already shipped with the local service) and rebuild.
What I want to do next (if you say yes)
- Build
skills/_taxonomy.yamlwith ~16 canonical categories. - One-shot Haiku reclassification of all 39 uncategorized + 32 sparse-category skills.
- Merge the 4-skill
knowledge-distillationcluster into one canonical skill. - Fix the 22 truncated IDs (rename + leave forwarding aliases for back-compat).
- Wire the missing fleet agents above as YAML in
pd-fleet.yml. - Re-run this analysis monthly; track the orphan/sparse/reciprocity numbers as headline health metrics.
Reproducing this analysis
# 1. scan SKILL.md frontmatter
npx tsx scripts/skill-taxonomy/scan.ts
# 2. compute stats + similarity from cached tool2vec
npx tsx scripts/skill-taxonomy/analyze.ts
# 3. render markdown reports
npx tsx scripts/skill-taxonomy/render-mermaid.ts
# 4. render interactive HTML explorer
npx tsx scripts/skill-taxonomy/render-explorer.ts
# Open in Typora to see mermaid:
open docs/skill-taxonomy/00-overview.md
# Open the explorer:
open docs/skill-taxonomy/explorer.html