Taxonomy/Insights

Insights — what the taxonomy is actually telling us

Personal commentary after looking at all 559 skills, the embeddings, and the declared metadata. The numbers in the other reports are factual; this doc is editorial. Treat as a punch list.

TL;DR

The skill catalog is a mature corpus that's outgrown its filing cabinet. The skills themselves are well-written — bodies are thoughtful, references exist, quality is solid. The problems are organizational: too many categories, weak reciprocal pairing, a tag space with no controlled vocabulary, and ~22 skill IDs that were silently truncated to 40 chars and collide visually even when they don't collide as identifiers.

The good news is that almost everything below is fixable with one or two background agents that already half-exist in pd-fleet.yml.

Top 8 things I noticed

1. The category system is broken

60 categories for 559 skills. 32 are sparse (< 3 members). 39 skills are uncategorized. Worse, several of the giant categories have near-twins:

PairCentroid cosine
Design & CreativeDesign & UX0.954
Agent & OrchestrationDAG Framework0.950
Research & AcademicAI & Agents0.946
Cognitive Science & Decision Making(uncategorized)0.948
DevOps & InfrastructureDevOps & Site Reliability0.895
Backend & InfrastructureDevOps & Site Reliability0.902

The (uncategorized) cluster is itself coherent at 0.95 with Cognitive Science & Decision Making — meaning many uncategorized skills have an obvious home but never got assigned one.

Recommendation: define a controlled vocabulary of ~15–18 canonical categories, then run a one-shot Haiku reclassification pass with the embedding neighborhoods as the prior. The taxonomy schema should live in a small file (skills/_taxonomy.yaml or similar) so the validator can lint against it.

2. We have literal duplicates of papers

ClusterSkills
Knowledge distillationknowledge-distillation-a-survey, knowledge-distillation-deep, knowledge-distillation-in-deep-learning, knowledge-distillation-survey (4 skills, all > 0.94 cosine to each other)
Rao & Georgeff BDIrao-georgeff-1991-modeling-rational-agents-bdi, rao-georgeff-1995-bdi-agents-from-theory-to-practice, agentspeak-bdi
Wooldridge MASai-wiley-wooldridge-an-introduction-to-multi-agent-systems, wooldridge-multiagent-intro (0.95)
Hypertree planningchen-et-al-2025-hypertree-planning, hypertree-planning (0.95)
Refactoringrefactor-architect, refactoring-surgeon, script-refactorer
Cognitive systems engineeringthe-state-of-cognitive-systems-engineeri, the-role-of-cognitive-systems-engineerin (0.945, note the truncated names)

For the bibliography ones (Wooldridge, Rao & Georgeff, Knowledge Distillation), this looks like the result of multiple ingestion passes that didn't dedupe. For windags-architect vs windags-architect-v1 (cosine 0.94) it's clearly a versioning leak that should be one canonical skill.

3. ~22 skill IDs are truncated at exactly 40 chars

There's a slug-length cap somewhere in the ingestion pipeline that stops at 40 characters, often mid-word:

the-state-of-cognitive-systems-engineeri    ← engineering
the-role-of-cognitive-systems-engineerin    ← engineering
mastering-the-game-of-go-with-deep-neura    ← neural
fast-and-practical-dag-decomposition-wit    ← with…
causal-reasoning-initial-report-of-a-nat    ← naturalistic

These IDs collide visually (the first two are different papers), are hard to remember, and break IDE auto-complete heuristics. Four of them are also uncategorized — likely because the importer choked when it saw the malformed slug.

Fix: raise the cap to ~64, normalize, and have the next ingestion pass canonicalize the existing 22.

4. pairs-with reciprocity is 34%

Of 958 declarations, only 163 are mutual. That means when skill A says "I work well with B", 66% of the time B doesn't say anything back. The recommender already uses this graph; one-way edges silently weaken every "what else should I run with this?" suggestion.

The existing pairs-with-validator agent in pd-fleet.yml only checks for broken references (5 found — see 02-pairs-with.md). It should be extended (or paired with a sibling agent) that proposes the missing reciprocations — usually a one-line edit to the partner skill.

5. 157 orphans (28%)

These skills have zero in-pairs and zero out-pairs. The recommender has nothing to suggest alongside them. A handful of these are clearly meant to be leaf skills (port-daddy-marketing-copy, init, review), but most are just skills nobody wired up.

The cluster that surprised me most: a chunk of the academic bibliography is orphaned. Papers like klein-1998-sources-of-power, polya-problem-solving, mythical-man-month have no pairs-with in or out — even though their applied counterparts do.

6. The tag space is feral

1799 unique tags for 559 skills. ~3.2 tags per skill on average, with a heavy long tail of singletons. There's no controlled vocabulary, no synonym map (e.g. agent vs agents, orchestration vs orchestrator), no parent tags. This makes tag-based filtering on the website mostly useless.

Solution: a tags.yaml controlled vocabulary plus a normalizer that runs on ingestion. ~30–60 canonical tags, with synonym aliases. The 1799 → ~50 collapse will probably also reveal natural sub-categories.

7. The hubs are not who I'd have guessed

Top-degree nodes by pairs-with:

deg=25  web-design-expert
deg=21  task-decomposer
deg=20  devops-automator
deg=16  interview-loop-strategist
deg=15  ai-engineer
deg=14  interview-simulator
deg=13  color-theory-palette-harmony-expert
deg=13  technical-writer

web-design-expert is the most connected skill in the entire library. That's interesting — it's also a strong piece of evidence that the design sub-graph is the most internally consolidated, which matches the recent warm-palette migration project.

What's not there is striking: none of the WinDAGs DAG-framework meta-skills (dag-planner, dag-graph-builder, dag-orchestrator, windags-architect) are in the top 20. For a project where DAG composition is the marquee capability, that suggests the DAG family talks to itself but hasn't been wired into the rest of the catalog yet.

8. Tool2Vec cache: stale and the wrong dimensions

The cache at ~/.windags/tool2vec-embeddings.json:

  • Last updated 2026-04-29 — 6 days stale
  • Contains 547/559 skills (12 missing, including ideal-web-app-builder, ultimate-seo-geo-skill, port-daddy-marketing-copy, find-skills, imagegen, openai-docs, plugin-creator, skill-installer, ui-ux-pro-max, plus three port-daddy-related)
  • Embeddings are 1536-dimensional, but LocalEmbeddingService produces 384-dim all-MiniLM-L6-v2 vectors. Whoever built the cache used a different embedder (Voyage / OpenAI ada-style). New embeddings won't be comparable.

The tool2vec-refresher fleet agent (pd-fleet.yml, schedule 0 3 * * *) should be catching these. Either it's not firing or it's failing silently.

Background fleet agents — what to add

pd-fleet.yml has good bones. Here's what I'd add or fix:

AgentStatusWhat to do
tool2vec-refresherexists, broken or staleAdd a healthcheck note + alert. 12 skills missing, cache 6 days old.
Embedder dimension lockmissingPick one embedding service (recommend MiniLM 384-dim, free + local). Add a CI check that the cache dim matches LocalEmbeddingService.DIMENSIONS.
pairs-with-validatorexists, validates onlyExtend to propose reciprocations when A→B exists but B→A doesn't, write a .skill-suggestions/<skill>-add-pairs.md per skill.
skill-id-canarymissingFlag any skill with len(id) == 40 (truncation suspect). 22 cases right now.
near-dup-detectormissingRun on commit. Compute embedding, flag if cosine > 0.9 to anything in the catalog. Auto-comment on the commit asking "should this merge into X?".
category-normalizermissingMaintain skills/_taxonomy.yaml (canonical categories), reject SKILL.md frontmatter that uses a non-canonical category. Suggest the closest canonical when violated.
tag-normalizermissingSame idea but for tags. Maintain a synonym map and singularize/lowercase. Right now we have agent and agents and agent-systems as separate tags.
orphan-suggestermissingFor each orphan, propose 3 candidate pairs-with from semantic neighbors with cosine > 0.6. The skill author can accept or reject.
skill-cartographerexists, probably fineAlready counts skill growth — extend to report category-balance health (sparse/giant ratio over time).

Critical first move: wire the dim mismatch. Until then, half the recommender stack is operating on stale 1536-dim vectors and any local re-embedding produces 384-dim vectors that can't be compared. Pick MiniLM (it's already shipped with the local service) and rebuild.

What I want to do next (if you say yes)

  1. Build skills/_taxonomy.yaml with ~16 canonical categories.
  2. One-shot Haiku reclassification of all 39 uncategorized + 32 sparse-category skills.
  3. Merge the 4-skill knowledge-distillation cluster into one canonical skill.
  4. Fix the 22 truncated IDs (rename + leave forwarding aliases for back-compat).
  5. Wire the missing fleet agents above as YAML in pd-fleet.yml.
  6. Re-run this analysis monthly; track the orphan/sparse/reciprocity numbers as headline health metrics.

Reproducing this analysis

# 1. scan SKILL.md frontmatter
npx tsx scripts/skill-taxonomy/scan.ts

# 2. compute stats + similarity from cached tool2vec
npx tsx scripts/skill-taxonomy/analyze.ts

# 3. render markdown reports
npx tsx scripts/skill-taxonomy/render-mermaid.ts

# 4. render interactive HTML explorer
npx tsx scripts/skill-taxonomy/render-explorer.ts

# Open in Typora to see mermaid:
open docs/skill-taxonomy/00-overview.md

# Open the explorer:
open docs/skill-taxonomy/explorer.html