Start
multimodal-embedding-generator
multimodal-embedding-generator - Skill Dossier

multimodal-embedding-generator
Generate cross-modal embeddings with CLIP, SigLIP, and ImageBind for text-image-audio search. Activate on: multimodal search, text-to-image search, cross-modal embeddings, CLIP embeddings, visual search. NOT for: text-only embeddings (ai-engineer), image classification (computer-vision-pipeline).
AI & Machine Learning
#multimodal#embeddings#clip#cross-modal-search#siglip
Allowed Tools
ReadWriteEditBash(python:*pip:*npm:*npx:*)
⚡
Coming in Spring 2026 Beta
WinDAGs will match this skill automatically. Then ask:
"Use multimodal-embedding-generator to help me build..."
Request Early Access