Start
multimodal-embedding-generator
multimodal-embedding-generator - Skill Dossier
multimodal-embedding-generator

multimodal-embedding-generator

Generate cross-modal embeddings with CLIP, SigLIP, and ImageBind for text-image-audio search. Activate on: multimodal search, text-to-image search, cross-modal embeddings, CLIP embeddings, visual search. NOT for: text-only embeddings (ai-engineer), image classification (computer-vision-pipeline).

AI & Machine Learning
#multimodal#embeddings#clip#cross-modal-search#siglip

Allowed Tools

ReadWriteEditBash(python:*pip:*npm:*npx:*)

Share this skill

Coming in Spring 2026 Beta

WinDAGs will match this skill automatically. Then ask:

"Use multimodal-embedding-generator to help me build..."
Request Early Access
"Use multimodal-embedding-generator to help me build a multimodal system"
"I need expert help with generate cross-modal embeddings with clip, siglip,..."
"Orchestrate multimodal-embedding-generator with clip-aware-embeddings for shared clip foundation for visual-semantic alignment"