Segment images into objects, parts, or scenes
Generate depth maps and 3D views from your photos
Annotate and describe images with text prompts
a tiny vision language model