Unique3D
Create a 1M faces 3D colored model from an image!
Create a 1M faces 3D colored model from an image!
Try PaliGemma on document understanding tasks
Generate custom audio clips from text prompts
Annotate and describe images with text prompts
Edit your video style using a text prompt and control maps
Video upscaler/restorer
Annotate video with object boxes and captions
Generate images from prompts or images
Generate summaries from YouTube videos or uploaded videos
Chat about images by uploading them
Build and run language models visually
Upscale and enhance images using tile ControlNet
In-browser speech recognition w/ word-level timestamps
High-fidelity Virtual Try-on
Video-to-Audio Generation with Hidden Alignment
Multimodal Image-to-Video
Transcribe audio in any language using text data
Generate images from text prompts
Aesthetically Controllable Text-Driven Stylization w/o Train
Generate lifelike video animations from images and audio
Try on clothes virtually with images
Generate enhanced images by blending foreground with custom backgrounds
Try on clothes on a person image
Text-to-Video
Generate text from images or videos
Transcribe speech and generate AI response
Convert image text to markdown format
Create professional ID photos with automatic background removal
Answer questions about any uploaded image
Travel through the model latent space
Create a video from an image with camera motion
Analyse any image with Llama3.2
Fill and edit images using masks
Convert PDFs to individual page images
Generate document search queries from a page image
Answer questions about uploaded images and documents
Transcribe audio or YouTube videos into text
Generate music from text descriptions
Generate audio‑ready scripts from your documents
Ultra-high resolution image synthesis
Generate and edit realistic audio from text prompts
VLMEvalKit Evaluation Results Collection
Generate personalized research profiles and chat with Arxiv Copilot
Run code snippets and get instant results
High-fidelity Virtual Try-on
Describe image contents with prompts
Visual Retrieval with ColPali and Vespa
Using RAG LLM to assist your academic writing
Generate new person images with swapped clothes or poses