UniWeTok: An Unified Binary Tokenizer with Codebook Size $\mathit{2^{128}}$ for Unified Multimodal Large Language Model Paper • 2602.14178 • Published 3 days ago • 10 • 2
Olaf-World: Orienting Latent Actions for Video World Modeling Paper • 2602.10104 • Published 8 days ago • 25 • 2
SAGE: Scalable Agentic 3D Scene Generation for Embodied AI Paper • 2602.10116 • Published 8 days ago • 7 • 2
DIFFA-2: A Practical Diffusion Large Language Model for General Audio Understanding Paper • 2601.23161 • Published 19 days ago • 10 • 3
OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution Paper • 2601.20380 • Published 21 days ago • 8 • 2
VINO: A Unified Visual Generator with Interleaved OmniModal Context Paper • 2601.02358 • Published Jan 5 • 29 • 3
Bridging Your Imagination with Audio-Video Generation via a Unified Director Paper • 2512.23222 • Published Dec 29, 2025 • 6 • 3
Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience Paper • 2512.17260 • Published Dec 19, 2025 • 52 • 3
VideoVLA: Video Generators Can Be Generalizable Robot Manipulators Paper • 2512.06963 • Published Dec 7, 2025 • 4 • 2
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling Paper • 2512.05343 • Published Dec 5, 2025 • 25 • 2