DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation Paper • 2512.05112 • Published about 1 month ago • 11
Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation Paper • 2511.16671 • Published Nov 20, 2025 • 15
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark Paper • 2510.26802 • Published Oct 30, 2025 • 33
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT Paper • 2505.00703 • Published May 1, 2025 • 44
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency Paper • 2502.09621 • Published Feb 13, 2025 • 28
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step Paper • 2501.13926 • Published Jan 23, 2025 • 43
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines Paper • 2409.12959 • Published Sep 19, 2024 • 38
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following Paper • 2309.00615 • Published Sep 1, 2023 • 13
OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding Paper • 2305.10764 • Published May 18, 2023 • 6
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching Paper • 2404.03653 • Published Apr 4, 2024 • 35
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? Paper • 2403.14624 • Published Mar 21, 2024 • 53