LuxDiT: Lighting Estimation with Video Diffusion Transformer Paper • 2509.03680 • Published Sep 3, 2025 • 17
2D Gaussian Splatting with Semantic Alignment for Image Inpainting Paper • 2509.01964 • Published Sep 2, 2025 • 7
Lost in Embeddings: Information Loss in Vision-Language Models Paper • 2509.11986 • Published Sep 15, 2025 • 28
Look Again, Think Slowly: Enhancing Visual Reflection in Vision-Language Models Paper • 2509.12132 • Published Sep 15, 2025 • 6
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation Paper • 2510.08551 • Published Oct 9, 2025 • 33
BLIP3o-NEXT: Next Frontier of Native Image Generation Paper • 2510.15857 • Published Oct 17, 2025 • 24
LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal Paper • 2510.15868 • Published Oct 17, 2025 • 26
Accelerating Vision Transformers with Adaptive Patch Sizes Paper • 2510.18091 • Published Oct 20, 2025 • 6
HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives Paper • 2510.20822 • Published Oct 23, 2025 • 40
Unveiling Intrinsic Dimension of Texts: from Academic Abstract to Creative Story Paper • 2511.15210 • Published Nov 19, 2025 • 89
RefusalBench: Generative Evaluation of Selective Refusal in Grounded Language Models Paper • 2510.10390 • Published Oct 12, 2025 • 4
Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency Paper • 2311.02772 • Published Nov 5, 2023 • 8
Durian: Dual Reference-guided Portrait Animation with Attribute Transfer Paper • 2509.04434 • Published Sep 4, 2025 • 10
Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis Paper • 2509.09595 • Published Sep 11, 2025 • 48
Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation Paper • 2509.00428 • Published Aug 30, 2025 • 17
OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning Paper • 2509.01644 • Published Sep 1, 2025 • 33
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning Paper • 2509.08519 • Published Sep 10, 2025 • 128