InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
Paper
• 2309.03895
• Published
• 15
ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and
Planning
Paper
• 2309.16650
• Published
• 10
CCEdit: Creative and Controllable Video Editing via Diffusion Models
Paper
• 2309.16496
• Published
• 9
FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling
Paper
• 2310.15169
• Published
• 10
Wonder3D: Single Image to 3D using Cross-Domain Diffusion
Paper
• 2310.15008
• Published
• 22
Matryoshka Diffusion Models
Paper
• 2310.15111
• Published
• 45
TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion
Models
Paper
• 2310.13772
• Published
• 7
HyperFields: Towards Zero-Shot Generation of NeRFs from Text
Paper
• 2310.17075
• Published
• 15
SE(3) Diffusion Model-based Point Cloud Registration for Robust 6D
Object Pose Estimation
Paper
• 2310.17359
• Published
• 1
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Paper
• 2310.17680
• Published
• 74
CustomNet: Zero-shot Object Customization with Variable-Viewpoints in
Text-to-Image Diffusion Models
Paper
• 2310.19784
• Published
• 10
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and
Prediction
Paper
• 2310.20700
• Published
• 10
Beyond U: Making Diffusion Models Faster & Lighter
Paper
• 2310.20092
• Published
• 12
Controllable Music Production with Diffusion Models and Guidance
Gradients
Paper
• 2311.00613
• Published
• 26
De-Diffusion Makes Text a Strong Cross-Modal Interface
Paper
• 2311.00618
• Published
• 23
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
Paper
• 2311.00945
• Published
• 16
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper
• 2311.10093
• Published
• 58
UFOGen: You Forward Once Large Scale Text-to-Image Generation via
Diffusion GANs
Paper
• 2311.09257
• Published
• 47
MagicDance: Realistic Human Dance Video Generation with Motions & Facial
Expressions Transfer
Paper
• 2311.12052
• Published
• 32
Diffusion Model Alignment Using Direct Preference Optimization
Paper
• 2311.12908
• Published
• 49
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
Paper
• 2312.03793
• Published
• 18
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
Paper
• 2312.03491
• Published
• 34
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive
Generation
Paper
• 2312.12491
• Published
• 75
Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model
Paper
• 2312.13252
• Published
• 27
InstructVideo: Instructing Video Diffusion Models with Human Feedback
Paper
• 2312.12490
• Published
• 19
DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for
Single Image Talking Face Generation
Paper
• 2312.13578
• Published
• 29
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models
and Adapters with Decoupled Consistency Learning
Paper
• 2402.00769
• Published
• 22
Magic-Me: Identity-Specific Video Customized Diffusion
Paper
• 2402.09368
• Published
• 31
MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models
Paper
• 2402.06178
• Published
• 14
FiT: Flexible Vision Transformer for Diffusion Model
Paper
• 2402.12376
• Published
• 48
Music Style Transfer with Time-Varying Inversion of Diffusion Models
Paper
• 2402.13763
• Published
• 11
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with
Audio2Video Diffusion Model under Weak Conditions
Paper
• 2402.17485
• Published
• 194
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion
Latent Aligners
Paper
• 2402.17723
• Published
• 16
Scalable Diffusion Models with Transformers
Paper
• 2212.09748
• Published
• 18
Paper
• 2403.03954
• Published
• 13
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Paper
• 2403.05135
• Published
• 45
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based
Semantic Control
Paper
• 2403.09055
• Published
• 26
Vivid-ZOO: Multi-View Video Generation with Diffusion Model
Paper
• 2406.08659
• Published
• 8
Note 研究者採用了擴散模型,將T2MVid生成問題分解為視角空間和時間組件,並利用預訓練的多視角圖像和2D視頻擴散模型層來確保視頻的多視角一致性和時間連續性。引入對齊模塊解決了由於2D和多視角數據之間的領域差異引起的層不兼容問題。此外,還貢獻了一個新的多視角視頻數據集。
GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors
Paper
• 2406.10111
• Published
• 6
Note 本文提出的GaussianSR方法通過引入2D生成先驗,並通過減少隨機性干擾來優化3DGS,成功實現了高品質的HRNVS,顯著超越了現有的最先進方法。這項研究為高解析度視角合成提供了一個新思路,具有重要的應用價值。
Alleviating Distortion in Image Generation via Multi-Resolution
Diffusion Models
Paper
• 2406.09416
• Published
• 29
Note 本文提出的DiMR和TD-LN方法有效地平衡了影像細節捕捉與計算複雜度,顯著減少了影像失真,並在ImageNet生成基準測試中展示出卓越的性能,為高保真影像生成設定了新的標杆。
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and
Video Generation
Paper
• 2406.07686
• Published
• 17
Note AV-DiT展示了一種高效的音視擴散變壓器架構,通過利用預訓練的圖像生成變壓器並進行輕量級的適配,實現了高質量的音視頻聯合生成。這不僅填補了現有方法的空白,還展示了多模態生成在降低計算成本和模型複雜度方面的潛力。
Repulsive Score Distillation for Diverse Sampling of Diffusion Models
Paper
• 2406.16683
• Published
• 4
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
Paper
• 2407.01392
• Published
• 44
Diff2Lip: Audio Conditioned Diffusion Models for Lip-Synchronization
Paper
• 2308.09716
• Published
• 2
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model
Paper
• 2407.16982
• Published
• 42
Diffusion Feedback Helps CLIP See Better
Paper
• 2407.20171
• Published
• 36
DC3DO: Diffusion Classifier for 3D Objects
Paper
• 2408.06693
• Published
• 11
3D Gaussian Editing with A Single Image
Paper
• 2408.07540
• Published
• 13
TurboEdit: Instant text-based image editing
Paper
• 2408.08332
• Published
• 20