SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning Paper • 2602.13515 • Published 11 days ago • 42
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning Paper • 2602.13515 • Published 11 days ago • 42
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning Paper • 2602.13515 • Published 11 days ago • 42
SLA2: Sparse-Linear Attention with Learnable Routing and QAT Paper • 2602.12675 • Published 12 days ago • 51
SLA2: Sparse-Linear Attention with Learnable Routing and QAT Paper • 2602.12675 • Published 12 days ago • 51
SLA2: Sparse-Linear Attention with Learnable Routing and QAT Paper • 2602.12675 • Published 12 days ago • 51 • 4
SLA2: Sparse-Linear Attention with Learnable Routing and QAT Paper • 2602.12675 • Published 12 days ago • 51
Geometry-Aware Rotary Position Embedding for Consistent Video World Model Paper • 2602.07854 • Published 17 days ago • 9
Geometry-Aware Rotary Position Embedding for Consistent Video World Model Paper • 2602.07854 • Published 17 days ago • 9
Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization Paper • 2602.02958 • Published 22 days ago • 33
Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization Paper • 2602.02958 • Published 22 days ago • 33
Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs Paper • 2601.17058 • Published Jan 22 • 188
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times Paper • 2512.16093 • Published Dec 18, 2025 • 95 • 7
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times Paper • 2512.16093 • Published Dec 18, 2025 • 95
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times Paper • 2512.16093 • Published Dec 18, 2025 • 95