view article Article How We Use Claude Code Skills to Run 1,000+ ML Experiments a Day 20 days ago • 46
Flash Sparse Attention: An Alternative Efficient Implementation of Native Sparse Attention Kernel Paper • 2508.18224 • Published Aug 25 • 1
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation Paper • 2511.09611 • Published Nov 12 • 68