A Survey on Inference Optimization Techniques for Mixture of Experts Models Paper • 2412.14219 • Published Dec 18, 2024
HOBBIT: A Mixed Precision Expert Offloading System for Fast MoE Inference Paper • 2411.01433 • Published Nov 3, 2024 • 1