Diversity-Incentivized Exploration for Versatile Reasoning Paper • 2509.26209 • Published Sep 30, 2025 • 16
CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning Paper • 2509.20712 • Published Sep 25, 2025 • 19