Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities Paper • 2505.15692 • Published May 21, 2025 • 14
s3: You Don't Need That Much Data to Train a Search Agent via RL Paper • 2505.14146 • Published May 20, 2025 • 19