Evaluating Gemini Robotics Policies in a Veo World Simulator Paper • 2512.10675 • Published 14 days ago • 16
AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning Paper • 2511.19304 • Published Nov 24 • 89
VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models Paper • 2511.11007 • Published Nov 14 • 15
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks Paper • 2511.15065 • Published Nov 19 • 74
P1: Mastering Physics Olympiads with Reinforcement Learning Paper • 2511.13612 • Published Nov 17 • 133
InteractComp: Evaluating Search Agents With Ambiguous Queries Paper • 2510.24668 • Published Oct 28 • 97
A Survey of Data Agents: Emerging Paradigm or Overstated Hype? Paper • 2510.23587 • Published Oct 27 • 65
ReCode: Unify Plan and Action for Universal Granularity Control Paper • 2510.23564 • Published Oct 27 • 121
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning Paper • 2509.03646 • Published Sep 3 • 32