daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently Paper • 2602.02619 • Published 16 days ago • 50
AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts Paper • 2601.11044 • Published Jan 16 • 34
DatasetResearch: Benchmarking Agent Systems for Demand-Driven Dataset Discovery Paper • 2508.06960 • Published Aug 9, 2025 • 1