arxiv:2601.18207

PaperSearchQA: Learning to Search and Reason over Scientific Papers with RLVR

Published on Jan 26

· Submitted by

James Burgess on Feb 5

Upvote

Authors:

James Burgess ,

Duo Peng ,

Yuhui Zhang ,

Abstract

Search agents trained on scientific paper corpora demonstrate advanced reasoning capabilities for technical question-answering tasks, outperforming traditional retrieval methods through reinforcement learning with verifiable rewards.

AI-generated summary

Search agents are language models (LMs) that reason and search knowledge bases (or the web) to answer questions; recent methods supervise only the final answer accuracy using reinforcement learning with verifiable rewards (RLVR). Most RLVR search agents tackle general-domain QA, which limits their relevance to technical AI systems in science, engineering, and medicine. In this work we propose training agents to search and reason over scientific papers -- this tests technical question-answering, it is directly relevant to real scientists, and the capabilities will be crucial to future AI Scientist systems. Concretely, we release a search corpus of 16 million biomedical paper abstracts and construct a challenging factoid QA dataset called PaperSearchQA with 60k samples answerable from the corpus, along with benchmarks. We train search agents in this environment to outperform non-RL retrieval baselines; we also perform further quantitative analysis and observe interesting agent behaviors like planning, reasoning, and self-verification. Our corpus, datasets, and benchmarks are usable with the popular Search-R1 codebase for RLVR training and released on https://huggingface.co/collections/jmhb/papersearchqa. Finally, our data creation methods are scalable and easily extendable to other scientific domains.