Collections for the paper "Language Models Can Learn from Verbal Feedback Without Scalar Rewards" (https://arxiv.org/pdf/2509.22638)
-
Renjie-Ranger/FCP_big_math_pro_C-plus_no_concise
Viewer • Updated • 185k • 10 -
Renjie-Ranger/FCP_general_reasoner_pro_C-plus_no_concise
Viewer • Updated • 133k • 8 -
Renjie-Ranger/FCP_general_reasoner_pro_SFT
Viewer • Updated • 272k • 6 -
Renjie-Ranger/FCP_big_math_pro_SFT
Viewer • Updated • 384k • 19 • 1