Microsoft AI Introduces rStar2-Agent: A 14B Math Reasoning Model Trained with Agentic Reinforcement Learning to Achieve Frontier-Level Performance
Table of contents The Problem with “Thinking Longer” The Agentic Approach Infrastructure Challenges and Solutions GRPO-RoC: Learning from High-Quality Examples…