LLMs have gained outstanding reasoning capabilities through reinforcement learning (RL) on correctness rewards. Modern RL algorithms for LLMs, including GRPO,…
Sampling from complex probability distributions is important in many fields, including statistical modeling, machine learning, and physics. This involves generating…