Recent reasoning research: GRPO tweaks, base model RL, and data curation - discu.eu

Linking pages

Data Science Weekly - Issue 593 https://datascienceweekly.substack.com/p/data-science-weekly-issue-593 0 comments

Linked pages

[2503.01307] Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs https://arxiv.org/abs/2503.01307 103 comments
There May Not be Aha Moment in R1-Zero-like Training — A Pilot Study | Notion https://oatllm.notion.site/oat-zero 8 comments
[2402.03300] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models https://arxiv.org/abs/2402.03300 2 comments
[2503.14476] DAPO: An Open-Source LLM Reinforcement Learning System at Scale https://arxiv.org/abs/2503.14476 2 comments
[2501.12599] Kimi k1.5: Scaling Reinforcement Learning with LLMs https://arxiv.org/abs/2501.12599 0 comments
DeepSeek R1's recipe to replicate o1 and the future of reasoning LMs https://www.interconnects.ai/p/deepseek-r1-recipe-for-o1 0 comments
[2503.04697] L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning https://arxiv.org/abs/2503.04697 0 comments

Related searches:

Search whole site: site:www.interconnects.ai

Search title: Recent reasoning research: GRPO tweaks, base model RL, and data curation

See how to search.

Submit link to: