Hacker News
- Recent reasoning research: GRPO tweaks, base model RL and data curation https://www.interconnects.ai/p/papers-im-reading-base-model-rl-grpo 0 comments
Linked pages
- [2503.01307] Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs https://arxiv.org/abs/2503.01307 103 comments
- There May Not be Aha Moment in R1-Zero-like Training — A Pilot Study | Notion https://oatllm.notion.site/oat-zero 8 comments
- [2402.03300] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models https://arxiv.org/abs/2402.03300 2 comments
- [2503.14476] DAPO: An Open-Source LLM Reinforcement Learning System at Scale https://arxiv.org/abs/2503.14476 2 comments
- [2501.12599] Kimi k1.5: Scaling Reinforcement Learning with LLMs https://arxiv.org/abs/2501.12599 0 comments
- DeepSeek R1's recipe to replicate o1 and the future of reasoning LMs https://www.interconnects.ai/p/deepseek-r1-recipe-for-o1 0 comments
- [2503.04697] L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning https://arxiv.org/abs/2503.04697 0 comments
Related searches:
Search whole site: site:www.interconnects.ai
Search title: Recent reasoning research: GRPO tweaks, base model RL, and data curation
See how to search.