Hacker News
Linking pages
- GitHub - sail-sg/understand-r1-zero: Understanding R1-Zero-Like Training: A Critical Perspective https://github.com/sail-sg/understand-r1-zero 21 comments
- Recent reasoning research: GRPO tweaks, base model RL, and data curation https://www.interconnects.ai/p/papers-im-reading-base-model-rl-grpo 0 comments
Related searches:
Search whole site: site:oatllm.notion.site
Search title: There May Not be Aha Moment in R1-Zero-like Training — A Pilot Study | Notion
See how to search.