Hacker News
- Understanding R1-Zero-Like Training: A Critical Perspective https://github.com/sail-sg/understand-r1-zero 21 comments
Linking pages
- Sea AI Lab Researchers Introduce Dr. GRPO: A Bias-Free Reinforcement Learning Method that Enhances Math Reasoning Accuracy in Large Language Models Without Inflating Responses - MarkTechPost https://www.marktechpost.com/2025/03/22/sea-ai-lab-researchers-introduce-dr-grpo-a-bias-free-reinforcement-learning-method-that-enhances-math-reasoning-accuracy-in-large-language-models-without-inflating-responses/ 1 comment
Linked pages
- There May Not be Aha Moment in R1-Zero-like Training — A Pilot Study | Notion https://oatllm.notion.site/oat-zero 8 comments
- GitHub - microsoft/DeepSpeed: DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective. https://github.com/microsoft/DeepSpeed 1 comment
- GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs https://github.com/vllm-project/vllm 0 comments
- deepseek-ai/DeepSeek-V3-Base · Hugging Face https://huggingface.co/deepseek-ai/DeepSeek-V3-Base 0 comments
Related searches:
Search whole site: site:github.com
Search title: GitHub - sail-sg/understand-r1-zero: Understanding R1-Zero-Like Training: A Critical Perspective
See how to search.