discu
Newsletters
Mentions
Extension
Pricing
Login
Sign Up
Reddit
[R] Literally recreated Mathematical reasoning and Deepseek’s aha moment in less than 10$ via end to end Simple Reinforcement Learning
https://medium.com/@rjusnba/overnight-end-to-end-rl-training-a-3b-model-on-a-grade-school-math-dataset-leads-to-reasoning-df61410c04c6
8 comments
20/2/2025
machinelearning