Hacker News
- Reviewing Post-Training Techniques: DeepSeek, Qwen-2, and Phi-4 https://brianfitzgerald.xyz/dpo-review/ 0 comments
Linked pages
- Proximal Policy Optimization — Spinning Up documentation https://spinningup.openai.com/en/latest/algorithms/ppo.html 8 comments
- [2402.03300] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models https://arxiv.org/abs/2402.03300 1 comment
- https://arxiv.org/abs/2203.02155 0 comments
- [2407.10671] Qwen2 Technical Report https://arxiv.org/abs/2407.10671 0 comments
- [2407.21783] The Llama 3 Herd of Models https://arxiv.org/abs/2407.21783 0 comments
- [2412.08905] Phi-4 Technical Report https://arxiv.org/abs/2412.08905 0 comments
Related searches:
Search whole site: site:brianfitzgerald.xyz
Search title: Reviewing Post-Training Techniques from Recent Open LLMs | Brian Fitzgerald
See how to search.