Reviewing Post-Training Techniques from Recent Open LLMs | Brian Fitzgerald - discu.eu

Hacker News

Reviewing Post-Training Techniques: DeepSeek, Qwen-2, and Phi-4 https://brianfitzgerald.xyz/dpo-review/ 0 comments 7/1/2025

Linked pages

Proximal Policy Optimization — Spinning Up documentation https://spinningup.openai.com/en/latest/algorithms/ppo.html 8 comments
[2402.03300] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models https://arxiv.org/abs/2402.03300 1 comment
https://arxiv.org/abs/2203.02155 0 comments
[2407.10671] Qwen2 Technical Report https://arxiv.org/abs/2407.10671 0 comments
[2407.21783] The Llama 3 Herd of Models https://arxiv.org/abs/2407.21783 0 comments
[2412.08905] Phi-4 Technical Report https://arxiv.org/abs/2412.08905 0 comments

Related searches:

Search whole site: site:brianfitzgerald.xyz

Search title: Reviewing Post-Training Techniques from Recent Open LLMs | Brian Fitzgerald

See how to search.

Submit link to: