Hacker News
- Direct Preference Optimization: Your Language Model Is a Reward Model https://arxiv.org/abs/2305.18290 2 comments
- [D] DPO Paper Potential Derivation Issue https://arxiv.org/abs/2305.18290 3 comments machinelearning
- [R] Direct Preference Optimization: Your Language Model Is Secretly A Reward Model https://arxiv.org/abs/2305.18290 3 comments machinelearning
Linking pages
- My AI Timelines Have Sped Up (Again) https://www.alexirpan.com/2024/01/10/ai-timelines-2024.html 95 comments
- AI and Open Source in 2023 - by Sebastian Raschka, PhD https://magazine.sebastianraschka.com/p/ai-and-open-source-in-2023 67 comments
- Aligning a LLM with Human Preferences - DataDreamer https://datadreamer.dev/docs/latest/pages/get_started/quick_tour/aligning.html 66 comments
- Can LLMs invent better ways to train LLMs? https://sakana.ai/llm-squared/ 36 comments
- Ahead of AI #11: New Foundation Models https://magazine.sebastianraschka.com/p/ahead-of-ai-11-new-foundation-models 34 comments
- How RLHF actually works - by Nathan Lambert - Interconnects https://www.interconnects.ai/p/how-rlhf-works 32 comments
- 10 Noteworthy AI Research Papers of 2023 https://magazine.sebastianraschka.com/p/10-ai-research-papers-2023 24 comments
- NLP Research in the Era of LLMs - by Sebastian Ruder https://nlpnewsletter.substack.com/p/nlp-research-in-the-era-of-llms 17 comments
- Bringing LLM Fine-Tuning and RLHF to Everyone https://argilla.io/blog/argilla-for-llms/ 11 comments
- GitHub - mlabonne/llm-course: Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks. https://github.com/mlabonne/llm-course 10 comments
- GitHub - JUSTSUJAY/ML-Research-Papers https://github.com/JUSTSUJAY/ML-Research-Papers 10 comments
- Ahead of AI #9: LLM Tuning & Dataset Perspectives https://magazine.sebastianraschka.com/p/ahead-of-ai-9-llm-tuning-and-dataset 4 comments
- AI Research Highlights In 3 Sentences Or Less (May-June 2023) https://magazine.sebastianraschka.com/p/ai-research-highlights-in-3-sentences-2a1 3 comments
- The New Kings of Open Source AI (Oct 2023 Recap) https://www.latent.space/p/oct-2023 3 comments
- Fine Tuning LLMs - learnings from the DeepLearning SF Meetup https://www.anti-vc.com/p/fine-tuning-llms-learnings-from-the 2 comments
- Tips for LLM Pretraining and Evaluating Reward Models https://sebastianraschka.com/blog/2024/research-papers-in-march-2024.html 1 comment
- GitHub - eric-mitchell/direct-preference-optimization: Reference implementation for DPO (Direct Preference Optimization) https://github.com/eric-mitchell/direct-preference-optimization 0 comments
- AI Research Highlights In 3 Sentences Or Less (June -July 2023) https://magazine.sebastianraschka.com/p/ai-research-highlights-in-3-sentences-738 0 comments
- How instruction-tuning can encourage hallucinations https://peterjliu.substack.com/p/how-instruction-tuning-can-encourage 0 comments
- Specifying objectives in RLHF - by Nathan Lambert https://www.interconnects.ai/p/specifying-objectives-in-rlhf 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:arxiv.org
Search title: [2305.18290] Direct Preference Optimization: Your Language Model is Secretly a Reward Model
See how to search.