[2305.18290] Direct Preference Optimization: Your Language Model is Secretly a Reward Model - discu.eu

Hacker News

Direct Preference Optimization: Your Language Model Is a Reward Model https://arxiv.org/abs/2305.18290 2 comments 12/1/2024

Reddit

[D] DPO Paper Potential Derivation Issue https://arxiv.org/abs/2305.18290 3 comments 17/1/2024 machinelearning
[R] Direct Preference Optimization: Your Language Model Is Secretly A Reward Model https://arxiv.org/abs/2305.18290 3 comments 5/9/2023 machinelearning

Linking pages

My AI Timelines Have Sped Up (Again) https://www.alexirpan.com/2024/01/10/ai-timelines-2024.html 95 comments
AI and Open Source in 2023 - by Sebastian Raschka, PhD https://magazine.sebastianraschka.com/p/ai-and-open-source-in-2023 67 comments
Aligning a LLM with Human Preferences - DataDreamer https://datadreamer.dev/docs/latest/pages/get_started/quick_tour/aligning.html 66 comments
Can LLMs invent better ways to train LLMs? https://sakana.ai/llm-squared/ 36 comments
Ahead of AI #11: New Foundation Models https://magazine.sebastianraschka.com/p/ahead-of-ai-11-new-foundation-models 34 comments
How RLHF actually works - by Nathan Lambert - Interconnects https://www.interconnects.ai/p/how-rlhf-works 32 comments
10 Noteworthy AI Research Papers of 2023 https://magazine.sebastianraschka.com/p/10-ai-research-papers-2023 24 comments
NLP Research in the Era of LLMs - by Sebastian Ruder https://nlpnewsletter.substack.com/p/nlp-research-in-the-era-of-llms 17 comments
Bringing LLM Fine-Tuning and RLHF to Everyone https://argilla.io/blog/argilla-for-llms/ 11 comments
GitHub - mlabonne/llm-course: Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks. https://github.com/mlabonne/llm-course 10 comments
GitHub - JUSTSUJAY/ML-Research-Papers https://github.com/JUSTSUJAY/ML-Research-Papers 10 comments
Ahead of AI #9: LLM Tuning & Dataset Perspectives https://magazine.sebastianraschka.com/p/ahead-of-ai-9-llm-tuning-and-dataset 4 comments
AI Research Highlights In 3 Sentences Or Less (May-June 2023) https://magazine.sebastianraschka.com/p/ai-research-highlights-in-3-sentences-2a1 3 comments
The New Kings of Open Source AI (Oct 2023 Recap) https://www.latent.space/p/oct-2023 3 comments
Fine Tuning LLMs - learnings from the DeepLearning SF Meetup https://www.anti-vc.com/p/fine-tuning-llms-learnings-from-the 2 comments
Tips for LLM Pretraining and Evaluating Reward Models https://sebastianraschka.com/blog/2024/research-papers-in-march-2024.html 1 comment
GitHub - eric-mitchell/direct-preference-optimization: Reference implementation for DPO (Direct Preference Optimization) https://github.com/eric-mitchell/direct-preference-optimization 0 comments
AI Research Highlights In 3 Sentences Or Less (June -July 2023) https://magazine.sebastianraschka.com/p/ai-research-highlights-in-3-sentences-738 0 comments
How instruction-tuning can encourage hallucinations https://peterjliu.substack.com/p/how-instruction-tuning-can-encourage 0 comments
Specifying objectives in RLHF - by Nathan Lambert https://www.interconnects.ai/p/specifying-objectives-in-rlhf 0 comments

Would you like to stay up to date with Computer science? Checkout Computer science Weekly.

Related searches:

Search whole site: site:arxiv.org

Search title: [2305.18290] Direct Preference Optimization: Your Language Model is Secretly a Reward Model

See how to search.

Submit link to: