Linking pages
Linked pages
- [2305.18290] Direct Preference Optimization: Your Language Model is Secretly a Reward Model https://arxiv.org/abs/2305.18290 8 comments
- LMSys Chatbot Arena Leaderboard - a Hugging Face Space by lmsys https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard 3 comments
- Specifying objectives in RLHF - by Nathan Lambert https://www.interconnects.ai/p/specifying-objectives-in-rlhf 0 comments
- [2310.12036] A General Theoretical Paradigm to Understand Learning from Human Preferences https://arxiv.org/abs/2310.12036#deepmind 0 comments
- lightonai/alfred-40b-1023 · Hugging Face https://huggingface.co/lightonai/alfred-40b-1023 0 comments
- kyutai: open science AI lab http://kyutai.org/ 0 comments
- allenai/tulu-2-dpo-70b · Hugging Face https://huggingface.co/allenai/tulu-2-dpo-70b 0 comments
- [2306.05685] Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena https://arxiv.org/abs/2306.05685 0 comments
Related searches:
Search whole site: site:interconnects.ai
Search title: RLHF progress: Scaling DPO to 70B, DPO vs PPO update, Tülu 2, Zephyr-β, meaningful evaluation, data contamination
See how to search.