Hacker News
- Training and aligning LLMs with RLHF and RLHF alternatives https://magazine.sebastianraschka.com/p/llm-training-rlhf-and-its-alternatives 14 comments
Linking pages
- AI and Open Source in 2023 - by Sebastian Raschka, PhD https://magazine.sebastianraschka.com/p/ai-and-open-source-in-2023 67 comments
- Finetuning LLMs with LoRA and QLoRA: Insights from Hundreds of Experiments - Lightning AI https://lightning.ai/pages/community/lora-insights/ 39 comments
- Practical Tips for Finetuning LLMs Using LoRA (Low-Rank Adaptation) https://magazine.sebastianraschka.com/p/practical-tips-for-finetuning-llms 37 comments
- Optimizing LLMs From a Dataset Perspective https://sebastianraschka.com/blog/2023/optimizing-LLMs-dataset-perspective.html 24 comments
- 10 Noteworthy AI Research Papers of 2023 https://magazine.sebastianraschka.com/p/10-ai-research-papers-2023 24 comments
- Ahead of AI #12: LLM Businesses and Busyness https://magazine.sebastianraschka.com/p/ahead-of-ai-12-llm-businesses 0 comments
- Research Papers (October 2023) - by Sebastian Raschka, PhD https://magazine.sebastianraschka.com/p/research-papers-october-2023 0 comments
- Research Papers in November 2023 https://magazine.sebastianraschka.com/p/research-papers-in-november-2023 0 comments
- Research Papers in January 2024 - by Sebastian Raschka, PhD https://magazine.sebastianraschka.com/p/research-papers-in-january-2024 0 comments
- Foundation models, internet-scale data, and the path to generalized robots - Luke Holoubek https://lukasholoubek.com/foundation-models-internet-scale-data-path-to-generalized-robots/ 0 comments
- Top 5 AI Substacks to Follow in 2024 https://aiauthority.dev/5-ai-dedicated-substacks-you-must-follow-in-2024 0 comments
- Tips for LLM Pretraining and Evaluating Reward Models https://magazine.sebastianraschka.com/p/tips-for-llm-pretraining-and-evaluating-rms 0 comments
Linked pages
- [2009.01325] Learning to summarize from human feedback https://arxiv.org/abs/2009.01325 12 comments
- https://arxiv.org/abs/1602.01783 7 comments
- [1909.08593] Fine-Tuning Language Models from Human Preferences https://arxiv.org/abs/1909.08593 5 comments
- https://arxiv.org/abs/1707.06347 3 comments
- Understanding Encoder And Decoder LLMs https://magazine.sebastianraschka.com/p/understanding-encoder-and-decoder 2 comments
- [2204.05862] Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback https://arxiv.org/abs/2204.05862 1 comment
- [2308.08998] Reinforced Self-Training (ReST) for Language Modeling https://arxiv.org/abs/2308.08998 1 comment
- https://arxiv.org/abs/2203.02155 0 comments
- [2307.09288] Llama 2: Open Foundation and Fine-Tuned Chat Models https://arxiv.org/abs/2307.09288 0 comments
- [2309.00267] RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback https://arxiv.org/abs/2309.00267 0 comments
Related searches:
Search whole site: site:magazine.sebastianraschka.com
Search title: LLM Training: RLHF and Its Alternatives
See how to search.