[2009.01325] Learning to summarize from human feedback - discu.eu

Hacker News

Learning to summarize from human feedback (2022) https://arxiv.org/abs/2009.01325 12 comments 4/3/2023

Linking pages

Understanding Large Language Models - by Sebastian Raschka https://magazine.sebastianraschka.com/p/understanding-large-language-models 53 comments
LLM Training: RLHF and Its Alternatives https://magazine.sebastianraschka.com/p/llm-training-rlhf-and-its-alternatives 14 comments
Reinforcement Learning as a fine-tuning paradigm | Ankesh Anand https://ankeshanand.com/blog/2022/01/08/rl-fine-tuning.html 8 comments
Simulators :: — Moire https://generative.ink/posts/simulators/ 7 comments
ChatGPT Decoded: An expert guide to mastering the technology and building domain-specific intelligent bots with GPT and reinforcement learning on AWS SageMaker | by Arun Shankar | Feb, 2023 | Medium https://medium.com/@shankar.arunp/chatgpt-decoded-an-expert-guide-to-mastering-the-technology-and-building-domain-specific-3a95b42827bb?sk=e025c40b1a15863f94c1a6105d089222&source=friends_link 7 comments
Unpacking the HF in RLHF - by Justin Cranshaw https://maestroai.substack.com/p/unpacking-the-hf-in-rlhf 3 comments
Reward Modeling for Large language models (with code) https://explodinggradients.com/reward-modeling-for-large-language-models-with-code 1 comment
Janus' GPT Wrangling - by Scott Alexander https://astralcodexten.substack.com/p/janus-gpt-wrangling 0 comments
Learning to Summarize with Human Feedback https://openai.com/blog/learning-to-summarize-with-human-feedback/ 0 comments
Mostly Helpful Econometrics. Why machine learning researchers should… | by Mikey Shulman | Kensho Blog https://blog.kensho.com/mostly-helpful-econometrics-729cc32e722 0 comments
Human Feedback Improves OpenAI Model Summarizations | Synced https://syncedreview.com/2020/09/15/human-feedback-improves-openai-model-summarizations/ 0 comments
OpenAI's latest GPT-3 model generates better and longer texts https://the-decoder.com/openais-latest-gpt-3-model-generates-better-and-longer-texts/ 0 comments
ChatGPT: The Latest and Greatest of Large Language Models from OpenAI [Examples and Resources] :: f3.al https://f3.al/chatgpt-definitive-resource/ 0 comments
Ahead of AI #6: TrAIn Differently - by Sebastian Raschka https://magazine.sebastianraschka.com/p/ahead-of-ai-6-train-differently 0 comments
Ahead of AI #6: TrAIn Differently - by Sebastian Raschka https://magazine.sebastianraschka.com/p/ahead-of-ai-6-train-differently?sd=pf 0 comments
GitHub - opendilab/awesome-RLHF: A curated list of reinforcement learning with human feedback resources (continually updated) https://github.com/opendilab/awesome-RLHF 0 comments
GitHub - Mooler0410/LLMsPracticalGuide: A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers) https://github.com/Mooler0410/LLMsPracticalGuide 0 comments
GitHub - RUCAIBox/LLMSurvey: The official GitHub page for the survey paper "A Survey of Large Language Models". https://github.com/RUCAIBox/LLMSurvey 0 comments
Truth https://compphil.github.io/truth/ 0 comments
How instruction-tuning can encourage hallucinations https://peterjliu.substack.com/p/how-instruction-tuning-can-encourage 0 comments

Related searches:

Search whole site: site:arxiv.org

Search title: [2009.01325] Learning to summarize from human feedback

See how to search.

Submit link to: