- [R] TruthfulQA: Measuring How Models Mimic Human Falsehoods https://arxiv.org/abs/2109.07958 7 comments machinelearning
Linking pages
- AI now surpasses humans in almost all performance benchmarks https://newatlas.com/technology/ai-index-report-global-impact/ 741 comments
- Lessons from the GPT-4Chan Controversy https://thegradient.pub/gpt-4chan-lessons/ 28 comments
- AI scientists are studying the “emergent” abilities of large language models – TechTalks https://bdtechtalks.com/2022/08/22/llm-emergent-abilities/ 10 comments
- WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing https://openai.com/blog/webgpt/ 7 comments
- WebGPT: Improving the Factual Accuracy of Language Models through Web Browsing https://openai.com/blog/improving-factual-accuracy/ 5 comments
- The New Version of GPT-3 Is Much, Much Better | by Alberto Romero | Towards Data Science https://towardsdatascience.com/the-new-version-of-gpt-3-is-much-much-better-53ac95f21cfb?sk=07d828c58539ae273327d434514f14c3 3 comments
- GitHub - inverse-scaling/prize: A prize for finding tasks that cause large language models to show inverse scaling https://github.com/inverse-scaling/prize 1 comment
- The Perils of Using Quotations to Authenticate NLG Content - Unite.AI https://www.unite.ai/the-perils-of-using-quotations-to-authenticate-nlg-content/ 0 comments
- AI Lies, Privacy, & OpenAI | Drew Breunig https://www.dbreunig.com/2023/04/10/the-privacy-question-and-open-ai.html 0 comments
- Google Might Have a Moat - by Will Seltzer - Intuitive AI https://intuitiveai.substack.com/p/google-might-have-a-moat 0 comments
- LLM Performance Benchmarks - Claude 3 Opus, GPT-4 and Gemini Ultra https://aisupremacy.substack.com/p/llm-performance-benchmarks-claude 0 comments
- GitHub - elicit/machine-learning-list https://github.com/elicit/machine-learning-list 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:arxiv.org
Search title: [2109.07958] TruthfulQA: Measuring How Models Mimic Human Falsehoods
See how to search.