Linking pages
- HellaSwag or HellaBad? 36% of this popular LLM benchmark contains errors https://www.surgehq.ai/blog/hellaswag-or-hellabad-36-of-this-popular-llm-benchmark-contains-errors 14 comments
- Foundation Models: The future (still) isn't happening fast enough https://www.madrona.com/foundation-models/ 1 comment
- Three Ideas for Regulating Generative AI https://aisnakeoil.substack.com/p/three-ideas-for-regulating-generative 1 comment
- Putting the human touch on LLMs - Molly Welch's Newsletter https://mewelch.substack.com/p/putting-the-human-touch-on-llms 0 comments
- Supporting benchmarks for AI safety with MLCommons – Google Research Blog http://blog.research.google/2023/10/supporting-benchmarks-for-ai-safety.html 0 comments
Related searches:
Search whole site: site:crfm.stanford.edu
Search title: Stanford CRFM
See how to search.