Hacker News
- LLM Leaderboard with explanations of what each score means https://crfm.stanford.edu/helm/lite/latest/#/leaderboard 3 comments
Linking pages
- Google's AI Will Help Decide Whether Unemployed Workers Get Benefits https://gizmodo.com/googles-ai-will-help-decide-whether-unemployed-workers-get-benefits-2000496215 19 comments
- AI leaderboards are no longer useful. It's time to switch to Pareto curves. https://www.aisnakeoil.com/p/ai-leaderboards-are-no-longer-useful 14 comments
- AI Evaluation Via An AI Led Turing Test (A Proposal) https://willthompson.name/ai-model-evaluation-via-ai-ab-testing 0 comments
Related searches:
Search whole site: site:crfm.stanford.edu
Search title: Holistic Evaluation of Language Models (HELM)
See how to search.