Holistic Evaluation of Language Models (HELM) - discu.eu

Hacker News

LLM Leaderboard with explanations of what each score means https://crfm.stanford.edu/helm/lite/latest/#/leaderboard 3 comments 20/4/2024

Linking pages

Google's AI Will Help Decide Whether Unemployed Workers Get Benefits https://gizmodo.com/googles-ai-will-help-decide-whether-unemployed-workers-get-benefits-2000496215 18 comments
AI leaderboards are no longer useful. It's time to switch to Pareto curves. https://www.aisnakeoil.com/p/ai-leaderboards-are-no-longer-useful 14 comments
AI Evaluation Via An AI Led Turing Test (A Proposal) https://willthompson.name/ai-model-evaluation-via-ai-ab-testing 0 comments

Related searches:

Search whole site: site:crfm.stanford.edu

Search title: Holistic Evaluation of Language Models (HELM)

See how to search.

Submit link to: