Evaluating LLMs is a minefield - discu.eu

Newsletters
Mentions
Extension
Pricing
Login
Sign Up

Linking pages

Normcore LLM Reads · GitHub https://gist.github.com/veekaybee/be375ab33085102f9027853128dc5f0e 54 comments
AI leaderboards are no longer useful. It's time to switch to Pareto curves. https://www.aisnakeoil.com/p/ai-leaderboards-are-no-longer-useful 14 comments
New paper: AI agents that matter https://www.aisnakeoil.com/p/new-paper-ai-agents-that-matter 10 comments
Is AI progress slowing down? https://www.aisnakeoil.com/p/is-ai-progress-slowing-down 3 comments
Is AI progress slowing down? https://www.aisnakeoil.com/p/is-ai-progress-slowing-down?subscribe_prompt=free 3 comments
GitHub - JShollaj/awesome-llm-interpretability: A curated list of Large Language Model (LLM) Interpretability resources. https://github.com/JShollaj/awesome-llm-interpretability 1 comment
Evaluating LLMs is a minefield https://www.aisnakeoil.com/p/evaluating-llms-is-a-minefield 0 comments
Will AI transform law? https://www.aisnakeoil.com/p/will-ai-transform-law 0 comments

Related searches:

Search whole site: site:cs.princeton.edu

Search title: Evaluating LLMs is a minefield

See how to search.

Submit link to:

Hacker News
Reddit
Lobsters
Twitter
Mastodon

Features

Weekly newsletter
Social & bots
Browser extension
Bookmarklet
Search
Pricing

Developers

Mentions
API
Website

Advertisers

Ad price calculator

About

Privacy Policy
Terms
Support

Made by Alexandru Cojocaru