GitHub - carlini/yet-another-applied-llm-benchmark: A benchmark to evaluate language models on questions I've previously asked them to solve. - discu.eu

Linking pages

You should have private evals – Jonathon Belotti [thundergolfer] https://thundergolfer.com/blog/private-evals 1 comment
Why you should write your own LLM benchmarks — with Nicholas Carlini, Google DeepMind https://www.latent.space/p/carlini 0 comments
Why you should maintain a personal LLM coding benchmark : ezyang’s blog https://blog.ezyang.com/2025/04/why-you-should-maintain-a-personal-llm-coding-benchmark/ 0 comments

Linked pages

Licenses - GNU Project - Free Software Foundation http://www.gnu.org/licenses/ 12 comments

Related searches:

Search whole site: site:github.com

Search title: GitHub - carlini/yet-another-applied-llm-benchmark: A benchmark to evaluate language models on questions I've previously asked them to solve.

See how to search.

Submit link to: