Hacker News
- My benchmark for large language models https://nicholas.carlini.com/writing/2024/my-benchmark-for-large-language-models.html 2 comments
- Stanford benchmarks and compares numerous Large Language Models https://crfm.stanford.edu/helm/latest/?group=core_scenarios 10 comments
- HellaSwag: 36% of this popular large language model benchmark contains errors https://www.surgehq.ai/blog/hellaswag-or-hellabad-36-of-this-popular-llm-benchmark-contains-errors 8 comments
- OpenAI Releases HealthBench: An Open-Source Benchmark for Measuring the Performance and Safety of Large Language Models in Healthcare https://www.marktechpost.com/2025/05/12/openai-releases-healthbench-an-open-source-benchmark-for-measuring-the-performance-and-safety-of-large-language-models-in-healthcare/ 0 comments machinelearningnews
- Benchmarking Large Language Models on NVIDIA H100 GPUs with CoreWeave (Part 1) https://www.mosaicml.com/blog/coreweave-nvidia-h100-part-1 15 comments nvidia
- [R] Large Language Models trained on code reason better, even on benchmarks that have nothing to do with code https://arxiv.org/abs/2210.07128 50 comments machinelearning