Hacker News
- My benchmark for large language models https://nicholas.carlini.com/writing/2024/my-benchmark-for-large-language-models.html 2 comments
- Benchmarking Large Language Models for Handwritten Text Recognition https://arxiv.org/abs/2503.15195 0 comments
- Stanford benchmarks and compares numerous Large Language Models https://crfm.stanford.edu/helm/latest/?group=core_scenarios 10 comments
- HellaSwag: 36% of this popular large language model benchmark contains errors https://www.surgehq.ai/blog/hellaswag-or-hellabad-36-of-this-popular-llm-benchmark-contains-errors 8 comments
- Benchmarking Large Language Models on NVIDIA H100 GPUs with CoreWeave (Part 1) https://www.mosaicml.com/blog/coreweave-nvidia-h100-part-1 15 comments nvidia
- [R] Large Language Models trained on code reason better, even on benchmarks that have nothing to do with code https://arxiv.org/abs/2210.07128 50 comments machinelearning