Hacker News
- Benchmarking LLM APIs – OpenAI, Cohere or Anthropic https://www.workorb.ai/blog/which-is-the-fastest-llm-a-comprehensive-benchmark 2 comments
- Are You Smarter Than An LLM? (Quiz based on the most popular LLM benchmark) https://d.erenrich.net/are-you-smarter-than-an-llm/index.html 9 comments
- Benchmarks and comparison of LLM AI models and API hosting providers https://artificialanalysis.ai 70 comments
- Structured Generation Improves LLM Performance: GSM8K Benchmark https://blog.dottxt.co/performance-gsm8k.html 4 comments
- Benchmarking NVIDIA's TensorRT-LLM https://jan.ai/post/benchmarking-nvidia-tensorrt-llm 8 comments nvidia
- Nvidia, Intel claim new LLM training speed records in new MLPerf 3.1 benchmark https://venturebeat.com/ai/nvidia-intel-claim-new-llm-training-speed-records-in-new-mlperf-3-1-benchmark/ 2 comments technology
- Nvidia, Intel claim new LLM training speed records in new MLPerf 3.1 benchmark https://venturebeat.com/ai/nvidia-intel-claim-new-llm-training-speed-records-in-new-mlperf-3-1-benchmark/ 57 comments hardware
- [R] Skeptical about LLM benchmarks telling the whole story? This paper shows how tiny tweaks to tests like MMLU can shuffle model rankings like a deck of cards. 🃏 https://arxiv.org/abs/2402.01781 12 comments machinelearning
- Intel's Gaudi2 Chip Is The Only Alternative To NVIDIA GPUs For LLM Training As Per MLPerf Benchmarks https://wccftech.com/intels-gaudi2-chip-is-the-only-alternative-to-nvidia-gpus-for-llm-training-as-per-mlperf-benchmarks/ 5 comments intel
- The Challenges of Building Effective LLM Benchmarks And The Future of LLM Evaluation https://codecompass00.substack.com/p/llm-evaluation-leaderboards?r=rcorn 0 comments compsci
- The Challenges of Building Effective LLM Benchmarks: A 5 minute deep-dive 🧠 https://codecompass00.substack.com/p/llm-evaluation-leaderboards 0 comments deeplearning