Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings | LMSYS Org - discu.eu

Hacker News

Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings https://lmsys.org/blog/2023-05-03-arena/ 7 comments 3/5/2023

Linking pages

AI Canon | Andreessen Horowitz https://a16z.com/2023/05/25/ai-canon/ 219 comments
“The king is dead”—Claude 3 surpasses GPT-4 on Chatbot Arena for the first time | Ars Technica https://arstechnica.com/information-technology/2024/03/the-king-is-dead-claude-3-surpasses-gpt-4-on-chatbot-arena-for-the-first-time/ 63 comments
GitHub - lm-sys/FastChat: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena. https://github.com/lm-sys/FastChat 4 comments
Why Are Elo Ratings Everywhere Now? - The Atlantic https://www.theatlantic.com/technology/archive/2024/04/elo-ratings-are-everywhere/678129/ 1 comment
Fine-tuning a Large Language Model using Metaflow, featuring LLaMA and LoRA | Outerbounds https://outerbounds.com/blog/llm-tuning-metaflow/ 0 comments
Truth https://compphil.github.io/truth/ 0 comments
GitHub - MLGroupJLU/LLM-eval-survey: The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models". https://github.com/MLGroupJLU/LLM-eval-survey 0 comments
Speculations on Building Superintelligence https://blog.sshh.io/p/speculations-on-building-superintelligence 0 comments
AI Pseudo Intelligence, brilliance without a brain? https://www.mindprison.cc/p/ai-pseudo-intelligence-brilliance 0 comments

Related searches:

Search whole site: site:lmsys.org

Search title: Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings | LMSYS Org

See how to search.

Submit link to: