GitHub - mlfoundations/evalchemy: Automatic evals for LLMs - discu.eu

Linking pages

GitHub - open-thoughts/open-thoughts: Fully open data curation for reasoning models https://github.com/open-thoughts/open-thoughts 1 comment
Scaling up Open Reasoning with OpenThinker-32B | Open Thoughts https://www.open-thoughts.ai/blog/scale 1 comment

Linked pages

OpenAI https://openai.com/ 137 comments
GitHub - bespokelabsai/curator: Synthetic data curation for post-training and structured data extraction https://github.com/bespokelabsai/curator 7 comments
GitHub - lmarena/arena-hard-auto: Arena-Hard-Auto: An automatic LLM benchmark. https://github.com/lmarena/arena-hard-auto 3 comments
GitHub - open-thoughts/open-thoughts: Fully open data curation for reasoning models https://github.com/open-thoughts/open-thoughts 1 comment
DataComp https://www.datacomp.ai/ 0 comments
GitHub - EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of autoregressive language models. https://github.com/EleutherAI/lm-evaluation-harness 0 comments
vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention | vLLM Blog https://blog.vllm.ai/2023/06/20/vllm.html 0 comments
[2401.03065] CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution https://arxiv.org/abs/2401.03065 0 comments
LiveBench https://livebench.ai/ 0 comments
BFCL V3 • Multi-Turn & Multi-Step Function Calling https://gorilla.cs.berkeley.edu/blogs/13_bfcl_v3_multi_turn.html 0 comments

Related searches:

Search whole site: site:github.com

Search title: GitHub - mlfoundations/evalchemy: Automatic evals for LLMs

See how to search.

Submit link to: