Linking pages
Linked pages
- Introducing the next generation of Claude \ Anthropic https://www.anthropic.com/news/claude-3-family 704 comments
- [1911.01547] On the Measure of Intelligence https://arxiv.org/abs/1911.01547 37 comments
- [2108.07732] Program Synthesis with Large Language Models https://arxiv.org/abs/2108.07732 25 comments
- [2107.03374] Evaluating Large Language Models Trained on Code https://arxiv.org/abs/2107.03374 8 comments
- [2109.07958] TruthfulQA: Measuring How Models Mimic Human Falsehoods https://arxiv.org/abs/2109.07958 7 comments
- https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf 3 comments
- [2009.03300] Measuring Massive Multitask Language Understanding https://arxiv.org/abs/2009.03300 0 comments
- [2103.03874] Measuring Mathematical Problem Solving With the MATH Dataset https://arxiv.org/abs/2103.03874 0 comments
- [2206.04615] Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models https://arxiv.org/abs/2206.04615 0 comments
- [1903.00161] DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs https://arxiv.org/abs/1903.00161 0 comments
- ARC/README.md at master · fchollet/ARC · GitHub https://github.com/fchollet/ARC/blob/master/README.md 0 comments
Related searches:
Search whole site: site:aisupremacy.substack.com
Search title: LLM Performance Benchmarks - Claude 3 Opus, GPT-4 and Gemini Ultra
See how to search.