Hacker News
- Yi-34B, Llama 2, and common practices in LLM training https://blog.eleuther.ai/nyt-yi-34b-response/ 3 comments
Linking pages
Linked pages
- [2005.14165] Language Models are Few-Shot Learners https://arxiv.org/abs/2005.14165 201 comments
- [1706.03762] Attention Is All You Need https://arxiv.org/abs/1706.03762 145 comments
- [2403.04652] Yi: Open Foundation Models by 01.AI https://arxiv.org/abs/2403.04652 81 comments
- China’s Rush to Dominate A.I. Comes With a Twist: It Depends on U.S. Technology - The New York Times https://www.nytimes.com/2024/02/21/technology/china-united-states-artificial-intelligence.html 69 comments
- [2104.09864] RoFormer: Enhanced Transformer with Rotary Position Embedding https://arxiv.org/abs/2104.09864 8 comments
- 01-ai/Yi-34B · llama-compatibility https://huggingface.co/01-ai/Yi-34B/discussions/11 7 comments
- https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf 1 comment
- [1911.02150] Fast Transformer Decoding: One Write-Head is All You Need https://arxiv.org/abs/1911.02150 1 comment
- [2204.02311] PaLM: Scaling Language Modeling with Pathways https://arxiv.org/abs/2204.02311 0 comments
- https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf 0 comments
- [2307.09288] Llama 2: Open Foundation and Fine-Tuned Chat Models https://arxiv.org/abs/2307.09288 0 comments
- [2305.13245] GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints https://arxiv.org/abs/2305.13245 0 comments
Related searches:
Search whole site: site:blog.eleuther.ai
Search title: Yi-34B, Llama 2, and common practices in LLM training: a fact check of the New York Times | EleutherAI Blog
See how to search.