Linking pages
- A Picture is Worth 170 Tokens: How Does GPT-4o Encode Images? - OranLooney.com https://www.oranlooney.com/post/gpt-cnn/ 112 comments
- GPT-J-6B: 6B JAX-Based Transformer – Aran Komatsuzaki https://arankomatsuzaki.wordpress.com/2021/06/04/gpt-j/ 79 comments
- You could have designed state of the art positional encoding https://fleetwood.dev/posts/you-could-have-designed-SOTA-positional-encoding 46 comments
- Transformers for software engineers - Made of Bugs https://blog.nelhage.com/post/transformers-for-software-engineers/ 20 comments
- Meta quietly releases Llama 2 Long AI model | VentureBeat https://venturebeat.com/ai/meta-quietly-releases-llama-2-long-ai-that-outperforms-gpt-3-5-and-claude-2-on-some-tasks/ 12 comments
- How to convert the SalesForce CodeGen models to GPT-J · GitHub https://gist.github.com/moyix/7896575befbe1b99162ccfec8d135566 3 comments
- GitHub - Const-me/Cgml: GPU-targeted vendor-agnostic AI library for Windows, and Mistral model implementation. https://github.com/Const-me/Cgml 1 comment
- How to train a Million Context LLM — with Mark Huang of Gradient.ai https://www.latent.space/p/gradient 1 comment
- Gradient Update #1: FBI Usage of Facial Recognition and Rotary Embeddings For Large LM's https://thegradientpub.substack.com/p/update-1-fbi-usage-of-facial-recognition 0 comments
- Transformer Taxonomy (the last lit review) | kipply's blog https://kipp.ly/blog/transformer-taxonomy/ 0 comments
- LLaMA-2 from the Ground Up - by Cameron R. Wolfe, Ph.D. https://cameronrwolfe.substack.com/p/llama-2-from-the-ground-up 0 comments
- Dolma, OLMo, and the Future of Open-Source LLMs https://cameronrwolfe.substack.com/p/dolma-olmo-and-the-future-of-open 0 comments
- GitHub - likejazz/llama3.np: llama3.np is pure NumPy implementation for Llama 3 model. https://github.com/likejazz/llama3.np 0 comments
Linked pages
- [2005.14165] Language Models are Few-Shot Learners https://arxiv.org/abs/2005.14165 201 comments
- GitHub - kingoflolz/mesh-transformer-jax: Model parallel transformers in JAX and Haiku https://github.com/kingoflolz/mesh-transformer-jax 146 comments
- [1706.03762] Attention Is All You Need https://arxiv.org/abs/1706.03762 145 comments
- GitHub - EleutherAI/gpt-neo: An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library. https://github.com/EleutherAI/gpt-neo/ 127 comments
- [2101.00027] The Pile: An 800GB Dataset of Diverse Text for Language Modeling https://arxiv.org/abs/2101.00027 81 comments
- GitHub - EleutherAI/gpt-neox: An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library. https://github.com/EleutherAI/gpt-neox 67 comments
- GitHub - lucidrains/x-transformers: A simple but complete full-attention transformer with a set of promising experimental features from various papers https://github.com/lucidrains/x-transformers 40 comments
- [2104.09864] RoFormer: Enhanced Transformer with Rotary Position Embedding https://arxiv.org/abs/2104.09864 8 comments
- [2104.04473] Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM https://arxiv.org/abs/2104.04473 1 comment
- [1910.10683] Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer https://arxiv.org/abs/1910.10683 1 comment
Related searches:
Search whole site: site:blog.eleuther.ai
Search title: Rotary Embeddings: A Relative Revolution | EleutherAI Blog
See how to search.