Linking pages
- GPT-J-6B: 6B JAX-Based Transformer – Aran Komatsuzaki https://arankomatsuzaki.wordpress.com/2021/06/04/gpt-j/ 79 comments
- Transformers for software engineers - Made of Bugs https://blog.nelhage.com/post/transformers-for-software-engineers/ 20 comments
- Meta quietly releases Llama 2 Long AI model | VentureBeat https://venturebeat.com/ai/meta-quietly-releases-llama-2-long-ai-that-outperforms-gpt-3-5-and-claude-2-on-some-tasks/ 12 comments
- How to convert the SalesForce CodeGen models to GPT-J · GitHub https://gist.github.com/moyix/7896575befbe1b99162ccfec8d135566 3 comments
- Gradient Update #1: FBI Usage of Facial Recognition and Rotary Embeddings For Large LM's https://thegradientpub.substack.com/p/update-1-fbi-usage-of-facial-recognition 0 comments
- Transformer Taxonomy (the last lit review) | kipply's blog https://kipp.ly/blog/transformer-taxonomy/ 0 comments
- LLaMA-2 from the Ground Up - by Cameron R. Wolfe, Ph.D. https://cameronrwolfe.substack.com/p/llama-2-from-the-ground-up 0 comments
Linked pages
- [2005.14165] Language Models are Few-Shot Learners https://arxiv.org/abs/2005.14165 201 comments
- GitHub - kingoflolz/mesh-transformer-jax: Model parallel transformers in JAX and Haiku https://github.com/kingoflolz/mesh-transformer-jax 146 comments
- [1706.03762] Attention Is All You Need https://arxiv.org/abs/1706.03762 145 comments
- GitHub - EleutherAI/gpt-neo: An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library. https://github.com/EleutherAI/gpt-neo/ 127 comments
- [2101.00027] The Pile: An 800GB Dataset of Diverse Text for Language Modeling https://arxiv.org/abs/2101.00027 81 comments
- GitHub - EleutherAI/gpt-neox: An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library. https://github.com/EleutherAI/gpt-neox 67 comments
- GitHub - lucidrains/x-transformers: A simple but complete full-attention transformer with a set of promising experimental features from various papers https://github.com/lucidrains/x-transformers 37 comments
- [2104.09864] RoFormer: Enhanced Transformer with Rotary Position Embedding https://arxiv.org/abs/2104.09864 8 comments
- [2104.04473] Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM https://arxiv.org/abs/2104.04473 1 comment
Related searches:
Search whole site: site:blog.eleuther.ai
Search title: Rotary Embeddings: A Relative Revolution | EleutherAI Blog
See how to search.