- [D] Why Are Sinusoidal Functions Used for Position Encoding https://arxiv.org/abs/2104.09864 8 comments learnmachinelearning
Linking pages
- GitHub - kingoflolz/mesh-transformer-jax: Model parallel transformers in JAX and Haiku https://github.com/kingoflolz/mesh-transformer-jax 146 comments
- GitHub - xenova/transformers.js: State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server! https://github.com/xenova/transformers.js 55 comments
- The Transformer Family Version 2.0 | Lil'Log https://lilianweng.github.io/posts/2023-01-27-the-transformer-family-v2/ 46 comments
- GitHub - huggingface/transformers: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. https://github.com/huggingface/transformers 26 comments
- GitHub - mlabonne/llm-course: Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks. https://github.com/mlabonne/llm-course 10 comments
- Is GPT-3 still King? Introducing GPT-J-6B https://ooshimus.com/is-gpt-3-still-king-introducing-gpt-j-6b 4 comments
- Absolute Unit NNs: Regression-Based MLPs for Everything · Gwern.net https://gwern.net/aunn 3 comments
- Yi-34B, Llama 2, and common practices in LLM training: a fact check of the New York Times | EleutherAI Blog https://blog.eleuther.ai/nyt-yi-34b-response/ 3 comments
- Stability AI launches StableCode, an LLM for code generation | VentureBeat https://venturebeat.com/programming-development/stability-ai-launches-stablecode-an-llm-for-code-generation/ 2 comments
- Rotary Embeddings: A Relative Revolution | EleutherAI Blog https://blog.eleuther.ai/rotary-embeddings/ 1 comment
- Cerebras Makes It Easy to Harness the Predictive Power of GPT-J | Cerebras https://www.cerebras.net/blog/cerebras-makes-it-easy-to-harness-the-predictive-power-of-gpt-j 1 comment
- ALiBi FlashAttention - Speeding up ALiBi by 3-5x with a hardware-efficient implementation | Princeton Language and Intelligence https://pli.princeton.edu/blog/2024/alibi-flashattention-speeding-alibi-3-5x-hardware-efficient-implementation 1 comment
- Gradient Update #1: FBI Usage of Facial Recognition and Rotary Embeddings For Large LM's https://thegradientpub.substack.com/p/update-1-fbi-usage-of-facial-recognition 0 comments
- GitHub - huggingface/transformers: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. https://github.com/huggingface/pytorch-transformers 0 comments
- GitHub - huggingface/transformers: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. https://github.com/huggingface/pytorch-pretrained-BERT 0 comments
- GitHub - tomohideshibata/BERT-related-papers: BERT-related papers https://github.com/tomohideshibata/BERT-related-papers 0 comments
- GitHub - databrickslabs/dolly: Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform https://github.com/databrickslabs/dolly 0 comments
- Transformer Taxonomy (the last lit review) | kipply's blog https://kipp.ly/blog/transformer-taxonomy/ 0 comments
- Five years of progress in GPTs - by Finbarr Timbers https://finbarrtimbers.substack.com/p/five-years-of-progress-in-gpts 0 comments
- Transformer Deep Dive: Parameter Counting https://orenleung.com/transformer-parameter-counting 0 comments
Related searches:
Search whole site: site:arxiv.org
Search title: [2104.09864] RoFormer: Enhanced Transformer with Rotary Position Embedding
See how to search.