[2104.09864] RoFormer: Enhanced Transformer with Rotary Position Embedding - discu.eu

Reddit

[D] Why Are Sinusoidal Functions Used for Position Encoding https://arxiv.org/abs/2104.09864 8 comments 10/4/2023 learnmachinelearning

Linking pages

GitHub - kingoflolz/mesh-transformer-jax: Model parallel transformers in JAX and Haiku https://github.com/kingoflolz/mesh-transformer-jax 146 comments
GitHub - xenova/transformers.js: State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server! https://github.com/xenova/transformers.js 55 comments
The Transformer Family Version 2.0 | Lil'Log https://lilianweng.github.io/posts/2023-01-27-the-transformer-family-v2/ 46 comments
GitHub - huggingface/transformers: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. https://github.com/huggingface/transformers 26 comments
GitHub - mlabonne/llm-course: Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks. https://github.com/mlabonne/llm-course 10 comments
Is GPT-3 still King? Introducing GPT-J-6B https://ooshimus.com/is-gpt-3-still-king-introducing-gpt-j-6b 4 comments
Absolute Unit NNs: Regression-Based MLPs for Everything · Gwern.net https://gwern.net/aunn 3 comments
Yi-34B, Llama 2, and common practices in LLM training: a fact check of the New York Times | EleutherAI Blog https://blog.eleuther.ai/nyt-yi-34b-response/ 3 comments
Stability AI launches StableCode, an LLM for code generation | VentureBeat https://venturebeat.com/programming-development/stability-ai-launches-stablecode-an-llm-for-code-generation/ 2 comments
Rotary Embeddings: A Relative Revolution | EleutherAI Blog https://blog.eleuther.ai/rotary-embeddings/ 1 comment
Cerebras Makes It Easy to Harness the Predictive Power of GPT-J | Cerebras https://www.cerebras.net/blog/cerebras-makes-it-easy-to-harness-the-predictive-power-of-gpt-j 1 comment
ALiBi FlashAttention - Speeding up ALiBi by 3-5x with a hardware-efficient implementation | Princeton Language and Intelligence https://pli.princeton.edu/blog/2024/alibi-flashattention-speeding-alibi-3-5x-hardware-efficient-implementation 1 comment
Gradient Update #1: FBI Usage of Facial Recognition and Rotary Embeddings For Large LM's https://thegradientpub.substack.com/p/update-1-fbi-usage-of-facial-recognition 0 comments
GitHub - huggingface/transformers: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. https://github.com/huggingface/pytorch-transformers 0 comments
GitHub - huggingface/transformers: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. https://github.com/huggingface/pytorch-pretrained-BERT 0 comments
GitHub - tomohideshibata/BERT-related-papers: BERT-related papers https://github.com/tomohideshibata/BERT-related-papers 0 comments
GitHub - databrickslabs/dolly: Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform https://github.com/databrickslabs/dolly 0 comments
Transformer Taxonomy (the last lit review) | kipply's blog https://kipp.ly/blog/transformer-taxonomy/ 0 comments
Five years of progress in GPTs - by Finbarr Timbers https://finbarrtimbers.substack.com/p/five-years-of-progress-in-gpts 0 comments
Transformer Deep Dive: Parameter Counting https://orenleung.com/transformer-parameter-counting 0 comments

Related searches:

Search whole site: site:arxiv.org

Search title: [2104.09864] RoFormer: Enhanced Transformer with Rotary Position Embedding

See how to search.

Submit link to: