[2108.12409] Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation - discu.eu

Hacker News

Attention with Linear Biases (ALiBi) https://arxiv.org/abs/2108.12409 15 comments 14/5/2023

Linking pages

What We Know About LLMs (Primer) https://willthompson.name/what-we-know-about-llms-primer 164 comments
GitHub - mosaicml/composer: Train neural networks up to 7x faster https://github.com/mosaicml/composer 84 comments
The Transformer Family Version 2.0 | Lil'Log https://lilianweng.github.io/posts/2023-01-27-the-transformer-family-v2/ 46 comments
FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention | PyTorch https://pytorch.org/blog/flexattention/ 24 comments
GitHub - mlabonne/llm-course: Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks. https://github.com/mlabonne/llm-course 10 comments
GitHub - JUSTSUJAY/ML-Research-Papers https://github.com/JUSTSUJAY/ML-Research-Papers 10 comments
GitHub - bytebreezestudios/ml-parakeet: Parakeet, a tiny language model by Byte Breeze Studios https://github.com/bytebreezestudios/ml-parakeet 2 comments
Aman's AI Journal • Primers • Overview of Large Language Models https://aman.ai/primers/ai/LLM/ 1 comment
ALiBi FlashAttention - Speeding up ALiBi by 3-5x with a hardware-efficient implementation | Princeton Language and Intelligence https://pli.princeton.edu/blog/2024/alibi-flashattention-speeding-alibi-3-5x-hardware-efficient-implementation 1 comment
GitHub - PiotrNawrot/nanoT5: Fast & Simple repository for pre-training and fine-tuning T5-style models https://github.com/PiotrNawrot/nanoT5 0 comments
Transformer Taxonomy (the last lit review) | kipply's blog https://kipp.ly/blog/transformer-taxonomy/ 0 comments
MPT-7B and The Beginning of Context=Infinity — with Jonathan Frankle and Abhinav Venigalla of MosaicML https://www.latent.space/p/mosaic-mpt-7b 0 comments
GitHub - RUCAIBox/LLMSurvey: The official GitHub page for the survey paper "A Survey of Large Language Models". https://github.com/RUCAIBox/LLMSurvey 0 comments
BTLM-3B-8K: 7B Performance in a 3 Billion Parameter Model - Cerebras https://www.cerebras.net/machine-learning/btlm-3b-8k-7b-performance-in-a-3-billion-parameter-model/ 0 comments
List of Artificial Intelligence AI Advancements by Non-Profit Researchers - MarkTechPost https://www.marktechpost.com/2023/10/27/list-of-artificial-intelligence-ai-advancements-by-non-profit-researchers/ 0 comments
GitHub - AIoT-MLSys-Lab/Efficient-LLMs-Survey: Efficient Large Language Models: A Survey https://github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey 0 comments
Positional Encoding for Self Attention - SWE to ML Engineer https://swe-to-mle.pages.dev/posts/positional-encoding-for-self-attention/ 0 comments
Position Information in Transformer-Based Models: Exploring the main Methods and Approaches – Reinforced Knowledge https://reinforcedknowledge.com/position-information-in-transformer-based-models-exploring-the-main-methods-and-approaches/ 0 comments

Related searches:

Search whole site: site:arxiv.org

Search title: [2108.12409] Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation

See how to search.

Submit link to: