Hacker News
Linking pages
- What We Know About LLMs (Primer) https://willthompson.name/what-we-know-about-llms-primer 164 comments
- GitHub - mosaicml/composer: Train neural networks up to 7x faster https://github.com/mosaicml/composer 84 comments
- The Transformer Family Version 2.0 | Lil'Log https://lilianweng.github.io/posts/2023-01-27-the-transformer-family-v2/ 46 comments
- FlexAttention: The Flexibility of PyTorch with the Performance of FlashAttention | PyTorch https://pytorch.org/blog/flexattention/ 24 comments
- GitHub - mlabonne/llm-course: Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks. https://github.com/mlabonne/llm-course 10 comments
- GitHub - JUSTSUJAY/ML-Research-Papers https://github.com/JUSTSUJAY/ML-Research-Papers 10 comments
- GitHub - bytebreezestudios/ml-parakeet: Parakeet, a tiny language model by Byte Breeze Studios https://github.com/bytebreezestudios/ml-parakeet 2 comments
- Aman's AI Journal • Primers • Overview of Large Language Models https://aman.ai/primers/ai/LLM/ 1 comment
- ALiBi FlashAttention - Speeding up ALiBi by 3-5x with a hardware-efficient implementation | Princeton Language and Intelligence https://pli.princeton.edu/blog/2024/alibi-flashattention-speeding-alibi-3-5x-hardware-efficient-implementation 1 comment
- GitHub - PiotrNawrot/nanoT5: Fast & Simple repository for pre-training and fine-tuning T5-style models https://github.com/PiotrNawrot/nanoT5 0 comments
- Transformer Taxonomy (the last lit review) | kipply's blog https://kipp.ly/blog/transformer-taxonomy/ 0 comments
- MPT-7B and The Beginning of Context=Infinity — with Jonathan Frankle and Abhinav Venigalla of MosaicML https://www.latent.space/p/mosaic-mpt-7b 0 comments
- GitHub - RUCAIBox/LLMSurvey: The official GitHub page for the survey paper "A Survey of Large Language Models". https://github.com/RUCAIBox/LLMSurvey 0 comments
- BTLM-3B-8K: 7B Performance in a 3 Billion Parameter Model - Cerebras https://www.cerebras.net/machine-learning/btlm-3b-8k-7b-performance-in-a-3-billion-parameter-model/ 0 comments
- List of Artificial Intelligence AI Advancements by Non-Profit Researchers - MarkTechPost https://www.marktechpost.com/2023/10/27/list-of-artificial-intelligence-ai-advancements-by-non-profit-researchers/ 0 comments
- GitHub - AIoT-MLSys-Lab/Efficient-LLMs-Survey: Efficient Large Language Models: A Survey https://github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey 0 comments
- Positional Encoding for Self Attention - SWE to ML Engineer https://swe-to-mle.pages.dev/posts/positional-encoding-for-self-attention/ 0 comments
- Position Information in Transformer-Based Models: Exploring the main Methods and Approaches – Reinforced Knowledge https://reinforcedknowledge.com/position-information-in-transformer-based-models-exploring-the-main-methods-and-approaches/ 0 comments
Related searches:
Search whole site: site:arxiv.org
Search title: [2108.12409] Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
See how to search.