Linking pages
Linked pages
- GitHub - BlinkDL/RWKV-LM: RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding. https://github.com/BlinkDL/RWKV-LM 179 comments
- [2305.13048] RWKV: Reinventing RNNs for the Transformer Era https://arxiv.org/abs/2305.13048 171 comments
- [2305.14314] QLoRA: Efficient Finetuning of Quantized LLMs https://arxiv.org/abs/2305.14314 129 comments
- From Deep to Long Learning? · Hazy Research https://hazyresearch.stanford.edu/blog/2023-03-27-long-learning 124 comments
- Paving the way to efficient architectures: StripedHyena-7B, open source models offering a glimpse into a world beyond Transformers https://www.together.ai/blog/stripedhyena-7b 72 comments
- GitHub - EleutherAI/gpt-neox: An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library. https://github.com/EleutherAI/gpt-neox 67 comments
- QuIP# https://cornell-relaxml.github.io/quip-sharp/ 59 comments
- [2212.14052] Hungry Hungry Hippos: Towards Language Modeling with State Space Models https://arxiv.org/abs/2212.14052 54 comments
- [2312.00752] Mamba: Linear-Time Sequence Modeling with Selective State Spaces https://arxiv.org/abs/2312.00752 42 comments
- http://arxiv.org/abs/1410.5401 40 comments
- GitHub - srush/GPU-Puzzles: Solve puzzles. Learn CUDA. https://github.com/srush/GPU-Puzzles 38 comments
- [2112.05682] Self-attention Does Not Need $O(n^2)$ Memory https://arxiv.org/abs/2112.05682 37 comments
- [2303.06865] High-throughput Generative Inference of Large Language Models with a Single GPU https://arxiv.org/abs/2303.06865 36 comments
- https://arxiv.org/abs/2307.08621 36 comments
- [1803.03635] The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks https://arxiv.org/abs/1803.03635 32 comments
- Batch computing and the coming age of AI systems · Hazy Research https://hazyresearch.stanford.edu/blog/2023-04-12-batch 32 comments
- Monarch Mixer: Revisiting BERT, Without Attention or MLPs · Hazy Research https://hazyresearch.stanford.edu/blog/2023-07-25-m2-bert 32 comments
- Making Deep Learning go Brrrr From First Principles https://horace.io/brrr_intro.html 20 comments
- [2310.01889] Ring Attention with Blockwise Transformers for Near-Infinite Context https://arxiv.org/abs/2310.01889 20 comments
- [2309.06180] Efficient Memory Management for Large Language Model Serving with PagedAttention https://arxiv.org/abs/2309.06180 16 comments
Related searches:
Search whole site: site:github.com
Search title: GitHub - HazyResearch/aisys-building-blocks: Building blocks for foundation models.
See how to search.