Linking pages
Linked pages
- GitHub - BlinkDL/RWKV-LM: RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding. https://github.com/BlinkDL/RWKV-LM 179 comments
- [2305.13048] RWKV: Reinventing RNNs for the Transformer Era https://arxiv.org/abs/2305.13048 171 comments
- PyTorch http://pytorch.org/ 100 comments
- https://arxiv.org/abs/2307.08621 36 comments
- [2311.01927] GateLoop: Fully Data-Controlled Linear Recurrence for Sequence Modeling https://arxiv.org/abs/2311.01927 23 comments
- [2307.14995] Scaling TransNormer to 175 Billion Parameters https://arxiv.org/abs/2307.14995 22 comments
- GitHub - openai/triton: Development repository for the Triton language and compiler https://github.com/openai/triton 5 comments
- [2102.11174] Linear Transformers Are Secretly Fast Weight Programmers https://arxiv.org/abs/2102.11174 2 comments
- [2310.01655] PolySketchFormer: Fast Transformers via Sketches for Polynomial Kernels https://arxiv.org/abs/2310.01655 1 comment
- Zoology (Blogpost 2): Simple, Input-Dependent, and Sub-Quadratic Sequence Mixers · Hazy Research https://hazyresearch.stanford.edu/blog/2023-12-11-zoology2-based 1 comment
- [2006.16236] Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention https://arxiv.org/abs/2006.16236 0 comments
- GitHub - EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of autoregressive language models. https://github.com/EleutherAI/lm-evaluation-harness 0 comments
- [2312.06635] Gated Linear Attention Transformers with Hardware-Efficient Training https://arxiv.org/abs/2312.06635 0 comments
- [2404.05892] Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence https://arxiv.org/abs/2404.05892 0 comments