- [D] Transformers: Polynomial gated FFN is better than SwiGLU and reduces the number of parameters while improving model's performance https://arxiv.org/pdf/2002.05202.pdf 10 comments machinelearning
Linking pages
- GitHub - linkedin/Liger-Kernel: Efficient Triton Kernels for LLM Training https://github.com/linkedin/Liger-Kernel 19 comments
- GitHub - PiotrNawrot/nanoT5: Fast & Simple repository for pre-training and fine-tuning T5-style models https://github.com/PiotrNawrot/nanoT5 0 comments
- Transformer Taxonomy (the last lit review) | kipply's blog https://kipp.ly/blog/transformer-taxonomy/ 0 comments
- Transformer Deep Dive: Parameter Counting https://orenleung.com/transformer-parameter-counting 0 comments
- BTLM-3B-8K: 7B Performance in a 3 Billion Parameter Model - Cerebras https://www.cerebras.net/machine-learning/btlm-3b-8k-7b-performance-in-a-3-billion-parameter-model/ 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.