[D] Transformers: Polynomial gated FFN is better than SwiGLU and reduces the number of parameters while improving model's performance - discu.eu

Reddit

[D] Transformers: Polynomial gated FFN is better than SwiGLU and reduces the number of parameters while improving model's performance https://arxiv.org/pdf/2002.05202.pdf 10 comments 29/12/2023 machinelearning

Linking pages

GitHub - linkedin/Liger-Kernel: Efficient Triton Kernels for LLM Training https://github.com/linkedin/Liger-Kernel 19 comments
GitHub - PiotrNawrot/nanoT5: Fast & Simple repository for pre-training and fine-tuning T5-style models https://github.com/PiotrNawrot/nanoT5 0 comments
Transformer Taxonomy (the last lit review) | kipply's blog https://kipp.ly/blog/transformer-taxonomy/ 0 comments
Transformer Deep Dive: Parameter Counting https://orenleung.com/transformer-parameter-counting 0 comments
BTLM-3B-8K: 7B Performance in a 3 Billion Parameter Model - Cerebras https://www.cerebras.net/machine-learning/btlm-3b-8k-7b-performance-in-a-3-billion-parameter-model/ 0 comments

Would you like to stay up to date with Computer science? Checkout Computer science Weekly.

Related searches:

Search whole site: site:arxiv.org

Search title: [D] Transformers: Polynomial gated FFN is better than SwiGLU and reduces the number of parameters while improving model's performance

See how to search.

Submit link to: