- [R] Sparse is Enough in Scaling Transformers https://arxiv.org/abs/2111.12763 5 comments machinelearning
Linking pages
- Large Transformer Model Inference Optimization | Lil'Log https://lilianweng.github.io/posts/2023-01-10-inference-optimization/ 20 comments
- GPT-4 architecture: what we can deduce from research literature | Kirill Gadjello's personal blog and website https://kir-gadjello.github.io/posts/gpt4-some-technical-hypotheses/ 6 comments
- Warsaw U, Google & OpenAI’s Terraformer Achieves a 37x Speedup Over Dense Baselines on 17B Transformer Decoding | Synced https://syncedreview.com/2021/12/03/deepmind-podracer-tpu-based-rl-frameworks-deliver-exceptional-performance-at-low-cost-158/ 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:arxiv.org
Search title: [R] Sparse is Enough in Scaling Transformers
See how to search.