Hacker News
- Techniques for Training Large Neural Networks https://openai.com/blog/techniques-for-training-large-neural-networks/ 23 comments
Linking pages
Linked pages
- ChatGPT https://chat.openai.com/ 742 comments
- DALL·E: Creating Images from Text https://openai.com/blog/dall-e/ 461 comments
- What is backpropagation really doing? | Chapter 3, Deep learning - YouTube https://www.youtube.com/watch?v=Ilg3gGewQ5U 203 comments
- [1701.06538] Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer https://arxiv.org/abs/1701.06538 125 comments
- Gradient descent, how neural networks learn | Chapter 2, Deep learning - YouTube https://youtu.be/IHZwWFHWa-w 61 comments
- The Illustrated Transformer – Jay Alammar – Visualizing machine learning one concept at a time. https://jalammar.github.io/illustrated-transformer/ 36 comments
- [2006.16668] GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding https://arxiv.org/abs/2006.16668 35 comments
- https://arxiv.org/abs/2101.03961 4 comments
- Matrix multiplication - Wikipedia https://en.wikipedia.org/wiki/Matrix_multiplication#Outer_product 4 comments
- Research Engineer https://openai.com/careers/research-engineer 4 comments
- [1710.03740] Mixed Precision Training https://arxiv.org/abs/1710.03740 1 comment
- MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism - NVIDIA ADLR https://nv-adlr.github.io/MegatronLM 1 comment
- [2104.04473] Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM https://arxiv.org/abs/2104.04473 1 comment
- [1412.6980] Adam: A Method for Stochastic Optimization http://arxiv.org/abs/1412.6980 0 comments
- [1811.06965] GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism https://arxiv.org/abs/1811.06965 0 comments
- [1604.06174] Training Deep Nets with Sublinear Memory Cost https://arxiv.org/abs/1604.06174 0 comments
- Technologies behind Distributed Deep Learning: AllReduce - Preferred Networks Research & Development https://tech.preferred.jp/en/blog/technologies-behind-distributed-deep-learning-allreduce/ 0 comments
Related searches:
Search whole site: site:openai.com
Search title: Techniques for Training Large Neural Networks
See how to search.