Techniques for Training Large Neural Networks - discu.eu

Hacker News

Techniques for Training Large Neural Networks https://openai.com/blog/techniques-for-training-large-neural-networks/ 23 comments 9/6/2022

Linking pages

How to Train Really Large Models on Many GPUs? | Lil'Log https://lilianweng.github.io/posts/2021-09-25-train-large/ 33 comments

Linked pages

ChatGPT https://chat.openai.com/ 756 comments
DALL·E: Creating Images from Text https://openai.com/blog/dall-e/ 461 comments
What is backpropagation really doing? | Chapter 3, Deep learning - YouTube https://www.youtube.com/watch?v=Ilg3gGewQ5U 203 comments
[1701.06538] Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer https://arxiv.org/abs/1701.06538 125 comments
Gradient descent, how neural networks learn | Chapter 2, Deep learning - YouTube https://youtu.be/IHZwWFHWa-w 61 comments
The Illustrated Transformer – Jay Alammar – Visualizing machine learning one concept at a time. https://jalammar.github.io/illustrated-transformer/ 36 comments
[2006.16668] GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding https://arxiv.org/abs/2006.16668 35 comments
https://arxiv.org/abs/2101.03961 4 comments
Matrix multiplication - Wikipedia https://en.wikipedia.org/wiki/Matrix_multiplication#Outer_product 4 comments
Research Engineer https://openai.com/careers/research-engineer 4 comments
[1710.03740] Mixed Precision Training https://arxiv.org/abs/1710.03740 1 comment
MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism - NVIDIA ADLR https://nv-adlr.github.io/MegatronLM 1 comment
[2104.04473] Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM https://arxiv.org/abs/2104.04473 1 comment
[1412.6980] Adam: A Method for Stochastic Optimization http://arxiv.org/abs/1412.6980 0 comments
[1811.06965] GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism https://arxiv.org/abs/1811.06965 0 comments
[1604.06174] Training Deep Nets with Sublinear Memory Cost https://arxiv.org/abs/1604.06174 0 comments
Technologies behind Distributed Deep Learning: AllReduce - Preferred Networks Research & Development https://tech.preferred.jp/en/blog/technologies-behind-distributed-deep-learning-allreduce/ 0 comments

Related searches:

Search whole site: site:openai.com

Search title: Techniques for Training Large Neural Networks

See how to search.

Submit link to: