- [R] You can't train GPT-3 on a single GPU, but you *can* tune its hyperparameters on one https://arxiv.org/abs/2203.03466 36 comments machinelearning
Linking pages
- Google "We Have No Moat, And Neither Does OpenAI" https://www.semianalysis.com/p/google-we-have-no-moat-and-neither 1571 comments
- Transformer Taxonomy (the last lit review) | kipply's blog https://kipp.ly/blog/transformer-taxonomy/ 0 comments
- BTLM-3B-8K: 7B Performance in a 3 Billion Parameter Model - Cerebras https://www.cerebras.net/machine-learning/btlm-3b-8k-7b-performance-in-a-3-billion-parameter-model/ 0 comments
- GitHub - elicit/machine-learning-list https://github.com/elicit/machine-learning-list 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:arxiv.org
Search title: [2203.03466] Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
See how to search.