Hacker News
- Recent Advances in Language Model Fine-Tuning (2021) https://www.ruder.io/recent-advances-lm-fine-tuning/ 3 comments
- Do NLP Beyond English https://ruder.io/nlp-beyond-english/ 68 comments
- The State of Transfer Learning in NLP http://ruder.io/state-of-transfer-learning-in-nlp/ 11 comments
- An overview of gradient descent optimization algorithms http://ruder.io/optimizing-gradient-descent/index.html 3 comments
- Optimization for Deep Learning Highlights http://ruder.io/deep-learning-optimization-2017/index.html 9 comments
- Word embeddings in 2017: Trends and future directions http://ruder.io/word-embeddings-2017/ 25 comments
- Deep Learning for NLP Best Practices http://ruder.io/deep-learning-nlp-best-practices/ 13 comments
- [D] Importance of square root in denominator for AdaGrad https://www.ruder.io/optimizing-gradient-descent/ 2 comments machinelearning
- Stochastic gradient descent: from noisy gradients in millions of dimensions for neural network training - how to go to 2nd order methods? http://ruder.io/optimizing-gradient-descent/index.html 26 comments math