[1905.10650] Are Sixteen Heads Really Better than One? - discu.eu

Linking pages

Transformers are Graph Neural Networks https://thegradient.pub/transformers-are-graph-neural-networks/ 25 comments
Transformers are Graph Neural Networks | NTU Graph Deep Learning Lab https://graphdeeplearning.github.io/post/transformers-are-gnns/ 19 comments
All The Ways You Can Compress BERT | Mitchell A. Gordon http://mitchgordon.me/machine/learning/2019/11/18/all-the-ways-to-compress-BERT.html 0 comments
Aman's AI Journal • Primers • Transformers https://aman.ai/primers/ai/transformers/ 0 comments
Learn how to make BERT smaller and faster | The Rasa Blog | Rasa https://blog.rasa.com/compressing-bert-for-faster-prediction-2/ 0 comments
Attention for time series forecasting and classification | by Isaac Godfried | Towards Data Science https://towardsdatascience.com/attention-for-time-series-classification-and-forecasting-261723e0006d 0 comments
GitHub - tomohideshibata/BERT-related-papers: BERT-related papers https://github.com/tomohideshibata/BERT-related-papers 0 comments
2019: The Year of BERT. As we wrap up 2019, it’s interesting to… | by Natasha Latysheva | Towards Data Science https://medium.com/@natasha.latysheva/2019-the-year-of-bert-354e8106f7ba 0 comments

Related searches:

Search whole site: site:arxiv.org

Search title: [1905.10650] Are Sixteen Heads Really Better than One?

See how to search.

Submit link to: