[2105.02723] Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet - discu.eu

Hacker News

A stack of feed-forward layers does surprisingly well on ImageNet https://arxiv.org/abs/2105.02723 25 comments 18/1/2023

Linking pages

GitHub - cmhungsteve/Awesome-Transformer-Attention: An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites https://github.com/cmhungsteve/Awesome-Transformer-Attention 13 comments
Gradient Update #1: FBI Usage of Facial Recognition and Rotary Embeddings For Large LM's https://thegradientpub.substack.com/p/update-1-fbi-usage-of-facial-recognition 0 comments

Related searches:

Search whole site: site:arxiv.org

Search title: [2105.02723] Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet

See how to search.

Submit link to: