[2105.14103] An Attention Free Transformer - discu.eu

Reddit

[R] An Attention Free Transformer https://arxiv.org/abs/2105.14103 15 comments 1/6/2021 machinelearning

Linking pages

GitHub - BlinkDL/RWKV-LM: RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding. https://github.com/BlinkDL/RWKV-LM 179 comments
RWKV: Reinventing RNNs for the Transformer Era — with Eugene Cheah of UIlicious https://www.latent.space/p/rwkv#%C2%A7the-eleuther-mafia 66 comments

Would you like to stay up to date with Computer science? Checkout Computer science Weekly.

Related searches:

Search whole site: site:arxiv.org

Search title: [2105.14103] An Attention Free Transformer

See how to search.

Submit link to: