[2212.14052] Hungry Hungry Hippos: Towards Language Modeling with State Space Models - discu.eu

Reddit

H3 - a new generative language models that outperforms GPT-Neo-2.7B with only *2* attention layers! In H3, the researchers replace attention with a new layer based on state space models (SSMs). With the right modifications, it can outperform transformers. Also has no fixed context length. https://arxiv.org/abs/2212.14052 54 comments 24/1/2023 machinelearning

Linking pages

AI Canon | Andreessen Horowitz https://a16z.com/2023/05/25/ai-canon/ 219 comments
100M Token Context Windows — Magic https://magic.dev/blog/100m-token-context-windows 22 comments
State-space LLMs: Do we need Attention? https://www.interconnects.ai/p/llms-beyond-attention 1 comment
GitHub - HazyResearch/aisys-building-blocks: Building blocks for foundation models. https://github.com/HazyResearch/aisys-building-blocks 1 comment
MPT-7B and The Beginning of Context=Infinity — with Jonathan Frankle and Abhinav Venigalla of MosaicML https://www.latent.space/p/mosaic-mpt-7b 0 comments
GitHub - RUCAIBox/LLMSurvey: The official GitHub page for the survey paper "A Survey of Large Language Models". https://github.com/RUCAIBox/LLMSurvey 0 comments
Mamba: Linear-Time Sequence Modeling with Selective State Spaces https://gonzoml.substack.com/p/mamba-linear-time-sequence-modeling 0 comments
GitHub - AIoT-MLSys-Lab/Efficient-LLMs-Survey: Efficient Large Language Models: A Survey https://github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey 0 comments

Would you like to stay up to date with Computer science? Checkout Computer science Weekly.

Related searches:

Search whole site: site:arxiv.org

Search title: [2212.14052] Hungry Hungry Hippos: Towards Language Modeling with State Space Models

See how to search.

Submit link to: