- H3 - a new generative language models that outperforms GPT-Neo-2.7B with only *2* attention layers! In H3, the researchers replace attention with a new layer based on state space models (SSMs). With the right modifications, it can outperform transformers. Also has no fixed context length. https://arxiv.org/abs/2212.14052 54 comments machinelearning
Linking pages
- AI Canon | Andreessen Horowitz https://a16z.com/2023/05/25/ai-canon/ 219 comments
- 100M Token Context Windows — Magic https://magic.dev/blog/100m-token-context-windows 22 comments
- State-space LLMs: Do we need Attention? https://www.interconnects.ai/p/llms-beyond-attention 1 comment
- GitHub - HazyResearch/aisys-building-blocks: Building blocks for foundation models. https://github.com/HazyResearch/aisys-building-blocks 1 comment
- MPT-7B and The Beginning of Context=Infinity — with Jonathan Frankle and Abhinav Venigalla of MosaicML https://www.latent.space/p/mosaic-mpt-7b 0 comments
- GitHub - RUCAIBox/LLMSurvey: The official GitHub page for the survey paper "A Survey of Large Language Models". https://github.com/RUCAIBox/LLMSurvey 0 comments
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces https://gonzoml.substack.com/p/mamba-linear-time-sequence-modeling 0 comments
- GitHub - AIoT-MLSys-Lab/Efficient-LLMs-Survey: Efficient Large Language Models: A Survey https://github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:arxiv.org
Search title: [2212.14052] Hungry Hungry Hippos: Towards Language Modeling with State Space Models
See how to search.