Linking pages
Linked pages
- Mistral 7B | Mistral AI | Open source models https://mistral.ai/news/announcing-mistral-7b/ 618 comments
- OpenAI suspends ByteDance’s account after it used GPT to train its own AI model. - The Verge https://www.theverge.com/2023/12/15/24003542/openai-suspends-bytedances-account-after-it-used-gpt-to-train-its-own-ai-model 285 comments
- [2305.13048] RWKV: Reinventing RNNs for the Transformer Era https://arxiv.org/abs/2305.13048 171 comments
- GitHub - johnma2006/mamba-minimal: Simple, minimal implementation of Mamba in one file of PyTorch. https://github.com/johnma2006/mamba-minimal 109 comments
- Paving the way to efficient architectures: StripedHyena-7B, open source models offering a glimpse into a world beyond Transformers https://www.together.ai/blog/stripedhyena-7b 72 comments
- [2212.14052] Hungry Hungry Hippos: Towards Language Modeling with State Space Models https://arxiv.org/abs/2212.14052 54 comments
- [2312.00752] Mamba: Linear-Time Sequence Modeling with Selective State Spaces https://arxiv.org/abs/2312.00752 42 comments
- http://arxiv.org/abs/1410.5401 40 comments
- Batch computing and the coming age of AI systems · Hazy Research https://hazyresearch.stanford.edu/blog/2023-04-12-batch 32 comments
- Monarch Mixer: Revisiting BERT, Without Attention or MLPs · Hazy Research https://hazyresearch.stanford.edu/blog/2023-07-25-m2-bert 32 comments
- [1810.04805] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding https://arxiv.org/abs/1810.04805 25 comments
- [2310.12109] Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture https://arxiv.org/abs/2310.12109 15 comments
- [2111.00396] Efficiently Modeling Long Sequences with Structured State Spaces https://arxiv.org/abs/2111.00396 8 comments
- [2302.10866] Hyena Hierarchy: Towards Larger Convolutional Language Models https://arxiv.org/abs/2302.10866 3 comments
- GitHub - state-spaces/mamba https://github.com/state-spaces/mamba 2 comments
- upstage/SOLAR-10.7B-Instruct-v1.0 · Hugging Face https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0 2 comments
- do-we-need-attention/DoWeNeedAttention.pdf at main · srush/do-we-need-attention · GitHub https://github.com/srush/do-we-need-attention/blob/main/DoWeNeedAttention.pdf 1 comment
- Zoology (Blogpost 2): Simple, Input-Dependent, and Sub-Quadratic Sequence Mixers · Hazy Research https://hazyresearch.stanford.edu/blog/2023-12-11-zoology2-based 1 comment
- Zoology (Blogpost 1): Measuring and Improving Recall in Efficient Language Models · Hazy Research https://hazyresearch.stanford.edu/blog/2023-12-11-zoology1-analysis 1 comment
- Is Attention All You Need? http://www.isattentionallyouneed.com/ 0 comments
Related searches:
Search whole site: site:interconnects.ai
Search title: State-space LLMs: Do we need Attention?
See how to search.