State-space LLMs: Do we need Attention? - discu.eu

Linking pages

The Four Wars of the AI Stack (Dec 2023 Recap) https://www.latent.space/p/dec-2023 0 comments
The Four Wars of the AI Stack (Dec 2023 Recap) https://www.latent.space/i/140396949/mixtral-sparks-a-gpuinference-war 0 comments

Linked pages

Mistral 7B | Mistral AI | Open source models https://mistral.ai/news/announcing-mistral-7b/ 618 comments
OpenAI suspends ByteDance’s account after it used GPT to train its own AI model. - The Verge https://www.theverge.com/2023/12/15/24003542/openai-suspends-bytedances-account-after-it-used-gpt-to-train-its-own-ai-model 284 comments
[2305.13048] RWKV: Reinventing RNNs for the Transformer Era https://arxiv.org/abs/2305.13048 171 comments
GitHub - johnma2006/mamba-minimal: Simple, minimal implementation of Mamba in one file of PyTorch. https://github.com/johnma2006/mamba-minimal 108 comments
Paving the way to efficient architectures: StripedHyena-7B, open source models offering a glimpse into a world beyond Transformers https://www.together.ai/blog/stripedhyena-7b 72 comments
[2212.14052] Hungry Hungry Hippos: Towards Language Modeling with State Space Models https://arxiv.org/abs/2212.14052 54 comments
[2312.00752] Mamba: Linear-Time Sequence Modeling with Selective State Spaces https://arxiv.org/abs/2312.00752 42 comments
http://arxiv.org/abs/1410.5401 40 comments
Batch computing and the coming age of AI systems · Hazy Research https://hazyresearch.stanford.edu/blog/2023-04-12-batch 32 comments
Monarch Mixer: Revisiting BERT, Without Attention or MLPs · Hazy Research https://hazyresearch.stanford.edu/blog/2023-07-25-m2-bert 32 comments
[1810.04805] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding https://arxiv.org/abs/1810.04805 25 comments
[2310.12109] Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture https://arxiv.org/abs/2310.12109 15 comments
[2111.00396] Efficiently Modeling Long Sequences with Structured State Spaces https://arxiv.org/abs/2111.00396 8 comments
[2302.10866] Hyena Hierarchy: Towards Larger Convolutional Language Models https://arxiv.org/abs/2302.10866 3 comments
GitHub - state-spaces/mamba https://github.com/state-spaces/mamba 2 comments
upstage/SOLAR-10.7B-Instruct-v1.0 · Hugging Face https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0 2 comments
do-we-need-attention/DoWeNeedAttention.pdf at main · srush/do-we-need-attention · GitHub https://github.com/srush/do-we-need-attention/blob/main/DoWeNeedAttention.pdf 1 comment
Zoology (Blogpost 2): Simple, Input-Dependent, and Sub-Quadratic Sequence Mixers · Hazy Research https://hazyresearch.stanford.edu/blog/2023-12-11-zoology2-based 1 comment
Zoology (Blogpost 1): Measuring and Improving Recall in Efficient Language Models · Hazy Research https://hazyresearch.stanford.edu/blog/2023-12-11-zoology1-analysis 1 comment
Is Attention All You Need? http://www.isattentionallyouneed.com/ 0 comments

Related searches:

Search whole site: site:www.interconnects.ai

Search title: State-space LLMs: Do we need Attention?

See how to search.

Submit link to: