Hacker News
- [R] An Empirical Study of Mamba-based Language Models (8B Mamba-2-Hybrid on 3.5T tokens data) http://arxiv.org/abs/2406.07887 4 comments machinelearning
Linking pages
- LLM Research Papers: The 2024 List https://magazine.sebastianraschka.com/p/llm-research-papers-the-2024-list 11 comments
- Why large language models struggle with long contexts https://www.understandingai.org/p/why-large-language-models-struggle 0 comments
- Why AI language models choke on too much text - Ars Technica https://arstechnica.com/ai/2024/12/why-ai-language-models-choke-on-too-much-text/ 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:arxiv.org
Search title: [2406.07887] An Empirical Study of Mamba-based Language Models
See how to search.