How to train a Million Context LLM — with Mark Huang of Gradient.ai

Linking pages

The 2025 AI Engineering Reading List - Latent Space https://www.latent.space/p/2025-papers 69 comments
The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka https://www.latent.space/p/yitay 4 comments
ICLR 2024 — Best Papers & Talks (Benchmarks, Reasoning & Agents) — ft. Graham Neubig, Aman Sanger, Moritz Hardt) https://www.latent.space/p/iclr-2024-benchmarks-agents 0 comments
How To Hire AI Engineers - by Adam Wiggins and James Brady https://www.latent.space/p/hiring 0 comments
State of the Art: Training >70B LLMs on 10,000 H100 clusters https://www.latent.space/p/llm-training-2024 0 comments

Linked pages

Introducing Gemini 1.5, Google's next-generation AI model https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/ 715 comments
GitHub - smol-ai/developer: smol developer that writes code for u https://github.com/smol-ai/developer 138 comments
GitHub - 01-ai/Yi: A series of large language models trained from scratch by developers @01-ai https://github.com/01-ai/Yi 52 comments
[2310.01889] Ring Attention with Blockwise Transformers for Near-Infinite Context https://arxiv.org/abs/2310.01889 20 comments
[2106.09685] LoRA: Low-Rank Adaptation of Large Language Models https://arxiv.org/abs/2106.09685 8 comments
Gradient https://gradient.ai/ 8 comments
SlimPajama: A 627B token cleaned and deduplicated version of RedPajama - Cerebras https://www.cerebras.net/blog/slimpajama-a-627b-token-cleaned-and-deduplicated-version-of-redpajama 7 comments
WebSim, WorldSim, and The Summer of Simulative AI — with Joscha Bach of Liquid AI, Karan Malhotra of Nous Research, Rob Haisfield of WebSim.ai https://www.latent.space/p/sim-ai 7 comments
Rotary Embeddings: A Relative Revolution | EleutherAI Blog https://blog.eleuther.ai/rotary-embeddings/ 1 comment
[2401.16380] Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling https://arxiv.org/abs/2401.16380 1 comment
Emulating Humans with NSFW Chatbots - with Jesse Silver https://www.latent.space/p/nsfw-chatbots 1 comment
Latent Space | swyx | Substack https://www.latent.space/ 0 comments
MPT-7B and The Beginning of Context=Infinity — with Jonathan Frankle and Abhinav Venigalla of MosaicML https://www.latent.space/p/mosaic-mpt-7b 0 comments
GitHub - FanaHOVA/smol-podcaster: smol-podcaster is your autonomous podcast production intern 🐣 https://github.com/FanaHOVA/smol-podcaster 0 comments
GitHub - gkamradt/LLMTest_NeedleInAHaystack: Doing simple retrieval from LLM models at various context lengths to measure accuracy https://github.com/gkamradt/LLMTest_NeedleInAHaystack 0 comments
[2401.02954] DeepSeek LLM: Scaling Open-Source Language Models with Longtermism https://arxiv.org/abs/2401.02954 0 comments
The Four Wars of the AI Stack (Dec 2023 Recap) https://www.latent.space/p/dec-2023 0 comments
ICLR 2024 — Best Papers & Talks (ImageGen, Vision, Transformers, State Space Models) ft. Christian Szegedy, Ilya Sutskever https://www.latent.space/p/iclr-2024-recap 0 comments