- At the Intersection of LLMs and Kernels - Research Roundup http://charlesfrye.github.io/programming/2023/11/10/llms-systems.html 4 comments programming
Linking pages
- We Are Running Out of Low-Background Tokens (Nov 2023 Recap) https://www.latent.space/i/139368545/the-concept-of-low-background-tokens 6 comments
- Beat GPT-4o at Python by searching with 100 dumb LLaMAs | Modal Blog https://modal.com/blog/llama-human-eval 2 comments
- [Paper Review] Efficient Memory Management for Large Language Model Serving with PagedAttention https://newsletter.micahlerner.com/p/paper-review-efficient-memory-management 0 comments
- Efficient Memory Management for Large Language Model Serving with PagedAttention https://www.micahlerner.com/2024/01/11/efficient-memory-management-for-large-language-model-serving-with-pagedattention.html 0 comments
Linked pages
- What the heck is the event loop anyway? | Philip Roberts | JSConf EU - YouTube https://www.youtube.com/watch?v=8aGhZQkoFbQ 307 comments
- Writing an OS in Rust https://os.phil-opp.com/ 297 comments
- [2304.03442] Generative Agents: Interactive Simulacra of Human Behavior https://arxiv.org/abs/2304.03442 276 comments
- [2310.02226] Think before you speak: Training Language Models With Pause Tokens https://arxiv.org/abs/2310.02226 107 comments
- [2310.08560] MemGPT: Towards LLMs as Operating Systems https://arxiv.org/abs/2310.08560 106 comments
- Replicate – Run open-source machine learning models with a cloud API https://replicate.ai/ 53 comments
- Amazon S3 - Cloud Object Storage - AWS http://aws.amazon.com/s3/ 31 comments
- [2309.06180] Efficient Memory Management for Large Language Model Serving with PagedAttention https://arxiv.org/abs/2309.06180 16 comments
- Transformer Circuits Thread https://transformer-circuits.pub/ 8 comments
- [2211.17192] Fast Inference from Transformers via Speculative Decoding https://arxiv.org/abs/2211.17192 2 comments
- [2302.01318] Accelerating Large Language Model Decoding with Speculative Sampling https://arxiv.org/abs/2302.01318 0 comments
- Assisted Generation: a new direction toward low-latency text generation https://huggingface.co/blog/assisted-generation 0 comments
- [2307.03172] Lost in the Middle: How Language Models Use Long Contexts https://arxiv.org/abs/2307.03172 0 comments
- [2309.02427] Cognitive Architectures for Language Agents https://arxiv.org/abs/2309.02427 0 comments
- Andrej Karpathy on X: "With many 🧩 dropping recently, a more complete picture is emerging of LLMs not as a chatbot, but the kernel process of a new Operating System. E.g. today it orchestrates: - Input &ampampampampampampampampampampampampampampampampampampampampampampampampampampampampampampampamp; Output across modalities (text, audio, vision) - Code interpreter, ability to write &ampampampampampampampampampampampampampampampampampampampampampampampampampampampampampampampamp; run… https://t.co/2HsyslOG2F" / X https://twitter.com/karpathy/status/1707437820045062561 0 comments
- [2309.16588] Vision Transformers Need Registers https://arxiv.org/abs/2309.16588 0 comments
Related searches:
Search whole site: site:charlesfrye.github.io
Search title: At the Intersection of LLMs and Kernels - Research Roundup
See how to search.