Hacker News
- Fast Inference from Transformers via Speculative Decoding https://arxiv.org/abs/2211.17192 2 comments
Linking pages
- Accelerating Generative AI with PyTorch II: GPT, Fast | PyTorch https://pytorch.org/blog/accelerating-generative-ai-2/ 69 comments
- GitHub - MDK8888/GPTFast: Accelerate your Hugging Face Transformers 6-7x. Native to Hugging Face and PyTorch. https://github.com/MDK8888/GPTFast 18 comments
- We Are Running Out of Low-Background Tokens (Nov 2023 Recap) https://www.latent.space/i/139368545/the-concept-of-low-background-tokens 6 comments
- At the Intersection of LLMs and Kernels - Research Roundup https://charlesfrye.github.io/programming/2023/11/10/llms-systems.html 4 comments
- Aman's AI Journal • Primers • Overview of Large Language Models https://aman.ai/primers/ai/LLM/ 1 comment
- LLM in a flash: Efficient Large Language Model Inference with Limited Memory https://browse.arxiv.org/html/2312.11514v1 1 comment
- GitHub - HazyResearch/aisys-building-blocks: Building blocks for foundation models. https://github.com/HazyResearch/aisys-building-blocks 1 comment
- GitHub - koayon/awesome-adaptive-computation: A curated reading list of research in Adaptive Computation (AC). https://github.com/koayon/awesome-adaptive-computation 0 comments
- Transformer inference tricks - by Finbarr Timbers https://www.artfintel.com/p/transformer-inference-tricks 0 comments
- GitHub - AIoT-MLSys-Lab/Efficient-LLMs-Survey: Efficient Large Language Models: A Survey https://github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey 0 comments
- Speculative Decoding - philkrav https://philkrav.com/posts/speculative/ 0 comments
Related searches:
Search whole site: site:arxiv.org
Search title: [2211.17192] Fast Inference from Transformers via Speculative Decoding
See how to search.