[2211.17192] Fast Inference from Transformers via Speculative Decoding - discu.eu

Hacker News

Fast Inference from Transformers via Speculative Decoding https://arxiv.org/abs/2211.17192 2 comments 5/9/2023

Linking pages

Accelerating Generative AI with PyTorch II: GPT, Fast | PyTorch https://pytorch.org/blog/accelerating-generative-ai-2/ 69 comments
GitHub - MDK8888/GPTFast: Accelerate your Hugging Face Transformers 6-7x. Native to Hugging Face and PyTorch. https://github.com/MDK8888/GPTFast 18 comments
We Are Running Out of Low-Background Tokens (Nov 2023 Recap) https://www.latent.space/i/139368545/the-concept-of-low-background-tokens 6 comments
At the Intersection of LLMs and Kernels - Research Roundup https://charlesfrye.github.io/programming/2023/11/10/llms-systems.html 4 comments
Aman's AI Journal • Primers • Overview of Large Language Models https://aman.ai/primers/ai/LLM/ 1 comment
LLM in a flash: Efficient Large Language Model Inference with Limited Memory https://browse.arxiv.org/html/2312.11514v1 1 comment
GitHub - HazyResearch/aisys-building-blocks: Building blocks for foundation models. https://github.com/HazyResearch/aisys-building-blocks 1 comment
GitHub - koayon/awesome-adaptive-computation: A curated reading list of research in Adaptive Computation (AC). https://github.com/koayon/awesome-adaptive-computation 0 comments
Transformer inference tricks - by Finbarr Timbers https://www.artfintel.com/p/transformer-inference-tricks 0 comments
GitHub - AIoT-MLSys-Lab/Efficient-LLMs-Survey: Efficient Large Language Models: A Survey https://github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey 0 comments
Speculative Decoding - philkrav https://philkrav.com/posts/speculative/ 0 comments

Related searches:

Search whole site: site:arxiv.org

Search title: [2211.17192] Fast Inference from Transformers via Speculative Decoding

See how to search.

Submit link to: