Hacker News
- Accelerating Generative AI with PyTorch II: GPT, Fast https://pytorch.org/blog/accelerating-generative-ai-2/ 69 comments
Linking pages
- We Are Running Out of Low-Background Tokens (Nov 2023 Recap) https://www.latent.space/i/139368545/the-concept-of-low-background-tokens 6 comments
- Gemlite: Towards Building Custom Low-Bit Fused CUDA Kernels https://mobiusml.github.io/gemlite_blogpost/ 2 comments
- GitHub - pytorch-labs/gpt-fast: Simple and efficient pytorch-native transformer text generation in <1000 LOC of python. https://github.com/pytorch-labs/gpt-fast 1 comment
- Accelerating Generative AI Part III: Diffusion, Fast | PyTorch https://pytorch.org/blog/accelerating-generative-ai-3/ 0 comments
- Faster and Smaller Whisper: A Deep Dive into Quantization and Torch Compilation https://mobiusml.github.io/whisper-static-cache-blog/ 0 comments
- Introducing torchchat: Accelerating Local LLM Inference on Laptop, Desktop and Mobile | PyTorch https://pytorch.org/blog/torchchat-local-llm-inference/ 0 comments
- Large language model inference optimizations on AMD GPUs — ROCm Blogs https://rocm.blogs.amd.com/artificial-intelligence/llm-inference-optimize/README.html 0 comments
- aie-book/resources.md at main · chiphuyen/aie-book · GitHub https://github.com/chiphuyen/aie-book/blob/main/resources.md 0 comments
Linked pages
- GitHub - ggerganov/llama.cpp: Port of Facebook's LLaMA model in C/C++ https://github.com/ggerganov/llama.cpp 286 comments
- GitHub - mlc-ai/mlc-llm: Enable everyone to develop, optimize and deploy AI models natively on everyone's devices. https://github.com/mlc-ai/mlc-llm 228 comments
- [2211.17192] Fast Inference from Transformers via Speculative Decoding https://arxiv.org/abs/2211.17192 2 comments
- Experience the power of PyTorch 2.0 on AMD Solutions | PyTorch https://pytorch.org/blog/experience-power-pytorch-2.0/ 1 comment
- GitHub - pytorch-labs/gpt-fast: Simple and efficient pytorch-native transformer text generation in <1000 LOC of python. https://github.com/pytorch-labs/gpt-fast 1 comment
- [2210.17323] GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers https://arxiv.org/abs/2210.17323 0 comments
- GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs https://github.com/vllm-project/vllm 0 comments
- Accelerating Generative AI with PyTorch: Segment Anything, Fast | PyTorch https://pytorch.org/blog/accelerating-generative-ai/ 0 comments
Related searches:
Search whole site: site:pytorch.org
Search title: Accelerating Generative AI with PyTorch II: GPT, Fast | PyTorch
See how to search.