Linking pages
- GitHub - punica-ai/punica: Serving multiple LoRA finetuned LLM as one https://github.com/punica-ai/punica 26 comments
- GPT-4 Architecture, Infrastructure, Training Dataset, Costs, Vision, MoE https://www.semianalysis.com/p/gpt-4-architecture-infrastructure 10 comments
- Benchmarking NVIDIA TensorRT-LLM - Jan https://jan.ai/post/benchmarking-nvidia-tensorrt-llm 9 comments
- Transformer Inference Arithmetic | kipply's blog https://kipp.ly/blog/transformer-inference-arithmetic/ 4 comments
- How to convert the SalesForce CodeGen models to GPT-J · GitHub https://gist.github.com/moyix/7896575befbe1b99162ccfec8d135566 3 comments
- Transformer Inference Arithmetic | kipply's blog https://carolchen.me/blog/transformer-inference-arithmetic/ 2 comments
- GitHub - facebookresearch/metaseq: Repo for external large-scale work https://github.com/facebookresearch/metaseq 2 comments
- GLM-130B: An Open Bilingual Pre-Trained Model | GLM-130B https://keg.cs.tsinghua.edu.cn/glm-130b/posts/glm-130b/ 2 comments
- Replit - Ghostwriter AI & Complete Code Beta https://blog.replit.com/ai 1 comment
- GitHub - THUDM/GLM-130B: GLM-130B: An Open Bilingual Pre-Trained Model https://github.com/THUDM/GLM-130B 1 comment
- Aman's AI Journal • Primers • Overview of Large Language Models https://aman.ai/primers/ai/LLM/ 1 comment
- Replit - Productizing Large Language Models https://blog.replit.com/llms 0 comments
- New Z-code Mixture of Experts models improve quality, efficiency in Translator and Azure AI - Source https://blogs.microsoft.com/ai/new-z-code-mixture-of-experts-models-improve-quality-efficiency-in-translator-and-azure-ai/ 0 comments
- GitHub - huggingface/awesome-huggingface: 🤗 A list of wonderful open-source projects & applications integrated with Hugging Face libraries. https://github.com/huggingface/awesome-huggingface 0 comments
- MPT-7B and The Beginning of Context=Infinity — with Jonathan Frankle and Abhinav Venigalla of MosaicML https://www.latent.space/p/mosaic-mpt-7b 0 comments
- Potentials of Multitenancy Fine-Tuned LLM Serving https://le.qun.ch/en/blog/2023/09/11/multi-lora-potentials/ 0 comments
- GitHub - mit-han-lab/smoothquant: [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models https://github.com/mit-han-lab/smoothquant 0 comments
- Mixed-input matrix multiplication performance optimizations – Google Research Blog https://blog.research.google/2024/01/mixed-input-matrix-multiplication.html 0 comments
- Transformer Inference Arithmetic | kipply's blog https://kipp.ly/transformer-inference-arithmetic/ 0 comments
Related searches:
Search whole site: site:github.com
Search title: GitHub - NVIDIA/FasterTransformer: Transformer related optimization, including BERT, GPT
See how to search.