Dissecting Batching Effects in GPT Inference - discu.eu

Linking pages

GitHub - punica-ai/punica: Serving multiple LoRA finetuned LLM as one https://github.com/punica-ai/punica 26 comments
Domain specific architectures for AI inference https://fleetwood.dev/posts/domain-specific-architectures 0 comments

Linked pages

Related searches:

Search whole site: site:le.qun.ch

Search title: Dissecting Batching Effects in GPT Inference

See how to search.

Submit link to: