- [P] What we learned by accelerating by 5X Hugging Face generative language models https://github.com/ELS-RD/transformer-deploy/ 17 comments machinelearning
- [P] 4.5 times faster Hugging Face transformer inference by modifying some Python AST https://github.com/ELS-RD/transformer-deploy 33 comments machinelearning
- [P] Python library to optimize Hugging Face transformer for inference: < 0.5 ms latency / 2850 infer/sec https://github.com/ELS-RD/transformer-deploy 19 comments machinelearning
Linking pages
- GitHub - VoltaML/voltaML: ⚡VoltaML is a lightweight library to convert and run your ML/DL deep learning models in high performance inference runtimes like TensorRT, TorchScript, ONNX and TVM. https://github.com/VoltaML/voltaML 14 comments
- GitHub - EthicalML/awesome-production-machine-learning: A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning https://github.com/EthicalML/awesome-production-machine-learning 0 comments
Linked pages
- Python Release Python 3.8.0 | Python.org https://www.python.org/downloads/release/python-380/ 361 comments
- FastAPI https://fastapi.tiangolo.com/ 243 comments
- PyTorch http://pytorch.org/ 100 comments
- GitHub - NVIDIA/nvidia-docker: Build and run Docker containers leveraging NVIDIA GPUs https://github.com/NVIDIA/nvidia-docker 55 comments
- GitHub - microsoft/onnxruntime: ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator https://github.com/Microsoft/onnxruntime 1 comment
- GitHub - triton-inference-server/server: The Triton Inference Server provides an optimized cloud and edge inferencing solution. https://github.com/triton-inference-server/server 1 comment
- GitHub - NVIDIA/TensorRT: NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications. https://github.com/NVIDIA/TensorRT 0 comments