Skip to content
transformer-deploy by Lefebvre Dalloz
QDQBert
Initializing search
ELS-RD/transformer-deploy/
transformer-deploy by Lefebvre Dalloz
ELS-RD/transformer-deploy/
Getting started
Installation (local or Docker only)
Run (1 command)
Which tool to choose for your inference?
How ONNX conversion works?
Understanding model optimization
Direct use TensorRT in Python script (no server)
GPU quantization for X2 speed-up
GPU quantization for X2 speed-up
Why using quantization?
Quantization theory
How is it implemented in this library?
PTQ and QAT, what are they?
End to end demo
From optimization to deployment: end to end demo
Accelerate text generation with GPT-2
Accelerate text generation with T5
Benchmarks run on AWS GPU instances
FAQ
API
API
Convert
QDQModels
QDQModels
QDQAlbert
QDQBert
QDQBert
Table of contents
src.transformer_deploy.QDQModels.QDQBert
QDQDeberta
QDQDistilbert
QDQElectra
QDQRoberta
Ast operator patch
Ast utils
Calibration utils
Patch
Backends
Backends
Ort utils
Pytorch utils
St utils
Trt utils
Benchmarks
Benchmarks
Utils
Triton
Triton
Configuration
Configuration decoder
Configuration encoder
Configuration token classifier
Utils
Utils
Args
Generative model
Token classifier
Table of contents
src.transformer_deploy.QDQModels.QDQBert
QDQBert
This module add quantization support to all Bert architecture based models.