QDQRoberta
This module add quantization support to all Roberta architecture based models.
qdq_create_position_tensorrt(input_ids, padding_idx, past_key_values_length=0)
#
Override qdq_create_position_tensorrt function. It appeared that cumsum operator in TensorRT doesn't support integer type. see https://github.com/onnx/onnx-tensorrt/blob/master/docs/operators.md This override uses float instead.
Source code in src/transformer_deploy/QDQModels/QDQRoberta.py
def qdq_create_position_tensorrt(input_ids, padding_idx, past_key_values_length=0):
"""
Override qdq_create_position_tensorrt function.
It appeared that cumsum operator in TensorRT doesn't support integer type.
see https://github.com/onnx/onnx-tensorrt/blob/master/docs/operators.md
This override uses float instead.
"""
# QDQ change below
# The series of casts and type-conversions here are carefully balanced to both work with ONNX export and XLA.
# int() -> float() because of a limitations in cumsum operator implementation in TensorRT
mask = input_ids.ne(padding_idx).float()
incremental_indices = (torch.cumsum(mask, dim=1).type_as(mask) + past_key_values_length) * mask
return incremental_indices.long() + padding_idx