Skip to content

QDQRoberta

This module add quantization support to all Roberta architecture based models.

qdq_create_position_tensorrt(input_ids, padding_idx, past_key_values_length=0) #

Override qdq_create_position_tensorrt function. It appeared that cumsum operator in TensorRT doesn't support integer type. see https://github.com/onnx/onnx-tensorrt/blob/master/docs/operators.md This override uses float instead.

Source code in src/transformer_deploy/QDQModels/QDQRoberta.py
def qdq_create_position_tensorrt(input_ids, padding_idx, past_key_values_length=0):
    """
    Override qdq_create_position_tensorrt function.
    It appeared that cumsum operator in TensorRT doesn't support integer type.
    see https://github.com/onnx/onnx-tensorrt/blob/master/docs/operators.md
    This override uses float instead.
    """
    # QDQ change below
    # The series of casts and type-conversions here are carefully balanced to both work with ONNX export and XLA.
    # int() -> float() because of a limitations in cumsum operator implementation in TensorRT
    mask = input_ids.ne(padding_idx).float()
    incremental_indices = (torch.cumsum(mask, dim=1).type_as(mask) + past_key_values_length) * mask
    return incremental_indices.long() + padding_idx