Config conv input and weight dtypes.
Parameters:
input_dtype (QuantDType) – Conv input dtype. Defaults to qint8. weight_dtype (QuantDType) – Conv weight dtype. Defaults to qint8. prefix (Optional[Sequence[str]]) – If given, only modify ops under these prefixes. Defaults to None. class horizon_plugin_pytorch.quantization.qconfig_setter.templates.LoadFromFileTemplate (load_dir: str, prefix=None, only_set_mod_in_graph=True, overwrite: bool = True, freeze: bool = False)
Load model qcoonfig dtypes from pt file.
Parameters:
load_dir (str) – Path to pt file, can accept url. prefix – Only apply to modules under given prefixes. Defaults to None. overwrite (bool) – Whether clear existed qtypes before set. freeze ( bool) class horizon_plugin_pytorch.quantization.qconfig_setter.templates.MatmulDtypeTemplate (input_dtypes: QuantDType | Tuple[QuantDType, QuantDType] = 'qint8', prefix: Sequence[str] | None = None)
Config matmul input dtypes.
Parameters:
input_dtypes (Union[QuantDType, Tuple[QuantDType, QuantDType]]) – Conv input dtypes. Defaults to qint8. prefix (Optional[Sequence[str]]) – If given, only modify ops under these prefixes. Defaults to None. class horizon_plugin_pytorch.quantization.qconfig_setter.templates.ModuleNameTemplate (config_mapping: Dict[str, QuantDType | Dict[str, Any]], enable_propagate: bool = True, only_set_mod_in_graph: bool = True, overwrite: bool = False, freeze: bool = False)
Set operation dtypes by name or prefix.
Parameters: config_mapping (Dict[str, Union[QuantDType, Dict[str, Any]]]) –
A mapping from prefix to dtype setting. : If the value is dtype, apply to the output. : Fox example, {“conv”: qint16}
If the value is dict, apply by k-v pair. : Fox example, {“conv”: {“output”: qint16, “weight”: qint8}}
User can also config the quant threshold. : If the value is float, apply to the output. : For example, {“conv”: {“threshold”: 1.0}}
If the value is dict, apply by k-v pair. : For example, {“conv”: {“threshold”: {“input”: 1.0, “output”: 1.0}}}
The quant scale will be computed by : scale = -(threshold/quant_min).
The dtype and threshold can be given at the same time. : For example, {“conv”: {“dtype”: “qint16”, “threshold”: 1.0}}
enable_propagate (bool) – Whether apply dtype setting to all submodules. Defaults to True.
only_set_mod_in_graph (bool) – Whether only apply dtype setting on mods in the computing graph. Defaults to True.
overwrite (bool) – Whether clear existed qtypes before set.
freeze (bool) – Whether freeze qtype setting after set.
class horizon_plugin_pytorch.quantization.qconfig_setter.templates.OpTypeTemplate (config_mapping: Dict[Any, QuantDType | Dict[str, QuantDType]], prefix=None, only_set_mod_in_graph: bool = True, overwrite: bool = False, freeze: bool = False)
Set operation dtypes by operation type.
Parameters: config_mapping (Dict[Any, Union[QuantDType, Dict[str, QuantDType]]]) –
A mapping from op type to dtype setting. Keys can be torch module class or string for torch func. If the value is dtype, apply to the output. If the value is dict, apply by k-v pair.
Fox example, {“conv”: {“output”: qint16, “weight”: qint8}}
prefix – If given, only modify ops under there prefixes. Defaults to None.
only_set_mod_in_graph (bool) – Whether only apply dtype setting on mods in the computing graph. Defaults to True.
overwrite (bool) – Whether clear existed qtypes before set.
freeze (bool) – Whether freeze qtype setting after set.
class horizon_plugin_pytorch.quantization.qconfig_setter.templates.SensitivityTemplate (sensitive_table: List[Tuple[str, str]], topk_or_ratio: int | float, sensitive_type='both', low_precision_dtype='qint8', high_precision_dtype='qint16', spread=True)
Promote dtype precision by sensitivity table.
Parameters:
sensitive_table (List[Tuple[str, str]]) – Table in sensitivity high to low order and each item contains module name and sensitivity type. Or file path or url to the saved sensitive table file. topk_or_ratio (Union[int, float]) – Set how many item to high precision in the sensitive table. sensitive_type – Choose from (‘weight’, ‘activation’, ‘both’). “activation” include input and output. Defaults to “both”. low_precision_dtype – Only modify if the origin dtype is this dtype. Defaults to qint8. high_precision_dtype – High precision dtype. Defaults to qint16. spread – Whether spread high precision dtype to adjacent operation. Defaults to True.