Accuracy Tuning Practice
This chapter introduces the post-training quantization(PTQ) accuracy tuning pipeline using the precision problems encountered in actual use as an example. Please read the Model Accuracy Tuning chapter firstly to Understand relevant theoretical knowledge and tool usage.
Typical accuracy issues include:
- The all node type int16 quantization accuracy meets the requirements and Accuracy Debug Tool can provide relatively accurate sorting of sensitive nodes;
- The all node type int16 quantization accuracy meets the requirements, but setting a large number of sensitive nodes to higher precision cannot effectively improve quantization accuracy;
- The all node type int16 quantization accuracy does not meet the standard, under the premise of full BPU quantization of the model, we hope to further improve the quantization accuracy.
Sensitive Node Analysis
Accuracy Debug Tool provides an interface for calculating node quantization sensitivity. It can calculate the impact of each operator's quantization on the output results, set nodes with high quantization loss to higher precision, and complete accuracy tuning. The tuning pipeline is described using the HybridNets model as an example.
Using HMCT default INT8 quantization, percentile is selected as the calibration algorithm, the calibration accuracy does not meet the requirements (the accuracy of det and ll_seg decreases by more than 1%):
Model Float March Samples calibrated_model Cosine_Similarity
------------------------- ------- ------- --------- ------------------ -------------------
hybridnets-384-640_det 0.77222 nash-e 10000 0.75562(97.85%) 0.98012
hybridnets-384-640_da_seg 0.90467 nash-e 10000 0.89675(99.12%) 0.98012
hybridnets-384-640_ll_seg 0.85376 nash-e 10000 0.81813(95.83%) 0.98012
All Node Type INT16
First, set all_node_type to INT16 and select percentile for the calibration algorithm. At this point, the calibration accuracy meets the requirements, and we can use INT8+INT16 mixed precision to complete the tuning:
quant_config = {"model_config": {"all_node_type": "int16"}}
Model Float March Samples calibrated_model Cosine_Similarity
------------------------- ------- ------- --------- ------------------ -------------------
hybridnets-384-640_det 0.77222 nash-e 10000 0.76866(99.54%) 0.997147
hybridnets-384-640_da_seg 0.90467 nash-e 10000 0.90405(99.93%) 0.997147
hybridnets-384-640_ll_seg 0.85376 nash-e 10000 0.84732(99.25%) 0.997147
Mixed Precision Debugging
Compile the INT8 calibration model based on the percentile calibration algorithm selected for INT16 model, configure debug_mode: "dump_calibration_data" in the yaml file to save the calibration data, and output the node quantization sensitivity through get_sensitivity_of_nodes:
hmct-debugger get-sensitivity-of-nodes hybridnets-384-640_calibrated_model.onnx calibration_data/ -n node -v True -s ./debug_result
===========================node sensitivity============================
node cosine-similarity
-----------------------------------------------------------------------
/encoder/_blocks.0/_depthwise_conv/Conv 0.98768
/encoder/_swish/Mul 0.99526
/encoder/_blocks.2/_depthwise_conv/Conv 0.99852
/encoder/_blocks.0/Mul 0.99887
/encoder/_blocks.0/GlobalAveragePool 0.99889
/encoder/_blocks.2/_swish/Mul 0.99957
/bifpn/bifpn.5/conv3_up/depthwise_conv/conv/Conv 0.99964
/encoder/_blocks.0/_swish/Mul 0.99969
/encoder/_blocks.2/Mul 0.99979
/encoder/_blocks.2/GlobalAveragePool 0.9998
/bifpn/bifpn.2/conv3_up/pointwise_conv/conv/Conv 0.99983
/encoder/_blocks.2/_swish_1/Mul 0.99984
/bifpn/bifpn.5/p4_downsample/Pad 0.99985
...er/seg_blocks.4/block/block.0/block/block.0/Pad 0.99985
/encoder/_blocks.0/_project_conv/Conv 0.99986
/encoder/_blocks.5/_depthwise_conv/Conv 0.99988
/bifpn/bifpn.3/conv3_up/depthwise_conv/conv/Conv 0.99989
/classifier/conv_list.0/depthwise_conv/conv/Conv 0.99989
..._blocks.4/block/block.0/block/block.0/conv/Conv 0.99989
/regressor/conv_list.0/depthwise_conv/conv/Conv 0.99989
/bifpn/bifpn.2/conv3_up/depthwise_conv/conv/Conv 0.99992
/encoder/_blocks.17/_se_expand/Conv 0.99992
/encoder/_blocks.13/Mul 0.99992
/encoder/_blocks.1/_depthwise_conv/Conv 0.99992
/encoder/_blocks.13/GlobalAveragePool 0.99992
/encoder/_blocks.14/_se_expand/Conv 0.99992
/classifier/header/pointwise_conv/conv/Conv 0.99992
/encoder/_blocks.1/Add 0.99993
/encoder/_blocks.3/Mul 0.99993
/encoder/_blocks.3/GlobalAveragePool 0.99993
/encoder/_blocks.1/_swish/Mul 0.99993
/encoder/_blocks.15/Mul 0.99993
/encoder/_blocks.15/GlobalAveragePool 0.99993
/bifpn/bifpn.4/conv3_up/depthwise_conv/conv/Conv 0.99993
/bifpn/bifpn.1/conv3_up/pointwise_conv/conv/Conv 0.99993
/bifpn/bifpn.4/conv3_up/pointwise_conv/conv/Conv 0.99994
/encoder/_blocks.8/_project_conv/Conv 0.99994
/bifpn/bifpn.5/swish_3/Mul 0.99994
/bifpn/bifpn.3/conv3_up/pointwise_conv/conv/Conv 0.99994
/encoder/_blocks.8/GlobalAveragePool 0.99994
/encoder/_conv_stem/Conv 0.99994
/encoder/_blocks.13/_project_conv/Conv 0.99994
/encoder/_blocks.8/Mul 0.99994
/bifpn/bifpn.5/conv3_up/pointwise_conv/conv/Conv 0.99995
...
Sorting by cosine similarity from front to back, gradually set the operator INT16 quantization, and the calibration model accuracy will increase until it meets the requirements:
| Serial Number | Cosine Similarity Value(<=value will be set to INT16) | Accuracy |
|---|
| det | da_seg | ll_seg |
|---|
| 1 | None | 0.75562(97.85%) | 0.89675(99.12%) | 0.81813(95.83%) |
| 2 | 0.999 | 0.76531(99.11%) | 0.90274(99.79%) | 0.83874(98.24%) |
| 3 | 0.9998 | 0.76545(99.12%) | 0.90340(99.86%) | 0.83961(98.34%) |
| 4 | 0.9999 | 0.76613(99.21%) | 0.90420(99.95%) | 0.84216(98.64%) |
| 5 | 0.99992 | 0.76712(99.34%) | 0.90356(99.88%) | 0.84397(98.85%) |
| 6 | 0.99993 | 0.76781(99.43%) | 0.90374(99.90%) | 0.84484(98.95%) |
| 7 | 0.99994 | 0.76811(99.47%) | 0.90344(99.86%) | 0.84528(99.01%) |
From the above test table, we can see that if the sensitive nodes with a sensitivity value less than or equal to 0.99994 are set as INT16 nodes, the calibration accuracy meets the requirements:
quant_config = {
"model_config": {
"activation": {"calibration_type": "max", "max_percentile": 0.99995},
},
"node_config": {
"/encoder/_blocks.0/_depthwise_conv/Conv": {"qtype": "int16"},
"/encoder/_swish/Mul": {"qtype": "int16"},
"/encoder/_blocks.2/_depthwise_conv/Conv": {"qtype": "int16"},
"/encoder/_blocks.0/Mul": {"qtype": "int16"},
"/encoder/_blocks.0/GlobalAveragePool": {"qtype": "int16"},
"/encoder/_blocks.2/_swish/Mul": {"qtype": "int16"},
"/bifpn/bifpn.5/conv3_up/depthwise_conv/conv/Conv": {"qtype": "int16"},
"/encoder/_blocks.0/_swish/Mul": {"qtype": "int16"},
"/encoder/_blocks.2/Mul": {"qtype": "int16"},
"/encoder/_blocks.2/GlobalAveragePool": {"qtype": "int16"},
"/bifpn/bifpn.2/conv3_up/pointwise_conv/conv/Conv": {"qtype": "int16"},
"/encoder/_blocks.2/_swish_1/Mul": {"qtype": "int16"},
"/bifpn/bifpn.5/p4_downsample/Pad": {"qtype": "int16"},
"/bifpndecoder/seg_blocks.4/block/block.0/block/block.0/Pad": {"qtype": "int16"},
"/encoder/_blocks.0/_project_conv/Conv": {"qtype": "int16"},
"/encoder/_blocks.5/_depthwise_conv/Conv": {"qtype": "int16"},
"/bifpn/bifpn.3/conv3_up/depthwise_conv/conv/Conv": {"qtype": "int16"},
"/classifier/conv_list.0/depthwise_conv/conv/Conv": {"qtype": "int16"},
"/bifpndecoder/seg_blocks.4/block/block.0/block/block.0/conv/Conv": {"qtype": "int16"},
"/regressor/conv_list.0/depthwise_conv/conv/Conv": {"qtype": "int16"},
"/bifpn/bifpn.2/conv3_up/depthwise_conv/conv/Conv": {"qtype": "int16"},
"/encoder/_blocks.17/_se_expand/Conv": {"qtype": "int16"},
"/encoder/_blocks.13/Mul": {"qtype": "int16"},
"/encoder/_blocks.1/_depthwise_conv/Conv": {"qtype": "int16"},
"/encoder/_blocks.13/GlobalAveragePool": {"qtype": "int16"},
"/encoder/_blocks.14/_se_expand/Conv": {"qtype": "int16"},
"/classifier/header/pointwise_conv/conv/Conv": {"qtype": "int16"},
"/encoder/_blocks.1/Add": {"qtype": "int16"},
"/encoder/_blocks.3/Mul": {"qtype": "int16"},
"/encoder/_blocks.3/GlobalAveragePool": {"qtype": "int16"},
"/encoder/_blocks.1/_swish/Mul": {"qtype": "int16"},
"/encoder/_blocks.15/Mul": {"qtype": "int16"},
"/encoder/_blocks.15/GlobalAveragePool": {"qtype": "int16"},
"/bifpn/bifpn.4/conv3_up/depthwise_conv/conv/Conv": {"qtype": "int16"},
"/bifpn/bifpn.1/conv3_up/pointwise_conv/conv/Conv": {"qtype": "int16"},
"/bifpn/bifpn.4/conv3_up/pointwise_conv/conv/Conv": {"qtype": "int16"},
"/encoder/_blocks.8/_project_conv/Conv": {"qtype": "int16"},
"/bifpn/bifpn.5/swish_3/Mul": {"qtype": "int16"},
"/bifpn/bifpn.3/conv3_up/pointwise_conv/conv/Conv": {"qtype": "int16"},
"/encoder/_blocks.8/GlobalAveragePool": {"qtype": "int16"},
"/encoder/_conv_stem/Conv": {"qtype": "int16"},
"/encoder/_blocks.13/_project_conv/Conv": {"qtype": "int16"},
"/encoder/_blocks.8/Mul": {"qtype": "int16"},
},
}
Model Float March Samples calibrated_model Cosine_Similarity
------------------------- ------- ------- --------- -------------------- -----------------
hybridnets-384-640_det 0.77222 nash-e 10000 0.76811(99.47%) 0.994576
hybridnets-384-640_da_seg 0.90467 nash-e 10000 0.90344(99.86%) 0.994576
hybridnets-384-640_ll_seg 0.85376 nash-e 10000 0.84528(99.01%) 0.994576
A complete accuracy tuning deployment example is available at: HybriNets Accuracy Tuning Deployment Example.
Sensitive Node Analysis Failure
If using the Accuracy Debug Tool to set sensitive nodes to higher precision fails to effectively improve model accuracy, we can first try specifying output nodes to filter out irrelevant nodes. Additionally, observe the model output error and select other metric to improve the correlation between sensitivity ranking and precision. Furthermore, by analyzing the model structure, typical substructures with a higher risk of quantization loss (such as model outputs, inputs, and structures with specific physical meaning) can be set to higher precision to complete accuracy tuning. This tuning pipeline is described using the YoloP model as an example.
Using HMCT default INT8 quantization, percentile is selected as the calibration algorithm, the calibration accuracy does not meet the requirements (the accuracy of det decreases by more than 1%):
Model Float March Samples calibrated_model Cosine_Similarity
-------------------- ------- ------- --------- ------------------ -------------------
yolop-384-640_det 0.76448 nash-e 10000 0.61507(80.46%) 0.999891
yolop-384-640_da_seg 0.89008 nash-e 10000 0.88863(99.84%) 0.999891
yolop-384-640_ll_seg 0.6523 nash-e 10000 0.65357(100.19%) 0.999891
All Node Type INT16
First, set all_node_type to INT16 and select percentile for the calibration algorithm. At this point, the calibration accuracy meets the requirements, and we can use INT8+INT16 mixed precision to complete the tuning:
quant_config = {"model_config": {"all_node_type": "int16"}}
Model Float March Samples calibrated_model Cosine_Similarity
-------------------- ------- ------- --------- ------------------ -----------------
yolop-384-640_det 0.76448 nash-e 10000 0.75890(99.27%) 0.99999
yolop-384-640_da_seg 0.89008 nash-e 10000 0.88950(99.93%) 0.99999
yolop-384-640_ll_seg 0.6523 nash-e 10000 0.64821(99.37%) 0.99999
Mixed Precision Debugging
Compile the INT8 calibration model based on the percentile calibration algorithm selected for INT16 model, configure debug_mode: "dump_calibration_data" in the yaml file to save the calibration data, and output the node quantization sensitivity through get_sensitivity_of_nodes:
hmct-debugger get-sensitivity-of-nodes yolop-384-640_calibrated_model.onnx calibration_data/ -n node -v True -s ./debug_result
=======================node sensitivity========================
node cosine-similarity
---------------------------------------------------------------
Mul_943 0.99736
Mul_647 0.99894
Mul_795 0.99909
Conv_50 0.99976
Div_49 0.99983
Conv_92 0.99989
Div_58 0.9999
Conv_1119 0.99995
Conv_88 0.99996
Conv_41 0.99996
Conv_59 0.99998
Slice_4 0.99998
Slice_9 0.99998
Slice_14 0.99998
Slice_19 0.99998
Slice_24 0.99998
Slice_29 0.99998
Slice_34 0.99998
Slice_39 0.99998
Concat_40 0.99998
Div_67 0.99998
MaxPool_297 0.99999
MaxPool_298 0.99999
MaxPool_299 0.99999
Concat_300 0.99999
Concat_1003 0.99999
Conv_177 0.99999
Div_296 0.99999
ScatterND_705 0.99999
Slice_645 0.99999
Reshape_706 0.99999
Conv_110 0.99999
Conv_87 0.99999
Concat_89 0.99999
LeakyRelu_91 0.99999
Mul_584 0.99999
ScatterND_640 0.99999
Concat_1105 0.99999
LeakyRelu_1107 0.99999
Conv_1004 0.99999
Add_582 0.99999
Conv_119 0.99999
Resize_1014 0.99999
Conv_1015 0.99999
Conv_1043 0.99999
Conv_266 0.99999
Conv_199 0.99999
Div_100 0.99999
...
Sorting by cosine similarity from front to back, gradually set the operator INT16 quantization. However, even with a large number of sensitive nodes set to INT16, the accuracy still failed to meet the requirements:
| Serial Number | Cosine Similarity Value(<=value will be set to INT16) | Accuracy |
|---|
| det | da_seg | ll_seg |
|---|
| 1 | None | 0.61507(80.46%) | 0.88863(99.84%) | 0.65357(100.19%) |
| 2 | 0.9999 | 0.60956(79.74%) | 0.88911(99.89%) | 0.65925(101.07%) |
| 3 | 0.99996 | 0.60978(79.76%) | 0.88933(99.92%) | 0.66112(101.35%) |
| 4 | 0.99998 | 0.60956(79.73%) | 0.88931(99.91%) | 0.66125(101.37%) |
| 5 | 0.99999 | 0.66426(86.89%) | 0.88958(99.94%) | 0.66065(101.28%) |
Observing the accuracy results of the INT8 calibrated model, only det branch does not reach 99%. When calculating nodes sensitivity through get_sensitivity_of_nodes interface, we can use the -o option to specify det output, so that a better sensitivity ranking can be obtained:
hmct-debugger get-sensitivity-of-nodes yolop-384-640_calibrated_model.onnx calibration_data/ -n node -o Concat_1003 -v True -s det_debug_result/
=======================node sensitivity========================
node cosine-similarity
---------------------------------------------------------------
Mul_943 0.99736
Mul_647 0.99894
Mul_795 0.99909
Conv_50 0.99997
Div_58 0.99997
Div_49 0.99999
Concat_1003 0.99999
ScatterND_705 0.99999
Slice_645 0.99999
Reshape_706 0.99999
Mul_584 0.99999
ScatterND_640 0.99999
Add_582 0.99999
Conv_41 0.99999
Slice_4 0.99999
Slice_9 0.99999
Slice_14 0.99999
Slice_19 0.99999
Slice_24 0.99999
Slice_29 0.99999
Slice_34 0.99999
Slice_39 0.99999
Concat_40 0.99999
Conv_88 0.99999
Conv_59 0.99999
...
Sorting by cosine similarity from front to back, setting the operator to INT16 quantization, and focusing only on the det output can filter out useless nodes, but the final accuracy still does not meet the requirements:
| Serial Number | Cosine Similarity Value(<=value will be set to INT16) | Accuracy |
|---|
| det | da_seg | ll_seg |
|---|
| 1 | None | 0.61507(80.46%) | 0.88863(99.84%) | 0.65357(100.19%) |
| 2 | 0.9999 | 0.60868(79.62%) | 0.88836(99.81%) | 0.65300(100.11%) |
| 3 | 0.99997 | 0.60961(79.74%) | 0.88902(99.88%) | 0.65664(100.66%) |
| 4 | 0.99999 | 0.66461(86.94%) | 0.88932(99.91%) | 0.65876(100.99%) |
Observe the output similarity of the INT8 calibrated model. The L1 and L2 distances of the det branch deviate greatly from the floating point. Try replacing cosine similarity with other metrics:
hmct-info yolop-384-640_calibrated_model.onnx -c ./calibration_data/images/00.npy
INFO:root:The quantized model output:
=================================================================================
Output Cosine Similarity L1 Distance L2 Distance Chebyshev Distance
---------------------------------------------------------------------------------
det_out 0.995352 7.633481 289.637665 552.817566
drive_area_seg 0.998973 0.004005 0.001132 0.592610
lane_line_seg 0.999933 0.000417 0.000069 0.564768
When calculating nodes sensitivity through the get_sensitivity_of_nodes interface, we can specify mse as the metric to improve the discrimination between different nodes:
hmct-debugger get-sensitivity-of-nodes yolop-384-640_calibrated_model.onnx calibration_data/ -n node -o Concat_1003 -m mse -v True -s det_mse_debug_result/
===================node sensitivity====================
node mse
-------------------------------------------------------
Mul_943 164.82712
Mul_647 65.88637
Mul_795 56.86866
Conv_50 2.04226
Div_58 1.88065
Concat_1003 0.87797
Div_49 0.84962
ScatterND_705 0.67858
Slice_645 0.67379
Reshape_706 0.67379
ScatterND_640 0.55187
Mul_584 0.54884
Add_582 0.52263
Conv_41 0.4714
Slice_4 0.38413
Slice_9 0.38413
Slice_14 0.38413
Slice_19 0.38413
Slice_24 0.38413
Slice_29 0.38413
Slice_34 0.38413
Slice_39 0.38413
Concat_40 0.38413
Conv_88 0.35534
Conv_59 0.33164
Conv_92 0.30398
Div_67 0.16826
ScatterND_853 0.16711
Slice_793 0.1627
Reshape_854 0.1627
ScatterND_788 0.13494
Mul_732 0.132
Add_730 0.12381
ScatterND_1001 0.07478
Conv_550 0.07226
Conv_518 0.06468
Conv_546 0.06344
Conv_310 0.05974
Conv_68 0.05735
Conv_338 0.05684
Add_86 0.04319
ScatterND_936 0.0428
Concat_89 0.04201
LeakyRelu_91 0.04201
Concat_517 0.04193
Conv_87 0.04148
Slice_941 0.04148
Reshape_1002 0.04148
Conv_448 0.04034
MaxPool_297 0.03739
MaxPool_298 0.03739
MaxPool_299 0.03739
Concat_300 0.03739
Mul_880 0.03334
Div_100 0.03328
Conv_855 0.03146
Concat_547 0.03094
LeakyRelu_549 0.03094
Add_878 0.03016
Div_558 0.02966
...
Sort by mse similarity from front to back, set the operator to higher precision, and finally achieve the required accuracy after adding a large number of INT16 nodes:
| Serial Number | MSE Value(>=value will be set to INT16) | Accuracy |
|---|
| det | da_seg | ll_seg |
|---|
| 1 | None | 0.61507(80.46%) | 0.88863(99.84%) | 0.65357(100.19%) |
| 2 | 0.5 | 0.66471(86.95%) | 0.88902(99.88%) | 0.65664(100.66%) |
| 3 | 0.2 | 0.66447(86.92%) | 0.88933(99.92%) | 0.66112(101.35%) |
| 4 | 0.1 | 0.72969(95.45%) | 0.88931(99.91%) | 0.66125(101.37%) |
| 5 | 0.05 | 0.73393(96.00%) | 0.88934(99.92%) | 0.66121(101.37%) |
| 6 | 0.04 | 0.73321(95.91%) | 0.88931(99.91%) | 0.66125(101.37%) |
| 7 | 0.03 | 0.75707(99.03%) | 0.88944(99.93%) | 0.66137(101.39%) |
Subgraph Structure Analysis
Even if we only focus on improving the det output based on mse metric, setting a large number of sensitive nodes for higher precision still cannot effectively improve the accuracy. Furthermore, considering that only the det task currently does not meet the requirements, it is inferred that quantization loss is less likely to occur in nodes in the da_seg and ll_seg branches, as well as in the shared backbone. Focus on the accuracy tuning of det branch, combined with model structure analysis, try to specify the det output position subgraph to use higher precision and test the calibration accuracy:
quant_config = {
"subgraph_config": {
"det_head": {
"inputs": ["Conv_559", "Conv_707", "Conv_855"],
"outputs": ["Concat_1003"],
"qtype": "int16",
},
}
}
Model Float March Samples calibrated_model Cosine_Similarity
-------------------- ------- ------- --------- ------------------ -------------------
yolop-384-640_det 0.76448 nash-e 10000 0.76275(99.77%) 0.99991
yolop-384-640_da_seg 0.89008 nash-e 10000 0.88863(99.84%) 0.99991
yolop-384-640_ll_seg 0.6523 nash-e 10000 0.65357(100.19%) 0.99991
Attention
When the sorting of sensitive nodes is inaccurate, the source of the loss can be preliminarily determined by configuring the subgraph for higher precision. If the latency of the subgraph with higher precision increases significantly, sensitivity analysis can be performed within the subgraph, so that fewer nodes with higher precision will be configured.
A complete accuracy tuning deployment example is available at: YoloP Accuracy Tuning Deployment Example.
Quantization Loss Compensation
Due to hardware constraints and latency, when set all_node_type to INT16, nodes with int8 precision still exist in model, including weights of Conv and ConvTranspose, Resize, GridSample and the second input of MatMul. PTQ introduces an identical operator to compensate for the quantization loss caused by int8 precision, further improving the model's accuracy with all nodes deployed on BPU, and complete accuracy tuning. The Lane model is used as an example to illustrate the tuning pipeline.
Using HMCT default INT8 quantization, max calibration algorithm with asymmetric and per-channel is selected, the accuracy does not meet the requirements(average cosine similarity of all outputs is less than 0.99):
+------------+-------------------+-----------+----------+----------+
| Output | Metric | Min | Max | Avg |
+------------+-------------------+-----------+----------+----------+
| mask | cosine-similarity | 0.575118 | 0.956694 | 0.875484 |
| field | cosine-similarity | 0.653135 | 0.948818 | 0.883109 |
| attr | cosine-similarity | 0.456388 | 0.986300 | 0.906577 |
| background | cosine-similarity | 0.878089 | 0.997221 | 0.979943 |
| cls | cosine-similarity | 0.444225 | 0.996364 | 0.958695 |
| box | cosine-similarity | 0.209923 | 0.993850 | 0.941041 |
| cls_sl | cosine-similarity | 0.353770 | 0.998059 | 0.956674 |
| box_sl | cosine-similarity | 0.196419 | 0.998049 | 0.948152 |
| occlusion | cosine-similarity | 0.786355 | 0.978297 | 0.939203 |
| cls_arrow | cosine-similarity | 0.079772 | 0.999830 | 0.940612 |
| box_arrow | cosine-similarity | -0.231095 | 0.998620 | 0.850558 |
+------------+-------------------+-----------+----------+----------+
All Node Type INT16
First, set all_node_type to INT16 and select max for the calibration algorithm. At this point, the calibration accuracy meets the requirements(average cosine similarity of occlusion and box_arrow outputs does not reach 0.99), further accuracy tuning is required:
quant_config = {"model_config": {"all_node_type": "int16"}}
+------------+-------------------+----------+----------+----------+
| Output | Metric | Min | Max | Avg |
+------------+-------------------+----------+----------+----------+
| mask | cosine-similarity | 0.926001 | 0.998617 | 0.992738 |
| field | cosine-similarity | 0.945007 | 0.999313 | 0.993549 |
| attr | cosine-similarity | 0.871824 | 0.999821 | 0.996161 |
| background | cosine-similarity | 0.981510 | 0.999835 | 0.998274 |
| cls | cosine-similarity | 0.918296 | 0.999851 | 0.997182 |
| box | cosine-similarity | 0.911032 | 0.999134 | 0.996155 |
| cls_sl | cosine-similarity | 0.933632 | 0.999918 | 0.997105 |
| box_sl | cosine-similarity | 0.850244 | 0.998877 | 0.996493 |
| occlusion | cosine-similarity | 0.943404 | 0.993528 | 0.983970 |
| cls_arrow | cosine-similarity | 0.560625 | 0.999993 | 0.994583 |
| box_arrow | cosine-similarity | 0.755858 | 0.999889 | 0.987496 |
+------------+-------------------+----------+----------+----------+
Upper Limit Accuracy of INT16
Since INT8 precision nodes still exist in model after setting all_node_type to INT16, we can modify the qtype of all calibration nodes to INT16 through IR interface provided by HMCT to obtain a true INT16 model:
from hmct.ir import load_model, save_model
model = load_model("lane_calibrated_model_int16.onnx")
calibration_nodes = model.graph.type2nodes["HzCalibration"]
for node in calibration_nodes:
node.qtype = "int16"
save_model(model, "lane_calibrated_model_real_int16.onnx")
Verify that the average similarity of the true INT16 calibrated model on all outputs, which can meet requirements, so that accuracy can be improved by compensating quantization error.
+------------+-------------------+----------+----------+----------+
| Output | Metric | Min | Max | Avg |
+------------+-------------------+----------+----------+----------+
| mask | cosine-similarity | 0.999819 | 0.999995 | 0.999984 |
| field | cosine-similarity | 0.999833 | 0.999997 | 0.999985 |
| attr | cosine-similarity | 0.999743 | 0.999999 | 0.999992 |
| background | cosine-similarity | 0.999977 | 0.999999 | 0.999996 |
| cls | cosine-similarity | 0.999852 | 0.999999 | 0.999994 |
| box | cosine-similarity | 0.999681 | 0.999999 | 0.999990 |
| cls_sl | cosine-similarity | 0.999722 | 1.000000 | 0.999993 |
| box_sl | cosine-similarity | 0.999529 | 1.000000 | 0.999992 |
| occlusion | cosine-similarity | 0.999844 | 0.999996 | 0.999978 |
| cls_arrow | cosine-similarity | 0.998314 | 1.000000 | 0.999985 |
| box_arrow | cosine-similarity | 0.999478 | 1.000000 | 0.999971 |
+------------+-------------------+----------+----------+----------+
Compensate Quantization Loss
The analysis process of compensation needs to be based on the calibrated model configured with all_node_type int16, nodes sensitivity is output through get_sensitivity_of_nodes:
hmct-debugger get-sensitivity-of-nodes lane_calibrated_model_int16.onnx calibration_data/ -m ['cosine-similarity','mre','mse','sqnr','chebyshev'] -n node -v True -s ./int16_debug_result
=========================================node sensitivity=========================================
node cosine-similarity mre mse sqnr chebyshev
--------------------------------------------------------------------------------------------------
Conv_360 0.98855 0.07025 0.61746 7.52018 4.78374
Conv_3 0.99895 0.2379 79.3955 12.55955 134.92183
Conv_338 0.99896 0.88695 0.00029 13.35992 1.54686
Conv_336 0.9994 0.04776 0.0002 14.62035 1.03647
...
Then modify the calibrated model according to the nodes sensitivity sorting, and increase the quantization precision from INT8 to INT16 until occlusion and box_arrow meet requirements:
from hmct.common import find_input_calibration, find_output_calibration
from hmct.ir import load_model, save_model
model = load_model("lane_calibrated_model_int16.onnx")
improved_nodes = ["Conv_360", "Conv_3", "Conv_338"]
for node in model.graph.nodes:
if node.name not in improved_nodes:
continue
if node.op_type in ["Conv", "ConvTranspose", "MatMul"]:
input1_calib = find_input_calibration(node, 1)
if input1_calib and input1_calib.tensor_type == "weight":
input1_calib.qtype = "int16"
if node.op_type == "Resize":
input_calib = find_input_calibration(node, 0)
if input_calib and input_calib.tensor_type == "feature":
input_calib.qtype = "int16"
interpolation_mode = node.attributes.get("mode", "nearest")
# In nearest mode, output quantization precision can be improved
# to be close to int16, in other modes, only the input quantization
# precision can be improved to be close to int16.
if interpolation_mode == "nearest":
output_calib = find_output_calibration(node)
if output_calib and output_calib.tensor_type == "feature":
output_calib.qtype = "int16"
if node.op_type == "GridSample":
input_calib = find_input_calibration(node, 0)
if input_calib and input_calib.tensor_type == "feature":
input_calib.qtype = "int16"
interpolation_mode = node.attributes.get("mode", "bilinear")
# In nearest mode, output quantization precision can be improved
# to be close to int16, in other modes, only the input quantization
# precision can be improved to be close to int16.
if interpolation_mode == "nearest":
output_calib = find_output_calibration(node)
if output_calib and output_calib.tensor_type == "feature":
output_calib.qtype = "int16"
save_model(model, "lane_calibrated_model_int16_improved.onnx")
| Serial Number | Cosine Similarity Value(<=value will be set to INT16) | Output Cosine |
|---|
| occlusion | box_arrow |
|---|
| Min | Avg | Min | Avg |
|---|
| 1 | None | 0.943404 | 0.983970 | 0.755858 | 0.987496 |
| 2 | 0.999 | 0.983739 | 0.997729 | 0.893116 | 0.994958 |
| 3 | 0.99 | 0.952758 | 0.994116 | 0.745781 | 0.987434 |
As shown in the table above, increasing the weight quantization precision of Conv_360, Conv_3, and Conv_338 from INT8 to INT16 allows all output similarities to meet requirements. To deploy all nodes on the BPU, a similar operator is introduced to compensate for the quantization loss caused by int8 precision, and the final precision will be improved to close to INT16:
quant_config = {
"model_config": {
"all_node_type": "int16",
"activation": {"calibration_type": "max"},
},
"node_config": {
# Weight quantization loss of Conv, ConvTranspose, MatMul
# can be compensated by configuring input1 to ec.
"Conv_360": {"input1": "ec"},
"Conv_3": {"input1": "ec"},
"Conv_338": {"input1": "ec"},
# Quantization loss of GridSample and Resize can be
# compensated by configuring input0 to ec.
# "GridSample_340": {"input0": "ec"},
}
}
+------------+-------------------+----------+----------+----------+
| Output | Metric | Min | Max | Avg |
+------------+-------------------+----------+----------+----------+
| mask | cosine-similarity | 0.983658 | 0.999305 | 0.997363 |
| field | cosine-similarity | 0.972155 | 0.999655 | 0.996506 |
| attr | cosine-similarity | 0.977372 | 0.999879 | 0.998771 |
| background | cosine-similarity | 0.994089 | 0.999934 | 0.999493 |
| cls | cosine-similarity | 0.984845 | 0.999909 | 0.999082 |
| box | cosine-similarity | 0.977550 | 0.999447 | 0.998403 |
| cls_sl | cosine-similarity | 0.980353 | 0.999958 | 0.998956 |
| box_sl | cosine-similarity | 0.979305 | 0.999973 | 0.999332 |
| occlusion | cosine-similarity | 0.982567 | 0.999608 | 0.997672 |
| cls_arrow | cosine-similarity | 0.904646 | 0.999996 | 0.998248 |
| box_arrow | cosine-similarity | 0.890971 | 0.999973 | 0.994999 |
+------------+-------------------+----------+----------+----------+
Attention
It is recommended that Resize and GridSample use the nearest sampling mode. In this case, the operator's output will not introduce any new values, and error can be compensated. Otherwise, INT8 quantization of the new value on output would introduce additional loss which cannot be compensated.
Mixed Precision Debugging
After compensating for the weight quantization losses of Conv_360, Conv_3, and Conv_338, the accuracy of calibrated model with all_node_type int16 configured can meet requirements. Then attempted to optimize the INT8 calibrated model with error compensated. The accuracy of the INT8 calibrated model is as follows:
quant_config = {
"model_config": {
"activation": {
"calibration_type": "max",
"per_channel": True,
"asymmetric": True,
},
},
"node_config": {
"Conv_360": {"input1": "ec"},
"Conv_3": {"input1": "ec"},
"Conv_338": {"input1": "ec"},
}
}
+------------+-------------------+-----------+----------+----------+
| Output | Metric | Min | Max | Avg |
+------------+-------------------+-----------+----------+----------+
| mask | cosine-similarity | 0.578707 | 0.950058 | 0.874048 |
| field | cosine-similarity | 0.687287 | 0.946366 | 0.875366 |
| attr | cosine-similarity | 0.471613 | 0.986946 | 0.908879 |
| background | cosine-similarity | 0.851624 | 0.996991 | 0.976282 |
| cls | cosine-similarity | 0.536348 | 0.996753 | 0.959749 |
| box | cosine-similarity | 0.094459 | 0.994883 | 0.939461 |
| cls_sl | cosine-similarity | 0.374808 | 0.998186 | 0.959271 |
| box_sl | cosine-similarity | 0.079629 | 0.998462 | 0.947069 |
| occlusion | cosine-similarity | 0.702038 | 0.986074 | 0.945837 |
| cls_arrow | cosine-similarity | 0.060614 | 0.999781 | 0.942194 |
| box_arrow | cosine-similarity | -0.301507 | 0.998179 | 0.829580 |
+------------+-------------------+-----------+----------+----------+
Output nodes sensitivity through get_sensitivity_of_nodes:
hmct-debugger get-sensitivity-of-nodes lane_calibrated_model.onnx calibration_data/ -m ['cosine-similarity','mre','mse','sqnr','chebyshev'] -n node -v True -s ./debug_result
===========================================node sensitivity===========================================
node cosine-similarity mre mse sqnr chebyshev
------------------------------------------------------------------------------------------------------
Conv_265 0.43427 12.56779 32.42019 0.37585 24.77806
Conv_278 0.84973 0.80948 21.66625 2.71994 16.2646
Conv_287 0.87352 2.34926 817.71234 0.42356 183.87369
Conv_237 0.96676 1.17564 12526.42871 3.45996 1538.01672
Conv_267 0.96678 2.02166 4.81972 4.51482 14.91702
UNIT_CONV_FOR_BatchNormalization_141 0.96682 1.17458 12521.30957 3.4652 1537.55347
Conv_276 0.97024 0.63814 10.79584 4.23258 13.53813
Conv_289 0.97159 0.61387 60.08849 6.0926 61.38558
Conv_336 0.97212 1.70509 0.00951 6.28411 1.07831
Conv_135 0.97478 0.93508 11459.125 3.94841 1468.99048
Add_140 0.97482 0.93514 11391.92676 3.95557 1464.76538
Conv_404 0.97672 2.96196 0.20074 6.37718 5.93086
Conv_3 0.97977 0.49545 27730.66992 2.96076 2199.58203
Conv_3_split_low 0.9798 0.49563 27710.51758 2.96429 2198.6333
Conv_129 0.97991 0.5559 14292.41602 4.12395 1598.18835
Add_134 0.97991 0.55644 14300.95312 4.12708 1598.60938
Conv_338 0.98833 6.00584 0.0032 8.16733 0.40305
Conv_338_split_low 0.98833 6.00581 0.0032 8.16729 0.4033
Conv_333 0.99496 5.13682 0.00184 10.63743 0.19728
Conv_107 0.99543 0.23465 2859.23389 7.1595 751.46326
Concat_106 0.99546 0.23441 2843.88184 7.17062 749.37109
Conv_339 0.99564 0.25485 0.17183 10.29787 8.05724
Conv_285 0.99576 0.21163 9.77346 10.03632 38.60489
Conv_335 0.99779 0.1337 0.44365 11.66653 2.3832
Conv_401 0.9978 2.66814 0.01664 11.78462 1.694
Conv_337 0.99871 0.1687 1.17969 12.91167 5.95476
Conv_250 0.99894 0.17839 0.70945 14.60334 11.99677
Conv_272 0.99917 0.06468 0.15378 13.46442 7.07149
Conv_384 0.99919 0.61947 4.9547 17.0275 71.32216
Conv_83 0.99921 0.11831 328.16125 11.67288 266.15958
Conv_8 0.99925 0.06326 409.16638 11.83647 295.1091
Add_249 0.99934 0.2801 0.5793 16.36763 12.01501
Conv_300 0.9995 0.01909 0.31657 14.86126 5.5387
Slice_299 0.99951 0.01896 0.30647 14.9317 5.27295
...
Sorting by cosine similarity from front to back, gradually setting the operator INT16 quantization, the calibrated model similarity will also increase:
| Serial Number | Cosine Similarity Value | Output Cosine |
|---|
| mask | field | attr | backgroud | cls | box | cls_sl | box_sl | occlusion | cls_arrow | box_arrow |
|---|
| 1 | None | 0.874048 | 0.875366 | 0.908879 | 0.976282 | 0.959749 | 0.939461 | 0.959271 | 0.947069 | 0.945837 | 0.942194 | 0.829580 |
| 2 | 0.99 | 0.980708 | 0.987483 | 0.989023 | 0.993368 | 0.991154 | 0.985205 | 0.990900 | 0.990375 | 0.985721 | 0.975350 | 0.963180 |
| 3 | 0.999 | 0.988858 | 0.990837 | 0.994669 | 0.995994 | 0.995570 | 0.994660 | 0.995466 | 0.996202 | 0.991201 | 0.979100 | 0.980218 |
| 4 | 0.9995 | 0.991085 | 0.991593 | 0.995369 | 0.997524 | 0.995818 | 0.995149 | 0.996001 | 0.996851 | 0.992633 | 0.981471 | 0.982875 |
Based on the above table, we optimized model by setting sensitive nodes with a sensitivity value less than or equal to 0.9995 to INT16. Except for cls_arrow and box_arrow, average similarity of all other outputs was no less than 0.99. Observing that cls_arrow and box_arrow share the same branch, we additionally tried configuring arrow output head subgraph to INT16, quantization configuration and output similarity are as follows:
{
"model_config": {
"activation": {
"calibration_type": "max",
"per_channel": True,
"asymmetric": True,
},
},
"node_config": {
"Conv_360": {"input1": "ec"},
"Conv_3": {"qtype": "int16", "input1": "ec"},
"Conv_338": {"qtype": "int16", "input1": "ec"},
# 0.99
"Conv_265": {"qtype": "int16"},
"Conv_278": {"qtype": "int16"},
"Conv_287": {"qtype": "int16"},
"Conv_237": {"qtype": "int16"},
"Conv_267": {"qtype": "int16"},
"UNIT_CONV_FOR_BatchNormalization_141": {"qtype": "int16"},
"Conv_276": {"qtype": "int16"},
"Conv_289": {"qtype": "int16"},
"Conv_336": {"qtype": "int16"},
"Conv_135": {"qtype": "int16"},
"Add_140": {"qtype": "int16"},
"Conv_404": {"qtype": "int16"},
"Conv_3_split_low": {"qtype": "int16"},
"Conv_129": {"qtype": "int16"},
"Add_134": {"qtype": "int16"},
"Conv_338_split_low": {"qtype": "int16"},
# 0.999
"Conv_333": {"qtype": "int16"},
"Conv_107": {"qtype": "int16"},
"Concat_106": {"qtype": "int16"},
"Conv_339": {"qtype": "int16"},
"Conv_285": {"qtype": "int16"},
"Conv_335": {"qtype": "int16"},
"Conv_401": {"qtype": "int16"},
"Conv_337": {"qtype": "int16"},
"Conv_250": {"qtype": "int16"},
# 0.9995
"Conv_272": {"qtype": "int16"},
"Conv_384": {"qtype": "int16"},
"Conv_83": {"qtype": "int16"},
"Conv_8": {"qtype": "int16"},
"Add_249": {"qtype": "int16"},
"Conv_300": {"qtype": "int16"},
},
"subgraph_config": {
"arrow_head": {
"inputs": ["Reshape_390"],
"outputs": ["Conv_403", "Conv_404"],
"qtype": "int16",
}
}
}
+------------+-------------------+----------+----------+----------+
| Output | Metric | Min | Max | Avg |
+------------+-------------------+----------+----------+----------+
| mask | cosine-similarity | 0.926363 | 0.997833 | 0.991085 |
| field | cosine-similarity | 0.915524 | 0.999179 | 0.991593 |
| attr | cosine-similarity | 0.869666 | 0.999608 | 0.995369 |
| background | cosine-similarity | 0.983465 | 0.999664 | 0.997524 |
| cls | cosine-similarity | 0.929948 | 0.999513 | 0.995818 |
| box | cosine-similarity | 0.890618 | 0.999021 | 0.995149 |
| cls_sl | cosine-similarity | 0.937240 | 0.999738 | 0.996001 |
| box_sl | cosine-similarity | 0.880057 | 0.999896 | 0.996851 |
| occlusion | cosine-similarity | 0.966050 | 0.998043 | 0.992633 |
| cls_arrow | cosine-similarity | 0.380447 | 0.999980 | 0.990650 |
| box_arrow | cosine-similarity | 0.556044 | 0.999856 | 0.983423 |
+------------+-------------------+----------+----------+----------+
Currently, only the average similarity of the box_arrow does not meet requirements. Specify box_arrow output to regain sensitivity sorting:
hmct-debugger get-sensitivity-of-nodes lane_calibrated_model_box.onnx calibration_data/ -m ['cosine-similarity','mre','mse','sqnr','chebyshev'] -n node -o Conv_404 -v True -s ./box_debug_result
========================================node sensitivity========================================
node cosine-similarity mre mse sqnr chebyshev
------------------------------------------------------------------------------------------------
Mul_116 0.9987 0.35957 0.03655 10.08165 18.38877
Conv_239 0.99871 0.38099 0.01399 12.16738 6.04099
UNIT_CONV_FOR_BatchNormalization_161 0.99871 0.3832 0.01404 12.15968 6.11046
Conv_10 0.99879 0.12717 0.04052 9.85789 19.9839
GridSample_340 0.99887 0.34926 0.02639 10.78928 15.52824
Conv_78 0.9989 0.33779 0.02165 11.21884 12.89857
Conv_163 0.9989 0.16412 0.03985 9.89421 20.19754
Relu_4 0.99902 0.39383 0.05239 9.29984 22.57018
Add_168 0.99902 0.16685 0.04009 9.88115 20.26648
Conv_8 0.99921 0.10613 0.04032 9.86872 19.64041
Conv_5 0.99932 0.37324 0.03799 9.99791 19.23952
Conv_57 0.9996 0.11046 0.01485 12.03829 11.99651
...
Sort by cosine similarity from front to back, and gradually set the operator INT16 quantization until box_arrow meets the requirements:
| Serial Number | Cosine Similarity Value | Output Cosine |
|---|
| mask | filed | attr | backgroud | cls | box | cls_sl | box_sl | occlusion | cls_arrow | box_arrow |
|---|
| 1 | None | 0.991085 | 0.991593 | 0.995369 | 0.997524 | 0.995818 | 0.995149 | 0.996001 | 0.996851 | 0.992633 | 0.990650 | 0.983423 |
| 2 | 0.999 | 0.993978 | 0.993429 | 0.996853 | 0.998160 | 0.996915 | 0.996035 | 0.997124 | 0.997249 | 0.994457 | 0.992993 | 0.989119 |
| 3 | 0.9995 | 0.995126 | 0.994426 | 0.997494 | 0.998554 | 0.997245 | 0.996941 | 0.997775 | 0.998297 | 0.995518 | 0.995444 | 0.990272 |
Finally, by setting some sensitive nodes to INT16, the average similarities of all model outputs meet the requirements. The quantization configuration and output similarity are as follows:
{
"model_config": {
"activation": {
"calibration_type": "max",
"per_channel": True,
"asymmetric": True,
},
},
"node_config": {
"Conv_360": {"input1": "ec"},
"Conv_3": {"qtype": "int16", "input1": "ec"},
"Conv_338": {"qtype": "int16", "input1": "ec"},
# 0.99
"Conv_265": {"qtype": "int16"},
"Conv_278": {"qtype": "int16"},
"Conv_287": {"qtype": "int16"},
"Conv_237": {"qtype": "int16"},
"Conv_267": {"qtype": "int16"},
"UNIT_CONV_FOR_BatchNormalization_141": {"qtype": "int16"},
"Conv_276": {"qtype": "int16"},
"Conv_289": {"qtype": "int16"},
"Conv_336": {"qtype": "int16"},
"Conv_135": {"qtype": "int16"},
"Add_140": {"qtype": "int16"},
"Conv_404": {"qtype": "int16"},
"Conv_3_split_low": {"qtype": "int16"},
"Conv_129": {"qtype": "int16"},
"Add_134": {"qtype": "int16"},
"Conv_338_split_low": {"qtype": "int16"},
# 0.999
"Conv_333": {"qtype": "int16"},
"Conv_107": {"qtype": "int16"},
"Concat_106": {"qtype": "int16"},
"Conv_339": {"qtype": "int16"},
"Conv_285": {"qtype": "int16"},
"Conv_335": {"qtype": "int16"},
"Conv_401": {"qtype": "int16"},
"Conv_337": {"qtype": "int16"},
"Conv_250": {"qtype": "int16"},
# 0.9995
"Conv_272": {"qtype": "int16"},
"Conv_384": {"qtype": "int16"},
"Conv_83": {"qtype": "int16"},
"Conv_8": {"qtype": "int16"},
"Add_249": {"qtype": "int16"},
"Conv_300": {"qtype": "int16"},
# box_arrow 0.999
"Mul_116": {"qtype": "int16"},
"Conv_239": {"qtype": "int16"},
"UNIT_CONV_FOR_BatchNormalization_161": {"qtype": "int16"},
"Conv_10": {"qtype": "int16"},
"GridSample_340": {"input0": "ec"},
"Conv_78": {"qtype": "int16"},
"Conv_163": {"qtype": "int16"},
# box_arrow 0.9995
"Relu_4": {"qtype": "int16"},
"Add_168": {"qtype": "int16"},
"Conv_8": {"qtype": "int16"},
"Conv_5": {"qtype": "int16"},
},
"subgraph_config": {
"arrow_head": {
"inputs": ["Reshape_390"],
"outputs": ["Conv_403", "Conv_404"],
"qtype": "int16",
}
}
}
+------------+-------------------+----------+----------+----------+
| Output | Metric | Min | Max | Avg |
+------------+-------------------+----------+----------+----------+
| mask | cosine-similarity | 0.905676 | 0.998650 | 0.995126 |
| field | cosine-similarity | 0.966730 | 0.999263 | 0.994426 |
| attr | cosine-similarity | 0.858142 | 0.999728 | 0.997494 |
| background | cosine-similarity | 0.990916 | 0.999713 | 0.998554 |
| cls | cosine-similarity | 0.881800 | 0.999640 | 0.997245 |
| box | cosine-similarity | 0.878442 | 0.999090 | 0.996941 |
| cls_sl | cosine-similarity | 0.917514 | 0.999845 | 0.997775 |
| box_sl | cosine-similarity | 0.923411 | 0.999942 | 0.998297 |
| occlusion | cosine-similarity | 0.972673 | 0.998806 | 0.995518 |
| cls_arrow | cosine-similarity | 0.678432 | 0.999992 | 0.995444 |
| box_arrow | cosine-similarity | 0.619935 | 0.999886 | 0.990272 |
+------------+-------------------+----------+----------+----------+
A complete accuracy tuning deployment example is available at: Lane Accuracy Tuning Deployment Example.
Accuracy Tuning Techniques
The PTQ accuracy tuning pipeline requires constant modification of node configurations, model compilation and accuracy verification. The entire process is time-consuming and expensive to debug. Based on this, we provide the IR interface to support you to directly modify quantization parameters in calibrated_model.onnx for rapid verification.
from hmct.ir import load_model, save_model
from hmct.common import find_input_calibration, find_output_calibration
model = load_model("calibrated_model.onnx")
# Modify specific activation or weight calibration nodes to use specific qtype
node = model.graph.node_mappings["ReduceMax_1317_HzCalibration"]
print(node.qtype) # Support qtype reading
node.qtype = "float32" # Support int8, int16, float16, float32 configured
# Configure all activation or weight calibration nodes to use int16 qtype
calibration_nodes = model.graph.type2nodes["HzCalibration"]
# Configure all activation calibration nodes to use int16 qtype
for node in calibration_nodes:
if node.tensor_type == "feature":
node.qtype = "int16"
# Configure all weight calibration nodes to use int16 qtype
for node in calibration_nodes:
if node.tensor_type == "weight":
node.qtype = "int16"
# Configure all calibration nodes to use int16 qtype
for node in calibration_nodes:
node.qtype = "int16"
# Configure a node with int16 input qtype
for node in model.graph.nodes:
if node.name in ["Conv_0"]:
for i in range(len(node.inputs)):
input_calib = find_input_calibration(node, i)
# It is required to be able to find HzCalibration in the
# input, and tensor_type is feature.
if input_calib and input_calib.tensor_type == "feature":
input_calib.qtype = "int16"
# Configure a node with int16 output qtype
for node in model.graph.nodes:
if node.name in ["Conv_0"]:
output_calib = find_output_calibration(node)
# It is required to be able to find HzCalibration in the
# input, and tensor_type is feature.
if output_calib and output_calib.tensor_type == "feature":
input_calib.qtype = "int16"
# Configure nodes with specific op_type to int16
for node in model.graph.nodes:
if node.op_type in ["Conv"]:
for i in range(len(node.inputs)):
input_calib = find_input_calibration(node, i)
# It is required to be able to find HzCalibration in the
# input, and tensor_type is feature.
if input_calib and input_calib.tensor_type == "feature":
input_calib.qtype = "int16"
# Modify specific activation or weight calibration nodes with specific thresholds
node = model.graph.node_mappings["ReduceMax_1317_HzCalibration"]
print(node.thresholds) # Support thresholds reading
node.thresholds = [4.23] # Support np.array, List[float]
save_model(model, "calibrated_model_modified.onnx")