Accuracy Tuning Practice

This chapter introduces the post-training quantization(PTQ) accuracy tuning pipeline using the precision problems encountered in actual use as an example. Please read the Model Accuracy Tuning chapter firstly to Understand relevant theoretical knowledge and tool usage.

Typical accuracy issues include:

  1. The all node type int16 quantization accuracy meets the requirements and Accuracy Debug Tool can provide relatively accurate sorting of sensitive nodes;
  2. The all node type int16 quantization accuracy meets the requirements, but setting a large number of sensitive nodes to higher precision cannot effectively improve quantization accuracy;
  3. The all node type int16 quantization accuracy does not meet the standard, under the premise of full BPU quantization of the model, we hope to further improve the quantization accuracy.

Sensitive Node Analysis

Accuracy Debug Tool provides an interface for calculating node quantization sensitivity. It can calculate the impact of each operator's quantization on the output results, set nodes with high quantization loss to higher precision, and complete accuracy tuning. The tuning pipeline is described using the HybridNets model as an example.

Using HMCT default INT8 quantization, percentile is selected as the calibration algorithm, the calibration accuracy does not meet the requirements (the accuracy of det and ll_seg decreases by more than 1%):

Model Float March Samples calibrated_model Cosine_Similarity ------------------------- ------- ------- --------- ------------------ ------------------- hybridnets-384-640_det 0.77222 nash-e 10000 0.75562(97.85%) 0.98012 hybridnets-384-640_da_seg 0.90467 nash-e 10000 0.89675(99.12%) 0.98012 hybridnets-384-640_ll_seg 0.85376 nash-e 10000 0.81813(95.83%) 0.98012

All Node Type INT16

First, set all_node_type to INT16 and select percentile for the calibration algorithm. At this point, the calibration accuracy meets the requirements, and we can use INT8+INT16 mixed precision to complete the tuning:

quant_config = {"model_config": {"all_node_type": "int16"}} Model Float March Samples calibrated_model Cosine_Similarity ------------------------- ------- ------- --------- ------------------ ------------------- hybridnets-384-640_det 0.77222 nash-e 10000 0.76866(99.54%) 0.997147 hybridnets-384-640_da_seg 0.90467 nash-e 10000 0.90405(99.93%) 0.997147 hybridnets-384-640_ll_seg 0.85376 nash-e 10000 0.84732(99.25%) 0.997147

Mixed Precision Debugging

Compile the INT8 calibration model based on the percentile calibration algorithm selected for INT16 model, configure debug_mode: "dump_calibration_data" in the yaml file to save the calibration data, and output the node quantization sensitivity through get_sensitivity_of_nodes:

hmct-debugger get-sensitivity-of-nodes hybridnets-384-640_calibrated_model.onnx calibration_data/ -n node -v True -s ./debug_result
===========================node sensitivity============================ node cosine-similarity ----------------------------------------------------------------------- /encoder/_blocks.0/_depthwise_conv/Conv 0.98768 /encoder/_swish/Mul 0.99526 /encoder/_blocks.2/_depthwise_conv/Conv 0.99852 /encoder/_blocks.0/Mul 0.99887 /encoder/_blocks.0/GlobalAveragePool 0.99889 /encoder/_blocks.2/_swish/Mul 0.99957 /bifpn/bifpn.5/conv3_up/depthwise_conv/conv/Conv 0.99964 /encoder/_blocks.0/_swish/Mul 0.99969 /encoder/_blocks.2/Mul 0.99979 /encoder/_blocks.2/GlobalAveragePool 0.9998 /bifpn/bifpn.2/conv3_up/pointwise_conv/conv/Conv 0.99983 /encoder/_blocks.2/_swish_1/Mul 0.99984 /bifpn/bifpn.5/p4_downsample/Pad 0.99985 ...er/seg_blocks.4/block/block.0/block/block.0/Pad 0.99985 /encoder/_blocks.0/_project_conv/Conv 0.99986 /encoder/_blocks.5/_depthwise_conv/Conv 0.99988 /bifpn/bifpn.3/conv3_up/depthwise_conv/conv/Conv 0.99989 /classifier/conv_list.0/depthwise_conv/conv/Conv 0.99989 ..._blocks.4/block/block.0/block/block.0/conv/Conv 0.99989 /regressor/conv_list.0/depthwise_conv/conv/Conv 0.99989 /bifpn/bifpn.2/conv3_up/depthwise_conv/conv/Conv 0.99992 /encoder/_blocks.17/_se_expand/Conv 0.99992 /encoder/_blocks.13/Mul 0.99992 /encoder/_blocks.1/_depthwise_conv/Conv 0.99992 /encoder/_blocks.13/GlobalAveragePool 0.99992 /encoder/_blocks.14/_se_expand/Conv 0.99992 /classifier/header/pointwise_conv/conv/Conv 0.99992 /encoder/_blocks.1/Add 0.99993 /encoder/_blocks.3/Mul 0.99993 /encoder/_blocks.3/GlobalAveragePool 0.99993 /encoder/_blocks.1/_swish/Mul 0.99993 /encoder/_blocks.15/Mul 0.99993 /encoder/_blocks.15/GlobalAveragePool 0.99993 /bifpn/bifpn.4/conv3_up/depthwise_conv/conv/Conv 0.99993 /bifpn/bifpn.1/conv3_up/pointwise_conv/conv/Conv 0.99993 /bifpn/bifpn.4/conv3_up/pointwise_conv/conv/Conv 0.99994 /encoder/_blocks.8/_project_conv/Conv 0.99994 /bifpn/bifpn.5/swish_3/Mul 0.99994 /bifpn/bifpn.3/conv3_up/pointwise_conv/conv/Conv 0.99994 /encoder/_blocks.8/GlobalAveragePool 0.99994 /encoder/_conv_stem/Conv 0.99994 /encoder/_blocks.13/_project_conv/Conv 0.99994 /encoder/_blocks.8/Mul 0.99994 /bifpn/bifpn.5/conv3_up/pointwise_conv/conv/Conv 0.99995 ...

Sorting by cosine similarity from front to back, gradually set the operator INT16 quantization, and the calibration model accuracy will increase until it meets the requirements:

Serial NumberCosine Similarity Value(<=value will be set to INT16)Accuracy
detda_segll_seg
1None0.75562(97.85%)0.89675(99.12%)0.81813(95.83%)
20.9990.76531(99.11%)0.90274(99.79%)0.83874(98.24%)
30.99980.76545(99.12%)0.90340(99.86%)0.83961(98.34%)
40.99990.76613(99.21%)0.90420(99.95%)0.84216(98.64%)
50.999920.76712(99.34%)0.90356(99.88%)0.84397(98.85%)
60.999930.76781(99.43%)0.90374(99.90%)0.84484(98.95%)
70.999940.76811(99.47%)0.90344(99.86%)0.84528(99.01%)

From the above test table, we can see that if the sensitive nodes with a sensitivity value less than or equal to 0.99994 are set as INT16 nodes, the calibration accuracy meets the requirements:

quant_config = { "model_config": { "activation": {"calibration_type": "max", "max_percentile": 0.99995}, }, "node_config": { "/encoder/_blocks.0/_depthwise_conv/Conv": {"qtype": "int16"}, "/encoder/_swish/Mul": {"qtype": "int16"}, "/encoder/_blocks.2/_depthwise_conv/Conv": {"qtype": "int16"}, "/encoder/_blocks.0/Mul": {"qtype": "int16"}, "/encoder/_blocks.0/GlobalAveragePool": {"qtype": "int16"}, "/encoder/_blocks.2/_swish/Mul": {"qtype": "int16"}, "/bifpn/bifpn.5/conv3_up/depthwise_conv/conv/Conv": {"qtype": "int16"}, "/encoder/_blocks.0/_swish/Mul": {"qtype": "int16"}, "/encoder/_blocks.2/Mul": {"qtype": "int16"}, "/encoder/_blocks.2/GlobalAveragePool": {"qtype": "int16"}, "/bifpn/bifpn.2/conv3_up/pointwise_conv/conv/Conv": {"qtype": "int16"}, "/encoder/_blocks.2/_swish_1/Mul": {"qtype": "int16"}, "/bifpn/bifpn.5/p4_downsample/Pad": {"qtype": "int16"}, "/bifpndecoder/seg_blocks.4/block/block.0/block/block.0/Pad": {"qtype": "int16"}, "/encoder/_blocks.0/_project_conv/Conv": {"qtype": "int16"}, "/encoder/_blocks.5/_depthwise_conv/Conv": {"qtype": "int16"}, "/bifpn/bifpn.3/conv3_up/depthwise_conv/conv/Conv": {"qtype": "int16"}, "/classifier/conv_list.0/depthwise_conv/conv/Conv": {"qtype": "int16"}, "/bifpndecoder/seg_blocks.4/block/block.0/block/block.0/conv/Conv": {"qtype": "int16"}, "/regressor/conv_list.0/depthwise_conv/conv/Conv": {"qtype": "int16"}, "/bifpn/bifpn.2/conv3_up/depthwise_conv/conv/Conv": {"qtype": "int16"}, "/encoder/_blocks.17/_se_expand/Conv": {"qtype": "int16"}, "/encoder/_blocks.13/Mul": {"qtype": "int16"}, "/encoder/_blocks.1/_depthwise_conv/Conv": {"qtype": "int16"}, "/encoder/_blocks.13/GlobalAveragePool": {"qtype": "int16"}, "/encoder/_blocks.14/_se_expand/Conv": {"qtype": "int16"}, "/classifier/header/pointwise_conv/conv/Conv": {"qtype": "int16"}, "/encoder/_blocks.1/Add": {"qtype": "int16"}, "/encoder/_blocks.3/Mul": {"qtype": "int16"}, "/encoder/_blocks.3/GlobalAveragePool": {"qtype": "int16"}, "/encoder/_blocks.1/_swish/Mul": {"qtype": "int16"}, "/encoder/_blocks.15/Mul": {"qtype": "int16"}, "/encoder/_blocks.15/GlobalAveragePool": {"qtype": "int16"}, "/bifpn/bifpn.4/conv3_up/depthwise_conv/conv/Conv": {"qtype": "int16"}, "/bifpn/bifpn.1/conv3_up/pointwise_conv/conv/Conv": {"qtype": "int16"}, "/bifpn/bifpn.4/conv3_up/pointwise_conv/conv/Conv": {"qtype": "int16"}, "/encoder/_blocks.8/_project_conv/Conv": {"qtype": "int16"}, "/bifpn/bifpn.5/swish_3/Mul": {"qtype": "int16"}, "/bifpn/bifpn.3/conv3_up/pointwise_conv/conv/Conv": {"qtype": "int16"}, "/encoder/_blocks.8/GlobalAveragePool": {"qtype": "int16"}, "/encoder/_conv_stem/Conv": {"qtype": "int16"}, "/encoder/_blocks.13/_project_conv/Conv": {"qtype": "int16"}, "/encoder/_blocks.8/Mul": {"qtype": "int16"}, }, } Model Float March Samples calibrated_model Cosine_Similarity ------------------------- ------- ------- --------- -------------------- ----------------- hybridnets-384-640_det 0.77222 nash-e 10000 0.76811(99.47%) 0.994576 hybridnets-384-640_da_seg 0.90467 nash-e 10000 0.90344(99.86%) 0.994576 hybridnets-384-640_ll_seg 0.85376 nash-e 10000 0.84528(99.01%) 0.994576

A complete accuracy tuning deployment example is available at: HybriNets Accuracy Tuning Deployment Example.

Sensitive Node Analysis Failure

If using the Accuracy Debug Tool to set sensitive nodes to higher precision fails to effectively improve model accuracy, we can first try specifying output nodes to filter out irrelevant nodes. Additionally, observe the model output error and select other metric to improve the correlation between sensitivity ranking and precision. Furthermore, by analyzing the model structure, typical substructures with a higher risk of quantization loss (such as model outputs, inputs, and structures with specific physical meaning) can be set to higher precision to complete accuracy tuning. This tuning pipeline is described using the YoloP model as an example.

Using HMCT default INT8 quantization, percentile is selected as the calibration algorithm, the calibration accuracy does not meet the requirements (the accuracy of det decreases by more than 1%):

Model Float March Samples calibrated_model Cosine_Similarity -------------------- ------- ------- --------- ------------------ ------------------- yolop-384-640_det 0.76448 nash-e 10000 0.61507(80.46%) 0.999891 yolop-384-640_da_seg 0.89008 nash-e 10000 0.88863(99.84%) 0.999891 yolop-384-640_ll_seg 0.6523 nash-e 10000 0.65357(100.19%) 0.999891

All Node Type INT16

First, set all_node_type to INT16 and select percentile for the calibration algorithm. At this point, the calibration accuracy meets the requirements, and we can use INT8+INT16 mixed precision to complete the tuning:

quant_config = {"model_config": {"all_node_type": "int16"}} Model Float March Samples calibrated_model Cosine_Similarity -------------------- ------- ------- --------- ------------------ ----------------- yolop-384-640_det 0.76448 nash-e 10000 0.75890(99.27%) 0.99999 yolop-384-640_da_seg 0.89008 nash-e 10000 0.88950(99.93%) 0.99999 yolop-384-640_ll_seg 0.6523 nash-e 10000 0.64821(99.37%) 0.99999

Mixed Precision Debugging

Compile the INT8 calibration model based on the percentile calibration algorithm selected for INT16 model, configure debug_mode: "dump_calibration_data" in the yaml file to save the calibration data, and output the node quantization sensitivity through get_sensitivity_of_nodes:

hmct-debugger get-sensitivity-of-nodes yolop-384-640_calibrated_model.onnx calibration_data/ -n node -v True -s ./debug_result
=======================node sensitivity======================== node cosine-similarity --------------------------------------------------------------- Mul_943 0.99736 Mul_647 0.99894 Mul_795 0.99909 Conv_50 0.99976 Div_49 0.99983 Conv_92 0.99989 Div_58 0.9999 Conv_1119 0.99995 Conv_88 0.99996 Conv_41 0.99996 Conv_59 0.99998 Slice_4 0.99998 Slice_9 0.99998 Slice_14 0.99998 Slice_19 0.99998 Slice_24 0.99998 Slice_29 0.99998 Slice_34 0.99998 Slice_39 0.99998 Concat_40 0.99998 Div_67 0.99998 MaxPool_297 0.99999 MaxPool_298 0.99999 MaxPool_299 0.99999 Concat_300 0.99999 Concat_1003 0.99999 Conv_177 0.99999 Div_296 0.99999 ScatterND_705 0.99999 Slice_645 0.99999 Reshape_706 0.99999 Conv_110 0.99999 Conv_87 0.99999 Concat_89 0.99999 LeakyRelu_91 0.99999 Mul_584 0.99999 ScatterND_640 0.99999 Concat_1105 0.99999 LeakyRelu_1107 0.99999 Conv_1004 0.99999 Add_582 0.99999 Conv_119 0.99999 Resize_1014 0.99999 Conv_1015 0.99999 Conv_1043 0.99999 Conv_266 0.99999 Conv_199 0.99999 Div_100 0.99999 ...

Sorting by cosine similarity from front to back, gradually set the operator INT16 quantization. However, even with a large number of sensitive nodes set to INT16, the accuracy still failed to meet the requirements:

Serial NumberCosine Similarity Value(<=value will be set to INT16)Accuracy
detda_segll_seg
1None0.61507(80.46%)0.88863(99.84%)0.65357(100.19%)
20.99990.60956(79.74%)0.88911(99.89%)0.65925(101.07%)
30.999960.60978(79.76%)0.88933(99.92%)0.66112(101.35%)
40.999980.60956(79.73%)0.88931(99.91%)0.66125(101.37%)
50.999990.66426(86.89%)0.88958(99.94%)0.66065(101.28%)

Observing the accuracy results of the INT8 calibrated model, only det branch does not reach 99%. When calculating nodes sensitivity through get_sensitivity_of_nodes interface, we can use the -o option to specify det output, so that a better sensitivity ranking can be obtained:

hmct-debugger get-sensitivity-of-nodes yolop-384-640_calibrated_model.onnx calibration_data/ -n node -o Concat_1003 -v True -s det_debug_result/
=======================node sensitivity======================== node cosine-similarity --------------------------------------------------------------- Mul_943 0.99736 Mul_647 0.99894 Mul_795 0.99909 Conv_50 0.99997 Div_58 0.99997 Div_49 0.99999 Concat_1003 0.99999 ScatterND_705 0.99999 Slice_645 0.99999 Reshape_706 0.99999 Mul_584 0.99999 ScatterND_640 0.99999 Add_582 0.99999 Conv_41 0.99999 Slice_4 0.99999 Slice_9 0.99999 Slice_14 0.99999 Slice_19 0.99999 Slice_24 0.99999 Slice_29 0.99999 Slice_34 0.99999 Slice_39 0.99999 Concat_40 0.99999 Conv_88 0.99999 Conv_59 0.99999 ...

Sorting by cosine similarity from front to back, setting the operator to INT16 quantization, and focusing only on the det output can filter out useless nodes, but the final accuracy still does not meet the requirements:

Serial NumberCosine Similarity Value(<=value will be set to INT16)Accuracy
detda_segll_seg
1None0.61507(80.46%)0.88863(99.84%)0.65357(100.19%)
20.99990.60868(79.62%)0.88836(99.81%)0.65300(100.11%)
30.999970.60961(79.74%)0.88902(99.88%)0.65664(100.66%)
40.999990.66461(86.94%)0.88932(99.91%)0.65876(100.99%)

Observe the output similarity of the INT8 calibrated model. The L1 and L2 distances of the det branch deviate greatly from the floating point. Try replacing cosine similarity with other metrics:

hmct-info yolop-384-640_calibrated_model.onnx -c ./calibration_data/images/00.npy INFO:root:The quantized model output: ================================================================================= Output Cosine Similarity L1 Distance L2 Distance Chebyshev Distance --------------------------------------------------------------------------------- det_out 0.995352 7.633481 289.637665 552.817566 drive_area_seg 0.998973 0.004005 0.001132 0.592610 lane_line_seg 0.999933 0.000417 0.000069 0.564768

When calculating nodes sensitivity through the get_sensitivity_of_nodes interface, we can specify mse as the metric to improve the discrimination between different nodes:

hmct-debugger get-sensitivity-of-nodes yolop-384-640_calibrated_model.onnx calibration_data/ -n node -o Concat_1003 -m mse -v True -s det_mse_debug_result/
===================node sensitivity==================== node mse ------------------------------------------------------- Mul_943 164.82712 Mul_647 65.88637 Mul_795 56.86866 Conv_50 2.04226 Div_58 1.88065 Concat_1003 0.87797 Div_49 0.84962 ScatterND_705 0.67858 Slice_645 0.67379 Reshape_706 0.67379 ScatterND_640 0.55187 Mul_584 0.54884 Add_582 0.52263 Conv_41 0.4714 Slice_4 0.38413 Slice_9 0.38413 Slice_14 0.38413 Slice_19 0.38413 Slice_24 0.38413 Slice_29 0.38413 Slice_34 0.38413 Slice_39 0.38413 Concat_40 0.38413 Conv_88 0.35534 Conv_59 0.33164 Conv_92 0.30398 Div_67 0.16826 ScatterND_853 0.16711 Slice_793 0.1627 Reshape_854 0.1627 ScatterND_788 0.13494 Mul_732 0.132 Add_730 0.12381 ScatterND_1001 0.07478 Conv_550 0.07226 Conv_518 0.06468 Conv_546 0.06344 Conv_310 0.05974 Conv_68 0.05735 Conv_338 0.05684 Add_86 0.04319 ScatterND_936 0.0428 Concat_89 0.04201 LeakyRelu_91 0.04201 Concat_517 0.04193 Conv_87 0.04148 Slice_941 0.04148 Reshape_1002 0.04148 Conv_448 0.04034 MaxPool_297 0.03739 MaxPool_298 0.03739 MaxPool_299 0.03739 Concat_300 0.03739 Mul_880 0.03334 Div_100 0.03328 Conv_855 0.03146 Concat_547 0.03094 LeakyRelu_549 0.03094 Add_878 0.03016 Div_558 0.02966 ...

Sort by mse similarity from front to back, set the operator to higher precision, and finally achieve the required accuracy after adding a large number of INT16 nodes:

Serial NumberMSE Value(>=value will be set to INT16)Accuracy
detda_segll_seg
1None0.61507(80.46%)0.88863(99.84%)0.65357(100.19%)
20.50.66471(86.95%)0.88902(99.88%)0.65664(100.66%)
30.20.66447(86.92%)0.88933(99.92%)0.66112(101.35%)
40.10.72969(95.45%)0.88931(99.91%)0.66125(101.37%)
50.050.73393(96.00%)0.88934(99.92%)0.66121(101.37%)
60.040.73321(95.91%)0.88931(99.91%)0.66125(101.37%)
70.030.75707(99.03%)0.88944(99.93%)0.66137(101.39%)

Subgraph Structure Analysis

Even if we only focus on improving the det output based on mse metric, setting a large number of sensitive nodes for higher precision still cannot effectively improve the accuracy. Furthermore, considering that only the det task currently does not meet the requirements, it is inferred that quantization loss is less likely to occur in nodes in the da_seg and ll_seg branches, as well as in the shared backbone. Focus on the accuracy tuning of det branch, combined with model structure analysis, try to specify the det output position subgraph to use higher precision and test the calibration accuracy:

quant_config = { "subgraph_config": { "det_head": { "inputs": ["Conv_559", "Conv_707", "Conv_855"], "outputs": ["Concat_1003"], "qtype": "int16", }, } } Model Float March Samples calibrated_model Cosine_Similarity -------------------- ------- ------- --------- ------------------ ------------------- yolop-384-640_det 0.76448 nash-e 10000 0.76275(99.77%) 0.99991 yolop-384-640_da_seg 0.89008 nash-e 10000 0.88863(99.84%) 0.99991 yolop-384-640_ll_seg 0.6523 nash-e 10000 0.65357(100.19%) 0.99991
Attention

When the sorting of sensitive nodes is inaccurate, the source of the loss can be preliminarily determined by configuring the subgraph for higher precision. If the latency of the subgraph with higher precision increases significantly, sensitivity analysis can be performed within the subgraph, so that fewer nodes with higher precision will be configured.

A complete accuracy tuning deployment example is available at: YoloP Accuracy Tuning Deployment Example.

Quantization Loss Compensation

Due to hardware constraints and latency, when set all_node_type to INT16, nodes with int8 precision still exist in model, including weights of Conv and ConvTranspose, Resize, GridSample and the second input of MatMul. PTQ introduces an identical operator to compensate for the quantization loss caused by int8 precision, further improving the model's accuracy with all nodes deployed on BPU, and complete accuracy tuning. The Lane model is used as an example to illustrate the tuning pipeline.

Using HMCT default INT8 quantization, max calibration algorithm with asymmetric and per-channel is selected, the accuracy does not meet the requirements(average cosine similarity of all outputs is less than 0.99):

+------------+-------------------+-----------+----------+----------+ | Output | Metric | Min | Max | Avg | +------------+-------------------+-----------+----------+----------+ | mask | cosine-similarity | 0.575118 | 0.956694 | 0.875484 | | field | cosine-similarity | 0.653135 | 0.948818 | 0.883109 | | attr | cosine-similarity | 0.456388 | 0.986300 | 0.906577 | | background | cosine-similarity | 0.878089 | 0.997221 | 0.979943 | | cls | cosine-similarity | 0.444225 | 0.996364 | 0.958695 | | box | cosine-similarity | 0.209923 | 0.993850 | 0.941041 | | cls_sl | cosine-similarity | 0.353770 | 0.998059 | 0.956674 | | box_sl | cosine-similarity | 0.196419 | 0.998049 | 0.948152 | | occlusion | cosine-similarity | 0.786355 | 0.978297 | 0.939203 | | cls_arrow | cosine-similarity | 0.079772 | 0.999830 | 0.940612 | | box_arrow | cosine-similarity | -0.231095 | 0.998620 | 0.850558 | +------------+-------------------+-----------+----------+----------+

All Node Type INT16

First, set all_node_type to INT16 and select max for the calibration algorithm. At this point, the calibration accuracy meets the requirements(average cosine similarity of occlusion and box_arrow outputs does not reach 0.99), further accuracy tuning is required:

quant_config = {"model_config": {"all_node_type": "int16"}} +------------+-------------------+----------+----------+----------+ | Output | Metric | Min | Max | Avg | +------------+-------------------+----------+----------+----------+ | mask | cosine-similarity | 0.926001 | 0.998617 | 0.992738 | | field | cosine-similarity | 0.945007 | 0.999313 | 0.993549 | | attr | cosine-similarity | 0.871824 | 0.999821 | 0.996161 | | background | cosine-similarity | 0.981510 | 0.999835 | 0.998274 | | cls | cosine-similarity | 0.918296 | 0.999851 | 0.997182 | | box | cosine-similarity | 0.911032 | 0.999134 | 0.996155 | | cls_sl | cosine-similarity | 0.933632 | 0.999918 | 0.997105 | | box_sl | cosine-similarity | 0.850244 | 0.998877 | 0.996493 | | occlusion | cosine-similarity | 0.943404 | 0.993528 | 0.983970 | | cls_arrow | cosine-similarity | 0.560625 | 0.999993 | 0.994583 | | box_arrow | cosine-similarity | 0.755858 | 0.999889 | 0.987496 | +------------+-------------------+----------+----------+----------+

Upper Limit Accuracy of INT16

Since INT8 precision nodes still exist in model after setting all_node_type to INT16, we can modify the qtype of all calibration nodes to INT16 through IR interface provided by HMCT to obtain a true INT16 model:

from hmct.ir import load_model, save_model model = load_model("lane_calibrated_model_int16.onnx") calibration_nodes = model.graph.type2nodes["HzCalibration"] for node in calibration_nodes: node.qtype = "int16" save_model(model, "lane_calibrated_model_real_int16.onnx")

Verify that the average similarity of the true INT16 calibrated model on all outputs, which can meet requirements, so that accuracy can be improved by compensating quantization error.

+------------+-------------------+----------+----------+----------+ | Output | Metric | Min | Max | Avg | +------------+-------------------+----------+----------+----------+ | mask | cosine-similarity | 0.999819 | 0.999995 | 0.999984 | | field | cosine-similarity | 0.999833 | 0.999997 | 0.999985 | | attr | cosine-similarity | 0.999743 | 0.999999 | 0.999992 | | background | cosine-similarity | 0.999977 | 0.999999 | 0.999996 | | cls | cosine-similarity | 0.999852 | 0.999999 | 0.999994 | | box | cosine-similarity | 0.999681 | 0.999999 | 0.999990 | | cls_sl | cosine-similarity | 0.999722 | 1.000000 | 0.999993 | | box_sl | cosine-similarity | 0.999529 | 1.000000 | 0.999992 | | occlusion | cosine-similarity | 0.999844 | 0.999996 | 0.999978 | | cls_arrow | cosine-similarity | 0.998314 | 1.000000 | 0.999985 | | box_arrow | cosine-similarity | 0.999478 | 1.000000 | 0.999971 | +------------+-------------------+----------+----------+----------+

Compensate Quantization Loss

The analysis process of compensation needs to be based on the calibrated model configured with all_node_type int16, nodes sensitivity is output through get_sensitivity_of_nodes:

hmct-debugger get-sensitivity-of-nodes lane_calibrated_model_int16.onnx calibration_data/ -m ['cosine-similarity','mre','mse','sqnr','chebyshev'] -n node -v True -s ./int16_debug_result
=========================================node sensitivity========================================= node cosine-similarity mre mse sqnr chebyshev -------------------------------------------------------------------------------------------------- Conv_360 0.98855 0.07025 0.61746 7.52018 4.78374 Conv_3 0.99895 0.2379 79.3955 12.55955 134.92183 Conv_338 0.99896 0.88695 0.00029 13.35992 1.54686 Conv_336 0.9994 0.04776 0.0002 14.62035 1.03647 ...

Then modify the calibrated model according to the nodes sensitivity sorting, and increase the quantization precision from INT8 to INT16 until occlusion and box_arrow meet requirements:

from hmct.common import find_input_calibration, find_output_calibration from hmct.ir import load_model, save_model model = load_model("lane_calibrated_model_int16.onnx") improved_nodes = ["Conv_360", "Conv_3", "Conv_338"] for node in model.graph.nodes: if node.name not in improved_nodes: continue if node.op_type in ["Conv", "ConvTranspose", "MatMul"]: input1_calib = find_input_calibration(node, 1) if input1_calib and input1_calib.tensor_type == "weight": input1_calib.qtype = "int16" if node.op_type == "Resize": input_calib = find_input_calibration(node, 0) if input_calib and input_calib.tensor_type == "feature": input_calib.qtype = "int16" interpolation_mode = node.attributes.get("mode", "nearest") # In nearest mode, output quantization precision can be improved # to be close to int16, in other modes, only the input quantization # precision can be improved to be close to int16. if interpolation_mode == "nearest": output_calib = find_output_calibration(node) if output_calib and output_calib.tensor_type == "feature": output_calib.qtype = "int16" if node.op_type == "GridSample": input_calib = find_input_calibration(node, 0) if input_calib and input_calib.tensor_type == "feature": input_calib.qtype = "int16" interpolation_mode = node.attributes.get("mode", "bilinear") # In nearest mode, output quantization precision can be improved # to be close to int16, in other modes, only the input quantization # precision can be improved to be close to int16. if interpolation_mode == "nearest": output_calib = find_output_calibration(node) if output_calib and output_calib.tensor_type == "feature": output_calib.qtype = "int16" save_model(model, "lane_calibrated_model_int16_improved.onnx")
Serial NumberCosine Similarity Value(<=value will be set to INT16)Output Cosine
occlusionbox_arrow
MinAvgMinAvg
1None0.9434040.9839700.7558580.987496
20.9990.9837390.9977290.8931160.994958
30.990.9527580.9941160.7457810.987434

As shown in the table above, increasing the weight quantization precision of Conv_360, Conv_3, and Conv_338 from INT8 to INT16 allows all output similarities to meet requirements. To deploy all nodes on the BPU, a similar operator is introduced to compensate for the quantization loss caused by int8 precision, and the final precision will be improved to close to INT16:

quant_config = { "model_config": { "all_node_type": "int16", "activation": {"calibration_type": "max"}, }, "node_config": { # Weight quantization loss of Conv, ConvTranspose, MatMul # can be compensated by configuring input1 to ec. "Conv_360": {"input1": "ec"}, "Conv_3": {"input1": "ec"}, "Conv_338": {"input1": "ec"}, # Quantization loss of GridSample and Resize can be # compensated by configuring input0 to ec. # "GridSample_340": {"input0": "ec"}, } } +------------+-------------------+----------+----------+----------+ | Output | Metric | Min | Max | Avg | +------------+-------------------+----------+----------+----------+ | mask | cosine-similarity | 0.983658 | 0.999305 | 0.997363 | | field | cosine-similarity | 0.972155 | 0.999655 | 0.996506 | | attr | cosine-similarity | 0.977372 | 0.999879 | 0.998771 | | background | cosine-similarity | 0.994089 | 0.999934 | 0.999493 | | cls | cosine-similarity | 0.984845 | 0.999909 | 0.999082 | | box | cosine-similarity | 0.977550 | 0.999447 | 0.998403 | | cls_sl | cosine-similarity | 0.980353 | 0.999958 | 0.998956 | | box_sl | cosine-similarity | 0.979305 | 0.999973 | 0.999332 | | occlusion | cosine-similarity | 0.982567 | 0.999608 | 0.997672 | | cls_arrow | cosine-similarity | 0.904646 | 0.999996 | 0.998248 | | box_arrow | cosine-similarity | 0.890971 | 0.999973 | 0.994999 | +------------+-------------------+----------+----------+----------+
Attention

It is recommended that Resize and GridSample use the nearest sampling mode. In this case, the operator's output will not introduce any new values, and error can be compensated. Otherwise, INT8 quantization of the new value on output would introduce additional loss which cannot be compensated.

Mixed Precision Debugging

After compensating for the weight quantization losses of Conv_360, Conv_3, and Conv_338, the accuracy of calibrated model with all_node_type int16 configured can meet requirements. Then attempted to optimize the INT8 calibrated model with error compensated. The accuracy of the INT8 calibrated model is as follows:

quant_config = { "model_config": { "activation": { "calibration_type": "max", "per_channel": True, "asymmetric": True, }, }, "node_config": { "Conv_360": {"input1": "ec"}, "Conv_3": {"input1": "ec"}, "Conv_338": {"input1": "ec"}, } } +------------+-------------------+-----------+----------+----------+ | Output | Metric | Min | Max | Avg | +------------+-------------------+-----------+----------+----------+ | mask | cosine-similarity | 0.578707 | 0.950058 | 0.874048 | | field | cosine-similarity | 0.687287 | 0.946366 | 0.875366 | | attr | cosine-similarity | 0.471613 | 0.986946 | 0.908879 | | background | cosine-similarity | 0.851624 | 0.996991 | 0.976282 | | cls | cosine-similarity | 0.536348 | 0.996753 | 0.959749 | | box | cosine-similarity | 0.094459 | 0.994883 | 0.939461 | | cls_sl | cosine-similarity | 0.374808 | 0.998186 | 0.959271 | | box_sl | cosine-similarity | 0.079629 | 0.998462 | 0.947069 | | occlusion | cosine-similarity | 0.702038 | 0.986074 | 0.945837 | | cls_arrow | cosine-similarity | 0.060614 | 0.999781 | 0.942194 | | box_arrow | cosine-similarity | -0.301507 | 0.998179 | 0.829580 | +------------+-------------------+-----------+----------+----------+

Output nodes sensitivity through get_sensitivity_of_nodes:

hmct-debugger get-sensitivity-of-nodes lane_calibrated_model.onnx calibration_data/ -m ['cosine-similarity','mre','mse','sqnr','chebyshev'] -n node -v True -s ./debug_result
===========================================node sensitivity=========================================== node cosine-similarity mre mse sqnr chebyshev ------------------------------------------------------------------------------------------------------ Conv_265 0.43427 12.56779 32.42019 0.37585 24.77806 Conv_278 0.84973 0.80948 21.66625 2.71994 16.2646 Conv_287 0.87352 2.34926 817.71234 0.42356 183.87369 Conv_237 0.96676 1.17564 12526.42871 3.45996 1538.01672 Conv_267 0.96678 2.02166 4.81972 4.51482 14.91702 UNIT_CONV_FOR_BatchNormalization_141 0.96682 1.17458 12521.30957 3.4652 1537.55347 Conv_276 0.97024 0.63814 10.79584 4.23258 13.53813 Conv_289 0.97159 0.61387 60.08849 6.0926 61.38558 Conv_336 0.97212 1.70509 0.00951 6.28411 1.07831 Conv_135 0.97478 0.93508 11459.125 3.94841 1468.99048 Add_140 0.97482 0.93514 11391.92676 3.95557 1464.76538 Conv_404 0.97672 2.96196 0.20074 6.37718 5.93086 Conv_3 0.97977 0.49545 27730.66992 2.96076 2199.58203 Conv_3_split_low 0.9798 0.49563 27710.51758 2.96429 2198.6333 Conv_129 0.97991 0.5559 14292.41602 4.12395 1598.18835 Add_134 0.97991 0.55644 14300.95312 4.12708 1598.60938 Conv_338 0.98833 6.00584 0.0032 8.16733 0.40305 Conv_338_split_low 0.98833 6.00581 0.0032 8.16729 0.4033 Conv_333 0.99496 5.13682 0.00184 10.63743 0.19728 Conv_107 0.99543 0.23465 2859.23389 7.1595 751.46326 Concat_106 0.99546 0.23441 2843.88184 7.17062 749.37109 Conv_339 0.99564 0.25485 0.17183 10.29787 8.05724 Conv_285 0.99576 0.21163 9.77346 10.03632 38.60489 Conv_335 0.99779 0.1337 0.44365 11.66653 2.3832 Conv_401 0.9978 2.66814 0.01664 11.78462 1.694 Conv_337 0.99871 0.1687 1.17969 12.91167 5.95476 Conv_250 0.99894 0.17839 0.70945 14.60334 11.99677 Conv_272 0.99917 0.06468 0.15378 13.46442 7.07149 Conv_384 0.99919 0.61947 4.9547 17.0275 71.32216 Conv_83 0.99921 0.11831 328.16125 11.67288 266.15958 Conv_8 0.99925 0.06326 409.16638 11.83647 295.1091 Add_249 0.99934 0.2801 0.5793 16.36763 12.01501 Conv_300 0.9995 0.01909 0.31657 14.86126 5.5387 Slice_299 0.99951 0.01896 0.30647 14.9317 5.27295 ...

Sorting by cosine similarity from front to back, gradually setting the operator INT16 quantization, the calibrated model similarity will also increase:

Serial NumberCosine Similarity ValueOutput Cosine
maskfieldattrbackgroudclsboxcls_slbox_slocclusioncls_arrowbox_arrow
1None0.8740480.8753660.9088790.9762820.9597490.9394610.9592710.9470690.9458370.9421940.829580
20.990.9807080.9874830.9890230.9933680.9911540.9852050.9909000.9903750.9857210.9753500.963180
30.9990.9888580.9908370.9946690.9959940.9955700.9946600.9954660.9962020.9912010.9791000.980218
40.99950.9910850.9915930.9953690.9975240.9958180.9951490.9960010.9968510.9926330.9814710.982875

Based on the above table, we optimized model by setting sensitive nodes with a sensitivity value less than or equal to 0.9995 to INT16. Except for cls_arrow and box_arrow, average similarity of all other outputs was no less than 0.99. Observing that cls_arrow and box_arrow share the same branch, we additionally tried configuring arrow output head subgraph to INT16, quantization configuration and output similarity are as follows:

{ "model_config": { "activation": { "calibration_type": "max", "per_channel": True, "asymmetric": True, }, }, "node_config": { "Conv_360": {"input1": "ec"}, "Conv_3": {"qtype": "int16", "input1": "ec"}, "Conv_338": {"qtype": "int16", "input1": "ec"}, # 0.99 "Conv_265": {"qtype": "int16"}, "Conv_278": {"qtype": "int16"}, "Conv_287": {"qtype": "int16"}, "Conv_237": {"qtype": "int16"}, "Conv_267": {"qtype": "int16"}, "UNIT_CONV_FOR_BatchNormalization_141": {"qtype": "int16"}, "Conv_276": {"qtype": "int16"}, "Conv_289": {"qtype": "int16"}, "Conv_336": {"qtype": "int16"}, "Conv_135": {"qtype": "int16"}, "Add_140": {"qtype": "int16"}, "Conv_404": {"qtype": "int16"}, "Conv_3_split_low": {"qtype": "int16"}, "Conv_129": {"qtype": "int16"}, "Add_134": {"qtype": "int16"}, "Conv_338_split_low": {"qtype": "int16"}, # 0.999 "Conv_333": {"qtype": "int16"}, "Conv_107": {"qtype": "int16"}, "Concat_106": {"qtype": "int16"}, "Conv_339": {"qtype": "int16"}, "Conv_285": {"qtype": "int16"}, "Conv_335": {"qtype": "int16"}, "Conv_401": {"qtype": "int16"}, "Conv_337": {"qtype": "int16"}, "Conv_250": {"qtype": "int16"}, # 0.9995 "Conv_272": {"qtype": "int16"}, "Conv_384": {"qtype": "int16"}, "Conv_83": {"qtype": "int16"}, "Conv_8": {"qtype": "int16"}, "Add_249": {"qtype": "int16"}, "Conv_300": {"qtype": "int16"}, }, "subgraph_config": { "arrow_head": { "inputs": ["Reshape_390"], "outputs": ["Conv_403", "Conv_404"], "qtype": "int16", } } } +------------+-------------------+----------+----------+----------+ | Output | Metric | Min | Max | Avg | +------------+-------------------+----------+----------+----------+ | mask | cosine-similarity | 0.926363 | 0.997833 | 0.991085 | | field | cosine-similarity | 0.915524 | 0.999179 | 0.991593 | | attr | cosine-similarity | 0.869666 | 0.999608 | 0.995369 | | background | cosine-similarity | 0.983465 | 0.999664 | 0.997524 | | cls | cosine-similarity | 0.929948 | 0.999513 | 0.995818 | | box | cosine-similarity | 0.890618 | 0.999021 | 0.995149 | | cls_sl | cosine-similarity | 0.937240 | 0.999738 | 0.996001 | | box_sl | cosine-similarity | 0.880057 | 0.999896 | 0.996851 | | occlusion | cosine-similarity | 0.966050 | 0.998043 | 0.992633 | | cls_arrow | cosine-similarity | 0.380447 | 0.999980 | 0.990650 | | box_arrow | cosine-similarity | 0.556044 | 0.999856 | 0.983423 | +------------+-------------------+----------+----------+----------+

Currently, only the average similarity of the box_arrow does not meet requirements. Specify box_arrow output to regain sensitivity sorting:

hmct-debugger get-sensitivity-of-nodes lane_calibrated_model_box.onnx calibration_data/ -m ['cosine-similarity','mre','mse','sqnr','chebyshev'] -n node -o Conv_404 -v True -s ./box_debug_result
========================================node sensitivity======================================== node cosine-similarity mre mse sqnr chebyshev ------------------------------------------------------------------------------------------------ Mul_116 0.9987 0.35957 0.03655 10.08165 18.38877 Conv_239 0.99871 0.38099 0.01399 12.16738 6.04099 UNIT_CONV_FOR_BatchNormalization_161 0.99871 0.3832 0.01404 12.15968 6.11046 Conv_10 0.99879 0.12717 0.04052 9.85789 19.9839 GridSample_340 0.99887 0.34926 0.02639 10.78928 15.52824 Conv_78 0.9989 0.33779 0.02165 11.21884 12.89857 Conv_163 0.9989 0.16412 0.03985 9.89421 20.19754 Relu_4 0.99902 0.39383 0.05239 9.29984 22.57018 Add_168 0.99902 0.16685 0.04009 9.88115 20.26648 Conv_8 0.99921 0.10613 0.04032 9.86872 19.64041 Conv_5 0.99932 0.37324 0.03799 9.99791 19.23952 Conv_57 0.9996 0.11046 0.01485 12.03829 11.99651 ...

Sort by cosine similarity from front to back, and gradually set the operator INT16 quantization until box_arrow meets the requirements:

Serial NumberCosine Similarity ValueOutput Cosine
maskfiledattrbackgroudclsboxcls_slbox_slocclusioncls_arrowbox_arrow
1None0.9910850.9915930.9953690.9975240.9958180.9951490.9960010.9968510.9926330.9906500.983423
20.9990.9939780.9934290.9968530.9981600.9969150.9960350.9971240.9972490.9944570.9929930.989119
30.99950.9951260.9944260.9974940.9985540.9972450.9969410.9977750.9982970.9955180.9954440.990272

Finally, by setting some sensitive nodes to INT16, the average similarities of all model outputs meet the requirements. The quantization configuration and output similarity are as follows:

{ "model_config": { "activation": { "calibration_type": "max", "per_channel": True, "asymmetric": True, }, }, "node_config": { "Conv_360": {"input1": "ec"}, "Conv_3": {"qtype": "int16", "input1": "ec"}, "Conv_338": {"qtype": "int16", "input1": "ec"}, # 0.99 "Conv_265": {"qtype": "int16"}, "Conv_278": {"qtype": "int16"}, "Conv_287": {"qtype": "int16"}, "Conv_237": {"qtype": "int16"}, "Conv_267": {"qtype": "int16"}, "UNIT_CONV_FOR_BatchNormalization_141": {"qtype": "int16"}, "Conv_276": {"qtype": "int16"}, "Conv_289": {"qtype": "int16"}, "Conv_336": {"qtype": "int16"}, "Conv_135": {"qtype": "int16"}, "Add_140": {"qtype": "int16"}, "Conv_404": {"qtype": "int16"}, "Conv_3_split_low": {"qtype": "int16"}, "Conv_129": {"qtype": "int16"}, "Add_134": {"qtype": "int16"}, "Conv_338_split_low": {"qtype": "int16"}, # 0.999 "Conv_333": {"qtype": "int16"}, "Conv_107": {"qtype": "int16"}, "Concat_106": {"qtype": "int16"}, "Conv_339": {"qtype": "int16"}, "Conv_285": {"qtype": "int16"}, "Conv_335": {"qtype": "int16"}, "Conv_401": {"qtype": "int16"}, "Conv_337": {"qtype": "int16"}, "Conv_250": {"qtype": "int16"}, # 0.9995 "Conv_272": {"qtype": "int16"}, "Conv_384": {"qtype": "int16"}, "Conv_83": {"qtype": "int16"}, "Conv_8": {"qtype": "int16"}, "Add_249": {"qtype": "int16"}, "Conv_300": {"qtype": "int16"}, # box_arrow 0.999 "Mul_116": {"qtype": "int16"}, "Conv_239": {"qtype": "int16"}, "UNIT_CONV_FOR_BatchNormalization_161": {"qtype": "int16"}, "Conv_10": {"qtype": "int16"}, "GridSample_340": {"input0": "ec"}, "Conv_78": {"qtype": "int16"}, "Conv_163": {"qtype": "int16"}, # box_arrow 0.9995 "Relu_4": {"qtype": "int16"}, "Add_168": {"qtype": "int16"}, "Conv_8": {"qtype": "int16"}, "Conv_5": {"qtype": "int16"}, }, "subgraph_config": { "arrow_head": { "inputs": ["Reshape_390"], "outputs": ["Conv_403", "Conv_404"], "qtype": "int16", } } } +------------+-------------------+----------+----------+----------+ | Output | Metric | Min | Max | Avg | +------------+-------------------+----------+----------+----------+ | mask | cosine-similarity | 0.905676 | 0.998650 | 0.995126 | | field | cosine-similarity | 0.966730 | 0.999263 | 0.994426 | | attr | cosine-similarity | 0.858142 | 0.999728 | 0.997494 | | background | cosine-similarity | 0.990916 | 0.999713 | 0.998554 | | cls | cosine-similarity | 0.881800 | 0.999640 | 0.997245 | | box | cosine-similarity | 0.878442 | 0.999090 | 0.996941 | | cls_sl | cosine-similarity | 0.917514 | 0.999845 | 0.997775 | | box_sl | cosine-similarity | 0.923411 | 0.999942 | 0.998297 | | occlusion | cosine-similarity | 0.972673 | 0.998806 | 0.995518 | | cls_arrow | cosine-similarity | 0.678432 | 0.999992 | 0.995444 | | box_arrow | cosine-similarity | 0.619935 | 0.999886 | 0.990272 | +------------+-------------------+----------+----------+----------+

A complete accuracy tuning deployment example is available at: Lane Accuracy Tuning Deployment Example.

Accuracy Tuning Techniques

The PTQ accuracy tuning pipeline requires constant modification of node configurations, model compilation and accuracy verification. The entire process is time-consuming and expensive to debug. Based on this, we provide the IR interface to support you to directly modify quantization parameters in calibrated_model.onnx for rapid verification.

from hmct.ir import load_model, save_model from hmct.common import find_input_calibration, find_output_calibration model = load_model("calibrated_model.onnx") # Modify specific activation or weight calibration nodes to use specific qtype node = model.graph.node_mappings["ReduceMax_1317_HzCalibration"] print(node.qtype) # Support qtype reading node.qtype = "float32" # Support int8, int16, float16, float32 configured # Configure all activation or weight calibration nodes to use int16 qtype calibration_nodes = model.graph.type2nodes["HzCalibration"] # Configure all activation calibration nodes to use int16 qtype for node in calibration_nodes: if node.tensor_type == "feature": node.qtype = "int16" # Configure all weight calibration nodes to use int16 qtype for node in calibration_nodes: if node.tensor_type == "weight": node.qtype = "int16" # Configure all calibration nodes to use int16 qtype for node in calibration_nodes: node.qtype = "int16" # Configure a node with int16 input qtype for node in model.graph.nodes: if node.name in ["Conv_0"]: for i in range(len(node.inputs)): input_calib = find_input_calibration(node, i) # It is required to be able to find HzCalibration in the # input, and tensor_type is feature. if input_calib and input_calib.tensor_type == "feature": input_calib.qtype = "int16" # Configure a node with int16 output qtype for node in model.graph.nodes: if node.name in ["Conv_0"]: output_calib = find_output_calibration(node) # It is required to be able to find HzCalibration in the # input, and tensor_type is feature. if output_calib and output_calib.tensor_type == "feature": input_calib.qtype = "int16" # Configure nodes with specific op_type to int16 for node in model.graph.nodes: if node.op_type in ["Conv"]: for i in range(len(node.inputs)): input_calib = find_input_calibration(node, i) # It is required to be able to find HzCalibration in the # input, and tensor_type is feature. if input_calib and input_calib.tensor_type == "feature": input_calib.qtype = "int16" # Modify specific activation or weight calibration nodes with specific thresholds node = model.graph.node_mappings["ReduceMax_1317_HzCalibration"] print(node.thresholds) # Support thresholds reading node.thresholds = [4.23] # Support np.array, List[float] save_model(model, "calibrated_model_modified.onnx")