精度调优实战

本章以实际使用过程中遇到的精度问题为例,介绍PTQ链路的精度调优流程,请确保先看完模型精度调优章节,了解相关的理论知识和工具用法。

典型的精度问题包括:

  1. 全INT16量化精度达标,精度debug工具能够提供相对准确的敏感节点排序;
  2. 全INT16量化精度达标,设置大量敏感节点为高精度无法有效提升量化精度;
  3. 全INT16量化精度不达标,模型全BPU量化的前提下,进一步提高量化精度。

敏感节点分析

精度Debug工具提供了节点量化敏感度的计算接口,能够计算各算子量化对输出结果的影响程度,将量化损失高的节点设置为高精度,完成精度调优。以HybridNets模型为例介绍该调优过程。

采用HMCT default INT8量化,校准算法选择percentile,校准精度未达标(det和ll_seg精度下降超1%):

Model Float March Samples calibrated_model Cosine_Similarity ------------------------- ------- ------- --------- ------------------ ------------------- hybridnets-384-640_det 0.77222 nash-e 10000 0.75562(97.85%) 0.98012 hybridnets-384-640_da_seg 0.90467 nash-e 10000 0.89675(99.12%) 0.98012 hybridnets-384-640_ll_seg 0.85376 nash-e 10000 0.81813(95.83%) 0.98012

全INT16精度

首先设置all_node_type为INT16,校准算法选择percentile,此时校准精度满足要求,可以使用INT8+INT16混合精度完成调优:

quant_config = {"model_config": {"all_node_type": "int16"}} Model Float March Samples calibrated_model Cosine_Similarity ------------------------- ------- ------- --------- ------------------ ------------------- hybridnets-384-640_det 0.77222 nash-e 10000 0.76866(99.54%) 0.997147 hybridnets-384-640_da_seg 0.90467 nash-e 10000 0.90405(99.93%) 0.997147 hybridnets-384-640_ll_seg 0.85376 nash-e 10000 0.84732(99.25%) 0.997147

混合精度调试

基于全INT16选择的percentile校准算法编译INT8校准模型,yaml文件中配置debug_mode为 "dump_calibration_data" 来保存校准数据,通过get_sensitivity_of_nodes输出节点量化敏感度:

hmct-debugger get-sensitivity-of-nodes hybridnets-384-640_calibrated_model.onnx calibration_data/ -n node -v True -s ./debug_result
===========================node sensitivity============================ node cosine-similarity ----------------------------------------------------------------------- /encoder/_blocks.0/_depthwise_conv/Conv 0.98768 /encoder/_swish/Mul 0.99526 /encoder/_blocks.2/_depthwise_conv/Conv 0.99852 /encoder/_blocks.0/Mul 0.99887 /encoder/_blocks.0/GlobalAveragePool 0.99889 /encoder/_blocks.2/_swish/Mul 0.99957 /bifpn/bifpn.5/conv3_up/depthwise_conv/conv/Conv 0.99964 /encoder/_blocks.0/_swish/Mul 0.99969 /encoder/_blocks.2/Mul 0.99979 /encoder/_blocks.2/GlobalAveragePool 0.9998 /bifpn/bifpn.2/conv3_up/pointwise_conv/conv/Conv 0.99983 /encoder/_blocks.2/_swish_1/Mul 0.99984 /bifpn/bifpn.5/p4_downsample/Pad 0.99985 ...er/seg_blocks.4/block/block.0/block/block.0/Pad 0.99985 /encoder/_blocks.0/_project_conv/Conv 0.99986 /encoder/_blocks.5/_depthwise_conv/Conv 0.99988 /bifpn/bifpn.3/conv3_up/depthwise_conv/conv/Conv 0.99989 /classifier/conv_list.0/depthwise_conv/conv/Conv 0.99989 ..._blocks.4/block/block.0/block/block.0/conv/Conv 0.99989 /regressor/conv_list.0/depthwise_conv/conv/Conv 0.99989 /bifpn/bifpn.2/conv3_up/depthwise_conv/conv/Conv 0.99992 /encoder/_blocks.17/_se_expand/Conv 0.99992 /encoder/_blocks.13/Mul 0.99992 /encoder/_blocks.1/_depthwise_conv/Conv 0.99992 /encoder/_blocks.13/GlobalAveragePool 0.99992 /encoder/_blocks.14/_se_expand/Conv 0.99992 /classifier/header/pointwise_conv/conv/Conv 0.99992 /encoder/_blocks.1/Add 0.99993 /encoder/_blocks.3/Mul 0.99993 /encoder/_blocks.3/GlobalAveragePool 0.99993 /encoder/_blocks.1/_swish/Mul 0.99993 /encoder/_blocks.15/Mul 0.99993 /encoder/_blocks.15/GlobalAveragePool 0.99993 /bifpn/bifpn.4/conv3_up/depthwise_conv/conv/Conv 0.99993 /bifpn/bifpn.1/conv3_up/pointwise_conv/conv/Conv 0.99993 /bifpn/bifpn.4/conv3_up/pointwise_conv/conv/Conv 0.99994 /encoder/_blocks.8/_project_conv/Conv 0.99994 /bifpn/bifpn.5/swish_3/Mul 0.99994 /bifpn/bifpn.3/conv3_up/pointwise_conv/conv/Conv 0.99994 /encoder/_blocks.8/GlobalAveragePool 0.99994 /encoder/_conv_stem/Conv 0.99994 /encoder/_blocks.13/_project_conv/Conv 0.99994 /encoder/_blocks.8/Mul 0.99994 /bifpn/bifpn.5/conv3_up/pointwise_conv/conv/Conv 0.99995 ...

按照余弦相似度排序从前往后的顺序,逐步设置算子INT16量化,校准模型精度也会随着增加,直到满足需求:

序号余弦相似度阈值(<=该阈值设置为INT16)精度
detda_segll_seg
1None0.75562(97.85%)0.89675(99.12%)0.81813(95.83%)
20.9990.76531(99.11%)0.90274(99.79%)0.83874(98.24%)
30.99980.76545(99.12%)0.90340(99.86%)0.83961(98.34%)
40.99990.76613(99.21%)0.90420(99.95%)0.84216(98.64%)
50.999920.76712(99.34%)0.90356(99.88%)0.84397(98.85%)
60.999930.76781(99.43%)0.90374(99.90%)0.84484(98.95%)
70.999940.76811(99.47%)0.90344(99.86%)0.84528(99.01%)

由上述测试表格可得,将敏感度阈值小于等于0.99994的敏感节点设为INT16节点,校准精度满足需求:

quant_config = { "model_config": { "activation": {"calibration_type": "max", "max_percentile": 0.99995}, }, "node_config": { "/encoder/_blocks.0/_depthwise_conv/Conv": {"qtype": "int16"}, "/encoder/_swish/Mul": {"qtype": "int16"}, "/encoder/_blocks.2/_depthwise_conv/Conv": {"qtype": "int16"}, "/encoder/_blocks.0/Mul": {"qtype": "int16"}, "/encoder/_blocks.0/GlobalAveragePool": {"qtype": "int16"}, "/encoder/_blocks.2/_swish/Mul": {"qtype": "int16"}, "/bifpn/bifpn.5/conv3_up/depthwise_conv/conv/Conv": {"qtype": "int16"}, "/encoder/_blocks.0/_swish/Mul": {"qtype": "int16"}, "/encoder/_blocks.2/Mul": {"qtype": "int16"}, "/encoder/_blocks.2/GlobalAveragePool": {"qtype": "int16"}, "/bifpn/bifpn.2/conv3_up/pointwise_conv/conv/Conv": {"qtype": "int16"}, "/encoder/_blocks.2/_swish_1/Mul": {"qtype": "int16"}, "/bifpn/bifpn.5/p4_downsample/Pad": {"qtype": "int16"}, "/bifpndecoder/seg_blocks.4/block/block.0/block/block.0/Pad": {"qtype": "int16"}, "/encoder/_blocks.0/_project_conv/Conv": {"qtype": "int16"}, "/encoder/_blocks.5/_depthwise_conv/Conv": {"qtype": "int16"}, "/bifpn/bifpn.3/conv3_up/depthwise_conv/conv/Conv": {"qtype": "int16"}, "/classifier/conv_list.0/depthwise_conv/conv/Conv": {"qtype": "int16"}, "/bifpndecoder/seg_blocks.4/block/block.0/block/block.0/conv/Conv": {"qtype": "int16"}, "/regressor/conv_list.0/depthwise_conv/conv/Conv": {"qtype": "int16"}, "/bifpn/bifpn.2/conv3_up/depthwise_conv/conv/Conv": {"qtype": "int16"}, "/encoder/_blocks.17/_se_expand/Conv": {"qtype": "int16"}, "/encoder/_blocks.13/Mul": {"qtype": "int16"}, "/encoder/_blocks.1/_depthwise_conv/Conv": {"qtype": "int16"}, "/encoder/_blocks.13/GlobalAveragePool": {"qtype": "int16"}, "/encoder/_blocks.14/_se_expand/Conv": {"qtype": "int16"}, "/classifier/header/pointwise_conv/conv/Conv": {"qtype": "int16"}, "/encoder/_blocks.1/Add": {"qtype": "int16"}, "/encoder/_blocks.3/Mul": {"qtype": "int16"}, "/encoder/_blocks.3/GlobalAveragePool": {"qtype": "int16"}, "/encoder/_blocks.1/_swish/Mul": {"qtype": "int16"}, "/encoder/_blocks.15/Mul": {"qtype": "int16"}, "/encoder/_blocks.15/GlobalAveragePool": {"qtype": "int16"}, "/bifpn/bifpn.4/conv3_up/depthwise_conv/conv/Conv": {"qtype": "int16"}, "/bifpn/bifpn.1/conv3_up/pointwise_conv/conv/Conv": {"qtype": "int16"}, "/bifpn/bifpn.4/conv3_up/pointwise_conv/conv/Conv": {"qtype": "int16"}, "/encoder/_blocks.8/_project_conv/Conv": {"qtype": "int16"}, "/bifpn/bifpn.5/swish_3/Mul": {"qtype": "int16"}, "/bifpn/bifpn.3/conv3_up/pointwise_conv/conv/Conv": {"qtype": "int16"}, "/encoder/_blocks.8/GlobalAveragePool": {"qtype": "int16"}, "/encoder/_conv_stem/Conv": {"qtype": "int16"}, "/encoder/_blocks.13/_project_conv/Conv": {"qtype": "int16"}, "/encoder/_blocks.8/Mul": {"qtype": "int16"}, }, } Model Float March Samples calibrated_model Cosine_Similarity ------------------------- ------- ------- --------- -------------------- ----------------- hybridnets-384-640_det 0.77222 nash-e 10000 0.76811(99.47%) 0.994576 hybridnets-384-640_da_seg 0.90467 nash-e 10000 0.90344(99.86%) 0.994576 hybridnets-384-640_ll_seg 0.85376 nash-e 10000 0.84528(99.01%) 0.994576

完整的精度调优部署示例见:HybridNets精度调优部署示例

敏感节点失效

当使用精度Debug工具设置敏感节点为高精度无法有效提升模型精度时,可先尝试指定输出节点,过滤掉不相关的节点,此外观察模型输出误差选择其他评估指标,提高敏感度排序和精度的相关性,更进一步通过分析模型结构,将量化损失风险较大的典型子结构(模型输出、输入以及具有特定物理意义的结构等)设为高精度,完成精度调优。以YoloP模型为例介绍该调优过程。

采用HMCT default INT8量化,校准算法选择percentile,校准精度未达标(det精度下降超1%):

Model Float March Samples calibrated_model Cosine_Similarity -------------------- ------- ------- --------- ------------------ ------------------- yolop-384-640_det 0.76448 nash-e 10000 0.61507(80.46%) 0.999891 yolop-384-640_da_seg 0.89008 nash-e 10000 0.88863(99.84%) 0.999891 yolop-384-640_ll_seg 0.6523 nash-e 10000 0.65357(100.19%) 0.999891

全INT16精度

首先设置all_node_type为INT16,校准算法选择percentile,此时校准精度满足要求,可以使用INT8+INT16混合精度完成调优:

quant_config = {"model_config": {"all_node_type": "int16"}} Model Float March Samples calibrated_model Cosine_Similarity -------------------- ------- ------- --------- ------------------ ----------------- yolop-384-640_det 0.76448 nash-e 10000 0.75890(99.27%) 0.99999 yolop-384-640_da_seg 0.89008 nash-e 10000 0.88950(99.93%) 0.99999 yolop-384-640_ll_seg 0.6523 nash-e 10000 0.64821(99.37%) 0.99999

混合精度调试

基于全INT16选择的percentile校准算法编译INT8校准模型,yaml文件中配置debug_mode为"dump_calibration_data"来保存校准数据,通过get_sensitivity_of_nodes输出节点量化敏感度:

hmct-debugger get-sensitivity-of-nodes yolop-384-640_calibrated_model.onnx calibration_data/ -n node -v True -s ./debug_result
=======================node sensitivity======================== node cosine-similarity --------------------------------------------------------------- Mul_943 0.99736 Mul_647 0.99894 Mul_795 0.99909 Conv_50 0.99976 Div_49 0.99983 Conv_92 0.99989 Div_58 0.9999 Conv_1119 0.99995 Conv_88 0.99996 Conv_41 0.99996 Conv_59 0.99998 Slice_4 0.99998 Slice_9 0.99998 Slice_14 0.99998 Slice_19 0.99998 Slice_24 0.99998 Slice_29 0.99998 Slice_34 0.99998 Slice_39 0.99998 Concat_40 0.99998 Div_67 0.99998 MaxPool_297 0.99999 MaxPool_298 0.99999 MaxPool_299 0.99999 Concat_300 0.99999 Concat_1003 0.99999 Conv_177 0.99999 Div_296 0.99999 ScatterND_705 0.99999 Slice_645 0.99999 Reshape_706 0.99999 Conv_110 0.99999 Conv_87 0.99999 Concat_89 0.99999 LeakyRelu_91 0.99999 Mul_584 0.99999 ScatterND_640 0.99999 Concat_1105 0.99999 LeakyRelu_1107 0.99999 Conv_1004 0.99999 Add_582 0.99999 Conv_119 0.99999 Resize_1014 0.99999 Conv_1015 0.99999 Conv_1043 0.99999 Conv_266 0.99999 Conv_199 0.99999 Div_100 0.99999 ...

按照余弦相似度排序从前往后顺序,逐步设置算子INT16量化,然而即使将大量敏感节点设为INT16,精度也未能达标:

序号余弦相似度阈值(<=该阈值设置为INT16)精度
detda_segll_seg
1None0.61507(80.46%)0.88863(99.84%)0.65357(100.19%)
20.99990.60956(79.74%)0.88911(99.89%)0.65925(101.07%)
30.999960.60978(79.76%)0.88933(99.92%)0.66112(101.35%)
40.999980.60956(79.73%)0.88931(99.91%)0.66125(101.37%)
50.999990.66426(86.89%)0.88958(99.94%)0.66065(101.28%)

观察INT8校准模型的精度结果,只有det分支的精度未达到99%,通过get_sensitivity_of_nodes接口计算节点量化敏感度时,可以通过-o选项指定det输出对应的节点,仅计算影响det输出的敏感度排序,提高精度问题定位准确度:

hmct-debugger get-sensitivity-of-nodes yolop-384-640_calibrated_model.onnx calibration_data/ -n node -o Concat_1003 -v True -s det_debug_result/
=======================node sensitivity======================== node cosine-similarity --------------------------------------------------------------- Mul_943 0.99736 Mul_647 0.99894 Mul_795 0.99909 Conv_50 0.99997 Div_58 0.99997 Div_49 0.99999 Concat_1003 0.99999 ScatterND_705 0.99999 Slice_645 0.99999 Reshape_706 0.99999 Mul_584 0.99999 ScatterND_640 0.99999 Add_582 0.99999 Conv_41 0.99999 Slice_4 0.99999 Slice_9 0.99999 Slice_14 0.99999 Slice_19 0.99999 Slice_24 0.99999 Slice_29 0.99999 Slice_34 0.99999 Slice_39 0.99999 Concat_40 0.99999 Conv_88 0.99999 Conv_59 0.99999 ...

按照余弦相似度排序从前往后顺序,设置算子INT16量化,仅关注det输出能过滤掉无用节点,但最终精度仍未达标:

序号余弦相似度阈值(<=该阈值设置为INT16)精度
detda_segll_seg
1None0.61507(80.46%)0.88863(99.84%)0.65357(100.19%)
20.99990.60868(79.62%)0.88836(99.81%)0.65300(100.11%)
30.999970.60961(79.74%)0.88902(99.88%)0.65664(100.66%)
40.999990.66461(86.94%)0.88932(99.91%)0.65876(100.99%)

观察INT8校准模型的输出相似度,其中det分支的L1和L2距离同浮点偏差很大,尝试将余弦相似度替换为其他指标:

hmct-info yolop-384-640_calibrated_model.onnx -c ./calibration_data/images/00.npy INFO:root:The quantized model output: ================================================================================= Output Cosine Similarity L1 Distance L2 Distance Chebyshev Distance --------------------------------------------------------------------------------- det_out 0.995352 7.633481 289.637665 552.817566 drive_area_seg 0.998973 0.004005 0.001132 0.592610 lane_line_seg 0.999933 0.000417 0.000069 0.564768

通过get_sensitivity_of_nodes接口计算节点敏感度时,可以指定mse作为评估指标,提高不同节点之间的区分度:

hmct-debugger get-sensitivity-of-nodes yolop-384-640_calibrated_model.onnx calibration_data/ -n node -o Concat_1003 -m mse -v True -s det_mse_debug_result/
===================node sensitivity==================== node mse ------------------------------------------------------- Mul_943 164.82712 Mul_647 65.88637 Mul_795 56.86866 Conv_50 2.04226 Div_58 1.88065 Concat_1003 0.87797 Div_49 0.84962 ScatterND_705 0.67858 Slice_645 0.67379 Reshape_706 0.67379 ScatterND_640 0.55187 Mul_584 0.54884 Add_582 0.52263 Conv_41 0.4714 Slice_4 0.38413 Slice_9 0.38413 Slice_14 0.38413 Slice_19 0.38413 Slice_24 0.38413 Slice_29 0.38413 Slice_34 0.38413 Slice_39 0.38413 Concat_40 0.38413 Conv_88 0.35534 Conv_59 0.33164 Conv_92 0.30398 Div_67 0.16826 ScatterND_853 0.16711 Slice_793 0.1627 Reshape_854 0.1627 ScatterND_788 0.13494 Mul_732 0.132 Add_730 0.12381 ScatterND_1001 0.07478 Conv_550 0.07226 Conv_518 0.06468 Conv_546 0.06344 Conv_310 0.05974 Conv_68 0.05735 Conv_338 0.05684 Add_86 0.04319 ScatterND_936 0.0428 Concat_89 0.04201 LeakyRelu_91 0.04201 Concat_517 0.04193 Conv_87 0.04148 Slice_941 0.04148 Reshape_1002 0.04148 Conv_448 0.04034 MaxPool_297 0.03739 MaxPool_298 0.03739 MaxPool_299 0.03739 Concat_300 0.03739 Mul_880 0.03334 Div_100 0.03328 Conv_855 0.03146 Concat_547 0.03094 LeakyRelu_549 0.03094 Add_878 0.03016 Div_558 0.02966 ...

按照mse相似度排序从前往后顺序,设置算子高精度,最终在增加大量INT16节点后精度才能够达标:

序号MSE阈值(>=该阈值设置为INT16)精度
detda_segll_seg
1None0.61507(80.46%)0.88863(99.84%)0.65357(100.19%)
20.50.66471(86.95%)0.88902(99.88%)0.65664(100.66%)
30.20.66447(86.92%)0.88933(99.92%)0.66112(101.35%)
40.10.72969(95.45%)0.88931(99.91%)0.66125(101.37%)
50.050.73393(96.00%)0.88934(99.92%)0.66121(101.37%)
60.040.73321(95.91%)0.88931(99.91%)0.66125(101.37%)
70.030.75707(99.03%)0.88944(99.93%)0.66137(101.39%)

分析子图结构

即使是基于mse指标只关注det输出的提升,设置大量敏感节点高精度仍无法有效提升精度,更进一步考虑到当前仅det任务不达标,推断da_seg分支、ll_seg分支以及公共的backbone部分存在敏感节点的可能性较小,关注det分支的精度调优,结合模型结构分析,尝试指定det输出位置子图采用高精度量化,测试精度:

quant_config = { "subgraph_config": { "det_head": { "inputs": ["Conv_559", "Conv_707", "Conv_855"], "outputs": ["Concat_1003"], "qtype": "int16", }, } } Model Float March Samples calibrated_model Cosine_Similarity -------------------- ------- ------- --------- ------------------ ------------------- yolop-384-640_det 0.76448 nash-e 10000 0.76275(99.77%) 0.99991 yolop-384-640_da_seg 0.89008 nash-e 10000 0.88863(99.84%) 0.99991 yolop-384-640_ll_seg 0.6523 nash-e 10000 0.65357(100.19%) 0.99991
注意

当敏感节点排序不准确时,通过配置子图高精度,初步确定损失来源,若子图高精度耗时增加显著,可以子图内进行敏感度分析,减少高精度算子占比。

完整的精度调优部署示例见:YoloP精度调优部署示例

量化损失补偿

受硬件约束及推理耗时的影响,设置全INT16量化时,模型中仍会存在INT8量化节点,包括:Conv和ConvTranspose权重、Resize、GridSample以及MatMul的第2个输入,PTQ通过引入一个相同算子,能够补偿INT8量化导致的精度损失,进一步提高模型全BPU量化精度,完成精度调优。以Lane模型为例介绍该调优过程。

采用HMCT default INT8量化,校准算法选择max_asy_perchannel,校准精度未达标(所有输出平均相似度均低于0.99):

+------------+-------------------+-----------+----------+----------+ | Output | Metric | Min | Max | Avg | +------------+-------------------+-----------+----------+----------+ | mask | cosine-similarity | 0.575118 | 0.956694 | 0.875484 | | field | cosine-similarity | 0.653135 | 0.948818 | 0.883109 | | attr | cosine-similarity | 0.456388 | 0.986300 | 0.906577 | | background | cosine-similarity | 0.878089 | 0.997221 | 0.979943 | | cls | cosine-similarity | 0.444225 | 0.996364 | 0.958695 | | box | cosine-similarity | 0.209923 | 0.993850 | 0.941041 | | cls_sl | cosine-similarity | 0.353770 | 0.998059 | 0.956674 | | box_sl | cosine-similarity | 0.196419 | 0.998049 | 0.948152 | | occlusion | cosine-similarity | 0.786355 | 0.978297 | 0.939203 | | cls_arrow | cosine-similarity | 0.079772 | 0.999830 | 0.940612 | | box_arrow | cosine-similarity | -0.231095 | 0.998620 | 0.850558 | +------------+-------------------+-----------+----------+----------+

全INT16精度

首先设置all_node_type为INT16,校准算法选择max,此时校准精度仍不满足要求(occlusion和box_arrow输出平均相似度未达0.99),需要进一步提升精度:

quant_config = {"model_config": {"all_node_type": "int16"}} +------------+-------------------+----------+----------+----------+ | Output | Metric | Min | Max | Avg | +------------+-------------------+----------+----------+----------+ | mask | cosine-similarity | 0.926001 | 0.998617 | 0.992738 | | field | cosine-similarity | 0.945007 | 0.999313 | 0.993549 | | attr | cosine-similarity | 0.871824 | 0.999821 | 0.996161 | | background | cosine-similarity | 0.981510 | 0.999835 | 0.998274 | | cls | cosine-similarity | 0.918296 | 0.999851 | 0.997182 | | box | cosine-similarity | 0.911032 | 0.999134 | 0.996155 | | cls_sl | cosine-similarity | 0.933632 | 0.999918 | 0.997105 | | box_sl | cosine-similarity | 0.850244 | 0.998877 | 0.996493 | | occlusion | cosine-similarity | 0.943404 | 0.993528 | 0.983970 | | cls_arrow | cosine-similarity | 0.560625 | 0.999993 | 0.994583 | | box_arrow | cosine-similarity | 0.755858 | 0.999889 | 0.987496 | +------------+-------------------+----------+----------+----------+

INT16上限精度

由于设置all_node_type为INT16后,模型中仍会存在INT8量化节点,可以通过HMCT提供的IR接口将校准模型中所有校准节点数据类型修改为INT16,得到真INT16模型:

from hmct.ir import load_model, save_model model = load_model("lane_calibrated_model_int16.onnx") calibration_nodes = model.graph.type2nodes["HzCalibration"] for node in calibration_nodes: node.qtype = "int16" save_model(model, "lane_calibrated_model_real_int16.onnx")

验证真INT16校准模型在所有输出上的平均相似度,均能够满足要求,该模型可以通过补偿误差来完成调优。

+------------+-------------------+----------+----------+----------+ | Output | Metric | Min | Max | Avg | +------------+-------------------+----------+----------+----------+ | mask | cosine-similarity | 0.999819 | 0.999995 | 0.999984 | | field | cosine-similarity | 0.999833 | 0.999997 | 0.999985 | | attr | cosine-similarity | 0.999743 | 0.999999 | 0.999992 | | background | cosine-similarity | 0.999977 | 0.999999 | 0.999996 | | cls | cosine-similarity | 0.999852 | 0.999999 | 0.999994 | | box | cosine-similarity | 0.999681 | 0.999999 | 0.999990 | | cls_sl | cosine-similarity | 0.999722 | 1.000000 | 0.999993 | | box_sl | cosine-similarity | 0.999529 | 1.000000 | 0.999992 | | occlusion | cosine-similarity | 0.999844 | 0.999996 | 0.999978 | | cls_arrow | cosine-similarity | 0.998314 | 1.000000 | 0.999985 | | box_arrow | cosine-similarity | 0.999478 | 1.000000 | 0.999971 | +------------+-------------------+----------+----------+----------+

补偿量化损失

补偿误差的分析过程需要基于全INT16校准模型,通过get_sensitivity_of_nodes输出节点量化敏感度:

hmct-debugger get-sensitivity-of-nodes lane_calibrated_model_int16.onnx calibration_data/ -m ['cosine-similarity','mre','mse','sqnr','chebyshev'] -n node -v True -s ./int16_debug_result
=========================================node sensitivity========================================= node cosine-similarity mre mse sqnr chebyshev -------------------------------------------------------------------------------------------------- Conv_360 0.98855 0.07025 0.61746 7.52018 4.78374 Conv_3 0.99895 0.2379 79.3955 12.55955 134.92183 Conv_338 0.99896 0.88695 0.00029 13.35992 1.54686 Conv_336 0.9994 0.04776 0.0002 14.62035 1.03647 ...

接着按照节点敏感度排序修改校准模型,将量化精度从INT8提升至INT16,直到occlusion和box_arrow满足精度需求:

from hmct.common import find_input_calibration, find_output_calibration from hmct.ir import load_model, save_model model = load_model("lane_calibrated_model_int16.onnx") improved_nodes = ["Conv_360", "Conv_3", "Conv_338"] for node in model.graph.nodes: if node.name not in improved_nodes: continue if node.op_type in ["Conv", "ConvTranspose", "MatMul"]: input1_calib = find_input_calibration(node, 1) if input1_calib and input1_calib.tensor_type == "weight": input1_calib.qtype = "int16" if node.op_type == "Resize": input_calib = find_input_calibration(node, 0) if input_calib and input_calib.tensor_type == "feature": input_calib.qtype = "int16" interpolation_mode = node.attributes.get("mode", "nearest") # nearest模式下,补偿误差的输出量化类型能提升至接近 # int16; 其他模式仅输入量化类型能提升至接近int16. if interpolation_mode == "nearest": output_calib = find_output_calibration(node) if output_calib and output_calib.tensor_type == "feature": output_calib.qtype = "int16" if node.op_type == "GridSample": input_calib = find_input_calibration(node, 0) if input_calib and input_calib.tensor_type == "feature": input_calib.qtype = "int16" interpolation_mode = node.attributes.get("mode", "bilinear") # nearest模式下,补偿误差的输出量化类型能提升至接近 # int16; 其他模式仅输入量化类型能提升至接近int16. if interpolation_mode == "nearest": output_calib = find_output_calibration(node) if output_calib and output_calib.tensor_type == "feature": output_calib.qtype = "int16" save_model(model, "lane_calibrated_model_int16_improved.onnx")
序号余弦相似度阈值(<=该阈值设置为INT16)输出相似度
occlusionbox_arrow
MinAvgMinAvg
1None0.9434040.9839700.7558580.987496
20.9990.9837390.9977290.8931160.994958
30.990.9527580.9941160.7457810.987434

由上方表格可知,将Conv_360, Conv_3, Conv_338权重量化精度从INT8提高至INT16,所有输出相似度能够达标。在HMCT全BPU部署时,具体做法是引入一个相同算子来补偿INT8量化导致的精度损失,将精度提升到接近INT16:

quant_config = { "model_config": { "all_node_type": "int16", "activation": {"calibration_type": "max"}, }, "node_config": { # Conv,ConvTranspose,MatMul通过配置input1为ec补偿权重量化损失 "Conv_360": {"input1": "ec"}, "Conv_3": {"input1": "ec"}, "Conv_338": {"input1": "ec"}, # GridSample, Resize通过配置input0为ec补偿输入量化损失 # "GridSample_340": {"input0": "ec"}, } } +------------+-------------------+----------+----------+----------+ | Output | Metric | Min | Max | Avg | +------------+-------------------+----------+----------+----------+ | mask | cosine-similarity | 0.983658 | 0.999305 | 0.997363 | | field | cosine-similarity | 0.972155 | 0.999655 | 0.996506 | | attr | cosine-similarity | 0.977372 | 0.999879 | 0.998771 | | background | cosine-similarity | 0.994089 | 0.999934 | 0.999493 | | cls | cosine-similarity | 0.984845 | 0.999909 | 0.999082 | | box | cosine-similarity | 0.977550 | 0.999447 | 0.998403 | | cls_sl | cosine-similarity | 0.980353 | 0.999958 | 0.998956 | | box_sl | cosine-similarity | 0.979305 | 0.999973 | 0.999332 | | occlusion | cosine-similarity | 0.982567 | 0.999608 | 0.997672 | | cls_arrow | cosine-similarity | 0.904646 | 0.999996 | 0.998248 | | box_arrow | cosine-similarity | 0.890971 | 0.999973 | 0.994999 | +------------+-------------------+----------+----------+----------+
注意

推荐Resize和GridSample采用nearest采样方式,此时算子输出不会引入新的数值,误差也能够被补偿掉,否则输出INT8量化也会额外引入损失,无法被补偿掉。

混合精度调试

补偿Conv_360, Conv_3, Conv_338权重量化损失后,全INT16校准模型精度能够达标,尝试基于误差补偿后的INT8校准模型开始调优,INT8校准模型精度如下:

quant_config = { "model_config": { "activation": { "calibration_type": "max", "per_channel": True, "asymmetric": True, }, }, "node_config": { "Conv_360": {"input1": "ec"}, "Conv_3": {"input1": "ec"}, "Conv_338": {"input1": "ec"}, } } +------------+-------------------+-----------+----------+----------+ | Output | Metric | Min | Max | Avg | +------------+-------------------+-----------+----------+----------+ | mask | cosine-similarity | 0.578707 | 0.950058 | 0.874048 | | field | cosine-similarity | 0.687287 | 0.946366 | 0.875366 | | attr | cosine-similarity | 0.471613 | 0.986946 | 0.908879 | | background | cosine-similarity | 0.851624 | 0.996991 | 0.976282 | | cls | cosine-similarity | 0.536348 | 0.996753 | 0.959749 | | box | cosine-similarity | 0.094459 | 0.994883 | 0.939461 | | cls_sl | cosine-similarity | 0.374808 | 0.998186 | 0.959271 | | box_sl | cosine-similarity | 0.079629 | 0.998462 | 0.947069 | | occlusion | cosine-similarity | 0.702038 | 0.986074 | 0.945837 | | cls_arrow | cosine-similarity | 0.060614 | 0.999781 | 0.942194 | | box_arrow | cosine-similarity | -0.301507 | 0.998179 | 0.829580 | +------------+-------------------+-----------+----------+----------+

通过get_sensitivity_of_nodes输出节点量化敏感度:

hmct-debugger get-sensitivity-of-nodes lane_calibrated_model.onnx calibration_data/ -m ['cosine-similarity','mre','mse','sqnr','chebyshev'] -n node -v True -s ./debug_result
===========================================node sensitivity=========================================== node cosine-similarity mre mse sqnr chebyshev ------------------------------------------------------------------------------------------------------ Conv_265 0.43427 12.56779 32.42019 0.37585 24.77806 Conv_278 0.84973 0.80948 21.66625 2.71994 16.2646 Conv_287 0.87352 2.34926 817.71234 0.42356 183.87369 Conv_237 0.96676 1.17564 12526.42871 3.45996 1538.01672 Conv_267 0.96678 2.02166 4.81972 4.51482 14.91702 UNIT_CONV_FOR_BatchNormalization_141 0.96682 1.17458 12521.30957 3.4652 1537.55347 Conv_276 0.97024 0.63814 10.79584 4.23258 13.53813 Conv_289 0.97159 0.61387 60.08849 6.0926 61.38558 Conv_336 0.97212 1.70509 0.00951 6.28411 1.07831 Conv_135 0.97478 0.93508 11459.125 3.94841 1468.99048 Add_140 0.97482 0.93514 11391.92676 3.95557 1464.76538 Conv_404 0.97672 2.96196 0.20074 6.37718 5.93086 Conv_3 0.97977 0.49545 27730.66992 2.96076 2199.58203 Conv_3_split_low 0.9798 0.49563 27710.51758 2.96429 2198.6333 Conv_129 0.97991 0.5559 14292.41602 4.12395 1598.18835 Add_134 0.97991 0.55644 14300.95312 4.12708 1598.60938 Conv_338 0.98833 6.00584 0.0032 8.16733 0.40305 Conv_338_split_low 0.98833 6.00581 0.0032 8.16729 0.4033 Conv_333 0.99496 5.13682 0.00184 10.63743 0.19728 Conv_107 0.99543 0.23465 2859.23389 7.1595 751.46326 Concat_106 0.99546 0.23441 2843.88184 7.17062 749.37109 Conv_339 0.99564 0.25485 0.17183 10.29787 8.05724 Conv_285 0.99576 0.21163 9.77346 10.03632 38.60489 Conv_335 0.99779 0.1337 0.44365 11.66653 2.3832 Conv_401 0.9978 2.66814 0.01664 11.78462 1.694 Conv_337 0.99871 0.1687 1.17969 12.91167 5.95476 Conv_250 0.99894 0.17839 0.70945 14.60334 11.99677 Conv_272 0.99917 0.06468 0.15378 13.46442 7.07149 Conv_384 0.99919 0.61947 4.9547 17.0275 71.32216 Conv_83 0.99921 0.11831 328.16125 11.67288 266.15958 Conv_8 0.99925 0.06326 409.16638 11.83647 295.1091 Add_249 0.99934 0.2801 0.5793 16.36763 12.01501 Conv_300 0.9995 0.01909 0.31657 14.86126 5.5387 Slice_299 0.99951 0.01896 0.30647 14.9317 5.27295 ...

按照余弦相似度排序从前往后的顺序,逐步设置算子INT16量化,校准模型相似度也会随之增加:

序号余弦相似度阈值输出相似度
maskfieldattrbackgroudclsboxcls_slbox_slocclusioncls_arrowbox_arrow
1None0.8740480.8753660.9088790.9762820.9597490.9394610.9592710.9470690.9458370.9421940.829580
20.990.9807080.9874830.9890230.9933680.9911540.9852050.9909000.9903750.9857210.9753500.963180
30.9990.9888580.9908370.9946690.9959940.9955700.9946600.9954660.9962020.9912010.9791000.980218
40.99950.9910850.9915930.9953690.9975240.9958180.9951490.9960010.9968510.9926330.9814710.982875

经上述测试表格调优,将敏感度阈值小于等于0.9995的敏感节点设为INT16,除cls_arrow和box_arrow外,其余输出平均相似度均不低于0.99。cls_arrow和box_arrow共用同一个分支,尝试基于0.9995敏感节点设置INT16的校准模型,配置arrow的输出head子图为INT16,量化配置及输出相似度:

{ "model_config": { "activation": { "calibration_type": "max", "per_channel": True, "asymmetric": True, }, }, "node_config": { "Conv_360": {"input1": "ec"}, "Conv_3": {"qtype": "int16", "input1": "ec"}, "Conv_338": {"qtype": "int16", "input1": "ec"}, # 0.99 "Conv_265": {"qtype": "int16"}, "Conv_278": {"qtype": "int16"}, "Conv_287": {"qtype": "int16"}, "Conv_237": {"qtype": "int16"}, "Conv_267": {"qtype": "int16"}, "UNIT_CONV_FOR_BatchNormalization_141": {"qtype": "int16"}, "Conv_276": {"qtype": "int16"}, "Conv_289": {"qtype": "int16"}, "Conv_336": {"qtype": "int16"}, "Conv_135": {"qtype": "int16"}, "Add_140": {"qtype": "int16"}, "Conv_404": {"qtype": "int16"}, "Conv_3_split_low": {"qtype": "int16"}, "Conv_129": {"qtype": "int16"}, "Add_134": {"qtype": "int16"}, "Conv_338_split_low": {"qtype": "int16"}, # 0.999 "Conv_333": {"qtype": "int16"}, "Conv_107": {"qtype": "int16"}, "Concat_106": {"qtype": "int16"}, "Conv_339": {"qtype": "int16"}, "Conv_285": {"qtype": "int16"}, "Conv_335": {"qtype": "int16"}, "Conv_401": {"qtype": "int16"}, "Conv_337": {"qtype": "int16"}, "Conv_250": {"qtype": "int16"}, # 0.9995 "Conv_272": {"qtype": "int16"}, "Conv_384": {"qtype": "int16"}, "Conv_83": {"qtype": "int16"}, "Conv_8": {"qtype": "int16"}, "Add_249": {"qtype": "int16"}, "Conv_300": {"qtype": "int16"}, }, "subgraph_config": { "arrow_head": { "inputs": ["Reshape_390"], "outputs": ["Conv_403", "Conv_404"], "qtype": "int16", } } } +------------+-------------------+----------+----------+----------+ | Output | Metric | Min | Max | Avg | +------------+-------------------+----------+----------+----------+ | mask | cosine-similarity | 0.926363 | 0.997833 | 0.991085 | | field | cosine-similarity | 0.915524 | 0.999179 | 0.991593 | | attr | cosine-similarity | 0.869666 | 0.999608 | 0.995369 | | background | cosine-similarity | 0.983465 | 0.999664 | 0.997524 | | cls | cosine-similarity | 0.929948 | 0.999513 | 0.995818 | | box | cosine-similarity | 0.890618 | 0.999021 | 0.995149 | | cls_sl | cosine-similarity | 0.937240 | 0.999738 | 0.996001 | | box_sl | cosine-similarity | 0.880057 | 0.999896 | 0.996851 | | occlusion | cosine-similarity | 0.966050 | 0.998043 | 0.992633 | | cls_arrow | cosine-similarity | 0.380447 | 0.999980 | 0.990650 | | box_arrow | cosine-similarity | 0.556044 | 0.999856 | 0.983423 | +------------+-------------------+----------+----------+----------+

仅box_arrow输出平均相似度未达标,单独指定box_arrow输出重新获取敏感度排序:

hmct-debugger get-sensitivity-of-nodes lane_calibrated_model_box.onnx calibration_data/ -m ['cosine-similarity','mre','mse','sqnr','chebyshev'] -n node -o Conv_404 -v True -s ./box_debug_result
========================================node sensitivity======================================== node cosine-similarity mre mse sqnr chebyshev ------------------------------------------------------------------------------------------------ Mul_116 0.9987 0.35957 0.03655 10.08165 18.38877 Conv_239 0.99871 0.38099 0.01399 12.16738 6.04099 UNIT_CONV_FOR_BatchNormalization_161 0.99871 0.3832 0.01404 12.15968 6.11046 Conv_10 0.99879 0.12717 0.04052 9.85789 19.9839 GridSample_340 0.99887 0.34926 0.02639 10.78928 15.52824 Conv_78 0.9989 0.33779 0.02165 11.21884 12.89857 Conv_163 0.9989 0.16412 0.03985 9.89421 20.19754 Relu_4 0.99902 0.39383 0.05239 9.29984 22.57018 Add_168 0.99902 0.16685 0.04009 9.88115 20.26648 Conv_8 0.99921 0.10613 0.04032 9.86872 19.64041 Conv_5 0.99932 0.37324 0.03799 9.99791 19.23952 Conv_57 0.9996 0.11046 0.01485 12.03829 11.99651 ...

按照余弦相似度排序从前往后的顺序,逐步设置算子INT16量化,直到box_arrow输出相似度满足要求:

序号余弦相似度阈值输出相似度
maskfiledattrbackgroudclsboxcls_slbox_slocclusioncls_arrowbox_arrow
1None0.9910850.9915930.9953690.9975240.9958180.9951490.9960010.9968510.9926330.9906500.983423
20.9990.9939780.9934290.9968530.9981600.9969150.9960350.9971240.9972490.9944570.9929930.989119
30.99950.9951260.9944260.9974940.9985540.9972450.9969410.9977750.9982970.9955180.9954440.990272

最终通过设置部分敏感节点INT16,模型所有输出的平均相似度均满足要求,量化配置及输出相似度如下:

{ "model_config": { "activation": { "calibration_type": "max", "per_channel": True, "asymmetric": True, }, }, "node_config": { "Conv_360": {"input1": "ec"}, "Conv_3": {"qtype": "int16", "input1": "ec"}, "Conv_338": {"qtype": "int16", "input1": "ec"}, # 0.99 "Conv_265": {"qtype": "int16"}, "Conv_278": {"qtype": "int16"}, "Conv_287": {"qtype": "int16"}, "Conv_237": {"qtype": "int16"}, "Conv_267": {"qtype": "int16"}, "UNIT_CONV_FOR_BatchNormalization_141": {"qtype": "int16"}, "Conv_276": {"qtype": "int16"}, "Conv_289": {"qtype": "int16"}, "Conv_336": {"qtype": "int16"}, "Conv_135": {"qtype": "int16"}, "Add_140": {"qtype": "int16"}, "Conv_404": {"qtype": "int16"}, "Conv_3_split_low": {"qtype": "int16"}, "Conv_129": {"qtype": "int16"}, "Add_134": {"qtype": "int16"}, "Conv_338_split_low": {"qtype": "int16"}, # 0.999 "Conv_333": {"qtype": "int16"}, "Conv_107": {"qtype": "int16"}, "Concat_106": {"qtype": "int16"}, "Conv_339": {"qtype": "int16"}, "Conv_285": {"qtype": "int16"}, "Conv_335": {"qtype": "int16"}, "Conv_401": {"qtype": "int16"}, "Conv_337": {"qtype": "int16"}, "Conv_250": {"qtype": "int16"}, # 0.9995 "Conv_272": {"qtype": "int16"}, "Conv_384": {"qtype": "int16"}, "Conv_83": {"qtype": "int16"}, "Conv_8": {"qtype": "int16"}, "Add_249": {"qtype": "int16"}, "Conv_300": {"qtype": "int16"}, # box_arrow 0.999 "Mul_116": {"qtype": "int16"}, "Conv_239": {"qtype": "int16"}, "UNIT_CONV_FOR_BatchNormalization_161": {"qtype": "int16"}, "Conv_10": {"qtype": "int16"}, "GridSample_340": {"input0": "ec"}, "Conv_78": {"qtype": "int16"}, "Conv_163": {"qtype": "int16"}, # box_arrow 0.9995 "Relu_4": {"qtype": "int16"}, "Add_168": {"qtype": "int16"}, "Conv_8": {"qtype": "int16"}, "Conv_5": {"qtype": "int16"}, }, "subgraph_config": { "arrow_head": { "inputs": ["Reshape_390"], "outputs": ["Conv_403", "Conv_404"], "qtype": "int16", } } } +------------+-------------------+----------+----------+----------+ | Output | Metric | Min | Max | Avg | +------------+-------------------+----------+----------+----------+ | mask | cosine-similarity | 0.905676 | 0.998650 | 0.995126 | | field | cosine-similarity | 0.966730 | 0.999263 | 0.994426 | | attr | cosine-similarity | 0.858142 | 0.999728 | 0.997494 | | background | cosine-similarity | 0.990916 | 0.999713 | 0.998554 | | cls | cosine-similarity | 0.881800 | 0.999640 | 0.997245 | | box | cosine-similarity | 0.878442 | 0.999090 | 0.996941 | | cls_sl | cosine-similarity | 0.917514 | 0.999845 | 0.997775 | | box_sl | cosine-similarity | 0.923411 | 0.999942 | 0.998297 | | occlusion | cosine-similarity | 0.972673 | 0.998806 | 0.995518 | | cls_arrow | cosine-similarity | 0.678432 | 0.999992 | 0.995444 | | box_arrow | cosine-similarity | 0.619935 | 0.999886 | 0.990272 | +------------+-------------------+----------+----------+----------+

完整的精度调优部署示例见:Lane精度调优部署示例

精度调优技巧

PTQ链路的调优流程需不断修改节点的高精度配置,编译生成模型,进行精度验证,但完整的编译链路耗时长,调试成本高。基于此,我们提供了IR接口支持您直接对calibrated_model.onnx模型的量化参数进行修改,从而快速验证。

from hmct.ir import load_model, save_model from hmct.common import find_input_calibration, find_output_calibration model = load_model("calibrated_model.onnx") # 修改特定激活/权重校准节点 采用特定的数据类型 node = model.graph.node_mappings["ReduceMax_1317_HzCalibration"] print(node.qtype) # 支持读取node的数据类型 node.qtype = "float32" # 支持配置int8,int16,float16,float32 # 配置所有激活/权重校准节点 采用int16量化 calibration_nodes = model.graph.type2nodes["HzCalibration"] # 配置所有激活节点采用int16 for node in calibration_nodes: if node.tensor_type == "feature": node.qtype = "int16" # 配置所有权重节点采用int16 for node in calibration_nodes: if node.tensor_type == "weight": node.qtype = "int16" # 配置所有校准节点采用int16 for node in calibration_nodes: node.qtype = "int16" # 配置某一个普通节点采用int16 for node in model.graph.nodes: if node.name in ["Conv_0"]: for i in range(len(node.inputs)): input_calib = find_input_calibration(node, i) # 要求能够在输入找到HzCalibration,并且tensor_type为feature类型 if input_calib and input_calib.tensor_type == "feature": input_calib.qtype = "int16" # 配置某一个普通节点输出为int16 for node in model.graph.nodes: if node.name in ["Conv_0"]: output_calib = find_output_calibration(node) # 要求能够在输出找到HzCalibration,并且tensor_type为feature类型 if output_calib and output_calib.tensor_type == "feature": input_calib.qtype = "int16" # 配置某个节点类型采用int16 for node in model.graph.nodes: if node.op_type in ["Conv"]: for i in range(len(node.inputs)): input_calib = find_input_calibration(node, i) # 要求能够在输入找到HzCalibration,并且tensor_type为feature类型 if input_calib and input_calib.tensor_type == "feature": input_calib.qtype = "int16" # 修改特定激活/权重校准节点 采用特定的阈值 node = model.graph.node_mappings["ReduceMax_1317_HzCalibration"] print(node.thresholds) # 支持读取node的阈值结果 node.thresholds = [4.23] # 支持np.array, List[float] save_model(model, "calibrated_model_modified.onnx")