本章以实际使用过程中遇到的精度问题为例,介绍PTQ链路的精度调优流程,请确保先看完模型精度调优章节,了解相关的理论知识和工具用法。
典型的精度问题包括:
精度Debug工具提供了节点量化敏感度的计算接口,能够计算各算子量化对输出结果的影响程度,将量化损失高的节点设置为高精度,完成精度调优。以HybridNets模型为例介绍该调优过程。
采用HMCT default INT8量化,校准算法选择percentile,校准精度未达标(det和ll_seg精度下降超1%):
Model Float March Samples calibrated_model Cosine_Similarity
------------------------- ------- ------- --------- ------------------ -------------------
hybridnets-384-640_det 0.77222 nash-e 10000 0.75562(97.85%) 0.98012
hybridnets-384-640_da_seg 0.90467 nash-e 10000 0.89675(99.12%) 0.98012
hybridnets-384-640_ll_seg 0.85376 nash-e 10000 0.81813(95.83%) 0.98012
首先设置all_node_type为INT16,校准算法选择percentile,此时校准精度满足要求,可以使用INT8+INT16混合精度完成调优:
quant_config = {"model_config": {"all_node_type": "int16"}}
Model Float March Samples calibrated_model Cosine_Similarity
------------------------- ------- ------- --------- ------------------ -------------------
hybridnets-384-640_det 0.77222 nash-e 10000 0.76866(99.54%) 0.997147
hybridnets-384-640_da_seg 0.90467 nash-e 10000 0.90405(99.93%) 0.997147
hybridnets-384-640_ll_seg 0.85376 nash-e 10000 0.84732(99.25%) 0.997147
基于全INT16选择的percentile校准算法编译INT8校准模型,yaml文件中配置debug_mode为 "dump_calibration_data" 来保存校准数据,通过get_sensitivity_of_nodes输出节点量化敏感度:
hmct-debugger get-sensitivity-of-nodes hybridnets-384-640_calibrated_model.onnx calibration_data/ -n node -v True -s ./debug_result
===========================node sensitivity============================
node cosine-similarity
-----------------------------------------------------------------------
/encoder/_blocks.0/_depthwise_conv/Conv 0.98768
/encoder/_swish/Mul 0.99526
/encoder/_blocks.2/_depthwise_conv/Conv 0.99852
/encoder/_blocks.0/Mul 0.99887
/encoder/_blocks.0/GlobalAveragePool 0.99889
/encoder/_blocks.2/_swish/Mul 0.99957
/bifpn/bifpn.5/conv3_up/depthwise_conv/conv/Conv 0.99964
/encoder/_blocks.0/_swish/Mul 0.99969
/encoder/_blocks.2/Mul 0.99979
/encoder/_blocks.2/GlobalAveragePool 0.9998
/bifpn/bifpn.2/conv3_up/pointwise_conv/conv/Conv 0.99983
/encoder/_blocks.2/_swish_1/Mul 0.99984
/bifpn/bifpn.5/p4_downsample/Pad 0.99985
...er/seg_blocks.4/block/block.0/block/block.0/Pad 0.99985
/encoder/_blocks.0/_project_conv/Conv 0.99986
/encoder/_blocks.5/_depthwise_conv/Conv 0.99988
/bifpn/bifpn.3/conv3_up/depthwise_conv/conv/Conv 0.99989
/classifier/conv_list.0/depthwise_conv/conv/Conv 0.99989
..._blocks.4/block/block.0/block/block.0/conv/Conv 0.99989
/regressor/conv_list.0/depthwise_conv/conv/Conv 0.99989
/bifpn/bifpn.2/conv3_up/depthwise_conv/conv/Conv 0.99992
/encoder/_blocks.17/_se_expand/Conv 0.99992
/encoder/_blocks.13/Mul 0.99992
/encoder/_blocks.1/_depthwise_conv/Conv 0.99992
/encoder/_blocks.13/GlobalAveragePool 0.99992
/encoder/_blocks.14/_se_expand/Conv 0.99992
/classifier/header/pointwise_conv/conv/Conv 0.99992
/encoder/_blocks.1/Add 0.99993
/encoder/_blocks.3/Mul 0.99993
/encoder/_blocks.3/GlobalAveragePool 0.99993
/encoder/_blocks.1/_swish/Mul 0.99993
/encoder/_blocks.15/Mul 0.99993
/encoder/_blocks.15/GlobalAveragePool 0.99993
/bifpn/bifpn.4/conv3_up/depthwise_conv/conv/Conv 0.99993
/bifpn/bifpn.1/conv3_up/pointwise_conv/conv/Conv 0.99993
/bifpn/bifpn.4/conv3_up/pointwise_conv/conv/Conv 0.99994
/encoder/_blocks.8/_project_conv/Conv 0.99994
/bifpn/bifpn.5/swish_3/Mul 0.99994
/bifpn/bifpn.3/conv3_up/pointwise_conv/conv/Conv 0.99994
/encoder/_blocks.8/GlobalAveragePool 0.99994
/encoder/_conv_stem/Conv 0.99994
/encoder/_blocks.13/_project_conv/Conv 0.99994
/encoder/_blocks.8/Mul 0.99994
/bifpn/bifpn.5/conv3_up/pointwise_conv/conv/Conv 0.99995
...
按照余弦相似度排序从前往后的顺序,逐步设置算子INT16量化,校准模型精度也会随着增加,直到满足需求:
| 序号 | 余弦相似度阈值(<=该阈值设置为INT16) | 精度 | ||
|---|---|---|---|---|
| det | da_seg | ll_seg | ||
| 1 | None | 0.75562(97.85%) | 0.89675(99.12%) | 0.81813(95.83%) |
| 2 | 0.999 | 0.76531(99.11%) | 0.90274(99.79%) | 0.83874(98.24%) |
| 3 | 0.9998 | 0.76545(99.12%) | 0.90340(99.86%) | 0.83961(98.34%) |
| 4 | 0.9999 | 0.76613(99.21%) | 0.90420(99.95%) | 0.84216(98.64%) |
| 5 | 0.99992 | 0.76712(99.34%) | 0.90356(99.88%) | 0.84397(98.85%) |
| 6 | 0.99993 | 0.76781(99.43%) | 0.90374(99.90%) | 0.84484(98.95%) |
| 7 | 0.99994 | 0.76811(99.47%) | 0.90344(99.86%) | 0.84528(99.01%) |
由上述测试表格可得,将敏感度阈值小于等于0.99994的敏感节点设为INT16节点,校准精度满足需求:
quant_config = {
"model_config": {
"activation": {"calibration_type": "max", "max_percentile": 0.99995},
},
"node_config": {
"/encoder/_blocks.0/_depthwise_conv/Conv": {"qtype": "int16"},
"/encoder/_swish/Mul": {"qtype": "int16"},
"/encoder/_blocks.2/_depthwise_conv/Conv": {"qtype": "int16"},
"/encoder/_blocks.0/Mul": {"qtype": "int16"},
"/encoder/_blocks.0/GlobalAveragePool": {"qtype": "int16"},
"/encoder/_blocks.2/_swish/Mul": {"qtype": "int16"},
"/bifpn/bifpn.5/conv3_up/depthwise_conv/conv/Conv": {"qtype": "int16"},
"/encoder/_blocks.0/_swish/Mul": {"qtype": "int16"},
"/encoder/_blocks.2/Mul": {"qtype": "int16"},
"/encoder/_blocks.2/GlobalAveragePool": {"qtype": "int16"},
"/bifpn/bifpn.2/conv3_up/pointwise_conv/conv/Conv": {"qtype": "int16"},
"/encoder/_blocks.2/_swish_1/Mul": {"qtype": "int16"},
"/bifpn/bifpn.5/p4_downsample/Pad": {"qtype": "int16"},
"/bifpndecoder/seg_blocks.4/block/block.0/block/block.0/Pad": {"qtype": "int16"},
"/encoder/_blocks.0/_project_conv/Conv": {"qtype": "int16"},
"/encoder/_blocks.5/_depthwise_conv/Conv": {"qtype": "int16"},
"/bifpn/bifpn.3/conv3_up/depthwise_conv/conv/Conv": {"qtype": "int16"},
"/classifier/conv_list.0/depthwise_conv/conv/Conv": {"qtype": "int16"},
"/bifpndecoder/seg_blocks.4/block/block.0/block/block.0/conv/Conv": {"qtype": "int16"},
"/regressor/conv_list.0/depthwise_conv/conv/Conv": {"qtype": "int16"},
"/bifpn/bifpn.2/conv3_up/depthwise_conv/conv/Conv": {"qtype": "int16"},
"/encoder/_blocks.17/_se_expand/Conv": {"qtype": "int16"},
"/encoder/_blocks.13/Mul": {"qtype": "int16"},
"/encoder/_blocks.1/_depthwise_conv/Conv": {"qtype": "int16"},
"/encoder/_blocks.13/GlobalAveragePool": {"qtype": "int16"},
"/encoder/_blocks.14/_se_expand/Conv": {"qtype": "int16"},
"/classifier/header/pointwise_conv/conv/Conv": {"qtype": "int16"},
"/encoder/_blocks.1/Add": {"qtype": "int16"},
"/encoder/_blocks.3/Mul": {"qtype": "int16"},
"/encoder/_blocks.3/GlobalAveragePool": {"qtype": "int16"},
"/encoder/_blocks.1/_swish/Mul": {"qtype": "int16"},
"/encoder/_blocks.15/Mul": {"qtype": "int16"},
"/encoder/_blocks.15/GlobalAveragePool": {"qtype": "int16"},
"/bifpn/bifpn.4/conv3_up/depthwise_conv/conv/Conv": {"qtype": "int16"},
"/bifpn/bifpn.1/conv3_up/pointwise_conv/conv/Conv": {"qtype": "int16"},
"/bifpn/bifpn.4/conv3_up/pointwise_conv/conv/Conv": {"qtype": "int16"},
"/encoder/_blocks.8/_project_conv/Conv": {"qtype": "int16"},
"/bifpn/bifpn.5/swish_3/Mul": {"qtype": "int16"},
"/bifpn/bifpn.3/conv3_up/pointwise_conv/conv/Conv": {"qtype": "int16"},
"/encoder/_blocks.8/GlobalAveragePool": {"qtype": "int16"},
"/encoder/_conv_stem/Conv": {"qtype": "int16"},
"/encoder/_blocks.13/_project_conv/Conv": {"qtype": "int16"},
"/encoder/_blocks.8/Mul": {"qtype": "int16"},
},
}
Model Float March Samples calibrated_model Cosine_Similarity
------------------------- ------- ------- --------- -------------------- -----------------
hybridnets-384-640_det 0.77222 nash-e 10000 0.76811(99.47%) 0.994576
hybridnets-384-640_da_seg 0.90467 nash-e 10000 0.90344(99.86%) 0.994576
hybridnets-384-640_ll_seg 0.85376 nash-e 10000 0.84528(99.01%) 0.994576
完整的精度调优部署示例见:HybridNets精度调优部署示例。
当使用精度Debug工具设置敏感节点为高精度无法有效提升模型精度时,可先尝试指定输出节点,过滤掉不相关的节点,此外观察模型输出误差选择其他评估指标,提高敏感度排序和精度的相关性,更进一步通过分析模型结构,将量化损失风险较大的典型子结构(模型输出、输入以及具有特定物理意义的结构等)设为高精度,完成精度调优。以YoloP模型为例介绍该调优过程。
采用HMCT default INT8量化,校准算法选择percentile,校准精度未达标(det精度下降超1%):
Model Float March Samples calibrated_model Cosine_Similarity
-------------------- ------- ------- --------- ------------------ -------------------
yolop-384-640_det 0.76448 nash-e 10000 0.61507(80.46%) 0.999891
yolop-384-640_da_seg 0.89008 nash-e 10000 0.88863(99.84%) 0.999891
yolop-384-640_ll_seg 0.6523 nash-e 10000 0.65357(100.19%) 0.999891
首先设置all_node_type为INT16,校准算法选择percentile,此时校准精度满足要求,可以使用INT8+INT16混合精度完成调优:
quant_config = {"model_config": {"all_node_type": "int16"}}
Model Float March Samples calibrated_model Cosine_Similarity
-------------------- ------- ------- --------- ------------------ -----------------
yolop-384-640_det 0.76448 nash-e 10000 0.75890(99.27%) 0.99999
yolop-384-640_da_seg 0.89008 nash-e 10000 0.88950(99.93%) 0.99999
yolop-384-640_ll_seg 0.6523 nash-e 10000 0.64821(99.37%) 0.99999
基于全INT16选择的percentile校准算法编译INT8校准模型,yaml文件中配置debug_mode为"dump_calibration_data"来保存校准数据,通过get_sensitivity_of_nodes输出节点量化敏感度:
hmct-debugger get-sensitivity-of-nodes yolop-384-640_calibrated_model.onnx calibration_data/ -n node -v True -s ./debug_result
=======================node sensitivity========================
node cosine-similarity
---------------------------------------------------------------
Mul_943 0.99736
Mul_647 0.99894
Mul_795 0.99909
Conv_50 0.99976
Div_49 0.99983
Conv_92 0.99989
Div_58 0.9999
Conv_1119 0.99995
Conv_88 0.99996
Conv_41 0.99996
Conv_59 0.99998
Slice_4 0.99998
Slice_9 0.99998
Slice_14 0.99998
Slice_19 0.99998
Slice_24 0.99998
Slice_29 0.99998
Slice_34 0.99998
Slice_39 0.99998
Concat_40 0.99998
Div_67 0.99998
MaxPool_297 0.99999
MaxPool_298 0.99999
MaxPool_299 0.99999
Concat_300 0.99999
Concat_1003 0.99999
Conv_177 0.99999
Div_296 0.99999
ScatterND_705 0.99999
Slice_645 0.99999
Reshape_706 0.99999
Conv_110 0.99999
Conv_87 0.99999
Concat_89 0.99999
LeakyRelu_91 0.99999
Mul_584 0.99999
ScatterND_640 0.99999
Concat_1105 0.99999
LeakyRelu_1107 0.99999
Conv_1004 0.99999
Add_582 0.99999
Conv_119 0.99999
Resize_1014 0.99999
Conv_1015 0.99999
Conv_1043 0.99999
Conv_266 0.99999
Conv_199 0.99999
Div_100 0.99999
...
按照余弦相似度排序从前往后顺序,逐步设置算子INT16量化,然而即使将大量敏感节点设为INT16,精度也未能达标:
| 序号 | 余弦相似度阈值(<=该阈值设置为INT16) | 精度 | ||
|---|---|---|---|---|
| det | da_seg | ll_seg | ||
| 1 | None | 0.61507(80.46%) | 0.88863(99.84%) | 0.65357(100.19%) |
| 2 | 0.9999 | 0.60956(79.74%) | 0.88911(99.89%) | 0.65925(101.07%) |
| 3 | 0.99996 | 0.60978(79.76%) | 0.88933(99.92%) | 0.66112(101.35%) |
| 4 | 0.99998 | 0.60956(79.73%) | 0.88931(99.91%) | 0.66125(101.37%) |
| 5 | 0.99999 | 0.66426(86.89%) | 0.88958(99.94%) | 0.66065(101.28%) |
观察INT8校准模型的精度结果,只有det分支的精度未达到99%,通过get_sensitivity_of_nodes接口计算节点量化敏感度时,可以通过-o选项指定det输出对应的节点,仅计算影响det输出的敏感度排序,提高精度问题定位准确度:
hmct-debugger get-sensitivity-of-nodes yolop-384-640_calibrated_model.onnx calibration_data/ -n node -o Concat_1003 -v True -s det_debug_result/
=======================node sensitivity========================
node cosine-similarity
---------------------------------------------------------------
Mul_943 0.99736
Mul_647 0.99894
Mul_795 0.99909
Conv_50 0.99997
Div_58 0.99997
Div_49 0.99999
Concat_1003 0.99999
ScatterND_705 0.99999
Slice_645 0.99999
Reshape_706 0.99999
Mul_584 0.99999
ScatterND_640 0.99999
Add_582 0.99999
Conv_41 0.99999
Slice_4 0.99999
Slice_9 0.99999
Slice_14 0.99999
Slice_19 0.99999
Slice_24 0.99999
Slice_29 0.99999
Slice_34 0.99999
Slice_39 0.99999
Concat_40 0.99999
Conv_88 0.99999
Conv_59 0.99999
...
按照余弦相似度排序从前往后顺序,设置算子INT16量化,仅关注det输出能过滤掉无用节点,但最终精度仍未达标:
| 序号 | 余弦相似度阈值(<=该阈值设置为INT16) | 精度 | ||
|---|---|---|---|---|
| det | da_seg | ll_seg | ||
| 1 | None | 0.61507(80.46%) | 0.88863(99.84%) | 0.65357(100.19%) |
| 2 | 0.9999 | 0.60868(79.62%) | 0.88836(99.81%) | 0.65300(100.11%) |
| 3 | 0.99997 | 0.60961(79.74%) | 0.88902(99.88%) | 0.65664(100.66%) |
| 4 | 0.99999 | 0.66461(86.94%) | 0.88932(99.91%) | 0.65876(100.99%) |
观察INT8校准模型的输出相似度,其中det分支的L1和L2距离同浮点偏差很大,尝试将余弦相似度替换为其他指标:
hmct-info yolop-384-640_calibrated_model.onnx -c ./calibration_data/images/00.npy
INFO:root:The quantized model output:
=================================================================================
Output Cosine Similarity L1 Distance L2 Distance Chebyshev Distance
---------------------------------------------------------------------------------
det_out 0.995352 7.633481 289.637665 552.817566
drive_area_seg 0.998973 0.004005 0.001132 0.592610
lane_line_seg 0.999933 0.000417 0.000069 0.564768
通过get_sensitivity_of_nodes接口计算节点敏感度时,可以指定mse作为评估指标,提高不同节点之间的区分度:
hmct-debugger get-sensitivity-of-nodes yolop-384-640_calibrated_model.onnx calibration_data/ -n node -o Concat_1003 -m mse -v True -s det_mse_debug_result/
===================node sensitivity====================
node mse
-------------------------------------------------------
Mul_943 164.82712
Mul_647 65.88637
Mul_795 56.86866
Conv_50 2.04226
Div_58 1.88065
Concat_1003 0.87797
Div_49 0.84962
ScatterND_705 0.67858
Slice_645 0.67379
Reshape_706 0.67379
ScatterND_640 0.55187
Mul_584 0.54884
Add_582 0.52263
Conv_41 0.4714
Slice_4 0.38413
Slice_9 0.38413
Slice_14 0.38413
Slice_19 0.38413
Slice_24 0.38413
Slice_29 0.38413
Slice_34 0.38413
Slice_39 0.38413
Concat_40 0.38413
Conv_88 0.35534
Conv_59 0.33164
Conv_92 0.30398
Div_67 0.16826
ScatterND_853 0.16711
Slice_793 0.1627
Reshape_854 0.1627
ScatterND_788 0.13494
Mul_732 0.132
Add_730 0.12381
ScatterND_1001 0.07478
Conv_550 0.07226
Conv_518 0.06468
Conv_546 0.06344
Conv_310 0.05974
Conv_68 0.05735
Conv_338 0.05684
Add_86 0.04319
ScatterND_936 0.0428
Concat_89 0.04201
LeakyRelu_91 0.04201
Concat_517 0.04193
Conv_87 0.04148
Slice_941 0.04148
Reshape_1002 0.04148
Conv_448 0.04034
MaxPool_297 0.03739
MaxPool_298 0.03739
MaxPool_299 0.03739
Concat_300 0.03739
Mul_880 0.03334
Div_100 0.03328
Conv_855 0.03146
Concat_547 0.03094
LeakyRelu_549 0.03094
Add_878 0.03016
Div_558 0.02966
...
按照mse相似度排序从前往后顺序,设置算子高精度,最终在增加大量INT16节点后精度才能够达标:
| 序号 | MSE阈值(>=该阈值设置为INT16) | 精度 | ||
|---|---|---|---|---|
| det | da_seg | ll_seg | ||
| 1 | None | 0.61507(80.46%) | 0.88863(99.84%) | 0.65357(100.19%) |
| 2 | 0.5 | 0.66471(86.95%) | 0.88902(99.88%) | 0.65664(100.66%) |
| 3 | 0.2 | 0.66447(86.92%) | 0.88933(99.92%) | 0.66112(101.35%) |
| 4 | 0.1 | 0.72969(95.45%) | 0.88931(99.91%) | 0.66125(101.37%) |
| 5 | 0.05 | 0.73393(96.00%) | 0.88934(99.92%) | 0.66121(101.37%) |
| 6 | 0.04 | 0.73321(95.91%) | 0.88931(99.91%) | 0.66125(101.37%) |
| 7 | 0.03 | 0.75707(99.03%) | 0.88944(99.93%) | 0.66137(101.39%) |
即使是基于mse指标只关注det输出的提升,设置大量敏感节点高精度仍无法有效提升精度,更进一步考虑到当前仅det任务不达标,推断da_seg分支、ll_seg分支以及公共的backbone部分存在敏感节点的可能性较小,关注det分支的精度调优,结合模型结构分析,尝试指定det输出位置子图采用高精度量化,测试精度:
quant_config = {
"subgraph_config": {
"det_head": {
"inputs": ["Conv_559", "Conv_707", "Conv_855"],
"outputs": ["Concat_1003"],
"qtype": "int16",
},
}
}
Model Float March Samples calibrated_model Cosine_Similarity
-------------------- ------- ------- --------- ------------------ -------------------
yolop-384-640_det 0.76448 nash-e 10000 0.76275(99.77%) 0.99991
yolop-384-640_da_seg 0.89008 nash-e 10000 0.88863(99.84%) 0.99991
yolop-384-640_ll_seg 0.6523 nash-e 10000 0.65357(100.19%) 0.99991
当敏感节点排序不准确时,通过配置子图高精度,初步确定损失来源,若子图高精度耗时增加显著,可以子图内进行敏感度分析,减少高精度算子占比。
完整的精度调优部署示例见:YoloP精度调优部署示例。
受硬件约束及推理耗时的影响,设置全INT16量化时,模型中仍会存在INT8量化节点,包括:Conv和ConvTranspose权重、Resize、GridSample以及MatMul的第2个输入,PTQ通过引入一个相同算子,能够补偿INT8量化导致的精度损失,进一步提高模型全BPU量化精度,完成精度调优。以Lane模型为例介绍该调优过程。
采用HMCT default INT8量化,校准算法选择max_asy_perchannel,校准精度未达标(所有输出平均相似度均低于0.99):
+------------+-------------------+-----------+----------+----------+
| Output | Metric | Min | Max | Avg |
+------------+-------------------+-----------+----------+----------+
| mask | cosine-similarity | 0.575118 | 0.956694 | 0.875484 |
| field | cosine-similarity | 0.653135 | 0.948818 | 0.883109 |
| attr | cosine-similarity | 0.456388 | 0.986300 | 0.906577 |
| background | cosine-similarity | 0.878089 | 0.997221 | 0.979943 |
| cls | cosine-similarity | 0.444225 | 0.996364 | 0.958695 |
| box | cosine-similarity | 0.209923 | 0.993850 | 0.941041 |
| cls_sl | cosine-similarity | 0.353770 | 0.998059 | 0.956674 |
| box_sl | cosine-similarity | 0.196419 | 0.998049 | 0.948152 |
| occlusion | cosine-similarity | 0.786355 | 0.978297 | 0.939203 |
| cls_arrow | cosine-similarity | 0.079772 | 0.999830 | 0.940612 |
| box_arrow | cosine-similarity | -0.231095 | 0.998620 | 0.850558 |
+------------+-------------------+-----------+----------+----------+
首先设置all_node_type为INT16,校准算法选择max,此时校准精度仍不满足要求(occlusion和box_arrow输出平均相似度未达0.99),需要进一步提升精度:
quant_config = {"model_config": {"all_node_type": "int16"}}
+------------+-------------------+----------+----------+----------+
| Output | Metric | Min | Max | Avg |
+------------+-------------------+----------+----------+----------+
| mask | cosine-similarity | 0.926001 | 0.998617 | 0.992738 |
| field | cosine-similarity | 0.945007 | 0.999313 | 0.993549 |
| attr | cosine-similarity | 0.871824 | 0.999821 | 0.996161 |
| background | cosine-similarity | 0.981510 | 0.999835 | 0.998274 |
| cls | cosine-similarity | 0.918296 | 0.999851 | 0.997182 |
| box | cosine-similarity | 0.911032 | 0.999134 | 0.996155 |
| cls_sl | cosine-similarity | 0.933632 | 0.999918 | 0.997105 |
| box_sl | cosine-similarity | 0.850244 | 0.998877 | 0.996493 |
| occlusion | cosine-similarity | 0.943404 | 0.993528 | 0.983970 |
| cls_arrow | cosine-similarity | 0.560625 | 0.999993 | 0.994583 |
| box_arrow | cosine-similarity | 0.755858 | 0.999889 | 0.987496 |
+------------+-------------------+----------+----------+----------+
由于设置all_node_type为INT16后,模型中仍会存在INT8量化节点,可以通过HMCT提供的IR接口将校准模型中所有校准节点数据类型修改为INT16,得到真INT16模型:
from hmct.ir import load_model, save_model
model = load_model("lane_calibrated_model_int16.onnx")
calibration_nodes = model.graph.type2nodes["HzCalibration"]
for node in calibration_nodes:
node.qtype = "int16"
save_model(model, "lane_calibrated_model_real_int16.onnx")
验证真INT16校准模型在所有输出上的平均相似度,均能够满足要求,该模型可以通过补偿误差来完成调优。
+------------+-------------------+----------+----------+----------+
| Output | Metric | Min | Max | Avg |
+------------+-------------------+----------+----------+----------+
| mask | cosine-similarity | 0.999819 | 0.999995 | 0.999984 |
| field | cosine-similarity | 0.999833 | 0.999997 | 0.999985 |
| attr | cosine-similarity | 0.999743 | 0.999999 | 0.999992 |
| background | cosine-similarity | 0.999977 | 0.999999 | 0.999996 |
| cls | cosine-similarity | 0.999852 | 0.999999 | 0.999994 |
| box | cosine-similarity | 0.999681 | 0.999999 | 0.999990 |
| cls_sl | cosine-similarity | 0.999722 | 1.000000 | 0.999993 |
| box_sl | cosine-similarity | 0.999529 | 1.000000 | 0.999992 |
| occlusion | cosine-similarity | 0.999844 | 0.999996 | 0.999978 |
| cls_arrow | cosine-similarity | 0.998314 | 1.000000 | 0.999985 |
| box_arrow | cosine-similarity | 0.999478 | 1.000000 | 0.999971 |
+------------+-------------------+----------+----------+----------+
补偿误差的分析过程需要基于全INT16校准模型,通过get_sensitivity_of_nodes输出节点量化敏感度:
hmct-debugger get-sensitivity-of-nodes lane_calibrated_model_int16.onnx calibration_data/ -m ['cosine-similarity','mre','mse','sqnr','chebyshev'] -n node -v True -s ./int16_debug_result
=========================================node sensitivity=========================================
node cosine-similarity mre mse sqnr chebyshev
--------------------------------------------------------------------------------------------------
Conv_360 0.98855 0.07025 0.61746 7.52018 4.78374
Conv_3 0.99895 0.2379 79.3955 12.55955 134.92183
Conv_338 0.99896 0.88695 0.00029 13.35992 1.54686
Conv_336 0.9994 0.04776 0.0002 14.62035 1.03647
...
接着按照节点敏感度排序修改校准模型,将量化精度从INT8提升至INT16,直到occlusion和box_arrow满足精度需求:
from hmct.common import find_input_calibration, find_output_calibration
from hmct.ir import load_model, save_model
model = load_model("lane_calibrated_model_int16.onnx")
improved_nodes = ["Conv_360", "Conv_3", "Conv_338"]
for node in model.graph.nodes:
if node.name not in improved_nodes:
continue
if node.op_type in ["Conv", "ConvTranspose", "MatMul"]:
input1_calib = find_input_calibration(node, 1)
if input1_calib and input1_calib.tensor_type == "weight":
input1_calib.qtype = "int16"
if node.op_type == "Resize":
input_calib = find_input_calibration(node, 0)
if input_calib and input_calib.tensor_type == "feature":
input_calib.qtype = "int16"
interpolation_mode = node.attributes.get("mode", "nearest")
# nearest模式下,补偿误差的输出量化类型能提升至接近
# int16; 其他模式仅输入量化类型能提升至接近int16.
if interpolation_mode == "nearest":
output_calib = find_output_calibration(node)
if output_calib and output_calib.tensor_type == "feature":
output_calib.qtype = "int16"
if node.op_type == "GridSample":
input_calib = find_input_calibration(node, 0)
if input_calib and input_calib.tensor_type == "feature":
input_calib.qtype = "int16"
interpolation_mode = node.attributes.get("mode", "bilinear")
# nearest模式下,补偿误差的输出量化类型能提升至接近
# int16; 其他模式仅输入量化类型能提升至接近int16.
if interpolation_mode == "nearest":
output_calib = find_output_calibration(node)
if output_calib and output_calib.tensor_type == "feature":
output_calib.qtype = "int16"
save_model(model, "lane_calibrated_model_int16_improved.onnx")
| 序号 | 余弦相似度阈值(<=该阈值设置为INT16) | 输出相似度 | |||
|---|---|---|---|---|---|
| occlusion | box_arrow | ||||
| Min | Avg | Min | Avg | ||
| 1 | None | 0.943404 | 0.983970 | 0.755858 | 0.987496 |
| 2 | 0.999 | 0.983739 | 0.997729 | 0.893116 | 0.994958 |
| 3 | 0.99 | 0.952758 | 0.994116 | 0.745781 | 0.987434 |
由上方表格可知,将Conv_360, Conv_3, Conv_338权重量化精度从INT8提高至INT16,所有输出相似度能够达标。在HMCT全BPU部署时,具体做法是引入一个相同算子来补偿INT8量化导致的精度损失,将精度提升到接近INT16:
quant_config = {
"model_config": {
"all_node_type": "int16",
"activation": {"calibration_type": "max"},
},
"node_config": {
# Conv,ConvTranspose,MatMul通过配置input1为ec补偿权重量化损失
"Conv_360": {"input1": "ec"},
"Conv_3": {"input1": "ec"},
"Conv_338": {"input1": "ec"},
# GridSample, Resize通过配置input0为ec补偿输入量化损失
# "GridSample_340": {"input0": "ec"},
}
}
+------------+-------------------+----------+----------+----------+
| Output | Metric | Min | Max | Avg |
+------------+-------------------+----------+----------+----------+
| mask | cosine-similarity | 0.983658 | 0.999305 | 0.997363 |
| field | cosine-similarity | 0.972155 | 0.999655 | 0.996506 |
| attr | cosine-similarity | 0.977372 | 0.999879 | 0.998771 |
| background | cosine-similarity | 0.994089 | 0.999934 | 0.999493 |
| cls | cosine-similarity | 0.984845 | 0.999909 | 0.999082 |
| box | cosine-similarity | 0.977550 | 0.999447 | 0.998403 |
| cls_sl | cosine-similarity | 0.980353 | 0.999958 | 0.998956 |
| box_sl | cosine-similarity | 0.979305 | 0.999973 | 0.999332 |
| occlusion | cosine-similarity | 0.982567 | 0.999608 | 0.997672 |
| cls_arrow | cosine-similarity | 0.904646 | 0.999996 | 0.998248 |
| box_arrow | cosine-similarity | 0.890971 | 0.999973 | 0.994999 |
+------------+-------------------+----------+----------+----------+
推荐Resize和GridSample采用nearest采样方式,此时算子输出不会引入新的数值,误差也能够被补偿掉,否则输出INT8量化也会额外引入损失,无法被补偿掉。
补偿Conv_360, Conv_3, Conv_338权重量化损失后,全INT16校准模型精度能够达标,尝试基于误差补偿后的INT8校准模型开始调优,INT8校准模型精度如下:
quant_config = {
"model_config": {
"activation": {
"calibration_type": "max",
"per_channel": True,
"asymmetric": True,
},
},
"node_config": {
"Conv_360": {"input1": "ec"},
"Conv_3": {"input1": "ec"},
"Conv_338": {"input1": "ec"},
}
}
+------------+-------------------+-----------+----------+----------+
| Output | Metric | Min | Max | Avg |
+------------+-------------------+-----------+----------+----------+
| mask | cosine-similarity | 0.578707 | 0.950058 | 0.874048 |
| field | cosine-similarity | 0.687287 | 0.946366 | 0.875366 |
| attr | cosine-similarity | 0.471613 | 0.986946 | 0.908879 |
| background | cosine-similarity | 0.851624 | 0.996991 | 0.976282 |
| cls | cosine-similarity | 0.536348 | 0.996753 | 0.959749 |
| box | cosine-similarity | 0.094459 | 0.994883 | 0.939461 |
| cls_sl | cosine-similarity | 0.374808 | 0.998186 | 0.959271 |
| box_sl | cosine-similarity | 0.079629 | 0.998462 | 0.947069 |
| occlusion | cosine-similarity | 0.702038 | 0.986074 | 0.945837 |
| cls_arrow | cosine-similarity | 0.060614 | 0.999781 | 0.942194 |
| box_arrow | cosine-similarity | -0.301507 | 0.998179 | 0.829580 |
+------------+-------------------+-----------+----------+----------+
通过get_sensitivity_of_nodes输出节点量化敏感度:
hmct-debugger get-sensitivity-of-nodes lane_calibrated_model.onnx calibration_data/ -m ['cosine-similarity','mre','mse','sqnr','chebyshev'] -n node -v True -s ./debug_result
===========================================node sensitivity===========================================
node cosine-similarity mre mse sqnr chebyshev
------------------------------------------------------------------------------------------------------
Conv_265 0.43427 12.56779 32.42019 0.37585 24.77806
Conv_278 0.84973 0.80948 21.66625 2.71994 16.2646
Conv_287 0.87352 2.34926 817.71234 0.42356 183.87369
Conv_237 0.96676 1.17564 12526.42871 3.45996 1538.01672
Conv_267 0.96678 2.02166 4.81972 4.51482 14.91702
UNIT_CONV_FOR_BatchNormalization_141 0.96682 1.17458 12521.30957 3.4652 1537.55347
Conv_276 0.97024 0.63814 10.79584 4.23258 13.53813
Conv_289 0.97159 0.61387 60.08849 6.0926 61.38558
Conv_336 0.97212 1.70509 0.00951 6.28411 1.07831
Conv_135 0.97478 0.93508 11459.125 3.94841 1468.99048
Add_140 0.97482 0.93514 11391.92676 3.95557 1464.76538
Conv_404 0.97672 2.96196 0.20074 6.37718 5.93086
Conv_3 0.97977 0.49545 27730.66992 2.96076 2199.58203
Conv_3_split_low 0.9798 0.49563 27710.51758 2.96429 2198.6333
Conv_129 0.97991 0.5559 14292.41602 4.12395 1598.18835
Add_134 0.97991 0.55644 14300.95312 4.12708 1598.60938
Conv_338 0.98833 6.00584 0.0032 8.16733 0.40305
Conv_338_split_low 0.98833 6.00581 0.0032 8.16729 0.4033
Conv_333 0.99496 5.13682 0.00184 10.63743 0.19728
Conv_107 0.99543 0.23465 2859.23389 7.1595 751.46326
Concat_106 0.99546 0.23441 2843.88184 7.17062 749.37109
Conv_339 0.99564 0.25485 0.17183 10.29787 8.05724
Conv_285 0.99576 0.21163 9.77346 10.03632 38.60489
Conv_335 0.99779 0.1337 0.44365 11.66653 2.3832
Conv_401 0.9978 2.66814 0.01664 11.78462 1.694
Conv_337 0.99871 0.1687 1.17969 12.91167 5.95476
Conv_250 0.99894 0.17839 0.70945 14.60334 11.99677
Conv_272 0.99917 0.06468 0.15378 13.46442 7.07149
Conv_384 0.99919 0.61947 4.9547 17.0275 71.32216
Conv_83 0.99921 0.11831 328.16125 11.67288 266.15958
Conv_8 0.99925 0.06326 409.16638 11.83647 295.1091
Add_249 0.99934 0.2801 0.5793 16.36763 12.01501
Conv_300 0.9995 0.01909 0.31657 14.86126 5.5387
Slice_299 0.99951 0.01896 0.30647 14.9317 5.27295
...
按照余弦相似度排序从前往后的顺序,逐步设置算子INT16量化,校准模型相似度也会随之增加:
| 序号 | 余弦相似度阈值 | 输出相似度 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| mask | field | attr | backgroud | cls | box | cls_sl | box_sl | occlusion | cls_arrow | box_arrow | ||
| 1 | None | 0.874048 | 0.875366 | 0.908879 | 0.976282 | 0.959749 | 0.939461 | 0.959271 | 0.947069 | 0.945837 | 0.942194 | 0.829580 |
| 2 | 0.99 | 0.980708 | 0.987483 | 0.989023 | 0.993368 | 0.991154 | 0.985205 | 0.990900 | 0.990375 | 0.985721 | 0.975350 | 0.963180 |
| 3 | 0.999 | 0.988858 | 0.990837 | 0.994669 | 0.995994 | 0.995570 | 0.994660 | 0.995466 | 0.996202 | 0.991201 | 0.979100 | 0.980218 |
| 4 | 0.9995 | 0.991085 | 0.991593 | 0.995369 | 0.997524 | 0.995818 | 0.995149 | 0.996001 | 0.996851 | 0.992633 | 0.981471 | 0.982875 |
经上述测试表格调优,将敏感度阈值小于等于0.9995的敏感节点设为INT16,除cls_arrow和box_arrow外,其余输出平均相似度均不低于0.99。cls_arrow和box_arrow共用同一个分支,尝试基于0.9995敏感节点设置INT16的校准模型,配置arrow的输出head子图为INT16,量化配置及输出相似度:
{
"model_config": {
"activation": {
"calibration_type": "max",
"per_channel": True,
"asymmetric": True,
},
},
"node_config": {
"Conv_360": {"input1": "ec"},
"Conv_3": {"qtype": "int16", "input1": "ec"},
"Conv_338": {"qtype": "int16", "input1": "ec"},
# 0.99
"Conv_265": {"qtype": "int16"},
"Conv_278": {"qtype": "int16"},
"Conv_287": {"qtype": "int16"},
"Conv_237": {"qtype": "int16"},
"Conv_267": {"qtype": "int16"},
"UNIT_CONV_FOR_BatchNormalization_141": {"qtype": "int16"},
"Conv_276": {"qtype": "int16"},
"Conv_289": {"qtype": "int16"},
"Conv_336": {"qtype": "int16"},
"Conv_135": {"qtype": "int16"},
"Add_140": {"qtype": "int16"},
"Conv_404": {"qtype": "int16"},
"Conv_3_split_low": {"qtype": "int16"},
"Conv_129": {"qtype": "int16"},
"Add_134": {"qtype": "int16"},
"Conv_338_split_low": {"qtype": "int16"},
# 0.999
"Conv_333": {"qtype": "int16"},
"Conv_107": {"qtype": "int16"},
"Concat_106": {"qtype": "int16"},
"Conv_339": {"qtype": "int16"},
"Conv_285": {"qtype": "int16"},
"Conv_335": {"qtype": "int16"},
"Conv_401": {"qtype": "int16"},
"Conv_337": {"qtype": "int16"},
"Conv_250": {"qtype": "int16"},
# 0.9995
"Conv_272": {"qtype": "int16"},
"Conv_384": {"qtype": "int16"},
"Conv_83": {"qtype": "int16"},
"Conv_8": {"qtype": "int16"},
"Add_249": {"qtype": "int16"},
"Conv_300": {"qtype": "int16"},
},
"subgraph_config": {
"arrow_head": {
"inputs": ["Reshape_390"],
"outputs": ["Conv_403", "Conv_404"],
"qtype": "int16",
}
}
}
+------------+-------------------+----------+----------+----------+
| Output | Metric | Min | Max | Avg |
+------------+-------------------+----------+----------+----------+
| mask | cosine-similarity | 0.926363 | 0.997833 | 0.991085 |
| field | cosine-similarity | 0.915524 | 0.999179 | 0.991593 |
| attr | cosine-similarity | 0.869666 | 0.999608 | 0.995369 |
| background | cosine-similarity | 0.983465 | 0.999664 | 0.997524 |
| cls | cosine-similarity | 0.929948 | 0.999513 | 0.995818 |
| box | cosine-similarity | 0.890618 | 0.999021 | 0.995149 |
| cls_sl | cosine-similarity | 0.937240 | 0.999738 | 0.996001 |
| box_sl | cosine-similarity | 0.880057 | 0.999896 | 0.996851 |
| occlusion | cosine-similarity | 0.966050 | 0.998043 | 0.992633 |
| cls_arrow | cosine-similarity | 0.380447 | 0.999980 | 0.990650 |
| box_arrow | cosine-similarity | 0.556044 | 0.999856 | 0.983423 |
+------------+-------------------+----------+----------+----------+
仅box_arrow输出平均相似度未达标,单独指定box_arrow输出重新获取敏感度排序:
hmct-debugger get-sensitivity-of-nodes lane_calibrated_model_box.onnx calibration_data/ -m ['cosine-similarity','mre','mse','sqnr','chebyshev'] -n node -o Conv_404 -v True -s ./box_debug_result
========================================node sensitivity========================================
node cosine-similarity mre mse sqnr chebyshev
------------------------------------------------------------------------------------------------
Mul_116 0.9987 0.35957 0.03655 10.08165 18.38877
Conv_239 0.99871 0.38099 0.01399 12.16738 6.04099
UNIT_CONV_FOR_BatchNormalization_161 0.99871 0.3832 0.01404 12.15968 6.11046
Conv_10 0.99879 0.12717 0.04052 9.85789 19.9839
GridSample_340 0.99887 0.34926 0.02639 10.78928 15.52824
Conv_78 0.9989 0.33779 0.02165 11.21884 12.89857
Conv_163 0.9989 0.16412 0.03985 9.89421 20.19754
Relu_4 0.99902 0.39383 0.05239 9.29984 22.57018
Add_168 0.99902 0.16685 0.04009 9.88115 20.26648
Conv_8 0.99921 0.10613 0.04032 9.86872 19.64041
Conv_5 0.99932 0.37324 0.03799 9.99791 19.23952
Conv_57 0.9996 0.11046 0.01485 12.03829 11.99651
...
按照余弦相似度排序从前往后的顺序,逐步设置算子INT16量化,直到box_arrow输出相似度满足要求:
| 序号 | 余弦相似度阈值 | 输出相似度 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| mask | filed | attr | backgroud | cls | box | cls_sl | box_sl | occlusion | cls_arrow | box_arrow | ||
| 1 | None | 0.991085 | 0.991593 | 0.995369 | 0.997524 | 0.995818 | 0.995149 | 0.996001 | 0.996851 | 0.992633 | 0.990650 | 0.983423 |
| 2 | 0.999 | 0.993978 | 0.993429 | 0.996853 | 0.998160 | 0.996915 | 0.996035 | 0.997124 | 0.997249 | 0.994457 | 0.992993 | 0.989119 |
| 3 | 0.9995 | 0.995126 | 0.994426 | 0.997494 | 0.998554 | 0.997245 | 0.996941 | 0.997775 | 0.998297 | 0.995518 | 0.995444 | 0.990272 |
最终通过设置部分敏感节点INT16,模型所有输出的平均相似度均满足要求,量化配置及输出相似度如下:
{
"model_config": {
"activation": {
"calibration_type": "max",
"per_channel": True,
"asymmetric": True,
},
},
"node_config": {
"Conv_360": {"input1": "ec"},
"Conv_3": {"qtype": "int16", "input1": "ec"},
"Conv_338": {"qtype": "int16", "input1": "ec"},
# 0.99
"Conv_265": {"qtype": "int16"},
"Conv_278": {"qtype": "int16"},
"Conv_287": {"qtype": "int16"},
"Conv_237": {"qtype": "int16"},
"Conv_267": {"qtype": "int16"},
"UNIT_CONV_FOR_BatchNormalization_141": {"qtype": "int16"},
"Conv_276": {"qtype": "int16"},
"Conv_289": {"qtype": "int16"},
"Conv_336": {"qtype": "int16"},
"Conv_135": {"qtype": "int16"},
"Add_140": {"qtype": "int16"},
"Conv_404": {"qtype": "int16"},
"Conv_3_split_low": {"qtype": "int16"},
"Conv_129": {"qtype": "int16"},
"Add_134": {"qtype": "int16"},
"Conv_338_split_low": {"qtype": "int16"},
# 0.999
"Conv_333": {"qtype": "int16"},
"Conv_107": {"qtype": "int16"},
"Concat_106": {"qtype": "int16"},
"Conv_339": {"qtype": "int16"},
"Conv_285": {"qtype": "int16"},
"Conv_335": {"qtype": "int16"},
"Conv_401": {"qtype": "int16"},
"Conv_337": {"qtype": "int16"},
"Conv_250": {"qtype": "int16"},
# 0.9995
"Conv_272": {"qtype": "int16"},
"Conv_384": {"qtype": "int16"},
"Conv_83": {"qtype": "int16"},
"Conv_8": {"qtype": "int16"},
"Add_249": {"qtype": "int16"},
"Conv_300": {"qtype": "int16"},
# box_arrow 0.999
"Mul_116": {"qtype": "int16"},
"Conv_239": {"qtype": "int16"},
"UNIT_CONV_FOR_BatchNormalization_161": {"qtype": "int16"},
"Conv_10": {"qtype": "int16"},
"GridSample_340": {"input0": "ec"},
"Conv_78": {"qtype": "int16"},
"Conv_163": {"qtype": "int16"},
# box_arrow 0.9995
"Relu_4": {"qtype": "int16"},
"Add_168": {"qtype": "int16"},
"Conv_8": {"qtype": "int16"},
"Conv_5": {"qtype": "int16"},
},
"subgraph_config": {
"arrow_head": {
"inputs": ["Reshape_390"],
"outputs": ["Conv_403", "Conv_404"],
"qtype": "int16",
}
}
}
+------------+-------------------+----------+----------+----------+
| Output | Metric | Min | Max | Avg |
+------------+-------------------+----------+----------+----------+
| mask | cosine-similarity | 0.905676 | 0.998650 | 0.995126 |
| field | cosine-similarity | 0.966730 | 0.999263 | 0.994426 |
| attr | cosine-similarity | 0.858142 | 0.999728 | 0.997494 |
| background | cosine-similarity | 0.990916 | 0.999713 | 0.998554 |
| cls | cosine-similarity | 0.881800 | 0.999640 | 0.997245 |
| box | cosine-similarity | 0.878442 | 0.999090 | 0.996941 |
| cls_sl | cosine-similarity | 0.917514 | 0.999845 | 0.997775 |
| box_sl | cosine-similarity | 0.923411 | 0.999942 | 0.998297 |
| occlusion | cosine-similarity | 0.972673 | 0.998806 | 0.995518 |
| cls_arrow | cosine-similarity | 0.678432 | 0.999992 | 0.995444 |
| box_arrow | cosine-similarity | 0.619935 | 0.999886 | 0.990272 |
+------------+-------------------+----------+----------+----------+
完整的精度调优部署示例见:Lane精度调优部署示例。
PTQ链路的调优流程需不断修改节点的高精度配置,编译生成模型,进行精度验证,但完整的编译链路耗时长,调试成本高。基于此,我们提供了IR接口支持您直接对calibrated_model.onnx模型的量化参数进行修改,从而快速验证。
from hmct.ir import load_model, save_model
from hmct.common import find_input_calibration, find_output_calibration
model = load_model("calibrated_model.onnx")
# 修改特定激活/权重校准节点 采用特定的数据类型
node = model.graph.node_mappings["ReduceMax_1317_HzCalibration"]
print(node.qtype) # 支持读取node的数据类型
node.qtype = "float32" # 支持配置int8,int16,float16,float32
# 配置所有激活/权重校准节点 采用int16量化
calibration_nodes = model.graph.type2nodes["HzCalibration"]
# 配置所有激活节点采用int16
for node in calibration_nodes:
if node.tensor_type == "feature":
node.qtype = "int16"
# 配置所有权重节点采用int16
for node in calibration_nodes:
if node.tensor_type == "weight":
node.qtype = "int16"
# 配置所有校准节点采用int16
for node in calibration_nodes:
node.qtype = "int16"
# 配置某一个普通节点采用int16
for node in model.graph.nodes:
if node.name in ["Conv_0"]:
for i in range(len(node.inputs)):
input_calib = find_input_calibration(node, i)
# 要求能够在输入找到HzCalibration,并且tensor_type为feature类型
if input_calib and input_calib.tensor_type == "feature":
input_calib.qtype = "int16"
# 配置某一个普通节点输出为int16
for node in model.graph.nodes:
if node.name in ["Conv_0"]:
output_calib = find_output_calibration(node)
# 要求能够在输出找到HzCalibration,并且tensor_type为feature类型
if output_calib and output_calib.tensor_type == "feature":
input_calib.qtype = "int16"
# 配置某个节点类型采用int16
for node in model.graph.nodes:
if node.op_type in ["Conv"]:
for i in range(len(node.inputs)):
input_calib = find_input_calibration(node, i)
# 要求能够在输入找到HzCalibration,并且tensor_type为feature类型
if input_calib and input_calib.tensor_type == "feature":
input_calib.qtype = "int16"
# 修改特定激活/权重校准节点 采用特定的阈值
node = model.graph.node_mappings["ReduceMax_1317_HzCalibration"]
print(node.thresholds) # 支持读取node的阈值结果
node.thresholds = [4.23] # 支持np.array, List[float]
save_model(model, "calibrated_model_modified.onnx")