Model Quantization and Compilation

The conversion of the floating-point model to the Horizon board-side deployable model will be completed in the Convert Model phase, after which you will get a model that can run on the Horizon computing platform. Before performing the conversion, make sure you have successfully passed the model check as described in the Check the Model section.

During the conversion, some important procedures such as model optimization and calibration quantization must prepare the data in line with model pre-processing requirements. You can refer to Data Preparation - Model Calibration Set Preparation section to prepare the calibration data in advance.

Convert the Model Using the hb_compile Tool

The model conversion process is performed using the hb_compile tool, please refer to section Model Quantized Compilation for the usage of the tool and the related specific configuration and parameters.

Model Conversion Interpretation

Model conversion is completed from a floating-point model to a board-side deployable model supported by Horizon's computing platform. To make this model run quickly and efficiently on the embedded end, model conversion focuses mainly on two phases: input data processing and model optimization compilation, and this section will focus on these two problems in turn.

In terms of Input data processing, Horizon's edge computing platform can provide hardware-level solutions for specific types of input channels, but the output of these solutions may not comply with the input requirements of your models. For example, the video processing sub-systems for video channels have the abilities to crop and scale images or optimize the image quality. The output of these sub-systems are mostly in the YUV420 format, however, the algorithm models are often trained based on commonly-used image formats such as bgr/rgb. To solve this problem, Horizon provides 2 kinds of input descriptions for each converted model: The one is used for the original floating-point model input (input_type_train and input_layout_train); while the other one is used for the input data ( input_type_rt ) of the edge platform that you are going to use.

For the frequently-used image data pre-processing, such as mean and scale, the edge platform data formats such as yuv420 are no longer suitable for such operations, therefore, we integrate these common image pre-processing into the model. After the above two processes, the input part of the converted model will be shown as follows

There are only 2 types of data layouts in the above diagram: NCHW and NHWC. Wherein, N denotes quantity, C denotes channel, H denotes height and W denotes width. The two different layouts reflect different memory access characteristics. The NHWC layout are more often used by the TensorFlow models; while the NCHW layout is used by the Caffe models. Although Horizon's edge platform doesn't restrict the data layout, there is still a requirement: the input_layout_train must be consistent with the data layout of the original floating-point model, as specifying correct data layout is the basis for smooth data parsing.

Model Optimization and Compilation: It includes several important steps, including model parsing, model optimization, model calibration and quantification, and model compilation, and its internal working process is shown in the figure below.

  • Model Parse Stage: It completes the conversion from Caffe floating-point model to ONNX floating-point model. This stage will name the operator (with a unique name) for the unnamed node/tensor, producing an original_float_model.onnx, and the computing accuracy of this ONNX model is float32.

  • Model Optimization Stage: It implements some operator optimization strategies for the model that are applicable to the Horizon platform, such as BN fusion to Conv, etc. The output of this phase is an optimized_float_model.onnx. The computational accuracy of this ONNX model is still float32, which will not affect the computational results of the model after optimization.

  • Model Calibration Stage: It uses the calibration data you provide to calculate the necessary quantization parameters, and the quantization parameters corresponding to each node calculated from the calibration data will be saved in the calibration node. The output of this phase is a calibrated_model.onnx. After model calibration, some processing will be performed on the model as well, the output of this process is a ptq_model.onnx.

  • Model Quantization Stage: It uses Horizon's model compiler, which uses the generated model during the model calibration stage(ptq_model.onnx), to perform model quantization according to your pre-processing configuration (including the color conversion between input_type_rt to input_type_train, the handling of mean/scale, etc). The output of this phase is a quantized_model.bc. The loss of accuracy due to model quantization can be evaluated using this model. If during model quantization there exists a situation of removing nodes at the input/output, a quantized_removed_model.bc will also be saved. In this scenario, we recommend you to use this HBIR model to compare with the final generated hbm model if you need to do consistency comparison later.

Attention

Please note that if input_type_rt is nv12, the input layout of quantized.bc is NHWC.

  • Model Compilation Stage: It uses Horizon's model compiler to convert the quantized model computational instructions and data supported by the Horizon platform. The output of this stage is a *.hbm model, this hbm model is the model that will be subsequently run on the Horizon Edge embedded platform, which is the final output result of the model conversion.

Interpret Conversion Results

This section will introduce the interpretation of successful model conversion status and the analysis of unsuccessful conversions in turn. To confirm the success of the model conversion, you need to check the similarity information of nodes and output tensors, the compile status information and the working_dir output.

Similarity Information

The similarity information will be printed in the console output after compile, after successful conversion, it will also be saved to the log file named hb_compile.log under the path where the compile command is executed, which takes the following form:

+--------------------------------------------+-------------------+------+-----------+-------------------+------------------+------------------+ | Node | NodeType | ON | Threshold | Calibrated Cosine | Quantized Cosine | Output Data Type | +--------------------------------------------+-------------------+------+-----------+-------------------+------------------+------------------+ | Conv_0+Relu_1 | Conv+Relu | BPU | 2.64 | 0.999746 | 0.999315 | si8 | | MaxPool_2 | MaxPool | BPU | 2.639386 | 0.999819 | 0.99959 | si8 | | Conv_3+Relu_4 | Conv+Relu | BPU | 2.639386 | 0.999598 | 0.999275 | si8 | | Conv_5+Relu_6 | Conv+Relu | BPU | 1.139316 | 0.999623 | 0.999461 | si8 | | Conv_7 | Conv | BPU | 1.414045 | 0.999483 | 0.999306 | si8 | ... | Relu_118 | Relu | BPU | -- | 0.992868 | 0.988587 | si8 | | GlobalAveragePool_119 | GlobalAveragePool | BPU | 11.446257 | 0.997873 | 0.996466 | si8 | | Gemm_121 | Conv | BPU | 6.047658 | 0.998330 | 0.997348 | si32 | | Gemm_121+Gemm_121_transpose_output_reshape | Reshape | NULL | -- | 0.998330 | 0.997348 | f32 | +--------------------------------------------+-------------------+------+-----------+-------------------+------------------+------------------+ +------------+-------------------+------------------+ | TensorName | Calibrated Cosine | Quantized Cosine | +------------+-------------------+------------------+ | output | 0.998330 | 0.997348 | +------------+-------------------+------------------+

As shown above:

  • The Node and NodeType represents node name and type.

  • The ON represents the node executed device.

  • The TensorName represents output tensors name.

  • The Threshold represents to the calibration threshold at each layer, which is used to provide feedback to Horizon technical support in abnormal states and is not of concern in normal conditions.

  • The Calibrated Cosine represents the cosine similarity result between the corresponding Node/Output Tensor of the optimized_float_model.onnx and the calibrated_model.onnx.

  • The Quantized Cosine represents the cosine similarity result between the corresponding Node/Output Tensor of the optimized_float_model.onnx and the quantized_model.bc generated after model quantization.

  • The Output Data type represents the node output data type, range with ['si8', 'si16', 'si32', 'si64', 'ui8', 'ui16', 'ui32', 'ui64', 'f32'].

Attention

Note that the cosine similarity field only serves as a reference to indicate the stability of the quantized data. It cannot directly tell the model accuracy loss. In general, there is a significant loss of accuracy if the similarity of the output nodes is below 0.8. Of course, since there is no absolute direct correlation with accuracy, a fully accurate accuracy situation should be described in Model Accuracy Analysis section.

Compile Status Information

For the compile status information, after a successful conversion, the model's dependencies and parameters will be output on the console, as follows:

  • model deps info: Versions of tools dependent during model compilation.

  • model_parameters info: Information of model parameters specified in the model compilation configuration file.

  • input_parameters info: Information of input parameters specified in the model compilation configuration file.

  • calibration_parameters info: Information of calibration parameters specified in the model compilation configuration file.

  • compiler_parameters info: Information of compiler parameters specified in the model compilation configuration file.

  • memory info: Information related to memory usage during various stages of model compilation.

Conversion Output

The conversion output is stored in the path specified by the conversion configuration parameter working_dir. You can get the following files in this directory (* part is what you specify by the conversion configuration parameter output_model_file_prefix).

  • *_original_float_model.onnx

  • *_optimized_float_model.onnx

  • *_calibrated_model.onnx

  • *_ptq_model.onnx

  • *_quantized_model.bc

  • *_quantized_removed_model.bc(exist a situation of removing nodes at the input/output)

  • *.hbm

  • *_advice.json

  • *_quant_info.json

  • *_node_info.csv

  • *.html

  • *.json

  • hb_compile.log

The Interpret Conversion Output section explains the function of each model output. However, before running on the board, we strongly recommend you to proceed the procedures as described in the sections Model Performance Analysis and Model Accuracy Analysis , to avoid extending the model conversion problem to the subsequent embedded terminal.

If any of the above-mentioned 3 outputs of verifying the success of the model conversion is missing, there must be something wrong with the conversion. In such cases, the compile tool will output error messages to your console in case of errors. For example, if we do not configure the prototxt and caffe_model parameters during the Caffe model conversion, the tool gives the following message:

2021-04-21 14:45:34,085 ERROR Key 'model_parameters' error: Missing keys: 'caffe_model', 'prototxt' 2021-04-21 14:45:34,085 ERROR yaml file parse failed. Please double check your input 2021-04-21 14:45:34,085 ERROR exception in command: compile

Interpret Conversion Output

The outputs of the successful conversion of the model mentioned above. This section explains the use of each output.

  • The output process of *_original_float_model.onnx can be found in Model Conversion Interpretation. The computing accuracy of this model is the same as the original floating-point model. In general, you don't need to use this model. In case of errors in the conversion results, it would be helpful to provide this model to Horizon's technical support to help you solve the problem quickly.

  • The output process of *_optimized_float_model.onnx can be found in Model Conversion Interpretation. This model undergoes some operator-level optimization operations, commonly known as operator fusion. You can visually compare it with the original_float model, and clearly find out some operator structural changes, which will not affect the computational accuracy of the model. In general, you do not need to use this model. In case of errors in the conversion results, it would be helpful to provide this model to Horizon's technical support to help you solve the problem quickly.

  • The output process of *_calibrated_model.onnx can be found in Model Conversion Interpretation. This model is an intermediate product obtained by the model conversion tool chain by taking the floating-point model after structural optimization, calculating the quantization parameters corresponding to each node from the calibration data and saving them in the calibration node.

  • The output process of *_ptq_model.onnx can be found in Model Conversion Interpretation. This model is the product of pre-quantization of the model obtained from calibration by the model conversion toolchain.

  • The output process of the *_quantized_model.bc can be found in Model Conversion Interpretation. This model has completed the calibration and quantization process, and the quantized accuracy loss can be viewed here. This model is a mandatory model in the accuracy verification process, please refer to the introduction of Model Accuracy Analysis.

  • The output process of the *_quantized_removed_model.bc can be found in Model Conversion Interpretation. If during model quantization there exists a situation of removing nodes at the input/output, this removed node's HBIR model will be automatically saved. In this scenario, we recommend you to use this HBIR model to compare with the final generated hbm model if you need to do consistency comparison later.

  • The *.hbm is the model that can be used to load and run on the Horizon computing platform. After reading Embedded Application Development. You can then deploy the model to run on the computing platform quickly. However, to ensure that the performance and accuracy of the model is as good as you expect, we strongly recommend completing the Model Performance Analysis and Model Accuracy Analysis , before moving on to application development and development.

  • The *_advice.json file contains the results printed by the Horizon Model Compiler op checker.

  • The *_quant_info.json file contains the calibrated quantization information of the operators.

  • The *_node_info.csv file contains the result of the cosine similarity and other information of the operator after successful conversion, which is the same as the similarity information output in the console after successful execution of hb_compile.

  • The *.json is the model static performance evaluation file.

  • The *.html is the model static performance evaluation file (better readability).

  • The hb_compile.log is the log file generated after compilation.