There are inevitable accuracy loss with the post-training model quantization that converting the floating-point models into the fixed-point models based on dozens or hundreds of calibration data. But it has been proofed by a large number of production experience that as long as the most optimized parameter combination can be found out, in most cases, Horizon's conversion tools can keep the accuracy loss about 1%.
This section explains how to correctly analyze model accuracy, the basic flow of its operation is shown below. In case the evaluation results fail your expectations, please refer to the Model Accuracy Optimization section and try to optimize the accuracy. If you still can't solve it, please don't hesitate to contact Horizon and seek for technical support.
You are expected to understand how to evaluate model accuracy when reading this section. This section explains how to run the model inference using the outputs of model conversion. As previously described, successful model conversion consists of the following model outputs:
Although the final hbm model is the one that will be deployed to the computing platform, in order to facilitate the accuracy evaluation on Ubuntu development machines. We provide *_quantized_model.bc to complete this accuracy evaluation process. The model has already been quantized and has the same accuracy results as the final hbm model. The basic process for loading the model inference model using the Horizon development library is shown below, and this following illustrative code is not only applicable to quantized model, but also applicable to original and optimized onnx models (just replace the model file). You only need to prepare corresponding data in line according to the input types and layouts of the models.