Before calibration and inference of the model, it is necessary to perform data preprocessing operations to meet the requirements of the model. This requires us to do some data preparation, below we will detail the steps for model calibration set and inference preparation respectively.
Model calibration set preparation: Since the accuracy of the model calibration process is closely related to the correctness of the input data, in order to obtain correct calibration results and ensure the accuracy and effect of the calibrated model, we need to prepare the model calibration set.
This process requires the use of samples similar to those in the training or validation set and ensures that the calibration samples undergo data preprocessing consistent with the original floating-point model, so as to ensure the consistency of data type, shape, and layout.
Model inference preparation: HBRuntime supports a variety of model inference scenarios (ONNX model, HBIR model and HBM model). In order to ensure correct inference results, the input data used in the inference process need to be preprocessed and converted as necessary to meet the model requirements.
For example, Pyramid input, Resizer input, and multiple Batch splitting scenarios will lead to changes in the model structure and require corresponding processing and data preparation.
Next, we will introduce the processing and data preparation required before calibration/inference of the model from each of these two aspects.
If you need to do this process in the sample folder, you need to execute the 00_init.sh script in the folder first to get the corresponding original model and dataset.
When performing model calibration, 20~100 samples are required at the calibration stage, each is an independent data file. To ensure the accuracy of the calibrated models, these calibration samples better come from the training or validation dataset when training the models. In addition, please try NOT to use rare samples, e.g. single colored images or those images that don't contain any detection or classification targets in them.
You need to preprocess the samples from the training/verification sets (the preprocessing process is the same as the original floating-point model data preprocessing process),
and the calibrated samples after processing will have the same data type (input_type_train), shape (input_shape) and layout (input_layout _train) with the original floating-point model.
You can save the data as an npy file with the numpy.save command, and the toolchain will read it based on the numpy.load command when it is calibrated.
The basic flow of the data set for pre-processing is shown below:
For example, there is an ImageNet trained original classification floating-point model with only one input node, it should be described as below:
BGR.NCHW.1x3x224x224.The steps for data preprocessing of the original floating point model are as follows:
Uniformly scale the image and resize the shorter side to 256.
Get 224x224 image using the center_crop method.
Align the input layout to the NCHW required by the model.
Convert the color space to the BGR required by the model.
Adjust the range of image values to [0, 255] as required by the model.
Subtract mean value by the channel.
Data multiple by the scale factor.
The sample processing code for the above example model is as follows (to avoid excessive code length, some simple transformer implementation codes are ignored, the usage of transformer can be found in Image Processing).
Note that the input_shape parameter in the yaml file serves to specify the input data size of the original floating-point model. If it is a dynamic input model, you can use this parameter to set the converted input size, and the shape size of the calibration data should be consistent with input_shape.
For example, if the original floating-point model input node shape is ?x3x224x224 ("?" sign represents the placeholder, i.e., the first dimension of the model is dynamic input), and the input_shape: 8x3x224x224 is set in the conversion profile, then the size of each calibration data that you need to prepare is 8x3x224x224 (Please be aware that the input_batch parameter does not support modifying the model batch information for models with the first dimension of the input shape not equal to 1).
As mentioned above, to ensure that the input data meets the model requirements, the input data needs to be processed accordingly before model inference. In the following, we will introduce you the data preparation before model inference for ONNX model inference and HBIR/HBM model inference respectively.
After the graph optimization and calibration process, the input data of the generated ONNX models (*_optimized_float_model.onnx, *_calibrated_model.onnx, and *_ptq_model.onnx) have been kept consistent with the inputs of the original floating-point model.
If you have not configured the input_batch parameter in the yaml file, just make sure that the input data is the same as the original floating point model.
If you have configured the input_batch parameter in the yaml file and set it to 8,
and the original model input shape is 1x3x224x224,
then the input shape of the ptq_model.onnx model that is generated during the compilation and conversion process is 8x3x224x224.
At this point, you need to prepare the input data according to the input shape of 8x3x224x224.
Since the HBIR/HBM model may be modified during the conversion and compilation process, these modifications may cause the model's requirements for input data to change as well.
Therefore, the input data required by the model needs to be processed before model inference, and below we give several common scenarios for input data preparation.
If you have configured input_type_rt (in case of inconsistency with input_type_train),
mean_value, and scale_value/std_value parameters in the yaml file, the model will do the color transformations and normalization,
and you only need to prepare the data according to input_type_rt data type for data preparation and no normalization is required.
If you have configured the input_type_rt parameter and set it to nv12 or gray,
or set the input_source parameter to pyramid in the yaml file,
it will be regarded as a Pyramid input scenario.
The Pyramid input scenario refers to the input scenario in the form of YUV420SP (NV12). The data preparation steps are as follows:
(If necessary) Resize the image to the appropriate size.
The original input type needs to be converted to the NV12 data type that the model requires.
If you configure parameters such as mean_value and scale_value/std_value in the yaml configuration file,
no further normalization is required during the data preparation phase.
When using the Pyramid input, it will be split into Y and UV channels for shape input in order to better process the image data.
For example, if the original model input shape is 1x3x224x224, when inserting Pyramid input,
the required input_y shape is 1x224x224x1 and input_uv shape is 1x112x112x2.
The following code takes RGB data type to NV12 data type as an example for your reference, in the actual use scenario, please replace it as needed.
If you have configured the input_source parameter and set it to resizer in the yaml file, it will be regarded as a Resizer input scenario.
The Resizer input scenario refers to the scenario where the model on the BPU takes the form of YUV420SP (NV12) plus a rectangular ROI for input. The data preparation steps are as follows:
(If necessary) Resize the image to the appropriate size.
The original input type needs to be converted to the NV12 data type that the model requires.
Define an input ROI (Region of Interest) and define the ROI size by the four boundaries of the ROI tensor input coordinates left, top, right and bottom. For a detailed introduction to the Region of Interest (ROI), you can refer to the Model Deployment Practice Guidance - Model Modification - Resizer Input Insertion - ROI Introduction and Constraints section .
If you configure parameters such as mean_value and scale_value/std_value in the yaml configuration file,
no further normalization is required during the data preparation phase.
When using the Resizer input, it will be split into Y and UV channels for shape input in order to better process the image data, and the defined ROIs will be shaped together for input.
For example, if the original model input shape is 1x3x224x224, when inserting Resizer input,
the required input_y shape is 1xNonexNonex1, input_uv shape is 1xNonexNonex2 and input_roi shape is 1x4.
The following code takes RGB data type to NV12 data type as an example and also includes the definition of ROI coordinates for your reference, in the actual use scenario, please replace it as needed.
If the original model is a multi-batch model , or if you have configured the input_batch parameter in the yaml file
and set the separate_batch parameter to True or configure the separate_name parameter to specify the split node,
the corresponding input node will be split internally.
For multi-batch models, after the model is split according to the batch dimensions, each split model input prepares its own input data according to the required input type and data layout.
For example, the original model input shape is 1x3x224x224, if you set the input_batch parameter to 8 in the yaml file,
the shape of this ptq_model.onnx model will be 8x3x224x224 after calibrating processing.
At this point, if the separate_batch or separate_name parameter is also configured in the yaml file,
the model will be batch split for the corresponding input.
The input shape of the model after splitting consists of 8 1x3x224x224.
At this point, you need to prepare the data for each input according to the 1x3x224x224 shape.
The above content introduces you to the calibration set preparation of the model and the data preparation required for model inference in some typical scenarios. In practice, you can refer to this chapter to prepare the input data in a targeted way.