Algorithm Model PTQ + On-board Deployment Quick Start

To help you quickly get started, this section introduces the basic workflow of the PTQ scheme using ResNet50 as an example to illustrate the details.

The basic workflow is as follows:

Attention

Before starting, make sure you have completed the environment installation on both dev PC and dev board by following the section Environment Deployment.

Floating-point Model Preparation

The OE package provides you with rich PTQ model samples under the samples/ai_toolchain/horizon_model_convert_sample path. The ResNet50 model sample is located under the 03_classification/03_resnet50 path.

Please first execute the 00_init.sh script to obtain the corresponding calibration dataset and the original model of the sample.

If you need to convert a private model, refer to section Prepare Floating-point Model Preparation to prepare opset=10-19 onnx models in advance. The following table shows reference schemes for converting different frameworks to ONNX model formats.

Training FrameworkReference Scheme
PytorchExport using the Official API.
TensorflowCovert using the onnx/tensorflow-onnx Tool of the ONNX community.
PaddlePaddleExport using the Official API.
MXNet2OnnxExport using the Official API.
Other frameworksCovert using the Reference Scheme.

Model Checking

After the floating-point model is ready, we recommend a quick checking of the model to ensure that it meets the support constraints of the computing platform. For the ResNet50 model in ONNX format, we can complete the model checking by typing the following command in the command line:

hb_compile --model resnet50.onnx \ --march nash-e

If your model is a multi-input model, you can refer to the following command:

hb_compile --march ${march} \ --proto ${caffe_proto} \ --model ${caffe_model/onnx_model} \ --input-shape input.0 1x1x224x224 --input-shape input.1 1x1x224x224 --input-shape input.2 1x1x224x224

The main parameters of the hb_compile tool are as follows, for more parameter descriptions, refer to section Model Checking.

ParameterDescription
--marchUsed to specify the type of processor to be adapted, for S100 processor, please set to nash-e, for S100P processor, please set to nash-m.
--protoThis parameter is only valid if model-type specifies Caffe, the value is the name of the prototxt file of the Caffe model. You do not need to specify it if your model is the ONNX model.
--modelWhen model is a Caffe model, the value is the name of the caffemodel file of the Caffe model. When model is an ONNX model, the value is the ONNX model file name.

Take the ResNet50 model as an example, you can execute the 01_check.sh script to quickly complete model checking.

The main contents of the 01_check.sh script file are as follows:

set -ex cd $(dirname $0) || exit onnx_model="../../01_common/model_zoo/mapper/classification/resnet50/resnet50.onnx" march="nash-e" hb_compile --model ${onnx_model} --march ${march}

If the model checking fails, confirm the error messages and modification suggestions according to the hb_compile.log file printed on the terminal or generated under the current path, please refer to section Model Checking for more instructions.

Model Conversion

After the model checking passes, you can use the hb_compile tool to covert the model, refer to the following command:

hb_compile -c resnet50_config.yaml

Among them, resnet50_config.yaml is the configuration file corresponding to the model conversion, for details please refer to section Configuration File Template.

In addition, the model quantization of the PTQ scheme also depends on a certain number of pre-processed samples for calibration, which is described in section Pre-processing Calibration Data.

YAML Configuration File

The YAML configuration file contains 4 required parameter groups (model_parameters, input_parameters, calibration_parameters, compiler_parameters) and 1 optional parameter group (custom_op).

Each parameter group contains both required and optional parameters (optional parameters are hidden by default), you can refer to section Specific Parameter Information for specific requirements and filling methods.

Note

For ONNX model, you should configure the onnx_model parameter, instead of configuring the caffe_model and prototxt parameters in the model_parameters parameter group.

  • The input_type_rt and input_type_train parameters in the input_parameters parameter group are used to specify the data type (e.g., NV12) that the model will receive when it is actually deployed on the board and the data type (e.g., RGB) for its own training, respectively.

    When the two data types are inconsistent, the conversion tool automatically inserts a BPU-accelerated preprocessing node in the frontend of the model to complete the corresponding color space conversion.

    Meanwhile, the mean_value, scale_value and std_value parameters in this parameter group can also be used to configure the data normalization operation of the image input model, which will be integrated into the preprocessing node for BPU acceleration by the conversion tool after configuration.

    The formula for data normalization is as follows:

    data_norm=(datamean_value)scale_valuedata\_norm = (data - mean\_value) * scale\_value

  • The cal_data_dir parameter in the calibration_parameters parameter group needs to be configured with the path of the preprocessed calibration data folder, refer to the section Pre-processing Calibration Data for the descriptions of the preprocessing method.

Pre-processing Calibration Data

Attention
  • Please note that before doing this step, make sure that you have already finished obtaining the calibration dataset by executing the 00_init.sh script in the corresponding sample directory.

  • If you are currently only concerned with model performance, you do not need to configure cal_data_dir parameter in the yaml and skip this subsection. The tool will perform pseudo calibration to facilitate quick verification.

The calibration data of the PTQ scheme is generally screened from the training set or verification set of about 100 (can be appropriately increased or decreased) typical data, and should avoid very rare and unusual samples, such as solid color images, images without any detection or classification targets, etc.

The filtered calibration data should also needs to be preprocessed in the same way as that before the model inference, and after processing, the data type (input_type_train), layout (input_layout_train) and size (input_shape) should stay the same as the original model.

For the preprocessing of calibration data, Horizon recommends directly using and modifying the sample code. Take the ResNet50 model as an example, the calibration_transformers function in the preprocess.py file contains the pre-processing transformers for its calibration data, and the processed calibration data is consistent with its YAML configuration file, that is:

  • input_type_train : 'rgb'

  • input_layout_train : 'NCHW'

def calibration_transformers(): transformers = [ PILResizeTransformer(size=short_size), PILCenterCropTransformer(size=crop_size), HWC2CHWTransformer(), ScaleTransformer(scale_value=1 / 255), MeanTransformer(means=np.array([0.485, 0.456, 0.406])), ScaleTransformer( scale_value=np.array([1 / 0.229, 1 / 0.224, 1 / 0.225])) ] return transformers

Where transformers are defined in ../../../01_common/python/data/transformer.py file, please refer to section Image Processing Transformer. You can choose to modify and extend them as needed.

After modifying the preprocess.py file, you can modify and execute the 02_preprocess.sh script to complete the preprocessing of the calibration data.

bash 02_preprocess.sh

The main contents of the 02_preprocess.sh script file is as follows:

set -e -v cd $(dirname $0) || exit python3 ../../data_preprocess.py \ --src_dir ../../01_common/calibration_data/imagenet \ --dst_dir ./calibration_data_rgb \ --pic_ext .rgb \ --read_mode PIL \ --saved_data_type float32

The parameters of the data_preprocess.py file is described as follows:

  • src_dir: Path to the raw calibration data.

  • dst_dir: Storage path of the processed data, which can be customized.

  • pic_ext: File suffix of the processed data, which is mainly used to help remember the data type and can be left unconfigured.

  • read_mode: Image reading mode, which can be configured as skimage, opencv or PIL. Note that the format of the image read by skimage is RGB with the data range of 0-1, the format of the image read by opencv is BGR with the data range of 0-255, the format of the image read by PIL is RGB with the data range of 0-255.

  • saved_data_type: Type of saved data after processing.

If you choose to write your own python code to pre-process the calibration data, you can use the numpy.save command to save it as a npy file, which will be read by the toolchain calibration based on the numpy.load command.

Model Conversion

After preparing the calibration data and YAML configuration file, you can complete the entire process of model parsing, graph optimization, calibration, quantization, and compilation conversion in one command.

For a detailed explanation of the internal process, please refer to section Model Conversion Interpretation.

After conversion, the following outputs will be saved under the working_dir path specified by the YAML file. For details, refer to section Interpret Conversion Output.

├── resnet50_224x224_nv12.html # Static performance evaluation file (better readability) ├── resnet50_224x224_nv12.json # Static performance evaluation file ├── resnet50_224x224_nv12.hbm # Models for loading and running on the Horizon computing platform ├── resnet50_224x224_nv12_calibrated_model.onnx ├── resnet50_224x224_nv12_ptq_model.onnx ├── resnet50_224x224_nv12_optimized_float_model.onnx ├── resnet50_224x224_nv12_original_float_model.onnx ├── resnet50_224x224_nv12_quant_info.json # File containing operators calibrated quantization information ├── resnet50_224x224_nv12_advice.json # Results file printed by the Horizon Model Compiler op checker ├── resnet50_224x224_nv12_node_info.csv # Results file containing cosine similarity and other information of the operator ├── resnet50_224x224_nv12_quantized_model.bc └── hb_compile.log # Log file generated after compilation
Note

If during model conversion there exists a situation of removing nodes at the input/output, the *_quantized_removed_model.bc will also be saved. In this scenario, we recommend you to use this HBIR model to compare with the final generated hbm model if you need to do consistency comparison later.

Fast Performance Verification

For the xxx.hbm model file generated by the conversion, Horizon supports both first estimating the static performance of the BPU part of the model on the dev PC first, and providing an executable tool on the board to quickly evaluate the dynamic performance without any coding.

For more detailed descriptions and performance tuning recommendations, refer to sections Model Performance Analysis and Model Performance Optimization.

Static Performance Evaluation

As described in section Model Conversion, after model conversion, HTML and JSON files containing static performance evaluation information for the model will be generated under the working_dir path. But the HTML file is more readable. The following is the HTML file generated by the ResNet50 model conversion, where:

  • The Summary tab provides the performance of the BPU part of the model predicted by the compiler.

    summary_tab

  • The Temporal Statistics tab provides the bandwidth usage of the model during the inference time of one frame.

    temporal_statistics_tab

  • The Layer Details tab provides the computation amount, computation time, data handling time and the active time period of the compiled layer (does not represent the execution time of the layer, usually multiple layers alternate/execute in parallel) of each layer of BPU operators.

    layer_details_tab

  • The Timeline tab provides the predicted time consumption of the instruction set within the inference time of one frame of the model and the information of the corresponding computing units.

    timeline_tab

    Some of the metrics included in the Timeline tab are as follows:

    • TAE: Tensor Acceleration Engine. It is an engine module in the BPU that is responsible for accelerating Tensor computations. It mainly takes charge of various convolution (conv) computations and can also support some Matrix computations.

    • VAE: Vector Acceleration Engine. It is an engine module in the BPU that is responsible for accelerating Vector computations. It mainly handles various element-wise operation operations in neural networks, such as A + B, A * B, and Look-Up Table (LUT) computations.

    • AAE: Auxiliary Acceleration Engine. It is an engine module in the BPU that is responsible for accelerating auxiliary computations. It mainly focuses on providing auxiliary acceleration for computations other than tensors, vectors, and scalars, such as functions like Pooling, Resize, and Warp computations.

    • TRANS: It is a computing unit in the BPU that is used to handle data layout transformations.

    • STORE: It means writing data from the internal cache/register to the memory (or outside the computing platform).

    • LOAD: It means loading data from the memory (possibly outside the computing platform) into the on-computing-platform cache.

Dynamic Performance Evaluation

Once the static performance of the model meets expectations, we can further evaluate the dynamic performance of the model on the board, and the reference method is as follows:

  1. Make sure you have completed the environment deployment of the dev board according to section Environment Deployment.

  2. Copy the xxx.hbm model generated by the conversion to any path in the /userdata folder of the dev board.

  3. Use the hrt_model_exec perf tool to quickly evaluate the time consumption and frame rate of the model.

# Copy the model to the dev board scp model_output/resnet50_224x224_nv12.hbm root@{board_ip}:/userdata # Log in to the dev board to evaluate the performance ssh root@{board_ip} cd /userdata # Evaluate the latency in the single-BPU core single-threaded serial state hrt_model_exec perf --model_file resnet50_224x224_nv12.hbm --thread_num 1 --frame_count 1000 --input_stride="50176,224,1,1;25088,224,2,1" # Evaluate the FPS in the dual-BPU cores single-threaded concurrent state hrt_model_exec perf --model_file resnet50_224x224_nv12.hbm --core_id 0 --thread_num 8 --frame_count 1000 --input_stride="50176,224,1,1;25088,224,2,1"

The main parameters of the hrt_model_exec tool are described as follows, refer to the section hrt_model_exec Tool Introduction for more instructions:

ParameterTypeDescription
model_filestring[Required] Model file path
core_idstring

[Optional] Used to specify the BPU operation core, defaults to 0. When specifying multiple cores, separate them with commas, such as "1,2".
0: Arbitrary core, the prediction library will automatically allocate schedules according to the load.
1: core0, 2: core1, and so on.

thread_numint[Optional] Number of threads to run the program, optional range [1,32], defaults to 1.
frame_countint[Optional] Total number of frames the model runs, defaults to 200.
profile_pathstring[Optional] Statistical tool log generation path, run to generate profiler.log and profiler.csv, analyze op time and scheduling time consumption.
Note
  • If you can't find the hrt_model_exec tool on the board side, you can run the install_linux.sh or the install_qnx.sh script(depending on your actual environment) under the package/board path in the OE package again.

  • When evaluating Latency, you can specify thread_num as 1 for single-threaded serial reasoning.

  • When evaluating FPS, you usually use multi-threaded concurrent reasoning to fill up BPU resources, so you can configure core_id to 0 and thread_num to be multi-threaded.

  • When the model input is dynamic, please fill in the input_valid_shape and input_stride parameters according to the actual input.

  • If you configure the profile_path parameter, the program needs to run normally before the profiler.log log file and the profiler.csv file will be generated, please do not use the Ctrl+C command to interrupt the program.

When the dynamic performance of the model does not meet expectations, refer to the section Model Performance Optimization for performance tuning.

Accuracy Verification

Once the performance of the model has been verified as expected, subsequent accuracy verification can be performed. Please first ensure that you have prepared the relevant evaluation datasets and mounted them in a Docker container. The dataset used for the samples can be obtained by referring to the section Dataset Download.

As described in the section Model Conversion, the model conversion generates two quantized models, xxx_quantized_model.bc and xxx.hbm, and the their outputs are kept numerically consistent.

You can also use the hb_verifier tool in the dev PC environment for consistency verification, the reference command is as follows, refer to section The hb_verifier Tool for detailed descriptions:

hb_verifier -m quantized.bc,model.hbm -i runtime_input_data.npy

Compared to xxx.hbm, Horizon recommends prioritizing the quantization accuracy of the xxx_quantized_model.bc in the Python environment on the dev PC as it is much easier and faster, refer to the section Development Machine Python Environment Verification.

xxx.hbm is evaluated on the board side based on C++ code, refer to the section Development Board C++ Environment Verification. For more detailed accuracy verification and optimization recommendations, refer to the sections Model Accuracy Analysis and PTQ Model Accuracy Optimization.

Development Machine Python Environment Verification

Take the ResNet50 model as an example, for the single inference and verification set accuracy evaluation sample of the quantized model resnet50_224x224_nv12_quantized_model.bc, refer to the scripts 04_inference.sh and 05_evaluate.sh in the sample directory. The reference commands are as follows:

# Test the single picture inference results of the quantitative model bash 04_inference.sh # Test the single-picture inference results of the floating-point model (optional) bash 04_inference.sh origin # Test the accuracy of the quantitative model, make sure that your evaluation dataset is properly mounted in the Docker container bash 05_evaluate.sh # Test the accuracy of the floating-point model (optional) bash 05_evaluate.sh origin

The two scripts will perform the inference by calling ../../cls_inference.py and ../../cls_evaluate.py accordingly. Taking the cls_inference.py file as an example, the usage logic of main interfaces in the code is as follows:

import numpy as np from typing import Iterable from horizon_tc_ui import HBRuntime from preprocess import infer_image_preprocess from postprocess import postprocess # For the HBIR model with data type NV12, the NV12 data needs to be split into two inputs, y and uv def nv12_split_yuv(target_size: Iterable, input_shapes: list, input_data: np.ndarray) -> list: width, height = target_size image = input_data.flatten() y_data = image[:width * height].reshape(input_shapes[0]) uv_data = image[width * height:].reshape(input_shapes[1]) return [y_data, uv_data] def inference(sess, image_name, input_layout) -> None: if input_layout is None: input_layout = sess.layout[0] # Preprocessing input_names = sess.input_names output_names = sess.output_names image_data = infer_image_preprocess(image_name, input_layout) image_data_processed = nv12_split_yuv(target_size=sess.sess.get_hw(), input_shapes=sess.input_shapes, input_data=image_data) feed_data = dict(zip(input_names, image_data_processed)) # Model inference output = sess.run(output_names, feed_data) # Postprocessing top_five_label_probs = postprocess(output) def main(model, image, input_layout) -> None: sess = HBRuntime(model=model) inference(sess, image, input_layout) if __name__ == '__main__': main()

The preprocessing operations of the infer_image_preprocess function come from the preprocess.py file described in the section Pre-processing Calibration Data. Compared to the calibration_transformers function, there is an additional process of transforming the data to input_type_rt (for parameter descriptions, see the section Yaml Configuration File). The data preparation for HBIR model inference (including input_type_rt to input_type_train color conversion, mean/scale processing, etc.) will be done internally, so there is no need to do additional external normalization and other processing. The specific code is as follows:

def infer_transformers(input_layout="NHWC"): transformers = [ PILResizeTransformer(size=short_size), PILCenterCropTransformer(size=crop_size), RGB2NV12Transformer(data_format="HWC") ] return transformers def infer_image_preprocess(image_file, input_layout): transformers = infer_transformers(input_layout) image = SingleImageDataLoader(transformers, image_file, imread_mode='PIL') return image

Development Board C++ Environment Verification

On the development board side, Horizon provides the unify compute platform (UCP for short) to help you quickly complete the deployment of models, and provide related samples. You can refer to section Model Deployment to learn the basics of model deployment and BPU SDK API interface, and then to section AI-Benchmark to learn the complete code framework of the sample model accuracy evaluation.

Model Deployment

The OE package provides a basic sample of model deployment in the following path to facilitate you to learn how to use the Model Inference API interface of UCP. For details on the sample, refer to the section Basic Sample User Guide.

Attention

Please note that before you can deploy the model, you need to obtain the models used on the board.

  • Execute resolve_ai_benchmark_ptq.sh and resolve_ai_benchmark_qat.sh scripts in the samples/ai_toolchain/model_zoo/runtime/ai_benchmark directory.

  • Execute resolve_runtime_sample.sh script in the samples/ai_toolchain/model_zoo/runtime/basic_samples directory.

Among them, the code/00_quick_start/src/main.cc file in the sample directory provides the complete flow code of the resnet50 model from preparing data to model inference, and then execution of post-processing to produce classification results.

The full link sample from camera input can be found in samples/ucp_tutorial/all-round .

The main code logic in main.cc includes the following 6 steps. For instructions on the API interfaces involved in the code, refer to the section BPU SDK API DOC.

  1. Load the model and get the model handle.

  2. Prepare the model input and output tensor and apply the corresponding BPU memory space.

  3. Read the model input data and put it into the requested input tensor.

  4. Infer the model and get the model output.

  5. Implement the model post-processing based on the data in the output tensor.

  6. Release the related resources.

int main(int argc, char **argv) { // Step1: get model handle { hbDNNInitializeFromFiles(&packed_dnn_handle, &modelFileName, 1); hbDNNGetModelNameList(&model_name_list, &model_count, packed_dnn_handle); hbDNNGetModelHandle(&dnn_handle, packed_dnn_handle, model_name_list[0]); } // Step2: prepare input and output tensor std::vector<hbDNNTensor> input_tensors; std::vector<hbDNNTensor> output_tensors; int input_count = 0; int output_count = 0; { hbDNNGetInputCount(&input_count, dnn_handle); hbDNNGetOutputCount(&output_count, dnn_handle); input_tensors.resize(input_count); output_tensors.resize(output_count); prepare_tensor(input_tensors.data(), output_tensors.data(), dnn_handle); } // Step3: set input data to input tensor { // read a single picture for input_tensor[0], for multi_input model, you // should set other input data according to model input properties. read_image_2_tensor_as_nv12(FLAGS_image_file, input_tensors.data()); } // Step4: run inference { // make sure memory data is flushed to DDR before inference for (int i = 0; i < input_count; i++) { hbUCPMemFlush(&input_tensors[i].sysMem, HB_SYS_MEM_CACHE_CLEAN); } // generate task handle hbDNNInferV2(&task_handle, output, input_tensors.data(), dnn_handle) // submit task hbUCPSchedParam ctrl_param; HB_UCP_INITIALIZE_SCHED_PARAM(&ctrl_param); ctrl_param.backend = HB_UCP_BPU_CORE_ANY; hbUCPSubmitTask(task_handle, &ctrl_param); // wait task done hbUCPWaitTaskDone(task_handle, 0); } // Step5: do postprocess with output data std::vector<Classification> top_k_cls; { // make sure CPU read data from DDR before using output tensor data for (int i = 0; i < output_count; i++) { hbUCPMemFlush(&output_tensors[i].sysMem, HB_SYS_MEM_CACHE_INVALIDATE); } get_topk_result(output, top_k_cls, FLAGS_top_k); for (int i = 0; i < FLAGS_top_k; i++) { LOGI("TOP {} result id: {}", i, top_k_cls[i].id); } } // Step6: release resources { // release task handle hbUCPReleaseTask(task_handle); // free input mem for (int i = 0; i < input_count; i++) { hbUCPFree(&(input_tensors[i].sysMem); } // free output mem for (int i = 0; i < output_count; i++) { hbUCPFree(&(output_tensors[i].sysMem)); } // release model hbDNNRelease(packed_dnn_handle); } return 0; }

The sample running is referenced as follows:

# Enter the code directory cd samples/ucp_tutorial/dnn/basic_samples/code # resolve dependencies bash resolve.sh # In the dev PC environment, perform cross-compilation to generate executable programs bash build_aarch64.sh # Copy the runtime directory to the board mkdir -p ../runtime/model/runtime/ scp -r ../runtime/ root@{board_ip}:/userdata # Copy the model file to the board scp -r ../../model_zoo/runtime/basic_samples/resnet50/ root@{board_ip}:/userdata/runtime/model/runtime # Log in to the dev board ssh root@{board_ip} # Go to the runtime/script/ directory and execute the corresponding run script cd /userdata/runtime/script/00_quick_start/ bash run_resnet_nv12.sh

AI-Benchmark

The OE package also provides sample packages for typical classification, detection, segmentation, and optical flow sample model board-side performance and accuracy evaluation under the samples/ai_toolchain/ucp_tutorial/dnn/ai_benchmark path, on top of which you can continue with further application development.

For more details, you can refer to the section AI Benchmark User Guide.

Application Development

When you are satisfied with the performance and accuracy of the model, you can then continue with the development of the upper-level application by referring to the steps described in the section Embedded Application Development.