The Practical Guide to Deploying the ResNet18 Model with RGB Input

The overall process of using the PTQ pipeline of the Horizon OpenExplorer toolchain includes multiple phases such as model optimization, model calibration, model conversion to fixed-point model, model compilation and boarding. This section takes the RGB input classification model based on the public version of ResNet18 as a sample(S100 Platform), and the step-by-step deployment practice is demonstrated in use for your reference.

Prepare the Floating Point Model

To prepare the ResNet18 floating point model, here we use torchvision to export the desired floating point model.

prepare_model.py
import torch import torchvision model = torchvision.models.resnet18(pretrained=True) input_shape = (1, 3, 224, 224) input_data = torch.randn(input_shape) output_path = "resnet18.onnx" torch.onnx.export(model, input_data, output_path, input_names=["input"], output_names=["output"], opset_version=10)

Calibration Set Preparation

Information about the public version of the ResNet18 model can be found in The ResNet18 description within the Pytorch documentation, where it can be seen that the data preprocessing flow for the ResNet18 model is:

  1. The image short side deflates to 256.
  2. Resize the image to 224x224 with center crop.
  3. The data were normalized with mean taking the values [0.485, 0.456, 0.406] and std taking the values [0.229, 0.224, 0.225].

A sample data preprocessing code is shown below:

data_preprocess.py
import os import cv2 import PIL import numpy as np from PIL import Image ori_dataset_dir = "./calibration_data/imagenet" calibration_dir = "./calibration_data_rgb" def resize_transformer(image_data: np.array, short_size: int): image = Image.fromarray(image_data.astype('uint8'), 'RGB') # Specify width, height w, h = image.size if (w <= h and w == short_size) or (h <= w and h == short_size): return np.array(image) # I.e., the width of the image is the short side if w < h: resize_size = (short_size, int(short_size * h / w)) # I.e., the height of the image is the short side else: resize_size = (int(short_size * w / h), short_size) # Resize the image data = np.array(image.resize(resize_size, Image.BILINEAR)) return data def center_crop_transformer(image_data: np.array, crop_size: int): image = Image.fromarray(image_data.astype('uint8'), 'RGB') image_width, image_height = image.size crop_height, crop_width = (crop_size, crop_size) crop_top = int(round((image_height - crop_height) / 2.)) crop_left = int(round((image_width - crop_width) / 2.)) image_data = image.crop((crop_left, crop_top, crop_left + crop_width, crop_top + crop_height)) return np.array(image_data).astype(np.float32) os.mkdir(calibration_dir) for image_name in os.listdir(ori_dataset_dir): image_path = os.path.join(ori_dataset_dir, image_name) # load the image with PIL method pil_image_data = PIL.Image.open(image_path).convert('RGB') image_data = np.array(pil_image_data).astype(np.uint8) # Resize the image image_data = resize_transformer(image_data, 256) # Crop the image image_data = center_crop_transformer(image_data, 224) # Adjust the data range from [0, 255] to [0, 1] image_data = image_data * (1 / 255) # Normalization, (data - mean) / std mean = [0.485, 0.456, 0.406] image_data = image_data - mean std = [0.229, 0.224, 0.225] image_data = image_data / std # Convert format from HWC to CHW image_data = np.transpose(image_data, (2, 0, 1)).astype(np.float32) # Convert format from CHW to NCHW image_data = image_data[np.newaxis, :] # Save the npy file cali_file_path = os.path.join(calibration_dir, image_name[:-5] + ".npy") np.save(cali_file_path, image_data)

To support PTQ model calibration, we need to take a small batch dataset from the ImageNet dataset, using the first 100 images as a sample here:

./imagenet ├── ILSVRC2012_val_00000001.JPEG ├── ILSVRC2012_val_00000002.JPEG ├── ILSVRC2012_val_00000003.JPEG ├── ...... ├── ILSVRC2012_val_00000099.JPEG └── ILSVRC2012_val_00000100.JPEG

The catalog structure of the calibration set generated based on the data preprocessing code above is then as follows:

./calibration_data_rgb ├── ILSVRC2012_val_00000001.npy ├── ILSVRC2012_val_00000002.npy ├── ILSVRC2012_val_00000003.npy ├── ...... ├── ILSVRC2012_val_00000099.npy └── ILSVRC2012_val_00000100.npy

Generate Board-side Model

PTQ Conversion Link supports both command line tools and PTQ API for model quantization compilation to generate board-side models, the following is an introduction to the use of the two ways.

Command-line Tool

The command line tool approach only requires you to install horizon_tc_ui(pre-installed in the Docker environment) and create the corresponding yaml file based on the model information configuration, here we take the yaml file corresponding to the ResNet18 model with RGB input(config.yaml) to show and explain.

config.yaml
model_parameters: onnx_model: 'resnet18.onnx' march: "nash-e" working_dir: 'model_output' output_model_file_prefix: 'resnet18_224x224_rgb' input_parameters: input_name: '' input_shape: '' input_type_rt: 'featuremap' input_type_train: 'featuremap' calibration_parameters: cal_data_dir: './calibration_data_rgb' compiler_parameters: optimize_level: 'O2'
Note

Here, input_name and input_shape are left empty because the tool supports the scenario of single input with no dynamic shape (i.e., the tool internally parses the ONNX model and obtains the name and shape of the input).

When the yaml file configuration is complete, you just need to call The hb_compile Tool to execute the command, the tool executes the command and the key log is as follows:

[horizon@xxx xxx]$ hb_compile -c config.yaml Start hb_compile... INFO Start verifying yaml INFO End verifying yaml INFO Start to Horizon NN Model Convert. INFO Start to prepare the onnx model. INFO End to prepare the onnx model. INFO Start to optimize the onnx model. INFO End to optimize the onnx model. INFO Start to calibrate the model. INFO End to calibrate the model. INFO Start to precompile the model. INFO End to precompile the model. INFO End to Horizon NN Model Convert. INFO Successful covert model: /xxx/resnet18_224x224_rgb_quantized_model.bc [==================================================]100% INFO ############# Model input/output info ############# INFO NAME TYPE SHAPE DATA_TYPE INFO ------ ------ ---------------- --------- INFO input input (1, 3, 224, 224) float32 INFO output output (1, 1000) float32 INFO The hb_compile completes running

After the completion of the command execution, in the yaml file working_dir parameter configuration of the directory (model_output), will be generated as shown below each stage of the intermediate model, the final model on the board and the model information file, of which resnet18_224x224_rgb.hbm that is, the board-side of the inference can be reasoned that the model file:

./model_output ├── ... ├── resnet18_224x224_rgb_calibrated_model.onnx ├── resnet18_224x224_rgb.hbm ├── resnet18_224x224_rgb_optimized_float_model.onnx ├── resnet18_224x224_rgb_original_float_model.onnx ├── resnet18_224x224_rgb_ptq_model.onnx └── resnet18_224x224_rgb_quantized_model.bc

PTQ API

Command line tool provides high ease of use but also bring some flexibility reduction, therefore, when you have the flexibility needs, you can use the PTQ API way to complete the quantization compilation of the model, the following for you to introduce the use of the API way to generate the specific process of the board-side of the model.

Attention

Please note that due to the large number of parameters in some interfaces, only the necessary parameters are configured in the sample below to facilitate your overall practice verification, please refer to HMCT API Refernence and HBDK Tool API Reference for the full parameters of specific interfaces.

Model Optimized Calibration

First, graph optimization and calibration quantization are performed on the floating-point model, a process for which we use the HMCT API, as exemplified below:

calibration.py
import os import logging import numpy as np from hmct.api import build_model logging.basicConfig(level=logging.INFO) march = "nash" onnx_path = "./resnet18.onnx" cali_data_dir = "./calibration_data_rgb" model_name = "resnet18_224x224_rgb" working_dir = "./model_output/" cali_data = [] for cali_data_name in os.listdir(cali_data_dir): data_path = os.path.join(cali_data_dir, cali_data_name) cali_data.append(np.load(data_path)) ptq_params = { 'cali_dict': { 'calibration_data': { 'input': cali_data } }, 'debug_methods': [], 'output_nodes': [] } if not os.path.exists(working_dir): os.mkdir(working_dir) build_model(onnx_file=onnx_path, march=march, name_prefix=working_dir + model_name, **ptq_params)

After build_model is executed correctly, the ONNX model for each phase will be generated in the working_dir directory, which has the following directory structure:

./model_output ├── resnet18_224x224_rgb_calibrated_model.onnx ├── resnet18_224x224_rgb_optimized_float_model.onnx ├── resnet18_224x224_rgb_original_float_model.onnx ├── resnet18_224x224_rgb_ptq_model.onnx └── resnet18_224x224_rgb_quant_info.json

The *ptq_model.onnx file here is the ONNX model file after the graph optimization and calibration process. For a specific description of the ONNX model in the intermediate stages, please refer to the section Post-Training Quantization(PTQ) - PTQ Conversion Steps - Model Quantization and Compilation - Interpret Conversion Output.

Model Turning Fixed Point and Compilation

Next, we need to complete the PTQ model to fixed-point model and model compilation operation, this process we need to complete through the compiler's API, the sample is as follows:

compile.py
import os import onnx from hbdk4.compiler.onnx import export from hbdk4.compiler import convert, compile march = "nash-e" working_dir = "./model_output/" model_name = "resnet18_224x224_rgb" ptq_onnx_path = "./model_output/resnet18_224x224_rgb_ptq_model.onnx" if not os.path.exists(working_dir): os.mkdir(working_dir) ptq_onnx = onnx.load(ptq_onnx_path) ptq_model = export(proto=ptq_onnx, name=model_name) quantized_model = convert(m=ptq_model, march=march) compile(m=quantized_model, path=working_dir + model_name + ".hbm", march=march, progress_bar=True)

After compilation, the working_dir directory will hold the intermediate stage and final model files that can be used on the board, with the following directory structure:

./model_output ├── resnet18_224x224_rgb_calibrated_model.onnx ├── resnet18_224x224_rgb.hbm ├── resnet18_224x224_rgb_optimized_float_model.onnx ├── resnet18_224x224_rgb_original_float_model.onnx ├── resnet18_224x224_rgb_ptq_model.onnx └── resnet18_224x224_rgb_quant_info.json

Visualization

After generating the required hbm model, we support you to view it visually with the hb_model_info and hrt_model_exec tools with the following reference commands:

  • Using the hb_model_info
hb_model_info -v resnet18_224x224_rgb.hbm
  • Using the hrt_model_exec
hrt_model_exec model_info --model_file resnet18_224x224_rgb.hbm

Building Board-side Sample

  1. Dependency libraries for preparing board-side sample.

To build the boardside sample as quickly as possible, we recommend that you use samples/ucp_tutorial/deps_aarch64 directly from the OE package.

Directory as dependent libraries, and the key header files and dynamic libraries that the board-side running sample depends on are listed below:

./deps_aarch64 ├── ...... └── ucp ├── include │ └── hobot │ ├── dnn │ │ ├── hb_dnn.h │ │ ├── hb_dnn_status.h │ │ └── hb_dnn_v1.h │ ├── ...... │ ├── hb_sys.h │ ├── hb_ucp.h │ ├── hb_ucp_status.h │ └── hb_ucp_sys.h └── lib ├── ...... ├── libdnn.so └── libhbucp.so
  1. Board-side sample development

The following sample shows the process of completing one board-side model inference and obtaining the classification result TOP1 based on the binary file input and the board-side model.

main.cc
#include <fstream> #include <iostream> #include <vector> #include <cstring> #include "hobot/dnn/hb_dnn.h" #include "hobot/hb_ucp.h" #include "hobot/hb_ucp_sys.h" const char* hbm_path = "resnet18_224x224_rgb.hbm"; std::string data_path = "input.bin"; // Read binary input file int read_binary_file(std::string file_path, char **bin, int *length) { std::ifstream ifs(file_path, std::ios::in | std::ios::binary); ifs.seekg(0, std::ios::end); *length = ifs.tellg(); ifs.seekg(0, std::ios::beg); *bin = new char[sizeof(char) * (*length)]; ifs.read(*bin, *length); ifs.close(); return 0; } int main() { // Get model handle hbDNNPackedHandle_t packed_dnn_handle; hbDNNHandle_t dnn_handle; hbDNNInitializeFromFiles(&packed_dnn_handle, &hbm_path, 1); const char **model_name_list; int model_count = 0; hbDNNGetModelNameList(&model_name_list, &model_count, packed_dnn_handle); hbDNNGetModelHandle(&dnn_handle, packed_dnn_handle, model_name_list[0]); // Prepare input and output tensor std::vector<hbDNNTensor> input_tensors; std::vector<hbDNNTensor> output_tensors; input_tensors.resize(1); // This model has only one input output_tensors.resize(1); // This model has only one output // Initialize and malloc the input tensor hbDNNTensor input = input_tensors[0]; hbDNNGetInputTensorProperties(&input.properties, dnn_handle, 0); int input_memSize = input.properties.alignedByteSize; hbUCPMallocCached(&input.sysMem, input_memSize, 0); // Initialize and malloc the output tensor hbDNNTensor output = output_tensors[0]; hbDNNGetOutputTensorProperties(&output.properties, dnn_handle, 0); int output_memSize = output.properties.alignedByteSize; hbUCPMallocCached(&output.sysMem, output_memSize, 0); // Copy binary input data to input tensor int32_t data_length = 0; char *data = nullptr; auto ret = read_binary_file(data_path, &data, &data_length); memcpy(reinterpret_cast<char *>(input.sysMem.virAddr), data, input.sysMem.memSize); hbUCPMemFlush(&(input.sysMem), HB_SYS_MEM_CACHE_CLEAN); // Submit task and wait till it completed hbUCPTaskHandle_t task_handle{nullptr}; hbDNNInferV2(&task_handle, &output, &input, dnn_handle); hbUCPSchedParam ctrl_param; HB_UCP_INITIALIZE_SCHED_PARAM(&ctrl_param); ctrl_param.backend = HB_UCP_BPU_CORE_ANY; hbUCPSubmitTask(task_handle, &ctrl_param); hbUCPWaitTaskDone(task_handle, 0); // Parse inference result and calculate TOP1 hbUCPMemFlush(&output.sysMem, HB_SYS_MEM_CACHE_INVALIDATE); auto result = reinterpret_cast<float *>(output.sysMem.virAddr); float max_score = 0.0; int label = -1; for (auto i = 0; i < 1000; i++) { float score = result[i]; if (score > max_score) { label = i; max_score = score; } } std::cout << "label: " << label << std::endl; }
  1. Cross-compile to generate board-side executable program

Before cross-compiling, you need to prepare CMakeLists.txt and the sample files. CMakeLists.txt content is as follows, because the sample does not contain data preprocessing and other operations, so there are fewer dependencies, here is mainly on the compilation parameters of GCC, dependent header files and dynamic library configuration. Where dnn board-side inference library and hbucp is used to do operations on tensor.

CMakeLists.txt
# CMakeLists.txt cmake_minimum_required(VERSION 3.0) project(sample) set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -Wl,-unresolved-symbols=ignore-in-shared-libs") message(STATUS "Build type: ${CMAKE_BUILD_TYPE}") set(CMAKE_CXX_FLAGS_DEBUG "-g -O0") set(CMAKE_C_FLAGS_DEBUG "-g -O0") set(CMAKE_CXX_FLAGS_RELEASE " -O3 ") set(CMAKE_C_FLAGS_RELEASE " -O3 ") set(CMAKE_BUILD_TYPE ${build_type}) set(DEPS_ROOT ${CMAKE_CURRENT_SOURCE_DIR}/deps_aarch64) include_directories(${DEPS_ROOT}/ucp/include) link_directories(${DEPS_ROOT}/ucp/lib) add_executable(run_sample src/main.cc) target_link_libraries(run_sample dnn hbucp)

The environment directory structure for compilation is as follows:

. ├── CMakeLists.txt ├── deps_aarch64 │ └── ucp │ ├── include │ └── lib └── src └── main.cc

When the sample files and CMakeLists.txt are ready, you can compile them. A sample of the compile command is shown below:

Attention

Note that the compilation scripts have to be configured with CC and CXX as the actual paths to cross-compile GCC and G++.

#!/usr/bin/env bash # Note,please configure according to the actual path export CC=/arm-gnu-toolchain-12.2.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc export CXX=/arm-gnu-toolchain-12.2.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-g++ rm -rf arm_build; mkdir arm_build; cd arm_build cmake ..; make -j8 cd ..

Once compiled, the board-ready run_sample binary program is generated. At this point, the board-side sample build process is complete.

Preparation for Board-side Operation

When the executable program is compiled, the inputs to the model need to be prepared. Since the scene calibration set for this hands-on tutorial can be used as model input, we can just use the calibration set data directly as model input here. Of course, you can also modify the board-side program according to the pre-processing logic of the calibration set and give it to the model input (this process should be noted that the modified program needs to ensure that the same pre-processing of the original image has been done as in the calibration).

Here we simply convert the calibration set in npy format to a binary file, as shown in the following sample:

input_data.py
import numpy as np data = np.load("calibration_data_rgb/ILSVRC2012_val_00000001.npy") data.tofile("input.bin")

After completing the preparation of the model input data, which means correctly generating the input file in binary format for the board-side sample inference, you also need to make sure that you now have the following ready:

  • S100 dev board for the actual execution of board-side program runs.

  • The model(*.hbm) that can be used for board-side inference, the output of Generate Board-side Model.

  • The board-side program(main.cc file and cross-compile to generate board-side executable program), the output of Building Board-side Sample.

  • The board-side program depends on libraries, and in order to reduce deployment costs, you can directly use the contents of the OE package samples/ucp_tutorial/deps_aarch64/ucp/lib folder.

Once ready, we integrate the model file(*.hbm), input data (*.bin file), board-side program and dependent libraries as above, with the following reference directory structure:

horizon ├── input.bin ├── lib ├── resnet18_224x224_rgb.hbm └── run_sample

Copy this integrated folder as a whole to the board-side environment, refer to the following command:

scp -r horizon/ root@{board_ip}:/map/

Board-side Execution

Finally, you can configure LD_LIBRARY_PATH and run the program as follows:

horizon@hobot:/map/horizon# export LD_LIBRARY_PATH=./lib:$LD_LIBRARY_PATH horizon@hobot:/map/horizon# ./run_sample ...... label: 65

As you can see, the label: 65 printed in the log is exactly the label corresponding to the ILSVRC2012_val_00000001 image in the ImageNet dataset, i.e., the classification result is correct.

This concludes the full process of practicing the PTQ deployment of the ResNet18 model with RGB input.