The Practical Guide to Deploying the ResNet18 Model with Resizer Input

The overall process of using the PTQ pipeline of the Horizon OpenExplorer toolchain includes multiple phases such as model optimization, model calibration, model conversion to fixed-point model, model compilation and boarding. This section takes the Resizer input classification model based on the public version of ResNet18 as a sample(S100 Platform), and the step-by-step deployment practice is demonstrated in use for your reference.

Prepare the Floating Point Model

To prepare the ResNet18 floating point model, here we use torchvision to export the desired floating point model.

prepare_model.py
import torch import torchvision model = torchvision.models.resnet18(pretrained=True) input_shape = (1, 3, 224, 224) input_data = torch.randn(input_shape) output_path = "resnet18.onnx" torch.onnx.export(model, input_data, output_path, input_names=["input"], output_names=["output"], opset_version=10)

Calibration Set Preparation

Information about the public version of the ResNet18 model can be found in The ResNet18 description within the Pytorch documentation, where it can be seen that the data preprocessing flow for the ResNet18 model is:

  1. The image short side deflates to 256.
  2. Resize the image to 224x224 with center crop.
  3. The data were normalized with mean taking the values [0.485, 0.456, 0.406] and std taking the values [0.229, 0.224, 0.225].

A sample data preprocessing code is shown below:

data_preprocess.py
import os import cv2 import PIL import numpy as np from PIL import Image ori_dataset_dir = "./calibration_data/imagenet" calibration_dir = "./calibration_data_rgb" def resize_transformer(image_data: np.array, short_size: int): image = Image.fromarray(image_data.astype('uint8'), 'RGB') # Specify width, height w, h = image.size if (w <= h and w == short_size) or (h <= w and h == short_size): return np.array(image) # I.e., the width of the image is the short side if w < h: resize_size = (short_size, int(short_size * h / w)) # I.e., the height of the image is the short side else: resize_size = (int(short_size * w / h), short_size) # Resize the image data = np.array(image.resize(resize_size, Image.BILINEAR)) return data def center_crop_transformer(image_data: np.array, crop_size: int): image = Image.fromarray(image_data.astype('uint8'), 'RGB') image_width, image_height = image.size crop_height, crop_width = (crop_size, crop_size) crop_top = int(round((image_height - crop_height) / 2.)) crop_left = int(round((image_width - crop_width) / 2.)) image_data = image.crop((crop_left, crop_top, crop_left + crop_width, crop_top + crop_height)) return np.array(image_data).astype(np.float32) os.mkdir(calibration_dir) for image_name in os.listdir(ori_dataset_dir): image_path = os.path.join(ori_dataset_dir, image_name) # load the image with PIL method pil_image_data = PIL.Image.open(image_path).convert('RGB') image_data = np.array(pil_image_data).astype(np.uint8) # Resize the image image_data = resize_transformer(image_data, 256) # Crop the image image_data = center_crop_transformer(image_data, 224) # Adjust the data range from [0, 255] to [0, 1] image_data = image_data * (1 / 255) # Normalization, (data - mean) / std mean = [0.485, 0.456, 0.406] image_data = image_data - mean std = [0.229, 0.224, 0.225] image_data = image_data / std # Convert format from HWC to CHW image_data = np.transpose(image_data, (2, 0, 1)).astype(np.float32) # Convert format from CHW to NCHW image_data = image_data[np.newaxis, :] # Save the npy file cali_file_path = os.path.join(calibration_dir, image_name[:-5] + ".npy") np.save(cali_file_path, image_data)

To support PTQ model calibration, we need to take a small batch dataset from the ImageNet dataset, using the first 100 images as a sample here:

./imagenet ├── ILSVRC2012_val_00000001.JPEG ├── ILSVRC2012_val_00000002.JPEG ├── ILSVRC2012_val_00000003.JPEG ├── ...... ├── ILSVRC2012_val_00000099.JPEG └── ILSVRC2012_val_00000100.JPEG

The catalog structure of the calibration set generated based on the data preprocessing code above is then as follows:

./calibration_data_rgb ├── ILSVRC2012_val_00000001.npy ├── ILSVRC2012_val_00000002.npy ├── ILSVRC2012_val_00000003.npy ├── ...... ├── ILSVRC2012_val_00000099.npy └── ILSVRC2012_val_00000100.npy

Generate Board-side Model

PTQ Conversion Link supports both command line tools and PTQ API for model quantization compilation to generate board-side models, the following is an introduction to the use of the two ways.

Command-line Tool

The command line tool approach only requires you to install horizon_tc_ui(pre-installed in the Docker environment) and create the corresponding yaml file based on the model information configuration, here we take the yaml file corresponding to the ResNet18 model with resizer input(config.yaml) to show and explain.

config.yaml
model_parameters: onnx_model: 'resnet18.onnx' march: "nash-e" working_dir: 'model_output' output_model_file_prefix: 'resnet18_224x224_nv12_resizer' input_parameters: input_name: '' input_shape: '' input_type_rt: 'nv12' input_type_train: 'rgb' input_layout_train: 'NCHW' : 'data_mean_and_scale' # Formula with [0.485 * 255, 0.456 * 255, 0.406 * 255] mean_value: "123.675 116.28 103.53" # Formula with [1 / (0.229*255), 1 / (0.224*255), 1 / (0.225*255)] scale_value: "0.01712475 0.017507 0.01742919" calibration_parameters: cal_data_dir: './calibration_data_rgb' compiler_parameters: optimize_level: 'O2' input_source: input: resizer
Note

Here, input_name and input_shape are left empty because the tool supports the scenario of single input with no dynamic shape (i.e., the tool internally parses the ONNX model and obtains the name and shape of the input).

When the yaml file configuration is complete, you just need to call The hb_compile Tool to execute the command, the tool executes the command and the key log is as follows:

[horizon@xxx xxx]$ hb_compile -c config.yaml INFO Start hb_compile... INFO Start verifying yaml INFO End verifying yaml INFO Start to Horizon NN Model Convert. INFO Start to prepare the onnx model. INFO End to prepare the onnx model. INFO Start to optimize the onnx model. INFO End to optimize the onnx model. INFO Start to calibrate the model. INFO End to calibrate the model. INFO Start to precompile the model. INFO End to precompile the model. INFO End to Horizon NN Model Convert. INFO Successful covert model: /xxx/resnet18_224x224_nv12_resizer_quantized_model.bc [==================================================]100% INFO ############# Model input/output info ############# INFO NAME TYPE SHAPE DATA_TYPE INFO --------- ------ ------------------ --------- INFO input_y input [1, None, None, 1] UINT8 INFO input_uv input [1, None, None, 2] UINT8 INFO input_roi input [1, 4] INT32 INFO output output [1, 1000] FLOAT32 INFO The hb_compile completes running

After the completion of the command execution, in the yaml file working_dir parameter configuration of the directory (model_output), will be generated as shown below each stage of the intermediate model, the final model on the board and the model information file, of which resnet18_224x224_nv12_resizer.hbm that is, the board-side of the inference can be reasoned that the model file:

./model_output ├── resnet18_224x224_nv12_resizer_advice.json ├── resnet18_224x224_nv12_resizer_calibrated_model.onnx ├── resnet18_224x224_nv12_resizer.hbm ├── resnet18_224x224_nv12_resizer.html ├── resnet18_224x224_nv12_resizer.json ├── resnet18_224x224_nv12_resizer_node_info.csv ├── resnet18_224x224_nv12_resizer_optimized_float_model.onnx ├── resnet18_224x224_nv12_resizer_original_float_model.onnx ├── resnet18_224x224_nv12_resizer_ptq_model.onnx ├── resnet18_224x224_nv12_resizer_quant_info.json └── resnet18_224x224_nv12_resizer_quantized_model.bc

PTQ API

Command line tool provides high ease of use but also bring some flexibility reduction, therefore, when you have the flexibility needs, you can use the PTQ API way to complete the quantization compilation of the model, the following for you to introduce the use of the API way to generate the specific process of the board-side of the model.

Attention

Please note that due to the large number of parameters in some interfaces, only the necessary parameters are configured in the sample below to facilitate your overall practice verification, please refer to HMCT API Refernence and HBDK Tool API Reference for the full parameters of specific interfaces.

Model Optimized Calibration

First, graph optimization and calibration quantization are performed on the floating-point model, a process for which we use the HMCT API, as exemplified below:

calibration.py
import os import logging import numpy as np from hmct.api import build_model logging.basicConfig(level=logging.INFO) march = "nash" onnx_path = "./resnet18.onnx" cali_data_dir = "./calibration_data_rgb" model_name = "resnet18_224x224_nv12_resizer" working_dir = "./model_output/" cali_data = [] for cali_data_name in os.listdir(cali_data_dir): data_path = os.path.join(cali_data_dir, cali_data_name) cali_data.append(np.load(data_path)) ptq_params = { 'cali_dict': { 'calibration_data': { 'input': cali_data } }, 'debug_methods': [], 'output_nodes': [] } if not os.path.exists(working_dir): os.mkdir(working_dir) build_model(onnx_file=onnx_path, march=march, name_prefix=working_dir + model_name, **ptq_params)

After build_model is executed correctly, the ONNX model for each phase will be generated in the working_dir directory, which has the following directory structure:

./model_output ├── resnet18_224x224_nv12_resizer_calibrated_model.onnx ├── resnet18_224x224_nv12_resizer_optimized_float_model.onnx ├── resnet18_224x224_nv12_resizer_original_float_model.onnx ├── resnet18_224x224_nv12_resizer_ptq_model.onnx └── resnet18_224x224_nv12_resizer_quant_info.json

The *ptq_model.onnx file here is the ONNX model file after the graph optimization and calibration process. For a specific description of the ONNX model in the intermediate stages, please refer to the section Post-Training Quantization(PTQ) - PTQ Conversion Steps - Model Quantization and Compilation - Interpret Conversion Output.

Model Turning Fixed Point and Compilation

Next, we need to complete the PTQ model to fixed-point model and model compilation operation, this process we need to complete through the compiler's API, the sample is as follows:

compile.py
import os import onnx from hbdk4.compiler.onnx import export from hbdk4.compiler import convert, compile march = "nash-e" working_dir = "./model_output/" model_name = "resnet18_224x224_nv12_resizer" ptq_onnx_path = "./model_output/resnet18_224x224_nv12_resizer_ptq_model.onnx" if not os.path.exists(working_dir): os.mkdir(working_dir) # load onnx model ptq_onnx = onnx.load(ptq_onnx_path) # Convert onnx model to hbir model ptq_model = export(proto=ptq_onnx, name=model_name) func = ptq_model.functions[0] mean = [0.485, 0.456, 0.406] std = [0.229, 0.224, 0.225] # Convert format from NCHW to NHWC func.inputs[0].insert_transpose([0, 3, 1, 2]) # Insert node for color conversion and normalization func.inputs[0].insert_image_preprocess(mode="yuvbt601full2rgb", divisor=255, mean=mean, std=std, is_signed=True) # Insert node for conversion from nv12 to yuv444 func.inputs[0].insert_roi_resize(mode="nv12") # Convert type from float to int quantized_model = convert(m=ptq_model, march=march) compile(m=quantized_model, path=working_dir + model_name + ".hbm", march=march, progress_bar=True)

After compilation, the working_dir directory will hold the intermediate stage and final model files that can be used on the board, with the following directory structure:

./model_output ├── resnet18_224x224_nv12_resizer_calibrated_model.onnx ├── resnet18_224x224_nv12_resizer_.hbm ├── resnet18_224x224_nv12_resizer_optimized_float_model.onnx ├── resnet18_224x224_nv12_resizer_original_float_model.onnx ├── resnet18_224x224_nv12_resizer_ptq_model.onnx └── resnet18_224x224_nv12_resizer_quant_info.json

Visualization

After generating the required hbm model, we support you to view it visually with the hb_model_info and hrt_model_exec tools with the following reference commands:

  • Using the hb_model_info
hb_model_info -v resnet18_224x224_nv12_resizer.hbm
  • Using the hrt_model_exec
hrt_model_exec model_info --model_file resnet18_224x224_nv12_resizer.hbm

Building Board-side Sample

  1. Dependency libraries for preparing board-side sample.

To build the boardside sample as quickly as possible, we recommend that you use samples/ucp_tutorial/deps_aarch64 directly from the OE package.

Directory as dependent libraries, and the key header files and dynamic libraries that the board-side running sample depends on are listed below:

./deps_aarch64 ├── ...... └── ucp ├── include │ └── hobot │ ├── dnn │ │ ├── hb_dnn.h │ │ ├── hb_dnn_status.h │ │ └── hb_dnn_v1.h │ ├── ...... │ ├── hb_sys.h │ ├── hb_ucp.h │ ├── hb_ucp_status.h │ └── hb_ucp_sys.h └── lib ├── ...... ├── libdnn.so └── libhbucp.so
  1. Board-side sample development

The following sample shows the process of completing one board-side model inference and obtaining the classification result TOP1 based on the binary file input and the board-side model.

Note

You can refer to the read_image_2_nv12 function in the example code below for how to prepare the input y and uv for the resizer model.

main.cc
#include <fstream> #include <cstring> #include <iostream> #include <map> #include <vector> #include "hobot/dnn/hb_dnn.h" #include "hobot/hb_ucp.h" #include "hobot/hb_ucp_sys.h" const char *model_file = "resnet18_224x224_nv12_resizer.hbm"; std::string data_y_path = "ILSVRC2012_val_00000001_y.bin"; std::string data_uv_path = "ILSVRC2012_val_00000001_uv.bin"; typedef struct Roi { int32_t left; int32_t top; int32_t right; int32_t bottom; } Roi; int read_image_2_nv12(std::string &y_path, std::string &uv_path, std::vector<hbUCPSysMem> &image_mem, int &input_h, int &input_w); int prepare_roi_mem(const std::vector<Roi> &rois, std::vector<hbUCPSysMem> &roi_mem); int prepare_image_tensor(const std::vector<hbUCPSysMem> &image_mem, int input_h, int input_w, hbDNNHandle_t dnn_handle, std::vector<hbDNNTensor> &input_tensor); int read_binary_file(std::string file_path, char **bin, int *length); /** * prepare roi tensor * @param[in] roi_mem: roi mem info * @param[in] dnn_handle: dnn handle * @param[in] roi_tensor_id: tensor id of roi input in model * @param[out] roi_tensor: roi tensor */ int prepare_roi_tensor(const hbUCPSysMem *roi_mem, hbDNNHandle_t dnn_handle, int32_t roi_tensor_id, hbDNNTensor *roi_tensor); /** * prepare out tensor * @param[in] dnn_handle: dnn handle * @param[out] output: output tensor */ int prepare_output_tensor(hbDNNHandle_t dnn_handle, std::vector<hbDNNTensor> &output); int main(int argc, char **argv) { // load model hbDNNPackedHandle_t packed_dnn_handle; hbDNNHandle_t dnn_handle; const char **model_name_list; int model_count = 0; // Step1: get model handle hbDNNInitializeFromFiles(&packed_dnn_handle, &model_file, 1); hbDNNGetModelNameList(&model_name_list, &model_count, packed_dnn_handle); hbDNNGetModelHandle(&dnn_handle, packed_dnn_handle, model_name_list[0]); // Step2: set input data to nv12 // In the sample, since the input is a same image, can allocate a memory for // reusing. image_mems is to save image data for y and uv. std::vector<hbUCPSysMem> image_mems(2); // image input size int input_h = 224; int input_w = 224; read_image_2_nv12(data_y_path, data_uv_path, image_mems, input_h, input_w); // Step3: prepare roi mem /** * Suppose to infer 2 roi tasks of data, the number of ROIs to be prepared is * also 2. */ // left = 0, top = 0 right = 223, bottom = 223 Roi roi_1 = {0, 0, 223, 223}; // left = 1, top = 1, right = 223, bottom = 223 Roi roi_2 = {1, 1, 223, 223}; std::vector<Roi> rois; rois.push_back(roi_1); rois.push_back(roi_2); int roi_num = 2; std::vector<hbUCPSysMem> roi_mems(2); prepare_roi_mem(rois, roi_mems); // Step4: prepare input and output tensor std::vector<std::vector<hbDNNTensor>> input_tensors(roi_num); std::vector<std::vector<hbDNNTensor>> output_tensors(roi_num); for (int i = 0; i < roi_num; ++i) { // prepare input tensor int input_count = 0; hbDNNGetInputCount(&input_count, dnn_handle); input_tensors[i].resize(input_count); // prepare image tensor /** Tips: * In the sample, all tasks use the same image, so allocate memory to * save image. all input tensor can reuse the memory. if your model has * different input image, please allocate different memory for all inputs. * */ prepare_image_tensor(image_mems, input_h, input_w, dnn_handle, input_tensors[i]); auto roi_tensor_id = 2; prepare_roi_tensor(&roi_mems[i], dnn_handle, roi_tensor_id, &input_tensors[i][roi_tensor_id]); // prepare output tensor int output_count = 0; hbDNNGetOutputCount(&output_count, dnn_handle); output_tensors[i].resize(output_count); prepare_output_tensor(dnn_handle, output_tensors[i]); } // Step5: run inference hbUCPTaskHandle_t task_handle{nullptr}; /** Tips: * In the sample, submit multiple tasks at the same time * when taskHandle is nullptr, here create a new task,and * when taskHandle is created but not submitted yet, attach new task to the previous which represents multi model task * */ for (int i = 0; i < roi_num; ++i) { hbDNNInferV2(&task_handle, output_tensors[i].data(), input_tensors[i].data(), dnn_handle); } // submit multi tasks hbUCPSchedParam infer_ctrl_param; HB_UCP_INITIALIZE_SCHED_PARAM(&infer_ctrl_param); hbUCPSubmitTask(task_handle, &infer_ctrl_param); // wait task done hbUCPWaitTaskDone(task_handle, 0); // Step6: do postprocess with output data for every task // Find the max score and corresponding label for (auto roi_idx = 0; roi_idx < roi_num; roi_idx++) { auto result = reinterpret_cast<float *>(output_tensors[roi_idx][0].sysMem.virAddr); float max_score = 0.0; int label = -1; for (auto i = 0; i < 1000; i++) { float score = result[i]; if (score > max_score) { label = i; max_score = score; } } std::cout << "label: " << label << std::endl; } // Step7: release resources // release task handle hbUCPReleaseTask(task_handle); // free input mem for (auto &mem : image_mems) { hbUCPFree(&mem); } for (auto &mem : roi_mems) { hbUCPFree(&mem); } // free output mem for (auto &tensors : output_tensors) { for (auto &tensor : tensors) { hbUCPFree(&(tensor.sysMem)); } } // release model hbDNNRelease(packed_dnn_handle); return 0; } #define ALIGN(value, alignment) (((value) + ((alignment)-1)) & ~((alignment)-1)) #define ALIGN_32(value) ALIGN(value, 32) int prepare_image_tensor(const std::vector<hbUCPSysMem> &image_mem, int input_h, int input_w, hbDNNHandle_t dnn_handle, std::vector<hbDNNTensor> &input_tensor) { // y and uv tensor for (int i = 0; i < 2; i++) { hbDNNGetInputTensorProperties(&input_tensor[i].properties, dnn_handle, i); input_tensor[i].sysMem = image_mem[i]; /** Tips: * roi model should modify input valid shape to input image shape. * here the struct of y/uv shape is NHWC * */ input_tensor[i].properties.validShape.dimensionSize[1] = input_h; input_tensor[i].properties.validShape.dimensionSize[2] = input_w; if (i == 1) { // uv input input_tensor[i].properties.validShape.dimensionSize[1] /= 2; input_tensor[i].properties.validShape.dimensionSize[2] /= 2; } /** Tips: * For input tensor, stride should be set according to real padding * of the user's data. And 32 bytes alignment is the requirement of y/uv * */ input_tensor[i].properties.stride[1] = ALIGN_32(input_tensor[i].properties.stride[2] * input_tensor[i].properties.validShape.dimensionSize[2]); input_tensor[i].properties.stride[0] = input_tensor[i].properties.stride[1] * input_tensor[i].properties.validShape.dimensionSize[1]; } return 0; } int prepare_roi_tensor(const hbUCPSysMem *roi_mem, hbDNNHandle_t dnn_handle, int32_t roi_tensor_id, hbDNNTensor *roi_tensor) { hbDNNGetInputTensorProperties(&roi_tensor->properties, dnn_handle, roi_tensor_id); roi_tensor->sysMem = *roi_mem; return 0; } int prepare_output_tensor(hbDNNHandle_t dnn_handle, std::vector<hbDNNTensor> &output) { for (size_t i = 0; i < output.size(); i++) { hbDNNGetOutputTensorProperties(&output[i].properties, dnn_handle, i); hbUCPMallocCached(&output[i].sysMem, output[i].properties.alignedByteSize, 0); } return 0; } int read_binary_file(std::string file_path, char **bin, int *length) { std::ifstream ifs(file_path, std::ios::in | std::ios::binary); ifs.seekg(0, std::ios::end); *length = ifs.tellg(); ifs.seekg(0, std::ios::beg); *bin = new char[sizeof(char) * (*length)]; ifs.read(*bin, *length); ifs.close(); return 0; } /** You can define read_image_2_other_type to prepare your data **/ int read_image_2_nv12(std::string &y_path, std::string &uv_path, std::vector<hbUCPSysMem> &image_mem, int &input_h, int &input_w) { // copy y data auto w_stride = ALIGN_32(input_w); int32_t y_mem_size = input_h * w_stride; hbUCPMallocCached(&image_mem[0], y_mem_size, 0); uint8_t *y_data_dst = reinterpret_cast<uint8_t *>(image_mem[0].virAddr); int32_t y_data_length = 0; char *y_data = nullptr; read_binary_file(y_path, &y_data, &y_data_length); memcpy(reinterpret_cast<char *>(image_mem[0].virAddr), y_data, y_mem_size); // copy uv data int32_t uv_height = input_h / 2; int32_t uv_width = input_w / 2; int32_t uv_mem_size = uv_height * w_stride; hbUCPMallocCached(&image_mem[1], uv_mem_size, 0); int32_t uv_data_length = 0; char *uv_data = nullptr; read_binary_file(uv_path, &uv_data, &uv_data_length); memcpy(reinterpret_cast<char *>(image_mem[1].virAddr), uv_data, uv_mem_size); // make sure cahced mem data is flushed to DDR before inference hbUCPMemFlush(&image_mem[0], HB_SYS_MEM_CACHE_CLEAN); hbUCPMemFlush(&image_mem[1], HB_SYS_MEM_CACHE_CLEAN); free(y_data); free(uv_data); return 0; } int prepare_roi_mem(const std::vector<Roi> &rois, std::vector<hbUCPSysMem> &roi_mem) { auto roi_size = rois.size(); roi_mem.resize(roi_size); for (auto i = 0; i < roi_size; ++i) { int32_t mem_size = 4 * sizeof(int32_t); hbUCPMallocCached(&roi_mem[i], mem_size, 0); int32_t *roi_data = reinterpret_cast<int32_t *>(roi_mem[i].virAddr); // The order of filling in the corner points of roi tensor is left, top, right, bottom roi_data[0] = rois[i].left; roi_data[1] = rois[i].top; roi_data[2] = rois[i].right; roi_data[3] = rois[i].bottom; // make sure cahced mem data is flushed to DDR before inference hbUCPMemFlush(&roi_mem[i], HB_SYS_MEM_CACHE_CLEAN); } return 0; }
  1. Cross-compile to generate board-side executable program

Before cross-compiling, you need to prepare CMakeLists.txt and the sample files. CMakeLists.txt content is as follows, because the sample does not contain data preprocessing and other operations, so there are fewer dependencies, here is mainly on the compilation parameters of GCC, dependent header files and dynamic library configuration. Where dnn board-side inference library and hbucp is used to do operations on tensor.

CMakeLists.txt
# CMakeLists.txt cmake_minimum_required(VERSION 3.0) project(sample) set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -Wl,-unresolved-symbols=ignore-in-shared-libs") message(STATUS "Build type: ${CMAKE_BUILD_TYPE}") set(CMAKE_CXX_FLAGS_DEBUG "-g -O0") set(CMAKE_C_FLAGS_DEBUG "-g -O0") set(CMAKE_CXX_FLAGS_RELEASE " -O3 ") set(CMAKE_C_FLAGS_RELEASE " -O3 ") set(CMAKE_BUILD_TYPE ${build_type}) set(DEPS_ROOT ${CMAKE_CURRENT_SOURCE_DIR}/deps_aarch64) include_directories(${DEPS_ROOT}/ucp/include) link_directories(${DEPS_ROOT}/ucp/lib) add_executable(run_sample src/main.cc) target_link_libraries(run_sample dnn hbucp)

The environment directory structure for compilation is as follows:

. ├── CMakeLists.txt ├── deps_aarch64 │ └── ucp │ ├── include │ └── lib └── src └── main.cc

When the sample files and CMakeLists.txt are ready, you can compile them. A sample of the compile command is shown below:

Attention

Note that the compilation scripts have to be configured with CC and CXX as the actual paths to cross-compile GCC and G++.

#!/usr/bin/env bash # Note,please configure according to the actual path export CC=/arm-gnu-toolchain-12.2.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc export CXX=/arm-gnu-toolchain-12.2.rel1-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-g++ rm -rf arm_build; mkdir arm_build; cd arm_build cmake ..; make -j8 cd ..

Once compiled, the board-ready run_sample binary program is generated. At this point, the board-side sample build process is complete.

Preparation for Board-side Operation

When the executable program is compiled, it needs to be prepared for input to the model. In order to reduce the operation and dependency configuration cost of the practice, here we through the python to do the data processing, of course, you can also according to the data processing logic in the board side of the sample through the C++ (need to make sure that the data processing logic is the same), the sample is as follows:

input_data.py
import os import cv2 import PIL import numpy as np from PIL import Image image_path = "./ILSVRC2012_val_00000001.JPEG" def resize_transformer(image_data: np.array, short_size: int): image = Image.fromarray(image_data.astype('uint8'), 'RGB') # Specify width, height w, h = image.size if (w <= h and w == short_size) or (h <= w and h == short_size): return np.array(image) # I.e., the width of the image is the short side if w < h: resize_size = (short_size, int(short_size * h / w)) # I.e., the height of the image is the short side else: resize_size = (int(short_size * w / h), short_size) # Resize the image data = np.array(image.resize(resize_size, Image.BILINEAR)) return data def center_crop_transformer(image_data: np.array, crop_size: int): image = Image.fromarray(image_data.astype('uint8'), 'RGB') image_width, image_height = image.size crop_height, crop_width = (crop_size, crop_size) crop_top = int(round((image_height - crop_height) / 2.)) crop_left = int(round((image_width - crop_width) / 2.)) image_data = image.crop((crop_left, crop_top, crop_left + crop_width, crop_top + crop_height)) return np.array(image_data).astype(np.float32) def rgb_to_nv12(image_data: np.array): r = image_data[:, :, 0] g = image_data[:, :, 1] b = image_data[:, :, 2] y = (0.299 * r + 0.587 * g + 0.114 * b) u = (-0.169 * r - 0.331 * g + 0.5 * b + 128)[::2, ::2] v = (0.5 * r - 0.419 * g - 0.081 * b + 128)[::2, ::2] uv = np.zeros(shape=(u.shape[0], u.shape[1] * 2)) for i in range(0, u.shape[0]): for j in range(0, u.shape[1]): uv[i, 2 * j] = u[i, j] uv[i, 2 * j + 1] = v[i, j] y = y.astype(np.uint8) uv = uv.astype(np.uint8) return y, uv if __name__ == '__main__': # load the image with PIL method pil_image_data = PIL.Image.open(image_path).convert('RGB') image_data = np.array(pil_image_data).astype(np.uint8) # Resize the image image_data = resize_transformer(image_data, 256) # Crop the image image_data = center_crop_transformer(image_data, 224) # Covert format from RGB to nv12 y, uv = rgb_to_nv12(image_data) y.tofile("ILSVRC2012_val_00000001_y.bin") uv.tofile("ILSVRC2012_val_00000001_uv.bin")

After completing the preparation of the model input data, which means correctly generating the input file in binary format for the board-side sample inference, you also need to make sure that you now have the following ready:

  • S100 dev board for the actual execution of board-side program runs.

  • The model(*.hbm) that can be used for board-side inference, the output of Generate Board-side Model.

  • The board-side program(main.cc file and cross-compile to generate board-side executable program), the output of Building Board-side Sample.

  • The board-side program depends on libraries, and in order to reduce deployment costs, you can directly use the contents of the OE package samples/ucp_tutorial/deps_aarch64/ucp/lib folder.

Once ready, we integrate the model file(*.hbm), input data (*.bin files), board-side program and dependent libraries as above, with the following reference directory structure:

horizon ├── ILSVRC2012_val_00000001_y.bin ├── ILSVRC2012_val_00000001_uv.bin ├── lib ├── resnet18_224x224_nv12_resizer.hbm └── run_sample

Copy this integrated folder as a whole to the board-side environment, refer to the following command:

scp -r horizon/ root@{board_ip}:/map/

Board-side Execution

Finally, you can configure LD_LIBRARY_PATH and run the program as follows:

horizon@hobot:/map/horizon# export LD_LIBRARY_PATH=./lib:$LD_LIBRARY_PATH horizon@hobot:/map/horizon# ./run_sample ...... label: 65

As you can see, the label: 65 printed in the log is exactly the label corresponding to the ILSVRC2012_val_00000001 image in the ImageNet dataset, i.e., the classification result is correct.

This concludes the full process of practicing the PTQ deployment of the ResNet18 model with Resizer input.