This section introduces how to develop model inference applications on the Horizon platform and highlights relevant considerations you need to be aware of.
Prior to the application development, make sure that you have completed the development environment preparations as described in Environment Deployment .
The simplest application development can be divided into 3 stages: project creation, project implementation, and project compilation and operation.
However, given the fact that the development of actual business scenarios are more complicated, we also provide explanations on multi-model control concepts and suggestions on application tuning.
We recommend using CMake to manage your application development engineering.
As described in the previous sections, by now, you should have installed CMake. Before reading this section, we assume that you understand how to use CMake.
The Horizon Development Library provides relevant project dependencies. The specific dependencies are listed below:
${OE_DIR}/samples/ucp_tutorial/deps_aarch64/ucp/.The $ {OE_DIR} above refers to the OE package path provided by Horizon.
To create a new project, you need to compile the CMakeLists.txt file.
The CMakeLists.txt file defines some compilation options, as well as the path to the dependency libs and header files, as follows:
In the above sample, we did not specify the compiler location. We will specify it at the project compilation stage, as described in the section Project Compilation and Running .
This section explains you how to run the hbm models on Horizon platforms.
The simplest procedure consists of model loading, input data preparations, output memory preparations, inference and result parsing. The sample code for simple model deployment are as follows:
To keep it simple, part of the model processing in above sample is described in the form of comments. More details are explained in subsequent documents, such as:
For dynamic input instructions, please refer to section Dynamic Input Instruction.
For memory alignment rules, please refer to section Alignment Rule.
For more comprehensive instructions on the engineering implementation, refer to sections Model Inference API Instruction and Basic Sample User Guide.
Combining with CMake engineering configurations as described in Project Creation, please refer to the following compilation script:
After reading Environment Deployment, we assume that you have installed the required compiler on your dev PC, so here you only need to associate the compiler configurations in the above script with your project.
Copy the arm program to the Horizon board to run, note that the program dependencies also need to be copied to the board together, and configure the dependencies in the startup script.
For example, our sample program depends on the following libraries: libhbucp.so, libdnn.so and other bsp libraries.
These dependencies can be found in the OE package under the path ucp_tutorial/deps_aarch64/ and need to be uploaded to the board's runtime environment.
We recommend that you create a new lib path under the /userdata path on the board side and transfer the libraries to that directory, the paths to the dependency libraries that need to be specified before running the program on the board side are as follows:
In the scenarios containing multiple models, as each model need to complete the inference with limited resources, they will inevitably compete for computing resources.
To help you control the execution of multiple models, we provide control strategies for the model prioritization.
This feature is only supported on the dev board side and is not supported by the x86 simulator.
There isn't task preemption feature in the BPU computing unit hardware of the S100 ASIC. Each inference task, once put to the BPU and begins model computing, it occupies the BPU until the task is completed. At this time, other tasks have to wait in line. If the BPU is occupied by a large model inference task, then other high-priority model inference tasks cannot be executed.
To fix this, we added a software feature called BPU Resource Preemption in the Runtime SDK based on model priorities.
Pay attention to the following:
To solve these two problems, we provide supports in both model conversion and system software. The implementation principles and operation methods are as follows:
max_time_per_fc option to the extra parameter configurations in the compilation interface to set the execution time (in microseconds) for each function call.
The default value is 0 (no limits). By setting this option, you can control the execution time of individual large function calls when they are running on-board.
Suppose the execution time of a function call is 10ms, and max_time_per_fc is set to 1000 during model compilation, then this function call will be split into 10 function calls.
If you are using the PTQ scheme to process the model, you can add the max_time_per_fc parameter to the compiler-related parameters (compiler_parameters) in the YAML configuration file of the model at the model conversion stage.hbUCPSchedParam.priority parameter needs to be set when the reasoning task is submitted. High-optimization preemption nesting capabilities can also be supported according to priority.
For example, if you configure the infer task with a priority less than 254, it is a normal task and cannot preempt other tasks.
Configure infer task with a priority equal to 254 to be a high preemption task, which can support preemption of normal tasks.
Configure infer task with a priority equal to 255 to be a urgent preemptive task, which can preempt both normal and high preemptive tasks.Horizon suggested application optimization strategy includes Engineering Task Scheduling and Algorithm Task Integration.
For Engineering Task Scheduling, we recommend some workflow scheduling management tools to fully utilize the parallel-processing capabilities at different task stages.
In general, an application can be divided into 3 stages: pre-processing, model inference, and post-processing output.
A simplified workflow is as follows:
After making full use of the workflow management to achieve the parallel execution of different task stages, the ideal task processing workflow can be as follows:
For Algorithm Task Integration, we recommend multi-task models.
On one hand, it can avoid the difficulties brought by the management of multi-model scheduling to a certain extent.
On the other hand, as multi-task model can fully share the computation of the backbone, it can significantly reduce the amount of computation at the entire application level compared to using independent models, and thereby achieve higher overall performance.
Multitasking is also a common application-level optimization strategy within Horizon Robotics and in the business practices of many collaborating customers.
The hrt_model_exec is a model execution tool that can evaluate the inference performance of the model and get the model information directly on the dev board.
On one hand, it allows you to get a realistic understanding of the model's real performance; On the other hand, it also helps you to learn the speed limit that the model can achieve, which is useful information in application tuning.
The hrt_model_exec provides three types of functions including model inference infer, model performance analysis perf and viewing model information model_info. For how to use the tool, please refer to hrt_model_exec Tool Introduction.
UCP also provides performance analysis tools to assist you in locating application performance bottlenecks. Among them, UCP Trace is used to analyze the application pipline scheduling capability, and hrt_ucp_monitor is used to monitor the occupancy rate of the hardware backend.
Please refer to the section UCP Trace Instructions and The hrt_ucp_monitor Tool Introduction for how to use these tools.