The HBRuntime is a x86-side model inference library provided by Horizon, which supports inference on the original ONNX models directly exported by commonly used training frameworks, the ONNX models at various stages generated during the PTQ conversion process of the Horizon toolchain, and the HBIR(*.bc) models and HBM(*.hbm) models generated during the Horizon toolchain conversion process. The usage flow is shown as follows:
Reference usage when using HBRuntime for model inference is as follows:
output_names: used to specify the output name, support to specify as None or custom configuration. If there are no special requirements, here we recommend you to set it to None.
input_feed: Used to configure the inputs for the model running, which need to be prepared according to the input type and layout. The format needs to be the dict, where the input name and input data form a key-value pair.In addition, HBRuntime supports you to view model attribute information during usage, the following model attribute information is supported to be viewed. For example, if you want to print to see the model input number, you can use print(f"input_num: {sess.input_num}").
| model_attribute | DESCRIPTION |
|---|---|
| input_num | Number of model input |
| output_num | Number of model output |
| input_names | Names of model input |
| output_names | Names of model output |
| input_types | Types of model input |
| output_types | Types of model output |
| input_shapes | Shapes of model input |
| ouput_shapes | Shapes of model output |
In the following, we provide you with the usage samples of HBRuntime for two scenarios, ONNX model inference and HBIR model inference, respectively.
When performing ONNX model inference in a GPU Docker container, GPU acceleration is automatically utilized, resulting in higher efficiency.
The basic flow for loading ONNX model inference using HBRuntime is shown below, and this sample code applies to inference for all ONNX models. Prepare the data according to the input type and layout requirements of different models:
The basic flow for loading HBIR model inference using HBRuntime is shown below, and this sample code applies to inference for all HBIR models. Prepare the data according to the input type and layout requirements of models:
The basic flow for loading HBM model inference using HBRuntime is shown below, and this sample code applies to inference for all HBM model. Prepare the data according to the input type and layout requirements of different models: