Performance Evaluation and Tuning

After completing the quantization and compilation of the model to obtain the hbm model deployed on the board side, we not only support the static performance of the BPU part of the model to be predicted on the developer side, but also provide executable tools on the board side that do not require any code development to help you quickly evaluate the dynamic performance of the model.

Static Performance Evaluation

  • If you are quantizing the model through the PTQ pipeline and using the command line tool, the hb_compile tool will generate a static performance evaluation file (*.html & *.json) by default when it compiles the model, and at the same time, you can also evaluate the static performance of the model by calling the hbm_perf interface. The reference commands are as follows:

    from hbdk4.compiler import hbm_perf hbm_perf("model.hbm")

    After successful execution, basic information such as model FPS will be printed within the terminal, and at the same time, a static performance evaluation file (model.html & model.json) will be generated for the model in the directory where the API interface is currently called. If you need to specify the path to the static performance evaluation file, you can refer to the following command:

    from hbdk4.compiler import hbm_perf hbm_perf("model.hbm", output_dir="target_dir")

    The file contains four tabs: Summary, Temporal Statistics, Layer Details, and Timeline. Among them:

    1. The Summary tab provides the performance of the BPU part of the model predicted by the compiler.

    2. The Temporal Statistics tab provides the bandwidth usage of the model during the inference time of one frame.

    3. The Layer Details tab provides the computation amount, computation time, data handling time and the active time period of the compiled layer (does not represent the execution time of the layer, usually multiple layers alternate/execute in parallel) of each layer of BPU operators.

    4. The Timeline tab provides both the predicted time consumption of the instruction set and the information of the corresponding computing units during the inference time of one frame of the model.

      Some of the metrics included in the Timeline tab are as follows:

      • TAE: Tensor Acceleration Engine. It is an engine module in the BPU that is responsible for accelerating Tensor computations. It mainly takes charge of various convolution (conv) computations and can also support some Matrix computations.

      • VAE: Vector Acceleration Engine. It is an engine module in the BPU that is responsible for accelerating Vector computations. It mainly handles various element-wise operation operations in neural networks, such as A + B, A * B, and Look-Up Table (LUT) computations.

      • AAE: Auxiliary Acceleration Engine. It is an engine module in the BPU that is responsible for accelerating auxiliary computations. It mainly focuses on providing auxiliary acceleration for computations other than tensors, vectors, and scalars, such as functions like Pooling, Resize, and Warp computations.

      • TRANS: It is a computing unit in the BPU that is used to handle data layout transformations.

      • STORE: It means writing data from the internal cache/register to the memory (or outside the computing platform).

      • LOAD: It means loading data from the memory (possibly outside the computing platform) into the on-computing-platform cache.

  • If you are quantizing the model through the QAT path, you can just call the hbm_perf interface directly to perform a static performance evaluation of the model in the same way as described above.

Dynamic Performance Evaluation

Once the static performance of the model is as expected, we also support the further use of the hrt_model_exec tool to empirically measure the dynamic performance of the model due to the presence of dynamic properties in the Pyramid and Resizer input models. The reference command is as follows:

hrt_model_exec perf --model_file model.hbm

For dynamic inputs, the requirement of stride is 32-aligned in the W direction, and the detailed calculation can be referred to the relevant contents of the introduction of UCP-Dynamic Inputs.

Assume that the input size of tensor valid_shape is (1,112,112,2), tensor_type=HB_DNN_TENSOR_TYPE_U8, stride= (-1,-1,2,1), then stride is calculated as follows:

// stride[3]=sizeof(tensor_type) -> 1 // stride[2]=stride[3]*valid_shape[3] -> 2 // stride[1]=ALIGN_32(stride[2]*valid_shape[2]) -> 2*112=224 // stride[0]=stride[1]*valid_shape[1] -> 224*112=25088 #define ALIGN_32(value) ((value + (32-1)) & ~(32-1))

For example, if the Pyramid input model with Y=(1,224,224,1) and UV=(1,112,112,2), the reference command would be as follows:

hrt_model_exec perf --model_file pyramid.hbm --input_stride="50176,224,1,1;25088,224,2,1"

For the Resizer input model, the tool will specify the default ROI internally, and the default value is [0,0,127,127], so the default value can not meet the requirements of your model, you can refer to the following command to specify:

hrt_model_exec perf --model_file resizer.hbm --input_file=test.jpg,test.jpg,roi.txt --input_img_properties=Y,UV

The roi.txt is specified within the ROI (Region of Interest) by the order of the four region side coordinates left, top, right, bottom, separated by spaces, for example:

0 0 200 200

Performance Tuning Recommendations

Based on the performance evaluation and analysis above, you may find that the performance result is not as expected, at this time you can refer to the Model Performance Tuning section, we provide suggestions and measures to improve the performance of the model for tuning attempts.