Model Performance Evaluation

The hrt_model_exec perf is used to test the model performance.

In this mode, you does not need to input data, and the program automatically constructs the input tensor according to the model, and the tensor data are random numbers.

By default, the program runs 200 frames of data in a single thread. When perf_time is specified, frame_count is disabled, and the program will run for the specified period of time and then exit.

Outputs the latency and the frame rate of the model. The program prints the performance information every 200 frames: max, min, and average values of latency. If < 200 frames, prints once before the programs ends.

The program finally outputs the running-related data, including number of program threads, number of frames, total model inference time, average latency of model inference, and frame rate.

Support Range

It supports HBM (High Bandwidth Memory) model.

Warning

When conducting performance testing (perf), if the input is not specified, the tool will randomly construct it internally. However, if the model itself has a strong dependence on the input, the randomly constructed input may cause the program to core dump.

Usage

Usage: hrt_model_exec [Option...] [Parameter] [Option] [instruction] --------------------------------------------------------------------------------------------------------------- -h --help Display this information -v --version Display this version [Option] [Parameter] --------------------------------------------------------------------------------------------------------------- --model_file [string]: Model file paths, separate by comma, each represents one model file path. --model_name [string]: Model name. When model_file has one more model and Subcommand is infer or perf, "model_name" must be specified! --core_id [string]: core id, 0 for any core, 1 for core 0, 2 for core 1 and etc, default is 0. Please confirm the number of bpu cores on the board before setting up. When you need to specify multiple cores, separate them with commas, such as "1,2". --input_file [string]: Input file paths, separate by comma, each represents one input. The extension of files should be one of [jpg, JPG, jpeg, JPEG, png, PNG, bin, txt] bin for binary such as image data, nv12 or yuv444 etc. txt for plain data such as image info. --frame_count [int] : frame count for run loop, default 200, valid when perf_time is 0 in perf mode; default 1 for infer mode. --dump_intermediate [string]: dump intermediate layer input and output. The default is 0. Subcommand must be infer. --perf_time [int] : minute, perf time for run loop, default 0. Subcommand must be perf. --thread_num [int] : thread num for run loop, thread_num range:[1,32], Subcommand must be perf. --profile_path [string]: profile log and csv files path, set to get detail information of model execution. --input_img_properties [string]: Specify the color space of the image type input. Each image needs to specify the color space, separated by commas. The supported color spaces are [Y, UV]. --input_valid_shape [string]: Complete the validshape of the model input, allowing only the dynamic part to change. Provide two ways to set: 1. This only needs to be set when the validShape of the model input is dynamic. 2. Set for all inputs. Different inputs are separated by semicolons, and different dimensions are separated by commas. For example: --input_valid_shape="1,376,376,1;1,188,188,2". --input_stride [string]: Complete the stride of the model input, allowing only the dynamic part to change. Provide two ways to set: 1. This only needs to be set when the stride of model input is dynamic. 2. Set for all inputs. Different inputs are separated by semicolons, and different dimensions are separated by commas. For example: --input_stride="144384,384,1,1;72192,384,2,1". [Examples] --------------------------------------------------------------------------------------------------------------- hrt_model_exec perf --model_file --model_name --core_id --input_file --input_img_properties --input_valid_shape --input_stride --frame_count --profile_path --perf_time --thread_num

Parameters Introduction

ParameterData TypeParameter DescriptionCorrelated Parameters
-h, --helpNone.Display the help information.None.
-v, --versionNone.View the version number of the DNN prediction library of the tool.None.
perf None.Perform model performance analysis and obtain performance analysis results.This parameter is used together with model_file to get detailed information about the model.
model_filestringModel file path, multiple paths can be separated by commas.None.
model_namestringSpecify the name of a model.None.
core_idstringSpecify the running core, 0 means arbitrary core, 1 means core0, 2 means core1 and etc. Default value is 0. When you need to specify multiple cores, separate them with commas, such as "1,2".None.
input_filestringModel input information. The input of the image type, it must have one of the following file name suffixes: PNG / JPG / JPEG / png / jpg / jpeg / bin / txt. The inputs should be separated by commas ,, such as xxx.jpg,input.txt.None.
input_img_propertiesstringThe color space information of the model image input, range [Y, UV].This parameter should be used together with input_file, each image type input in input_file needs to specify a Y/UV type, and each input color space needs to be separated by an English character comma ,, such as: Y,UV.
input_valid_shapestringModel dynamic validShape input information. If the model input attribute validShape contains -1, the -1 part needs to be completed, and multiple validShape are separated by English semicolons. For example: --input_valid_shape="1,376,376,1;1,188,188,2".None.
input_stridestringModel dynamic stride input information. If the model input attribute stride contains -1, the -1 part needs to be completed, and multiple strides are separated by English semicolons. For example: --input_stride="144384,384,1,1;72192,384,2,1".None.
frame_countintThe number of running frames of the execution model.
  • When the subcommand is infer, defaults to 1.
  • When the subcommand is perf, defaults to 200.
  • When the subcommand is perf, it takes effect when perf_time is not set.
    dump_intermediatestringdump model each layer of input and output, range [0, 3]. Default value is 0.
  • When dump_intermediate=0, the dump function is turned off by default.
  • When dump_intermediate=1, the input and output data of each node layer in the model are saved as bin, where inputs and outputs of node are stride data.
  • When dump_intermediate=2, the input and output data of each node layer in the model are saved as bin and txt, where the inputs and outputs of node are stride data.
  • When dump_intermediate=3, the input and output data of each node layer in the model are saved as bin and txt, where the inputs and outputs of node are valid data.
  • None.
    perf_timeintSet perf runtime in minutes. Default value is 0.None.
    thread_numintSet the number of threads (parallelism), the value can indicate how many tasks are processed in parallel at most, range [1, 32]. Default value is 1.
    When testing latency, the value needs to be set to 1 to avoid resource preemption and get more accurate latency.
    When testing throughput, it is recommended to set > 3 * N(number of BPU cores) to adjust the number of threads so that the BPU utilization is as high as possible, and the throughput test is more accurate.
    None.
    profile_pathstringStatistical tool log generation path, run to generate profiler.log and profiler.csv, analyze op time and scheduling time consumption. For detailed instructions on the profiler.log and profiler.csv files, please refer to the profile_path Description.
    Generally, just set --profile_path=".", which means the log file will be generated in the current directory.
    None.

    profile_path Description

    After setting the profile_path parameter and the tool runs normally, profiler.log and profiler.csv files will be generated. The files include the following parameters:

    • ucp_version:UCP and HBRT version.

    • perf_result:Record perf results.

    PARAMETERDESCRIPTIONS
    FPSFrames processed per second.
    average_latencyThe average time it takes to run a frame.
    • running_condition:Operating environment information.
    PARAMETERDESCRIPTIONS
    core_idThe bpu core set by the program running.
    frame_countThe total number of frames the program runs.
    model_nameThe name of the evaluation model.
    run_timeProgram running time.
    thread_numThe number of threads the program runs on.
    • model_latency: Model node time consumption statistics.
    PARAMETERDESCRIPTIONS
    Node-padModel input padding takes time.
    Node-NodeIdx-NodeType-NodeNameTime consuming information of model nodes. Note: NodeIdx Specifies the sequence number of the model node topology, and NodeType is a specific node type, such as Dequantize, and NodeName is a specific node name.
    • processor_latency:Model processor time consumption statistics.
    PARAMETERDESCRIPTIONS
    BPU_inference_time_costInference BPU processor time per frame.
    CPU_inference_time_costInference CPU processor time per frame.
    • task_latency:Model task time-consuming statistics.
    PARAMETERDESCRIPTIONS
    TaskRunningTimeThe actual running time of the task includes the time consumed by the UCP framework.

    Usage Example

    hrt_model_exec perf --model_file=xxx.hbm ../aarch64/bin/hrt_model_exec perf --model_file=resnet50_224x224_nv12.hbm --input_stride=57344,256,1,1;28672,256,2,1 --frame_count=200 --thread_num=16 Load model to DDR cost 1390.88ms. Frame count: 200, Thread Average: 3.423510 ms, thread max latency: 84.261002 ms, thread min latency: 1.024000 ms, FPS: 2345.875977 Running condition: Thread number is: 16 Frame count is: 200 Program run time: 85.469 ms Perf result: Frame totally latency is: 684.702 ms Average latency is: 3.424 ms Frame rate is: 2345.876 FPS
    Note

    Performance testing only supports running one model at a time. When model_file contains multiple models, please set the model_name parameter to specify it.

    hrt_model_exec perf --model_file=xxx.hbm,xxx.hbm --model_name=xxx