The hrt_model_exec perf is used to test the model performance.
In this mode, you does not need to input data, and the program automatically constructs the input tensor according to the model, and the tensor data are random numbers.
By default, the program runs 200 frames of data in a single thread. When perf_time is specified, frame_count is disabled, and the program will run for the specified period of time and then exit.
Outputs the latency and the frame rate of the model. The program prints the performance information every 200 frames: max, min, and average values of latency. If < 200 frames, prints once before the programs ends.
The program finally outputs the running-related data, including number of program threads, number of frames, total model inference time, average latency of model inference, and frame rate.
It supports HBM (High Bandwidth Memory) model.
When conducting performance testing (perf), if the input is not specified, the tool will randomly construct it internally. However, if the model itself has a strong dependence on the input, the randomly constructed input may cause the program to core dump.
| Parameter | Parameter Description | Correlated Parameters | |
|---|---|---|---|
-h, --help | None. | Display the help information. | None. |
-v, --version | None. | View the version number of the DNN prediction library of the tool. | None. |
perf | None. | Perform model performance analysis and obtain performance analysis results. | This parameter is used together with model_file to get detailed information about the model. |
model_file | string | Model file path, multiple paths can be separated by commas. | None. |
model_name | string | Specify the name of a model. | None. |
core_id | string | Specify the running core, 0 means arbitrary core, 1 means core0, 2 means core1 and etc. Default value is 0. When you need to specify multiple cores, separate them with commas, such as "1,2". | None. |
input_file | string | Model input information. The input of the image type, it must have one of the following file name suffixes: PNG / JPG / JPEG / png / jpg / jpeg / bin / txt. The inputs should be separated by commas ,, such as xxx.jpg,input.txt. | None. |
input_img_properties | string | The color space information of the model image input, range [Y, UV]. | This parameter should be used together with input_file, each image type input in input_file needs to specify a Y/UV type, and each input color space needs to be separated by an English character comma ,, such as: Y,UV. |
input_valid_shape | string | Model dynamic validShape input information. If the model input attribute validShape contains -1, the -1 part needs to be completed, and multiple validShape are separated by English semicolons. For example: --input_valid_shape="1,376,376,1;1,188,188,2". | None. |
input_stride | string | Model dynamic stride input information. If the model input attribute stride contains -1, the -1 part needs to be completed, and multiple strides are separated by English semicolons. For example: --input_stride="144384,384,1,1;72192,384,2,1". | None. |
frame_count | int | The number of running frames of the execution model. infer, defaults to 1.perf, defaults to 200. | When the subcommand is perf, it takes effect when perf_time is not set. |
dump_intermediate | string | dump model each layer of input and output, range [0, 3]. Default value is 0.dump_intermediate=0, the dump function is turned off by default.dump_intermediate=1, the input and output data of each node layer in the model are saved as bin, where inputs and outputs of node are stride data.dump_intermediate=2, the input and output data of each node layer in the model are saved as bin and txt, where the inputs and outputs of node are stride data.dump_intermediate=3, the input and output data of each node layer in the model are saved as bin and txt, where the inputs and outputs of node are valid data. | None. |
perf_time | int | Set perf runtime in minutes. Default value is 0. | None. |
thread_num | int | Set the number of threads (parallelism), the value can indicate how many tasks are processed in parallel at most, range [1, 32]. Default value is 1.When testing latency, the value needs to be set to 1 to avoid resource preemption and get more accurate latency. When testing throughput, it is recommended to set > 3 * N(number of BPU cores) to adjust the number of threads so that the BPU utilization is as high as possible, and the throughput test is more accurate. | None. |
profile_path | string | Statistical tool log generation path, run to generate profiler.log and profiler.csv, analyze op time and scheduling time consumption. For detailed instructions on the profiler.log and profiler.csv files, please refer to the profile_path Description.Generally, just set --profile_path=".", which means the log file will be generated in the current directory. | None. |
After setting the profile_path parameter and the tool runs normally, profiler.log and profiler.csv files will be generated. The files include the following parameters:
ucp_version:UCP and HBRT version.
perf_result:Record perf results.
| PARAMETER | DESCRIPTIONS |
|---|---|
FPS | Frames processed per second. |
average_latency | The average time it takes to run a frame. |
| PARAMETER | DESCRIPTIONS |
|---|---|
core_id | The bpu core set by the program running. |
frame_count | The total number of frames the program runs. |
model_name | The name of the evaluation model. |
run_time | Program running time. |
thread_num | The number of threads the program runs on. |
| PARAMETER | DESCRIPTIONS |
|---|---|
Node-pad | Model input padding takes time. |
Node-NodeIdx-NodeType-NodeName | Time consuming information of model nodes. Note: NodeIdx Specifies the sequence number of the model node topology, and NodeType is a specific node type, such as Dequantize, and NodeName is a specific node name. |
| PARAMETER | DESCRIPTIONS |
|---|---|
BPU_inference_time_cost | Inference BPU processor time per frame. |
CPU_inference_time_cost | Inference CPU processor time per frame. |
| PARAMETER | DESCRIPTIONS |
|---|---|
TaskRunningTime | The actual running time of the task includes the time consumed by the UCP framework. |
Performance testing only supports running one model at a time. When model_file contains multiple models, please set the model_name parameter to specify it.