UCP Trace provides the ability to in-depth analysis of the scheduling logic of UCP applications by embedding trace recording on the critical path executed by UCP. When performance anomalies occur, it can quickly locate the time point of the anomaly by analyzing UCP trace.
UCP trace provides two trace backend options: Perfetto Trace and Chrome Trace.
You can choose between them by setting an environment variable to meet your sepecific performance tracking needs.
Perfetto trace can retrieve ucp recorded traces, as well as system status, ftrace information, etc.Chrome trace can only retrieve ucp recoreded traces and is mainly used to analyze UCP's scheduling logic.The UCP trace tool and configuration files are located in the samples/ucp_tutorial/tools directory path, with a directory structures as follows:
| Environmental Variable | Range of values | Default values | Description |
|---|---|---|---|
HB_UCP_ENABLE_PERFETTO | true,false | false | Whether to enable the perfetto trace, defaults to not starting. |
HB_UCP_PERFETTO_CONFIG_PATH | Perfetto configuration file path | "" | Specify the path to the perfetto configuration file. By default, it is empty, if not specified, use defalut system backend. |
HB_UCP_TRACE_LOG_LEVEL | [0, 6] | 6 | Specify the UCP trace log level, which defaults to 6 and is not output. |
HB_UCP_USE_ALOG | true,false | false | Whether to enable the alog sink,defaults disabled. If enabled, logs will be output to the alog buffer and can be captured using logcat while logging is disabled for terminal output. |
Perfetto Trace has a higher priority and if export HB_UCP_ENABLE_PERFETTO=true while export UCP_TRACE_LOG_LEVEL=0 is also set,
then only perfetto trace will be started and the ucp trace log will be ignored.
UCP add trace records in the application API and internal critical scheduling paths, including task trace records and operator records.
| Name | Description |
|---|---|
hbDNNInfer | Create a model inference task |
hbVPxxx | Create a vision process task |
hbHPLxxx | Create a high performance compute task |
hbUCPSubmitTask | Submit Task |
${TaskType}::Wait | Wait task done |
TaskSetDone | Notify task done |
hbUCPReleaseTask | Release task |
| Name | Description |
|---|---|
SubmitOp | Submit operator |
OpInfer | Operator inference |
OpFnish | Operator finish |
Perfetto is a system analysis tool developed and open-sourced by Google, which can collect performance data from different data sources and provides the Perfetto UI for data visualization and analysis.
For more details on Perfetto, please refer to the Perfetto offical document.
| Parameter | Date Type | Parameter Description | Correlated Parameters |
backend | string | Function DESCRIPTION:
| None. |
trace_config | string | Function DESCRIPTION: It is available when the backend is set to in_process, the file is protobuf text format. | None. |
The UCP trace configuration file is not necessary when your application has already initiaized Perfetto, you only need to export HB_UCP_ENABLE_PERFETTO=true to enable Perfetto.
By default, Perfetto stores trace data in an in-memory buffer until the trace session ends, at which point the data in the buffer is dumped to a file. If the trace data exceeds the buffer capacity, there is no guarantee of data integrity.
Perfetto supports periodically writing buffer data to a file, which can be achieved by adding the following fields to the trace configuration file.
| Configuraiton | Type | Description |
|---|---|---|
| write_into_file | bool | true enable periodic write into a file, which is not enabled by default. |
| file_write_period_ms | uint32 | Set the write cycle to the file, with a default value of 5s. You can set an appropirate write cycle based on the size of the data generated per second and the capacity of the trace buffer. |
| max_file_size_bytes | uint64 | Set the maximum value for the trace file, after which the trace will automatically terminate, with no default limit. |
You can configure UCP to use the Perfetto by specifying it through the environment variable HB_UCP_PERFETTO_CONFIG_PATH.
When selecting the system for backend, there is no need to specify trace_config separately for UCP.
For detailed information about perfetto configuration files, please refer to Perfetto TraceConfig Reference.
UCP provides reference configuration files ucp_in_process.cfg and ucp_system.cfg, which can be modified based on application scenario.
In system mode, BPU trace capture is supported. Simply add BPU trace data source to the perfetto configuration file. The ucp_bpu_trace.cfg file has already defaultly included the BPU trace data source.
The specific configuration items are shown below.
The bpu_trace_period_ms is used to set the period for reading the BPU trace. You can adjust this parameter according to your actual usage scenario. When the BPU load is high,
you can appropriately shorted the reading period to avoid the problem of trace data being overwritten due to mismatched read and write speeds.
Currently, the BPU Trace feature does not support dynamic runtime activation. To capture BPU Trace data in real-time during application execution, the functionality must be manually enabled before launching the application using the following system command:
echo 1 > /sys/devices/system/bpu/bpu0/trace. Before executing this command, ensure that the value of /sys/devices/system/bpu/bpu0/power_enable is 0. If it is not 0, please execute echo 0 > /sys/devices/system/bpu/bpu0/power_enable first.
In system mode, DSP trace capture is supported. Simply add DSP trace data source to the perfetto configuration file. The ucp_dsp_trace.cfg file has already defaultly included the DSP trace data source.
The valid configuration items are shown below.
The dsptrace_period_ms is used to set the period for reading the DSP trace. You can adjust this parameter according to your actual usage scenario. When the DSP load is high,
you can appropriately shorted the reading period to avoid the problem of trace data being overwritten due to mismatched read and write speeds.
Currently, the DSP Trace feature does not support dynamic runtime activation. To capture DSP Trace data in real-time during application execution, the functionality must be manually enabled before launching the application using the following system command:
in_process mode, capture the trace information within the processIn the in_process mode, only trace within UCP process can be captured, and it is not necessary to start the background process of perfetto.
In the ucp_in_process.json, the configuration file for perfetto is specified as ucp_in_process.cfg, and the output_path specifies the path for output trace file.
Due to the fact that Perfetto does not support directly overwriting existing trace files, if the file already exists, it needs to be deleted first.
hrt_model_exec as an example.Due to the specified file path is a relative path, the trace configuration file and scripts need to be placed in the same level directory as the running program. Also, you need to make sure that you configure the environment variables and run the program in the same shell environment.
ucp.pftrace specified by the perfetto command, and you can use Perfetto UI to open it.
Perfetto UI are as follows, for more detailed operation Instructions, please refer to the help interface.| Operations | Description |
|---|---|
w or ctrl + scroll up with the mouse wheeel | Zoom in |
s or ctrl + scroll down with the mouse wheeel | zoom out |
a or drag the time bar to the left | Pan left |
d or drag the time bar to the right | Pan right |
? | Show help |
system mode, capture the trace information within the processIn system mode, UCP trace is just one of the data sources, so it it necessary to run the corresponding commands for tracebox to complete the capture of trace.
hrt_model_exec as an example.To be able to capture complete data, it is necessary to ensure that the perfetto process does not exit before the hrt_model_exec execution is complete.
ucp.pftrace, and you can use Perfetto UI to open it.
To demonstrate the BPU trace during the inference process of multiple models, an example of a multi-process application is provided here. Except for the different running programs being launched, the rest of the steps are the same as in the previous section.
Visualization of BPU trace requires the use of the hbperfetto tool, which is custom-developed by Horizon Robotics. You can obtain this tool by contacting the Horizon Robotics system software technical support personnel.
The effect of opening a trace file using hbperfetto is shown in the image below.
The scheduling of different model inference tasks presented in BPU trace is shown in the following figure.
hbperfetto supports the association of UCP trace and BPU trace. The following diagram illustrates the complete process from the creating, submission,
scheduling and execution, to the task's completion and eventual releases.
Additionally, you can query the raw data of BPU trace based on SQL.
| Event type | Data source name | hbperfetto customize | Configuraiton | Description |
|---|---|---|---|---|
| Application's track events | track_event | No | track_event_config | Used to capture data from applications that use the perfetto sdk api for instrumentation. |
| ftrace | linux.ftrace | No | ftrace_config | Specify the events to capture through ftrace_events, such as sched/sched_switch. The specific supported events can be viewed through /sys/kernel/tracing/available_events. For detailed information on ftrace_config, please refer to FtraceConfig。 |
| System memory | linux.sys_stats | No | sys_stats_config | Specify the sampling period through the meminfo_period_ms, specify the type of data to capture through meminfo_counters, such as MEMINFO_MEM_AVAILABLE. For detailed information on sys_stats_config, please refer to SysStatsConfig。 |
| Process memory | linux.process_stats | No | process_stats_config | Specify the sampling period through the proc_stats_poll_ms. For detailed information on process_stats_config, please refer to ProcessStatsConfig。 |
| CPU usage | linux.sys_stats | No | sys_stats_config | Specify the sampling peroid through the stat_period_ms. |
| perf | linux.perf | No | perf_event_config | Record process call stack and perf count. For detailed information on perf_event_config, please refer to PerfEventConfig。 |
| DDR bandwidth | linux.sys_stats | Yes | sys_stats_config | Record DDR read and write bandwidth, and specify the sampling period through the ddrinfo_period_ms. |
| ION memory | linux.sys_stats | Yes | sys_stats_config | Record ION memory information, and specify the sampling period through the ion_period_ms. |
| BPU usage | linux.sys_stats | Yes | sys_stats_config | Record BPU usage, and specify the sampling peroid through the bpuinfo_period_ms. |
| BPU trace | linux.sys_stats | Yes | sys_stats_config | Record BPU trace information, and specify the sampling peroid through the bputrace_period_ms. |
| DSP trace | linux.sys_stats | 是 | sys_stats_config | 记录DSP trace信息,通过 dsptrace_period_ms 设置采样周期。 |
Chrome trace only supports capturing UCP trace, and does not support capturing data sources. For capturing multiple data sources, please use Perfetto trace. The characteristic of Chrome trace is simplicity and ease of use, using text logs to record traces without depending on any extra thrid-party libraries or tools. If you are only interested in the scheduling logic of UCP, you can use Chrome trace to capture it.
Before starting new capture, it is recommended to delete the old log files to avoid interference from old data.
hrt_model_exec as an example.After capturing the trace logs, run the catch_trace.sh provided in the UCP distribution package to convert the raw trace logs into a json-formatted trace file.
ucp_trace_task.json and ucp_trace_thread.json using Perfetto UI.Open ucp_trace_thread.json by Perfetto UI:
