Configuration Info

Environment Variables

HB_NN_LOG_LEVEL // Set the log level of NN module. The values 0, 1, 2, 3, 4, 5, and 6 correspond to Trace, Debug, Info, Warning, Error, Critical, and Never, respectively. The default level is Warning. HB_NN_HBIR_GPU_ENABLE // Set whether to use GPU acceleration, `true` means use it. HB_NN_ENABLE_MEM_LRU_CACHE // Set whether the NN module uses memory LRU cache, `true` means it is used, and the default is not used. HB_NN_MEM_LRU_CACHE_CLEAN_INTERVAL // Set the time interval for the NN module to clean up the LRU cache in milliseconds. The default value is 1000ms.

Log Level Setting Instruction

  • Log level:

The logs in NN module are mainly divided into 7 levels:

The log level can be set to 0, 1, 2, 3, 4, 5 and 6, corresponding to Trace, Debug, Info, Warning, Error, Critical, and Never, with the default being Warning.

  • Log level setting rules:

    • If the occurring log level is greater than or equal to the set level, then the log can be printed, otherwise, it will be shielded.
    • The smaller the set log level, the more information is printed. For example, if the log level is set to 3, which is the Warning level, then log at levels 3, 4, 5 can all be printed. The default log level for the NN module is the Warning level, so log messages at the Warning, Error, and Critical levels can be printed.

HBIR Model Inference Instructions

DNN supports inference of HBIR models in X86 environments. Since there may not be a GPU in your environment, GPU acceleration is not used by default.

If there is a GPU on your machine, you can use GPU acceleration by setting the environment variable HB_NN_HBIR_GPU_ENABLE to true.

Notes
  • When using GPU acceleration, make sure that the corresponding GPU driver and CUDA environment are installed on your machine. For environment requirements, please refer to Environment Deployment.

  • When using GPU acceleration, make sure libhbdnn.so is in the directory set by LD_LIBRARY_PATH.

  • When using GPU acceleration, you can set the deviceId in the control parameters to specify the GPU for calculation. Make sure the set value is in the range [0, gpu_dvice_count).

Memory LRU Cache Description

Before the BPU actually uses the BPU memory, the NN module needs to perform special processing on the memory before it can be used normally. Frequent processing of memory will increase the CPU load, which may cause performance problems.

To solve this problem, the inference library provides a memory LRU cache function, which can be used by setting the environment variable HB_NN_ENABLE_MEM_LRU_CACHE to true. The setting method is as follows:

export HB_NN_ENABLE_MEM_LRU_CACHE=true
Note
  • The input and output memory are managed based on the LRU (least recently used principle) internally. Therefore, when using the cache function, the memory you applied for will not be released immediately after calling the release interface. The memory will not be actually released until the cache is cleaned. You can control the cache cleanup time interval by setting the HB_NN_MEM_LRU_CACHE_CLEAN_INTERVAL environment variable.

  • Scenario constraints: The memory you applied for will only be profitable when it is used repeatedly.

  • This cache feature is only available for NN modules.

L2M Model Inference Support Guide

The UCP-side L2 Cache uses a static mapping scheme. You need to configure the size of the L2 Cache allocated to each BPU core via the environment variable HB_DNN_USER_DEFINED_L2M_SIZES . Sizes for each core are separated by : and the unit is MB . The size values only support integers. During the inference preparation phase, UCP will allocate the L2 Cache for all cores according to your configuration. Below are several configuration examples and explanations:

export HB_DNN_USER_DEFINED_L2M_SIZES=6:6:6:6 # Four cores, with 6MB allocated to each core export HB_DNN_USER_DEFINED_L2M_SIZES=0:6:0:0 # Four cores, 6MB allocated to core 1, no allocation for the other cores export HB_DNN_USER_DEFINED_L2M_SIZES=12:0:0:12 # Four cores, 12MB allocated to core 0, 12MB allocated to core 3, no allocation for the other cores
Attention
  1. When enabling L2 Cache optimization, inference priority cannot be set to preemptive priority.

  2. In a multi-process environment, the L2 Cache allocated to different processes is independent of each other. Therefore, it is necessary to ensure that the total L2 Cache configured across all processes does not exceed the maximum hardware L2 Cache size.

  3. The maximum L2 cache size can be viewed using the command cat /sys/kernel/debug/ion/heaps/custom.