Reducing the amount of input and output data transferred between the X86 side and the board side can improve the tool's performance. The tool can provide transmission optimization support for the following three usage scenarios.
The model input is fixed or updated periodically. If the input tensor of a multi-frame inference model remains unchanged, the input tensor does not need to be transmitted repeatedly; it only needs to be stored on the board side, and subsequent inferences can directly reuse the tensor on the board.
Model output filtering: Unused model outputs do not need to be transmitted back to the X86 side.
In multi-model scenarios, the output of a preceding model can be directly used as the input for a subsequent model. In this case, the output tensor of the preceding model can be stored on the board side without being transmitted back, and the corresponding input for the subsequent model also does not need to be transmitted, as it can use the tensor stored on the board.
Class HTensor
HTensor is used as the input and output tensor in transmission optimization scenarios. It is a wrapper class for tensors on the X86 side or the board side, designed to provide a unified data interface and restrict modifications to certain attributes to ensure data consistency.
1). HTensor Member Method: __init__
Initialize an HTensor object.
| PARAMETER | DESCRIPTIONS |
|---|---|
data | Tensor data to be encapsulated. Supported types include numpy.ndarray , torch.Tensor , or None . |
device | Storage device of the tensor: None (unavailable), "cpu" (X86 side), "bpu" (board side), or ["cpu", "bpu"] (both). |
key | A key value used on the board side to uniquely identify the tensor. |
2). data Attribute of HTensor
Retrieves or assigns the tensor data. When assigning new data, if the existing data is not None, the data type must match that of the original.
3). device Attribute of HTensor
Obtains the storage device information of the tensor. This attribute is immutable once the object is constructed.
4). key Attribute of HTensor
Obtains the tensor’s unique identification key on the server. This attribute is immutable once the object is constructed.
5). shape Attribute of HTensor
Obtains the shape of the tensor. Manual modifications are prohibited, as it is maintained automatically by the tool.
output_config Parameter of HbmRpcSession.__call__
This parameter is used to configure the transmission behavior of the output tensor after the current inference frame ends. Its type is Dict[str, Dict[str, Any]] , where the first-level keys are the model output names, and the second-level keys must include "device" or "key" (optional). The meanings and constraints of the values corresponding to "device" or "key" are consistent with the device or key parameters in the HTensor constructor.
When a certain output name of the model is correctly configured in output_config , the corresponding output tensor in the inference results returned for the current frame will be of type HTensor. Outputs that are not configured will be returned as regular types ( numpy.ndarray or torch.Tensor ).
Periodic Input Update
In this example, it is assumed that the model has an input named img , which is updated every 10 frames during inference. On the first frame after the img update, the tool will transfer it to the board side, while no input data will be transferred in the remaining frames.
Output Filtering
In this example, the model has three outputs: output_0 , output_1 , and output_2 . Among them, output_2 is unused and filtered out, so only output_0 and output_1 are returned to the X86 side.
Model Chaining
This example assumes that the input HBM file contains two models: model0 and model1. The output of model0 named output_0 will be directly used as the input named input_0 for model1. In this process, the output of model0 does not need to be transmitted back to the X86 side, and the input of model1 does not need to be transmitted to the board side.
Comprehensive Application
The following inference pipeline covers the three scenarios of periodic input update, output filtering, and model chaining. The flowchart is as follows:
The reference code is as follows: