Please use the hbm_infer.hbm_rpc_session_flexible module for Flexible Mode.
Global Method: init_server
Constructs an HbmRpcServer object.
| PARAMETER | DESCRIPTIONS |
|---|---|
host | IP address of the development board. |
username | Board-side username. |
password | Login password for the development board. |
ssh_port | SSH destination port. |
remote_root | Root directory for temporary files on the board. |
An instance of the HbmRpcServer object.
Global Method: deinit_server
Clean up the server files on the board.
| PARAMETER | DESCRIPTIONS |
|---|---|
hbm_rpc_server | Instance of the HbmRpcServer object. |
It is necessary to explicitly call the deinit_server interface to ensure that board-side storage resources are properly released.
Global Method: init_hbm
Constructs an HbmHandle object.
| PARAMETER | DESCRIPTIONS |
|---|---|
local_hbm_path | Local path to the HBM file. |
hbm_rpc_server | Instance of the HbmRpcServer object. |
An instance of the HbmHandle object.
Global Method: deinit_hbm
Clean up the board-side HBM files.
| PARAMETER | DESCRIPTIONS |
|---|---|
hbm_handle | Instance of the HbmHandle object. |
It is necessary to explicitly call the deinit_hbm interface to ensure that board-side storage resources are properly released.
HbmRpcSession Member Method: __init__
Initializes an HbmRpcSession object.
| PARAMETER | DESCRIPTIONS |
|---|---|
hbm_handle | Instance of the HbmHandle object. |
hbm_rpc_server | Instance of the HbmRpcServer object. |
frame_timeout | Per-frame timeout for gRPC communication in seconds. |
server_timeout | Server timeout in minutes. Server auto-terminates and cleans non-log files after timeout. |
with_profile | Whether to enable time statistics for each stage of inference. The default value is False . |
debug | Enable debug mode retains more logs. |
compress_option | Enable the gRPC compression feature. Optional values are "IN" , "INOUT" , and "NONE" , which indicate enabling compression for request data frames , enabling compression for both request and response data frames , and disabling compression , respectively. |
core_id | Specifies BPU core IDs for inference: 0 for CORE_0, 1 for CORE_1, ..., -1 for CORE_ANY (default). Multiple cores can be listed. |
remote_environment | Configure environment variables on the board. This is a dictionary where the keys are environment variable names and the values are their corresponding values. The default is an empty dictionary. |
The compression feature is processed by software, so enabling it usually leads to increased inference latency. The optimization of the compression function primarily focuses on reducing network load and improving throughput. The compression quality depends on the internal correlation within the input and output data. It is generally not recommended to enable compression for floating-point input/output, but it may be worth enabling for image inputs or segmentation outputs.
HbmRpcSession Member Method: get_model_names
Get the list of model names in the current session.
List of model names.
HbmRpcSession Member Method: get_input_info
Get model input information.
| PARAMETER | DESCRIPTIONS |
|---|---|
model_name | For multi-model sessions, model_name must be specified. |
A dictionary describing the model input information. For specific format details, refer to the example below:
HbmRpcSession Member Method: get_output_info
Get model output information.
| PARAMETER | DESCRIPTIONS |
|---|---|
model_name | For multi-model sessions, model_name must be specified. |
Returns a dictionary describing the model output information , with a format consistent with the return value of get_input_info .
HbmRpcSession Member Method: show_input_output_info
Print model input and output information.
| PARAMETER | DESCRIPTIONS |
|---|---|
model_name | For multi-model sessions, model_name must be specified. |
HbmRpcSession Member Method: __call__
Perform model inference.
| PARAMETER | DESCRIPTIONS |
|---|---|
data | Model input, in dictionary format. The key is the input tensor name, and the value is the input tensor. Three formats are supported: torch.Tensor , numpy.ndarray , and HTensor . Note:
|
output_config | See the Transmission Optimization section for more details. |
model_name | For multi-model sessions, model_name must be specified. |
Model output, of dictionary type. The key is the name of the output tensor, and the value is the output tensor, which has the same type as the model input.
HbmRpcSession Member Method: close_server
Shut down the server and clean up server-side resources.
It is necessary to explicitly call the close_server interface to ensure that board-side processes, storage, and other resources are properly released.
HbmRpcSession Member Method: get_profile
To obtain the time statistics for each stage of inference, the with_profile parameter must be set to True .
| PARAMETER | DESCRIPTIONS |
|---|---|
model_name | For multi-model sessions, model_name must be specified. |
Timing statistics for each inference stage, in dictionary format. The reference format is as follows:
HbmRpcSession Member Method: get_profile_last_frame
To obtain the time statistics for each stage of the most recent frame inference, the with_profile parameter must be set to True .
| PARAMETER | DESCRIPTIONS |
|---|---|
model_name | For multi-model sessions, model_name must be specified. |
Timing statistics for each inference stage of the most recent frame, in dictionary format. The reference format is as follows: