Standard Mode

Please use the hbm_infer.hbm_rpc_session module for Standard Mode .

API Parameter Descriptions

  1. HbmRpcSession Member Method: __init__

    def __init__( self, host: str, local_hbm_path: Union[str, List[str]], username: str = "root", password: Optional[str] = None, ssh_port: int = 22, remote_root: str = "/map/hbm_infer/", frame_timeout: int = 90, server_timeout: int = 5, with_profile: bool = False, debug: bool = False, compress_option: str = "NONE", core_id: Union[int, List[int]] = -1, remote_environment: Dict[str, Any] = {}, ) -> None:

    Initializes an HbmRpcSession object.

    • Parameters
    PARAMETERDESCRIPTIONS
    hostIP address of the development board.
    local_hbm_pathLocal path to the HBM file.
    usernameBoard-side username.
    passwordLogin password for the development board.
    ssh_portSSH destination port.
    remote_rootRoot directory for temporary files on the board.
    frame_timeoutPer-frame timeout for gRPC communication in seconds.
    server_timeoutServer timeout in minutes. Server auto-terminates and cleans non-log files after timeout.
    with_profileWhether to enable time statistics for each stage of inference. The default value is False .
    debugEnable debug mode retains more logs.
    compress_optionEnable the gRPC compression feature. Optional values are "IN" , "INOUT" , and "NONE" , which indicate enabling compression for request data frames , enabling compression for both request and response data frames , and disabling compression , respectively.
    core_idSpecifies BPU core IDs for inference: 0 for CORE_0, 1 for CORE_1, ..., -1 for CORE_ANY (default). Multiple cores can be listed.
    remote_environmentConfigure environment variables on the board. This is a dictionary where the keys are environment variable names and the values are their corresponding values. The default is an empty dictionary.
Note

The compression feature is processed by software, so enabling it usually leads to increased inference latency. The optimization of the compression function primarily focuses on reducing network load and improving throughput. The compression quality depends on the internal correlation within the input and output data. It is generally not recommended to enable compression for floating-point input/output, but it may be worth enabling for image inputs or segmentation outputs.

  1. HbmRpcSession Member Method: get_model_names

    def get_model_names(self) -> List[str]:

    Get the list of model names in the current session.

    • Returns

    List of model names.

  2. HbmRpcSession Member Method: get_input_info

    def get_input_info(self, model_name: Optional[str] = None) -> Dict[str, Dict]:

    Get model input information.

    • Parameters
    PARAMETERDESCRIPTIONS
    model_nameFor multi-model sessions, model_name must be specified.
    • Returns

    A dictionary describing the model input information. For specific format details, refer to the example below:

    { "input_name0": { "valid_shape": [1, 3, 224, 224], "tensor_type": "DATA_TYPE_S8", "quanti_type": "QUANTI_TYPE_SCALE", "quantizeAxis": 0, "scale_data": [0.006861070170998573], "zero_point_data": [0] }, ... }
  3. HbmRpcSession Member Method: get_output_info

    def get_output_info(self, model_name: Optional[str] = None) -> Dict[str, Dict]:

    Get model output information.

    • Parameters
    PARAMETERDESCRIPTIONS
    model_nameFor multi-model sessions, model_name must be specified.
    • Returns

    Returns a dictionary describing the model output information , with a format consistent with the return value of get_input_info .

  4. HbmRpcSession Member Method: show_input_output_info

    def show_input_output_info(self, model_name: Optional[str] = None) -> None:

    Print model input and output information.

    • Parameters
    PARAMETERDESCRIPTIONS
    model_nameFor multi-model sessions, model_name must be specified.
  5. HbmRpcSession Member Method: __call__

    def __call__( self, data: Dict[str, Union[np.ndarray, torch.Tensor, HTensor]], output_config: Optional[Dict[str, Dict]] = None, model_name: Optional[str] = None, ) -> Dict[str, Union[np.ndarray, torch.Tensor, HTensor]]:

    Perform model inference.

    • Parameters
    PARAMETERDESCRIPTIONS
    dataModel input, in dictionary format. The key is the input tensor name, and the value is the input tensor. Three formats are supported: torch.Tensor , numpy.ndarray , and HTensor .
    Note:
    • The input data must match the model’s input specifications, including names, number of inputs, shapes, and data types.
    • torch.Tensor and numpy.ndarray cannot be mixed in a single input.
    • When using torch.Tensor , all tensors must be on the same device.
    output_configSee the Transmission Optimization section for more details.
    model_nameFor multi-model sessions, model_name must be specified.
    • Returns

    Model output, of dictionary type. The key is the name of the output tensor, and the value is the output tensor, which has the same type as the model input.

  6. HbmRpcSession Member Method: close_server

    def close_server(self) -> None:

    Shut down the server and clean up server-side resources.

Attention

It is necessary to explicitly call the close_server interface to ensure that board-side processes, storage, and other resources are properly released.

  1. HbmRpcSession Member Method: get_profile

    def get_profile(self, model_name: Optional[str] = None) -> Dict[str, Dict]:

    To obtain the time statistics for each stage of inference, the with_profile parameter must be set to True .

    • Parameters
    PARAMETERDESCRIPTIONS
    model_nameFor multi-model sessions, model_name must be specified.
    • Returns

    Timing statistics for each inference stage, in dictionary format. The reference format is as follows:

    { // Total frame latency (ms) "frame_duration": { "avg": 6, "min": 6, "max": 6, }, // Latency from sending a gRPC request to receiving a response (ms) "sd2rv_duration": { "avg": 5, "min": 5, "max": 5, }, // Network communication latency (ms) "commu_duration": { "avg": 4, "min": 4, "max": 4, }, // Board-side total latency (ms) "board_duration": { "avg": 1, "min": 1, "max": 1, }, // Board-side infer latency (ms) "infer_duration": { "avg": 0.5, "min": 0.5, "max": 0.5, }, // Board-side preprocess latency (ms) "prepr_duration": { "avg": 0.3, "min": 0.3, "max": 0.3, }, // Board-side postprocess latency (ms) "pospr_duration": { "avg": 0.2, "min": 0.2, "max": 0.2, }, }
  2. HbmRpcSession Member Method: get_profile_last_frame

    def get_profile_last_frame(self, model_name: Optional[str] = None) -> Dict[str, Dict]:

    To obtain the time statistics for each stage of the most recent frame inference, the with_profile parameter must be set to True .

    • Parameters
    PARAMETERDESCRIPTIONS
    model_nameFor multi-model sessions, model_name must be specified.
    • Returns

    Timing statistics for each inference stage of the most recent frame, in dictionary format. The reference format is as follows:

    { // Total frame latency (ms) "frame_duration": 12, // Latency from sending a gRPC request to receiving a response (ms) "sd2rv_duration": 10, // Network communication latency (ms) "commu_duration": 6, // Board-side total latency (ms) "board_duration": 4, // Board-side infer latency (ms) "infer_duration": 2, // Board-side preprocess latency (ms) "prepr_duration": 0.5, // Board-side postprocess latency (ms) "pospr_duration": 0.5, }

Usage Example

import time import torch from hbm_infer.hbm_rpc_session import HbmRpcSession def run_hbm_infer(run_epoch=10): # Create session sess = HbmRpcSession( host=<available_ip>, local_hbm_path=<local_hbm_path> ) # Print model input/output information sess.show_input_output_info() # Prepare input data input_data = { 'img': torch.ones((1, 3, 224, 224), dtype=torch.int8) } # Execute inference and return results for i in range(run_epoch): output_data = sess(input_data) print([output_data[k].shape for k in output_data]) # Close server sess.close_server() if __name__ == '__main__': run_hbm_infer()