Flexible Mode

Please use the hbm_infer.hbm_rpc_session_flexible module for Flexible Mode.

API Parameter Descriptions

  1. Global Method: init_server

    def init_server( host: str, username: str = "root", password: Optional[str] = None, ssh_port: int = 22, remote_root: str = "/map/hbm_infer/", ) -> HbmRpcServer:

    Constructs an HbmRpcServer object.

    • Parameters
    PARAMETERDESCRIPTIONS
    hostIP address of the development board.
    usernameBoard-side username.
    passwordLogin password for the development board.
    ssh_portSSH destination port.
    remote_rootRoot directory for temporary files on the board.
    • Returns

    An instance of the HbmRpcServer object.

  2. Global Method: deinit_server

    def deinit_server(hbm_rpc_server: HbmRpcServer) -> None:

    Clean up the server files on the board.

    • Parameters
    PARAMETERDESCRIPTIONS
    hbm_rpc_serverInstance of the HbmRpcServer object.
Attention

It is necessary to explicitly call the deinit_server interface to ensure that board-side storage resources are properly released.

  1. Global Method: init_hbm

    def init_hbm( local_hbm_path: Union[str, List[str]], hbm_rpc_server: HbmRpcServer, ) -> HbmHandle:

    Constructs an HbmHandle object.

    • Parameters
    PARAMETERDESCRIPTIONS
    local_hbm_pathLocal path to the HBM file.
    hbm_rpc_serverInstance of the HbmRpcServer object.
    • Returns

    An instance of the HbmHandle object.

  2. Global Method: deinit_hbm

    def deinit_hbm(hbm_handle: HbmHandle) -> None:

    Clean up the board-side HBM files.

    • Parameters
    PARAMETERDESCRIPTIONS
    hbm_handleInstance of the HbmHandle object.
Attention

It is necessary to explicitly call the deinit_hbm interface to ensure that board-side storage resources are properly released.

  1. HbmRpcSession Member Method: __init__

    def __init__( hbm_handle: HbmHandle, hbm_rpc_server: HbmRpcServer, frame_timeout: int = 90, server_timeout: int = 5, with_profile: bool = False, debug: bool = False, compress_option: str = "NONE", core_id: Union[int, List[int]] = -1, remote_environment: Dict[str, Any] = {}, ) -> None:

    Initializes an HbmRpcSession object.

    • Parameters
    PARAMETERDESCRIPTIONS
    hbm_handleInstance of the HbmHandle object.
    hbm_rpc_serverInstance of the HbmRpcServer object.
    frame_timeoutPer-frame timeout for gRPC communication in seconds.
    server_timeoutServer timeout in minutes. Server auto-terminates and cleans non-log files after timeout.
    with_profileWhether to enable time statistics for each stage of inference. The default value is False .
    debugEnable debug mode retains more logs.
    compress_optionEnable the gRPC compression feature. Optional values are "IN" , "INOUT" , and "NONE" , which indicate enabling compression for request data frames , enabling compression for both request and response data frames , and disabling compression , respectively.
    core_idSpecifies BPU core IDs for inference: 0 for CORE_0, 1 for CORE_1, ..., -1 for CORE_ANY (default). Multiple cores can be listed.
    remote_environmentConfigure environment variables on the board. This is a dictionary where the keys are environment variable names and the values are their corresponding values. The default is an empty dictionary.
Note

The compression feature is processed by software, so enabling it usually leads to increased inference latency. The optimization of the compression function primarily focuses on reducing network load and improving throughput. The compression quality depends on the internal correlation within the input and output data. It is generally not recommended to enable compression for floating-point input/output, but it may be worth enabling for image inputs or segmentation outputs.

  1. HbmRpcSession Member Method: get_model_names

    def get_model_names(self) -> List[str]:

    Get the list of model names in the current session.

    • Returns

    List of model names.

  2. HbmRpcSession Member Method: get_input_info

    def get_input_info(self, model_name: Optional[str] = None) -> Dict[str, Dict]:

    Get model input information.

    • Parameters
    PARAMETERDESCRIPTIONS
    model_nameFor multi-model sessions, model_name must be specified.
    • Returns

    A dictionary describing the model input information. For specific format details, refer to the example below:

    { "input_name0": { "valid_shape": [1, 3, 224, 224], "tensor_type": "DATA_TYPE_S8", "quanti_type": "QUANTI_TYPE_SCALE", "quantizeAxis": 0, "scale_data": [0.006861070170998573], "zero_point_data": [0] }, ... }
  3. HbmRpcSession Member Method: get_output_info

    def get_output_info(self, model_name: Optional[str] = None) -> Dict[str, Dict]:

    Get model output information.

    • Parameters
    PARAMETERDESCRIPTIONS
    model_nameFor multi-model sessions, model_name must be specified.
    • Returns

    Returns a dictionary describing the model output information , with a format consistent with the return value of get_input_info .

  4. HbmRpcSession Member Method: show_input_output_info

    def show_input_output_info(self, model_name: Optional[str] = None) -> None:

    Print model input and output information.

    • Parameters
    PARAMETERDESCRIPTIONS
    model_nameFor multi-model sessions, model_name must be specified.
  5. HbmRpcSession Member Method: __call__

    def __call__( self, data: Dict[str, Union[np.ndarray, torch.Tensor, HTensor]], output_config: Optional[Dict[str, Dict]] = None, model_name: Optional[str] = None, ) -> Dict[str, Union[np.ndarray, torch.Tensor, HTensor]]:

    Perform model inference.

    • Parameters
    PARAMETERDESCRIPTIONS
    dataModel input, in dictionary format. The key is the input tensor name, and the value is the input tensor. Three formats are supported: torch.Tensor , numpy.ndarray , and HTensor .
    Note:
    • The input data must match the model’s input specifications, including names, number of inputs, shapes, and data types.
    • torch.Tensor and numpy.ndarray cannot be mixed in a single input.
    • When using torch.Tensor , all tensors must be on the same device.
    output_configSee the Transmission Optimization section for more details.
    model_nameFor multi-model sessions, model_name must be specified.
    • Returns

    Model output, of dictionary type. The key is the name of the output tensor, and the value is the output tensor, which has the same type as the model input.

  6. HbmRpcSession Member Method: close_server

    def close_server(self) -> None:

    Shut down the server and clean up server-side resources.

Attention

It is necessary to explicitly call the close_server interface to ensure that board-side processes, storage, and other resources are properly released.

  1. HbmRpcSession Member Method: get_profile

    def get_profile(self, model_name: Optional[str] = None) -> Dict[str, Dict]:

    To obtain the time statistics for each stage of inference, the with_profile parameter must be set to True .

    • Parameters
    PARAMETERDESCRIPTIONS
    model_nameFor multi-model sessions, model_name must be specified.
    • Returns

    Timing statistics for each inference stage, in dictionary format. The reference format is as follows:

    { // Total frame latency (ms) "frame_duration": { "avg": 6, "min": 6, "max": 6, }, // Latency from sending a gRPC request to receiving a response (ms) "sd2rv_duration": { "avg": 5, "min": 5, "max": 5, }, // Network communication latency (ms) "commu_duration": { "avg": 4, "min": 4, "max": 4, }, // Board-side total latency (ms) "board_duration": { "avg": 1, "min": 1, "max": 1, }, // Board-side infer latency (ms) "infer_duration": { "avg": 0.5, "min": 0.5, "max": 0.5, }, // Board-side preprocess latency (ms) "prepr_duration": { "avg": 0.3, "min": 0.3, "max": 0.3, }, // Board-side postprocess latency (ms) "pospr_duration": { "avg": 0.2, "min": 0.2, "max": 0.2, }, }
  2. HbmRpcSession Member Method: get_profile_last_frame

    def get_profile_last_frame(self, model_name: Optional[str] = None) -> Dict[str, Dict]:

    To obtain the time statistics for each stage of the most recent frame inference, the with_profile parameter must be set to True .

    • Parameters
    PARAMETERDESCRIPTIONS
    model_nameFor multi-model sessions, model_name must be specified.
    • Returns

    Timing statistics for each inference stage of the most recent frame, in dictionary format. The reference format is as follows:

    { // Total frame latency (ms) "frame_duration": 12, // Latency from sending a gRPC request to receiving a response (ms) "sd2rv_duration": 10, // Network communication latency (ms) "commu_duration": 6, // Board-side total latency (ms) "board_duration": 4, // Board-side infer latency (ms) "infer_duration": 2, // Board-side preprocess latency (ms) "prepr_duration": 0.5, // Board-side postprocess latency (ms) "pospr_duration": 0.5, }

Usage Example

import torch import multiprocessing as mp from hbm_infer.hbm_rpc_session_flexible import HbmRpcSession, init_server, deinit_server, init_hbm, deinit_hbm def single_session_entry(rpc_server, hbm_handle, run_epoch): # Create session sess = HbmRpcSession( hbm_rpc_server=rpc_server, hbm_handle=hbm_handle ) # Print model input/output information sess.show_input_output_info() # Prepare input data input_data = { 'img': torch.ones((1, 3, 224, 224), dtype=torch.int8) } # Execute inference and return results for i in range(run_epoch): output_data = sess(input_data) print([output_data[k].shape for k in output_data]) # Close server sess.close_server() def run_hbm_infer(num_process=8, run_epoch=20): # Initialize server rpc_server = init_server( host=<available_ip> ) # Load HBM file hbm_handle = init_hbm( hbm_rpc_server=rpc_server, local_hbm_path=<local_hbm_path> ) # Multi-process inference processes = list() for i in range(num_process): p = mp.Process(target=single_session_entry, args=(rpc_server, hbm_handle, run_epoch)) processes.append(p) p.start() for p in processes: p.join() # Clean up board-side server files deinit_server(rpc_server) # Clean up board-side HBM files deinit_hbm(hbm_handle) if __name__ == "__main__": run_hbm_infer()