The hbm_infer Tool Introduction

Tool Overview

The hbm_infer is a validation tool with a X86 + board-side co-communication mode. It performs HBM model pre/post-processing on the X86 side using Python code, while the actual inference is executed by a board-side server. This design enhances the efficiency of model accuracy evaluation and reduces development costs. The workflow of the tool is summarized as follows:

hbm_infer_process

The hbm_infer tool employs the gRPC (Google Remote Procedure Call) framework to support task dispatching and responses between client and server processes in multi-process environments. In terms of communication architecture, each server independently corresponds to one client, forming a multi-client, multi-server pattern. The communication workflow is outlined below:

hbm_infer_communicate hbm_infer_communicate_detail
Attention
  • It is recommended to establish a direct connection between the X86 platform and the development board via a local area network (LAN) to avoid potential network communication failures and other related issues.

Usage Method

The hbm_infer tool supports two modes: Standard Mode and Flexible Mode.

  • Standard Mode:This mode automatically uploads server and HBM files to the development board and removes them post-inference, requiring no user intervention. However, since HBM and server files cannot be reused, session initialization incurs significant bandwidth consumption on the board side.

  • Flexible Mode:This mode requires you to manually manage the upload of Server files and HBM files, as well as the shutdown of the Server. This mode allows you to reduce bandwidth usage during file transfers by reusing existing board-side files.

Standard Mode is recommended for single-process usage. FlexibleMode is recommended for multi-process usage.

If there are performance requirements, you can refer to the Transmission Optimization section for targeted improvements.