The hbm_infer is a validation tool with a X86 + board-side co-communication mode.
It performs HBM model pre/post-processing on the X86 side using Python code, while the actual inference is executed by a board-side server.
This design enhances the efficiency of model accuracy evaluation and reduces development costs. The workflow of the tool is summarized as follows:
The hbm_infer tool employs the gRPC (Google Remote Procedure Call) framework to support task dispatching and responses between client and server processes in multi-process environments.
In terms of communication architecture, each server independently corresponds to one client, forming a multi-client, multi-server pattern.
The communication workflow is outlined below:
The hbm_infer tool supports two modes: Standard Mode and Flexible Mode.
Standard Mode:This mode automatically uploads server and HBM files to the development board and removes them post-inference, requiring no user intervention. However, since HBM and server files cannot be reused, session initialization incurs significant bandwidth consumption on the board side.
Flexible Mode:This mode requires you to manually manage the upload of Server files and HBM files, as well as the shutdown of the Server. This mode allows you to reduce bandwidth usage during file transfers by reusing existing board-side files.
Standard Mode is recommended for single-process usage. FlexibleMode is recommended for multi-process usage.
If there are performance requirements, you can refer to the Transmission Optimization section for targeted improvements.