Prepare the Floating Point Model

The S100 toolchain supports you to quantize your model by means of PTQ or QAT. The two paths have different input requirements for floating-point model, as shown below:

  • PTQ: It supports for common DL frameworks including Caffe, Pytorch, TensorFlow, PaddlePaddle, etc., of which the Caffe framework is directly supported, and other frameworks need to be converted to ONNX model for support, and ONNX model need to satisfy the version requirements of opset10-19. A detailed introduction of PTQ links can be found in section Post-training Quantization(PTQ) .
Attention
  • For Caffe model, you will need to complete a evaluation of the model's floating-point accuracy to ensure that the model's weights and structure are correct.

  • For ONNX model, you need to first perform inference using the HBRuntime to verify that the ONNX model and the original DL framework model inference results are consistent (i.e., verify model legitimacy).

  • QAT: We provide a plugin that can directly convert and compile the exported torch module from Pytorch. For the exported torch module, you can first complete the evaluation of the model's floating-point accuracy to make sure that the model's weights and structure are correct. For a detailed description of the QAT path, please refer to section Quantized Awareness Training(QAT).