The floating-point models that meet quantized awareness training requirements.
The process of obtaining quantitative parameters using calibration data.
Pseudo-quantized model obtained after Calibration.
Training for quantized awareness. The QAT is to quantize the trained model before training it again. Since the fixed-point values cannot be used for backward gradient calculation, the actual procedure is to insert fake quantization nodes in front of some operators to obtain the truncated values of the data flowing through the op during the training, so that they can be easily used when quantizing the nodes during the deployment of the quantization models. We need to continuously optimize the accuracy during training to obtain the best quantization parameters. Since the model training is involved, it requires the developers to have higher levels of technical skills.
Pseudo-quantized models obtained after quantized awareness training.
The process of first quantizing and then dequantizing floating-point data which is generally implemented in network models through pseudo-quantized nodes.
Models with pseudo-quantized nodes which are typically obtained by Calibration or QAT.
Convert the floating-point parameters in a pseudo-quantized model to fixed-point parameters through parameter transformations, and convert the floating-point operators to fixed-point operators, the transformed model is called a Quantized model or fixed-point model or quantized model.
Models exported for deployment, typically exported from a QAT model, can be used for accuracy simulation and compilation on boards.
Name of the BPU architecture.
Name of the processor.
| Processor | J6B Lite | J6B | J6B Plus | J6E | J6M | J6H | J6P |
|---|---|---|---|---|---|---|---|
| BPU | Nash-b | Nash-b | Nash-b | Nash-e | Nash-m | Nash-h | Nash-p |
| march string | nash-b-lite | nash-b | nash-b-plus | nash-e | nash-m | nash-h | nash-p |