Solution: It is recommended to refer to the description in the GPU Docker section of the Environment Deployment section of the documentation.
Assuming the randomness of training is not considered, for the same model, calibration accuracy and QAT accuracy are positively correlated. The degree of correlation between them varies depending on the model, and there is no clear metric for this.
Adjusting calibration parameters alone only has small effect on accuracy. It is recommended to perform QAT after calibration:
If the calibration accuracy is almost at the target, adjusting the calibration parameters is enough.
If QAT accuracy is higher than calibration accuracy but still far from the target, stay in calibration phase to debug the accuracy, fix issues that are not friendly for quantization, and proceed with QAT only when the calibration accuracy shows a significant improvement.
If QAT accuracy is lower than calibration accuracy, check potential issues in the QAT pipeline and training strategy firstly.
The behaviors of QAT training, such as loss and accuracy, should be consistent with floating-point training. If inconsistencies arise, it is recommended to check for the issues of QAT pipeline firstly.
Solution: The QAT/Quantized accuracy is not as expected, there is a NAN, or the initial QAT loss is clearly anomalous with respect to float. Please refer to the section Accuracy Tuning Tool Guide.
The batchsize increases exponentially when multi-computer training is turned on, and the LR and other hyperparameters need to be adjusted synchronously to balance the batchsize.
Horizon provides Qconfig which defines how activation and weight are quantized, and currently supports quantization algorithms such as FakeQuantize, LSQ, and PACT.
Please refer to the following code implementation:
There are more factors affecting the problem and it is recommended to check the following aspects:
Checks whether the input data contains nan.
Check that the floating-point model has converged. Floating-point models that do not converge may cause large fluctuations for certain minimization errors.
Check whether calib is turned on, it is recommended to turn it on to give better initial coefficients to the model.
Check whether the training strategy is appropriate. Unsuitable training strategy can also lead to NAN values, such as learning rate lr is too large (either by lowering the learning rate or using gradient truncation). In terms of training strategy, the default QAT is consistent with floating-point, and it is recommended to replace it with SGD if the floating-point training uses optimizers such as OneCycle that affect the LR settings.
Solution: This may be due to a misconfiguration of module_name, which only supports string and does not support configuration by index.
You can print the layer where the qat_model is located to see if it has (activation_post_process): FakeQuantize, and if it doesn't, then it is a high-accuracy output.
For example, the int32 high-accuracy conv prints as follows:
The int8 low-accuracy conv prints as follows:
It is recommended that pseudo-quantization nodes be inserted only for the portion of the model that is deployed on the board. Since QAT training is a global training, the existence of auxiliary branches will lead to an increase in the difficulty of training, and if the data distribution at the auxiliary branches is different from other branches, it will also increase the risk of accuracy, so it is recommended to remove them.
The non-public implementation of the gridsample operator in horizon_plugin_pytorch has grid inputs (input 2) that are absolute coordinates of type int16, while the public version of the torch is normalized coordinates of type float32 in the range [-1, 1].
Therefore, after importing the grid_sample operator from the torch.nn.functional.grid_sample path, the grid can be normalized in the following way:
It can be checked in turn as follows:
Whether prepare is before the optimizer definition. Because prepare performs operator fusion, resulting in a change in model structure.
Whether fake_quant_enabled and observe_enabled are 1.
Whether the training variable in the module is True.
Solution: set qconfig only for modules that need to be quantized.
Solution: select the correct BPU architecture based on the processor to be deployed, e.g. S100 requires Nash:
An error example is shown below:
Assume that the model is defined as follows:
Solution: in order to improve the model accuracy, set the model output node to high accuracy, as shown in the example below:
Due to underlying limitations, currently Calibration does not support multi-cards, please use a single card for Calibration
Since the image format supported by Horizon hardware is centered YUV444, it is recommended that you use the YUV444 format directly as the network input from the beginning of model training.
The reason for the error between QAT and Quantized is that the QAT stage cannot fully simulate the pure fixed-point computation logic in Quantized, so it is recommended to use the quantized model for model accuracy evaluation and monitoring.
FloatFunctional() multiple times.The error example is as follows:
Solution: prohibit calling the same variable defined by FloatFunctional() multiple times in forward.
The Quantized phase is not completely unable to add operators directly, such as color space conversion operators, see the document for details on how to add operators. However, not all operators can be added directly, such as cat, this kind of operator must be obtained in the calibration or QAT phase of the statistics of the real quantization parameters in order not to affect the final accuracy, if you have a similar need to adjust the structure of the network, you can consult with the framework developers.
Common determination of model overfitting:
Large changes in the output after a slight transformation of the input data.
The model parameters are assigned large values.
The model activation is large.
Solution: Solve the floating-point model overfitting problem on your own.
During the QAT training phase, lookup table operations are floating-point operators with pseudo-quantized nodes. They will be converted to actual lookup table operations during export. Verify whether lookup table operators cause the accuracy drop using the following code:
If the issue persists, please provide feedback to the technical support staff of Horizon Robotics.
If the model only contains deployment logic, there is no issue about inconsistencies between export and training code. Otherwise, the following methods should be used to handle the issue: