FAQ

Training Environment

Docker container can't use Nvidia resources?

Solution: It is recommended to refer to the description in the GPU Docker section of the Environment Deployment section of the documentation.

QAT Quantized Training

Does poor Calibration accuracy indicate that the QAT accuracy is likely to be low as well?

Assuming the randomness of training is not considered, for the same model, calibration accuracy and QAT accuracy are positively correlated. The degree of correlation between them varies depending on the model, and there is no clear metric for this.

How to determine when to stop the Calibration phase and start QAT training?

Adjusting calibration parameters alone only has small effect on accuracy. It is recommended to perform QAT after calibration:

  1. If the calibration accuracy is almost at the target, adjusting the calibration parameters is enough.

  2. If QAT accuracy is higher than calibration accuracy but still far from the target, stay in calibration phase to debug the accuracy, fix issues that are not friendly for quantization, and proceed with QAT only when the calibration accuracy shows a significant improvement.

  3. If QAT accuracy is lower than calibration accuracy, check potential issues in the QAT pipeline and training strategy firstly.

Are the behaviors of QAT training and float training consistent?

The behaviors of QAT training, such as loss and accuracy, should be consistent with floating-point training. If inconsistencies arise, it is recommended to check for the issues of QAT pipeline firstly.

Quantized Accuracy Anomaly

Solution: The QAT/Quantized accuracy is not as expected, there is a NAN, or the initial QAT loss is clearly anomalous with respect to float. Please refer to the section Accuracy Tuning Tool Guide.

Why does accuracy performance deteriorate after turning on multi-machine training?

The batchsize increases exponentially when multi-computer training is turned on, and the LR and other hyperparameters need to be adjusted synchronously to balance the batchsize.

Does Qconfig require user intervention?

Horizon provides Qconfig which defines how activation and weight are quantized, and currently supports quantization algorithms such as FakeQuantize, LSQ, and PACT.

How to export ONNX model for each phase?

Please refer to the following code implementation:

from horizon_plugin_pytorch.utils import onnx_helper as horizon_onnx_helper # [Optional] Export float 、qat net to ONNX # -------------------------------------------------------------------- logging.info("Export qat model to ONNX...") data = torch.rand((1, 3, 228, 228), device=device) horizon_onnx_helper.export_to_onnx(qat_net, data, "resnet_qat.onnx") # [Optional] Export quantized_model to ONNX # -------------------------------------------------------------------- horizon_onnx_helper.export_quantized_onnx( quantized_model, data, "resnet_quantized.onnx" )

Why does nan exist during QAT training?

There are more factors affecting the problem and it is recommended to check the following aspects:

  1. Checks whether the input data contains nan.

  2. Check that the floating-point model has converged. Floating-point models that do not converge may cause large fluctuations for certain minimization errors.

  3. Check whether calib is turned on, it is recommended to turn it on to give better initial coefficients to the model.

  4. Check whether the training strategy is appropriate. Unsuitable training strategy can also lead to NAN values, such as learning rate lr is too large (either by lowering the learning rate or using gradient truncation). In terms of training strategy, the default QAT is consistent with floating-point, and it is recommended to replace it with SGD if the floating-point training uses optimizers such as OneCycle that affect the LR settings.

Configure the int16 node or high-precision output node to be invalid

Solution: This may be due to a misconfiguration of module_name, which only supports string and does not support configuration by index.

How to check whether high-accuracy output is turned on for a particular layer?

You can print the layer where the qat_model is located to see if it has (activation_post_process): FakeQuantize, and if it doesn't, then it is a high-accuracy output. For example, the int32 high-accuracy conv prints as follows:

(1): ConvModule2d( (0): Conv2d( 64, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1) (weight_fake_quant): FakeQuantize( fake_quant_enabled=tensor([1], dtype=torch.uint8), observer_enabled=tensor([1], dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=qint8, qscheme=torch.per_channel_symmetric, ch_axis=0, scale=tensor([1., 1., 1.]), zero_point=tensor([0, 0, 0]) (activation_post_process): MovingAveragePerChannelMinMaxObserver(min_val=tensor([]), max_val=tensor([])) ) ) )

The int8 low-accuracy conv prints as follows:

(0): ConvModule2d( (0): ConvReLU2d( 64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1) (weight_fake_quant): FakeQuantize( fake_quant_enabled=tensor([1], dtype=torch.uint8), observer_enabled=tensor([1], dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=qint8, qscheme=torch.per_channel_symmetric, ch_axis=0, scale=tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]), zero_point=tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) (activation_post_process): MovingAveragePerChannelMinMaxObserver(min_val=tensor([]), max_val=tensor([])) ) (activation_post_process): FakeQuantize( fake_quant_enabled=tensor([1], dtype=torch.uint8), observer_enabled=tensor([1], dtype=torch.uint8), quant_min=-128, quant_max=127, dtype=qint8, qscheme=torch.per_tensor_symmetric, ch_axis=-1, scale=tensor([1.]), zero_point=tensor([0]) (activation_post_process): MovingAverageMinMaxObserver(min_val=tensor([]), max_val=tensor([])) ) ) )

Whether auxiliary branches can insert pseudo-quantized nodes?

It is recommended that pseudo-quantization nodes be inserted only for the portion of the model that is deployed on the board. Since QAT training is a global training, the existence of auxiliary branches will lead to an increase in the difficulty of training, and if the data distribution at the auxiliary branches is different from other branches, it will also increase the risk of accuracy, so it is recommended to remove them.

How to rewrite the horizon gridsample operator into a torch public implementation?

The non-public implementation of the gridsample operator in horizon_plugin_pytorch has grid inputs (input 2) that are absolute coordinates of type int16, while the public version of the torch is normalized coordinates of type float32 in the range [-1, 1].

Therefore, after importing the grid_sample operator from the torch.nn.functional.grid_sample path, the grid can be normalized in the following way:

def norm_grid(x, grid): n = grid.size(0) h = grid.size(1) w = grid.size(2) base_coord_y = ( torch.arange(h, dtype=grid.dtype, device=grid.device) .unsqueeze(-1) .unsqueeze(0) .expand(n, h, w) ) base_coord_x = ( torch.arange(w, dtype=grid.dtype, device=grid.device) .unsqueeze(0) .unsqueeze(0) .expand(n, h, w) ) absolute_grid_x = grid[:, :, :, 0] + base_coord_x absolute_grid_y = grid[:, :, :, 1] + base_coord_y norm_grid_x = absolute_grid_x * 2 / (x.size(3) - 1) - 1 norm_grid_y = absolute_grid_y * 2 / (x.size(2) - 1) - 1 norm_grid = torch.stack((norm_grid_x, norm_grid_y), dim=-1) return norm_grid

Why are the weight parameters not updated for QAT training after load calibration?

It can be checked in turn as follows:

  1. Whether prepare is before the optimizer definition. Because prepare performs operator fusion, resulting in a change in model structure.

  2. Whether fake_quant_enabled and observe_enabled are 1.

  3. Whether the training variable in the module is True.

Setting Error

Modules that do not need to be quantized set a non-None qconfig, e.g., pre/post-processing, loss function, etc.

Solution: set qconfig only for modules that need to be quantized.

The march is not set correctly, which may result in model compilation failures or inconsistent deployment accuracy.

Solution: select the correct BPU architecture based on the processor to be deployed, e.g. S100 requires Nash:

horizon.march.set_march("nash-e")

The model output node is not set to high accuracy output, resulting in quantization accuracy that is not as expected.

An error example is shown below:

Assume that the model is defined as follows:

class ToyNet(nn.Module): def __init__(self): self.conv0 = nn.Conv2d(4,4,3,3) self.relu0 = nn.ReLU() self.classifier = nn.Conv2d(4,4,3,3) def forward(self, x): out = self.conv0(x) out = self.relu(out) out = self.classifier(out) return out # example of setting qconfig incorrectly: float_model = ToyNet() # set the whole network to int8 quantization float_model.qconfig = default_qat_8bit_fake_quant_qconfig qat_model = prepare(float_model, example_input)

Solution: in order to improve the model accuracy, set the model output node to high accuracy, as shown in the example below:

qat_model = horizon.quantization.prepare( float_model, example_input, # Use default template to automatically enable high precision output. qconfig_setter = horizon.quantization.qconfig_template.default_qat_qconfig_setter, )

Method Error

The Calibration process uses multi-cards.

Due to underlying limitations, currently Calibration does not support multi-cards, please use a single card for Calibration

The model input image data is in a non-centered YUV444 format such as RGB, which may result in inconsistent model deployment accuracy.

Since the image format supported by Horizon hardware is centered YUV444, it is recommended that you use the YUV444 format directly as the network input from the beginning of model training.

Use the qat model for model accuracy evaluation and monitoring in quantized awareness training, which leads to the problem of failing to detect the abnormal accuracy at the time of deployment in a timely manner.

The reason for the error between QAT and Quantized is that the QAT stage cannot fully simulate the pure fixed-point computation logic in Quantized, so it is recommended to use the quantized model for model accuracy evaluation and monitoring.

quantized_hbir_model = hbdk4.compiler.convert(qat_hbir_model) acc = evaluate(quantized_hbir_model, eval_data_loader)

Network Error

Call the same member defined byFloatFunctional() multiple times.

The error example is as follows:

class ToyNet(nn.Module): def __init__(self): self.add = FloatFunctional() def forward(self, x, y, z) out = self.add(x, y) return self.add(out, z)

Solution: prohibit calling the same variable defined by FloatFunctional() multiple times in forward.

class ToyNet(nn.Module): def __init__(self): self.add0 = FloatFunctional() self.add1 = FloatFunctional() def forward(self, x, y, z) out = self.add0.add(x, y) return self.add1.add(out, z)

Operator Error

Some of the operators in the Quantized model have not gone through the calibration or QAT, for example, a post-processing operator wants to be accelerated on the BPU but has not gone through the quantization stage, which will lead to the failure of quantization inference or abnormal accuracy when deployed.

The Quantized phase is not completely unable to add operators directly, such as color space conversion operators, see the document for details on how to add operators. However, not all operators can be added directly, such as cat, this kind of operator must be obtained in the calibration or QAT phase of the statistics of the real quantization parameters in order not to affect the final accuracy, if you have a similar need to adjust the structure of the network, you can consult with the framework developers.

Model Error

Floating-point model overfitting.

Common determination of model overfitting:

  • Large changes in the output after a slight transformation of the input data.

  • The model parameters are assigned large values.

  • The model activation is large.

Solution: Solve the floating-point model overfitting problem on your own.

Model Export

Why does model export cause accuracy drop?

During the QAT training phase, lookup table operations are floating-point operators with pseudo-quantized nodes. They will be converted to actual lookup table operations during export. Verify whether lookup table operators cause the accuracy drop using the following code:

from horizon_plugin_pytorch.nn.qat.segment_lut import QuantizedQATSegmentLUT # Only lookup table operators are converted. QuantizedQATSegmentLUT.convert_segment_lut(self.qat_model) # Model inference result can be used to verify whether the accuracy or visualization meets expectations. qat_lut_ret = self.qat_model(self.example_input)

If the issue persists, please provide feedback to the technical support staff of Horizon Robotics.

How to adjust for inconsistencies between export and training code with minimal code intrusion?

If the model only contains deployment logic, there is no issue about inconsistencies between export and training code. Otherwise, the following methods should be used to handle the issue:

# Using the module's training variable and customized deployment flags. def forward(self, x): if self.training ... elif self.deploy: ... else: ... # Define different forward methods. This method is not recommended as it requires adapting three different forward methods, which is less maintainable. def train_forward(self, x): ... def val_forward(self, x): ... def deploy_forward(self, x): ...
On This Page