Error I: Cannot find the extension library(_C.so)
Solution:
Error II: RuntimeError: Cannot load custom ops. Please rebuild the horizon_plugin_pytorch
Solution: check if the local CUDA environment is OK, such as path, version, etc.
The QAT/Quantized accuracy is not as expected, there is a NAN, or the initial QAT loss is clearly anomalous with respect to float.
Solution: please refer to the section Accuracy Tuning Tool Guide.
Some frameworks (e.g., PyTorch Lightning) provide wrappers which do not support deepcopy around the native torch module.
Solution: Implement the __deepcopy__ method for the model.
Solution: it mainly happens when horizon_plugin_pytorch installs successfully but import fails, the solution is as follows:
Solution: please make sure that the local CUDA environment is working properly, e.g., the path and version are as expected.
Solution: it mainly occurs in the phase of not being able to prepare properly, which is usually caused by non-leaf tensor in the model, please configure the inplace of prepare to True.
Solution: probably caused by a multi-threaded, python program that wasn't completely killed.
Solution: this error occurs mainly during the insertion of pseudo-quantized nodes and is caused by the input scale of the operator being None. The reason may be that the output layer conv is inserted into dequant and then connected to an op, which is similar to the structure of conv+dequant+conv; or the conv configured with high accuracy output is connected to other operators. In this case, please check whether the dequant operator or the high accuracy output configuration is used correctly.
Solution: this error is caused by using dynamic control flow such as if, loop, etc. in fx mode. Currently, fx mode only supports static control flow, so you need to avoid using dynamic statements such as if, for, assert, etc. in forward.
Solution: this error may occur in the Calibration phase in fx mode because fx mode does not support calculations of the form (-x), please change (-x) to (-1)*(x).
Solution: this error may occur in the Calibration phase in fx mode because the logic of operator substitution in fx mode is that if the subtracted number in the subtraction is a constant, the operator substitution is not performed automatically, so you need to change the subtraction to addition, e.g., change (1-x) to (x+(-1))*(-1).