J6P/H Torch算子BPU约束列表

注意

下方默认进行了如下别名替换:

import horizon_plugin_pytorch as horizon

下方表格中:

lhs:left-hand side,指运算中的左操作数。

rhs:right-hand side,指运算中的右操作数。

Torch OperatorEager Mode OperatorTorch constraint
torch.abs
torch.Tensor.abs
input:
Type: int8, int16, float16
Shape: [*]
output:
The input and output types need to be the same.
Shape: [*]
torch.acos
torch.Tensor.acos
horizon.nn.Acosinput:
Type: int8, int16, float16
output:
Type: int8, int16, float16. Only support 
(int8->int8), (int16->int8,int16), 
(float16->float16)
torch.acosh
torch.Tensor.acosh
horizon.nn.Acoshinput:
Type: int8, int16, float16
output:
Type: int8, int16, float16. Only support 
(int8->int8), (int16->int8,int16), 
(float16->float16)
torch.add
torch.Tensor.add
torch.nn.quantized.FloatFunctional OR
horizon.nn.quantized.FloatFunctional
lhs:
Type: int8, int16, int32, float16, float32
Shape: [*]
rhs:
The lhs and rhs types need to be the same.
Shape: [*]
output:
The input and output types need to be the same.
Shape: [*]
others:
When inputs are of integer or quantized type, if the result overflows, it will saturate to the maximum or minimum value of the result data type.
torch.all
torch.Tensor.all
input:
Type: bool8, int8, int16
Shape: [*]
Dim: reduce axis dim size ∈ [1, 65535]
Element : reduce Elements size ∈ [1, 65535]
output:
Type: bool8
Shape: reduce dim will be 1 or fused depend on keepDim
torch.any
torch.Tensor.any
input:
Type: bool8, int8, int16
Shape: [*]
Dim: reduce axis dim size ∈ [1, 65535]
Element : reduce Elements size ∈ [1, 65535]
output:
Type: bool8
Shape: reduce dim will be 1 or fused depend on keepDim
torch.argmax
torch.Tensor.argmax
input:
Type: int8, int16, int32, float16, float32
Shape: [*]
Dim: if type is int8, int16 or float16, reduce axis dim size ∈ [1, 32767]
Element : if type is int8, int16 or float16, reduce Elements size ∈ [1, 65535]
dims:
If type is int32, float32, only support one dim.
output:
Type: int8, int16, int32
Shape: reduce dim will be 1 or fused depend on keepDim
torch.argmin
torch.Tensor.argmin
input:
Type: int8, int16, int32, float16, float32
Shape: [*]
Dim: if type is int8, int16 or float16, reduce axis dim size ∈ [1, 32767]
Element : if type is int8, int16 or float16, reduce Elements size ∈ [1, 65535]
dims:
If type is int32, float32, only support one dim.
output:
Type: int8, int16, int32
Shape: reduce dim will be 1 or fused depend on keepDim
torch.asin
torch.Tensor.asin
horizon.nn.Asininput:
Type: int8, int16, float16
output:
Type: int8, int16, float16. Only support 
(int8->int8), (int16->int8,int16), 
(float16->float16)
torch.asinh
torch.Tensor.asinh
horizon.nn.Asinhinput:
Type: int8, int16, float16
output:
Type: int8, int16, float16. Only support 
(int8->int8), (int16->int8,int16), 
(float16->float16)
torch.atan 
torch.Tensor.atan
horizon.nn.Ataninput:
Type: int8, int16, float16
output:
Type: int8, int16, float16. Only support 
(int8->int8), (int16->int8,int16), 
(float16->float16)
torch.atanh
torch.Tensor.atanh
horizon.nn.Atanhinput:
Type: int8, int16, float16
output:
Type: int8, int16, float16. Only support 
(int8->int8), (int16->int8,int16), 
(float16->float16)
torch.bitwise_andlhs:
Type: int8, int16
Shape: [*]
rhs:
The lhs and rhs types need to be the same.
Shape: [*]
output:
The input and output types need to be the same.
Shape: [*]
torch.bitwise_notinput:
Type: int8, int16
Shape: [*]
output:
The input and output types need to be the same.
Shape: [*]
torch.bitwise_orlhs:
Type: int8, int16
Shape: [*]
rhs:
The lhs and rhs types need to be the same.
Shape: [*]
output:
The input and output types need to be the same.
Shape: [*]
torch.bitwise_xorlhs:
Type: int8, int16
Shape: [*]
rhs:
The lhs and rhs types need to be the same.
Shape: [*]
output:
The input and output types need to be the same.
Shape: [*]
torch.ceil
torch.Tensor.ceil
horizon.nn.Ceilinput:
Type: int8, int16, float16
output:
Type: int8, int16, float16. only support combinations (int8→int8), (int16→int8,int16), (float16→float16)
torch.clamp
torch.clip
torch.Tensor.clamp
torch.Tensor.clip
 if minmax is scalar:
input:
Type: int8, int16, int32, float16, float32
Shape: [*]
output:
The input and output types need to be the same.
Shape: [*]
if minmax is Tensor:
lhs:
Type: int8, int16, float16, float32
Shape: [*]
rhs:
The lhs and rhs types need to be the same.
Shape: [*]
output:
The input and output types need to be the same.
Shape: [*]
torch.cat
torch.concat
torch.concatenate
torch.nn.quantized.FloatFunctional OR
horizon.nn.quantized.FloatFunctional
input:
Type: No limits
Arg Number: input number ∈ [1, 1024]
Dim: all dims < 131072
Size: size < 2G
output:
The input and output types need to be the same.
Dim: all dims < 131072
Size: size < 2G
torch.cos
torch.Tensor.cos
horizon.nn.Cosif is quantized int operation:
input:
Type: int8, int16, float16
output:
Type: int8, int16, float16. Only support 
(int8->int8), (int16->int8,int16), 
(float16->float16)
if is float operation:
input:
Type: float16, float32
Shape: [*]
output:
The input and output types need to be the same.
Shape: [*]
others:
This operator uses fast math implementation by default, which will introduce minor numeric error. For numeric sensitive usage, try to set 'module._disable_op_fast_math_impl = true' to fallback to high. for precision implementation, the precision error is small at input range [-π, π]. When input is outside this range, the error increases significantly. This is seen with the input sampled in [-10, 10]: Max absolute error is 0.0076904, relative error is 1.42285156
torch.cosh
torch.Tensor.cosh
horizon.nn.Coshinput:
Type: int8, int16, float16
output:
Type: int8, int16, float16. Only support 
(int8->int8), (int16->int8,int16), 
(float16->float16)
torch.cumsum
torch.Tensor.cumsum
horizon.nn.CumSuminput:
Type: int8, int16, float16, float32
Shape: [*, dim[axis], *]
Dim: * ∈ [1, 65536]; dim[axis] ∈ [1, 8192]
output:
Type: int8, int16, int32, float16, float32. only support combinations (int8→int8,int16,int32), (int16→int16,int32), (float16→float16), (float32→float32)
The Shape and Dim is same as input
exclusive:
If type is float16, float32, only support exclusive == 0, otherwise, exclusive is 0 or 1
reverse:
If type is float16, float32, only support reverse == 0, otherwise, exclusive is 0 or 1
torch.div
torch.Tensor.div
lhs:
Type: quantized type support int8, int16, others support int16, int32, float16, float32
rhs:
The lhs and rhs types need to be the same.
Shape: [*]
output:
For quantized type, support (int8,int16→int8,int16), others the input and output types need to be the same.
Shape: [*]
rounding_mode:
For integer, only support TRUNC, For float, only support NONE.
others:
When inputs are of integer or quantized type, if the result overflows, it will saturate to the maximum or minimum value of the result data type.
torch.eq
torch.Tensor.eq
 lhs:
Type: int8, int16, int32, float16, float32, bool8
Shape: [*]
rhs:
The lhs and rhs types need to be the same.
Shape: [*]
output:
Type: bool8
Shape: [*]
torch.gather
torch.Tensor.gather
 input:
Type: int8, int16, int32, float16, float32
Shape: [*]
Input will transpose to [N, W, C]. W is inputShape[dim], N is the product of inputShape[:dim], C is the product of inputShape[dim+1:].
N, C ∈ [1, 1048576]. N × C should not be larger than 1048576
When C <= 4, W ∈ [1, 1048576]; when C > 4, W ∈ [1, 131072]. If input type is int32, float32, W ∈ [1, 65536].
indices:
Type: int8, int16, int32; Unsupported negative indices.
Shape: [*] indices value should not be larger than 32767
Indices will transpose to [N, D, C]. D is indicesShape[dim], N is the product of indicesShape[:dim], C is the product of indicesShape[dim+1:].
N, C ∈ [1, 1048576].
IndicesShape[i] <= inputShape[i] for all dimensions i != dim.
output:
The input and output types need to be the same.
torch.gt
torch.greater
torch.Tensor.gt
torch.Tensor.greater
 lhs:
Type: int8, int16, int32, float16, float32
Shape: [*]
rhs:
The lhs and rhs types need to be the same.
Shape: [*]
output:
Type: bool8
Shape: [*]
torch.ge
torch.greater_equal
torch.Tensor.ge
torch.Tensor.greater_equal
 lhs:
Type: int8, int16, int32, float16, float32
Shape: [*]
rhs:
The lhs and rhs types need to be the same.
Shape: [*]
output:
Type: bool8
Shape: [*]
torch.lt
torch.less
torch.Tensor.lt
torch.Tensor.less
 lhs:
Type: int8, int16, int32, float16, float32
Shape: [*]
rhs:
The lhs and rhs types need to be the same.
Shape: [*]
output:
Type: bool8
Shape: [*]
torch.le
torch.less_equal
torch.Tensor.le
torch.Tensor.less_equal
 lhs:
Type: int8, int16, int32, float16, float32
Shape: [*]
rhs:
The lhs and rhs types need to be the same.
Shape: [*]
output:
Type: bool8
Shape: [*]
torch.erf
torch.Tensor.erf
horizon.nn.Erfinput:
Type: int8, int16, float16
output:
Type: int8, int16, float16. Only support 
(int8->int8), (int16->int8,int16), 
(float16->float16)
torch.exp
torch.Tensor.exp
horizon.nn.Expinput:
Type: int8, int16, float16, float32
Shape: [*]
output:
Type: int8, int16, float16, float32. only support combinations (int8→int8), (int16→int8,int16), (float16→float16), (float32→float32)
Shape: [*]
others:
If type is float16 or float32, this operator uses fast math implementation by default, which will introduce minor numeric error. For numeric sensitive usage, try to set 'module._disable_op_fast_math_impl = true' to fallback to high. for precision implementation, with the input interval of [-12, 12]: Max absolute error is 256, relative error is 0.0052032.
torch.Tensor.expand input:
Type: No limits
Arg Number: input number ∈ [1, 1024]
Dim: all dims < 131072
Size: size < 2G
output:
The input and output types need to be the same.
Dim: all dims < 131072
Size: size < 2G
torch.flatten
torch.Tensor.flatten
torch.nn.Flatten
input:
Type: No limits
output:
The input and output types need to be the same.
torch.flip
torch.Tensor.flip
input:
Type: int8, int16, int32
output:
The input and output types need to be the same.
torch.floor
torch.Tensor.floor
horizon.nn.Floorinput:
Type: int8, int16, float16
output:
Type: int8, int16, float16. only support combinations (int8→int8), (int16→int8,int16), (float16→float16)
torch.fmod
torch.remainder
horizon.nn.FMod
horizon.nn.Remainder
lhs:
Type: int16, int32. Not support quantized type.
Shape: [*]
Value range: must be none-negative.
rhs:
The lhs and rhs types need to be the same.
Shape: [*]
Value range: must be positive.
output:
The input and output types need to be the same.
Shape: [*]
torch.index_select
torch.Tensor.index_select
torch.unbind
torch.Tensor.unbind
input:
Type: int8, int16, int32, float16, float32
Shape: [*]
Input will transpose to [N, W, C]. W is inputShape[dim], N is the product of inputShape[:dim], C is the product of inputShape[dim+1:].
N, C ∈ [1, 1048576]. When C <= 4, W ∈ [1, 1048576]; when C > 4, W ∈ [1, 131072]. If input type is int32, float32, W ∈ [1, 65536].
index:
Type: int8, int16, int32; Unsupported negative indices.
Shape: [*] index value should not be larger than 1048576
output:
The input and output types need to be the same.
torch.log
torch.Tensor.log
horizon.nn.HardLoginput:
Type: int8, int16, float16
output:
Type: int8, int16, float16. Only support 
(int8->int8), (int16->int8,int16), 
(float16->float16)
torch.logical_and
torch.Tensor.logical_and
lhs:
Type: int8, int16, bool8, float16
Shape: [*]
rhs:
The lhs and rhs types need to be the same.
Shape: [*]
output:
Type: bool8
Shape: [*]
torch.logical_not
torch.Tensor.logical_not
input:
Type: int8, int16, bool8, float16
Shape: [*]
output:
Type: bool8
Shape: [*]
torch.logical_or
torch.Tensor.logical_or
lhs:
Type: int8, int16, bool8, float16
Shape: [*]
rhs:
The lhs and rhs types need to be the same.
Shape: [*]
output:
Type: bool8
Shape: [*]
torch.logical_xor
torch.Tensor.logical_xor
lhs:
Type: int8, int16, bool8, float16
Shape: [*]
rhs:
The lhs and rhs types need to be the same.
Shape: [*]
output:
Type: bool8
Shape: [*]
torch.Tensor.masked_fillinput:
Type: int8, int16, int32, float16, float32
output:
The output and input types need to be the same.
torch.matmul
torch.Tensor.matmul
horizon.nn.quantized.FloatFunctionallhs:
Type: int8, int16
Shape: [*,M,C]
Dim: * ∈ [1, 4096], M,C ∈ [1, 8192]
rhl:
Type: int8, int16
Shape: [*,C,N]
Dim: * ∈ [1, 4096]; C ∈ [1, 8192], N ∈  [1, 1048576]
output:
Type: int8, int16, int32, float16, float32
Shape: [*,M,N]
Other constraints: Same as lhs and rhs
torch.max
torch.Tensor.max
torch.min
torch.Tensor.min
 input:
Type: int8, int16, int32, float16, float32
Shape: [*]
Dim: if type is int8, int16 or float16, reduce axis dim size ∈ [1, 32767]
Element : if type is int8 or int16, reduce Elements size ∈ [1, 65535]
dims:
If type is int32, float16, float32, only support one dim
output:
Value Type: The input and output types need to be the same.
Index Type: int8, int16, int32
Shape: reduce dim will be 1 or fused depend on keepDim
torch.maximum
torch.Tensor.maximum
horizon.nn.quantized.FloatFunctionallhs:
Type: int8, int16, float16, float32
Shape: [*]
rhs:
The lhs and rhs types need to be the same.
Shape: [*]
output:
The input and output types need to be the same.
Shape: [*]
torch.mean
torch.Tensor.mean
horizon.nn.quantized.FloatFunctionalinput:
Type: int8, int16, float16, float32
Shape: [*]
Dim: reduce axis dim size ∈ [1, 65535]
Element : reduce Elements size ∈ [1, 65535]
output:
Type: int8, int16, float16, float32. only support combinations (int8→int8,int16), (int16→int16), (float16→float16,float32), (float32→float32)
Shape: reduce dim will be 1 or fused depend on keepDim
dims:
If type is float32, only support one dim.
torch.minimum
torch.Tensor.minimum
horizon.nn.quantized.FloatFunctionallhs:
Type: int8, int16, float16, float32
Shape: [*]
rhs:
The lhs and rhs types need to be the same.
Shape: [*]
output:
The input and output types need to be the same.
Shape: [*]
torch.mul
torch.Tensor.mul
torch.nn.quantized.FloatFunctional or
horizon.nn.quantized.FloatFunctional
lhs:
Type: int8, int16, int32, float16, float32
Shape: [*]
rhs:
The lhs and rhs types need to be the same.
Shape: [*]
output:
Type: int8, int16, int32, float16, float32. only support (int8,int16→int8,int16,int32), (int32→int32), (float16→float16), (float32→float32).
Shape: [*]
others:
When inputs are of integer or quantized type, if the result overflows, it will saturate to the maximum or minimum value of the result data type.
torch.neg
torch.negative
torch.Tensor.neg
torch.Tensor.negative
input:
Type: int8, int16, int32, float16
Shape: [*]
output:
The input and output types need to be the same.
Shape: [*]
torch.ne
torch.not_equal
torch.Tensor.ne
torch.Tensor.not_equal
 lhs:
Type: int8, int16, int32, float16, float32, bool8
Shape: [*]
rhs:
The lhs and rhs types need to be the same.
Shape: [*]
output:
Type: bool8
Shape: [*]
torch.permute
torch.Tensor.permute
 input:
Type: No limits
output:
The input and output types need to be the same.
torch.pow
torch.Tensor.pow
horizon.nn.Powif exponent is scalar 2:
input:
Type: int8, int16, int32, float16, float32
Shape: [*]
output:
Type: int8, int16, int32, float16, float32. only support (int8,int16→int8,int16,int32), (int32→int32), (float16→float16), (float32→float32).
Shape: [*]
others:
When inputs are of integer or quantized type, if the result overflows, it will saturate to the maximum or minimum value of the result data type.
if exponent is scalar and not 2:
input:
Type: int8, int16, float16
output:
Type: int8, int16, float16. Only support 
(int8->int8), (int16->int8,int16), 
(float16->float16)
torch.reciprocal
torch.Tensor.reciprocal
horizon.nn.Reciprocalinput:
Type: int8, int16, float16, float32
output:
Type: int8, int16, float16, float32. only support combinations (int8→int8), (int16→int8,int16), (float16→float16), (float32→float32)
torch.Tensor.repeat input:
Type: No limits
output:
The input and output types need to be the same.
torch.repeat_interleave
torch.Tensor.repeat_interleave
input:
Type: No limits
Dim: all dims < 2097152
output:
The input and output types need to be the same.
Dim: all dims < 131072
Size: size < 2G
torch.reshape
torch.Tensor.reshape
torch.Tensor.view
input:
Type: No limits
output:
The input and output types need to be the same.
torch.roll
torch.Tensor.roll
input:
Type: No limits
output:
The input and output types need to be the same.
torch.round
torch.Tensor.round
input:
Type: int8, int16, float16, float32
output:
The output and input types need to be the same.
torch.rsqrt
torch.Tensor.rsqrt
horizon.nn.Rsqrtinput:
Type: int8, int16, float16, float32
output:
Type: int8, int16, float16, float32. only support combinations (int8→int8), (int16→int8,int16), (float16→float16), (float32→float32)
torch.scatter
torch.Tensor.scatter
torch.scatter_add
torch.Tensor.scatter_add
torch.scatter_reduce
torch.Tensor.scatter_reduce
horizon.nn.Scatter
horizon.nn.ScatterAdd
horizon.nn.ScatterReduce
input:
Type: float16, float32, if type is float32, scatterReduceMode must be add
Shape: [*]
indices:
Type: int16, int32
Shape: data and indices must be the same shape, except for scatterAxis
updates:
Type: float16
Shape: [*]
output:
The input and output types need to be the same.
scatterReduceMode:
Mode ∈ [none, add, max, min]
torch.sign
torch.Tensor.sign
input:
Type: float16
Shape: [*]
output:
The shape and type is same as input
torch.sin
torch.Tensor.sin
horizon.nn.Sinif is quantized int operation:
input:
Type: int8, int16, float16
output:
Type: int8, int16, float16. Only support 
(int8->int8), (int16->int8,int16), 
(float16->float16)
if is float operation:
input:
Type: float16, float32
Shape: [*]
output:
The input and output types need to be the same.
Shape: [*]
others:
This operator uses fast math implementation by default, which will introduce minor numeric error. For numeric sensitive usage, try to set 'module._disable_op_fast_math_impl = true' to fallback to high. for precision implementation, the precision error is small at input range [-π, π]. When input is outside this range, the error increases significantly. This is seen with the input sampled in [-10, 10]: Max absolute error is 0.0088195, relative error is 2.0292968.
torch.sinh
torch.Tensor.sinh
horizon.nn.Sinhinput:
Type: int8, int16, float16
output:
Type: int8, int16, float16. Only support 
(int8->int8), (int16->int8,int16), 
(float16->float16)
torch.slice_scatterhorizon.nn.SliceScatterinput:
Type: No limits
Dim: all dims < 2097152
output:
The input and output types need to be the same.
The input and output constraints are the same.
torch.split
torch.Tensor.split
 input:
Type: No limits
Dim: all dims < 2097152
output:
The input and output types need to be the same.
The input and output constraints are the same.
torch.sqrthorizon.nn.Sqrtinput:
Type: int8, int16, float16, float32
output:
Type: int8, int16, float16, float32. only support combinations (int8→int8), (int16→int8,int16), (float16→float16), (float32→float32)
torch.squeeze
torch.Tensor.squeeze
 input:
Type: No limits
output:
The input and output types need to be the same.
torch.stackhorizon.nn.quantized.FloatFunctionalinput:
Type: No limits
Arg Number: input number ∈ [1, 1024]
Dim: all dims < 131072
Size: size < 2G
output:
The input and output types need to be the same.
Dim: all dims < 131072
Size: size < 2G
torch.sub
torch.Tensor.sub
horizon.nn.quantized.FloatFunctionallhs:
Type: int8, int16, int32, float16, float32
Shape: [*]
rhs:
The lhs and rhs types need to be the same.
Shape: [*]
output:
The input and output types need to be the same.
Shape: [*]
others:
When inputs are of integer or quantized type, if the result overflows, it will saturate to the maximum or minimum value of the result data type.
torch.sum
torch.Tensor.sum
horizon.nn.quantized.FloatFunctionalinput:
Type: int8, int16, float16, float32
Shape: [*]
Dim: reduce axis dim size ∈ [1, 65535]
Element : reduce Elements size ∈ [1, 65535]
output:
Type: int8, int16, float16, float32. only support combinations (int8→int8,int16), (int16→int16), (float16→float16,float32), (float32→float32)
Shape: reduce dim will be 1 or fused depend on keepDim
dims:
If type is float32, only support one dim.
torch.tanhorizon.nn.Taninput:
Type: int8, int16, float16
output:
Type: int8, int16, float16. Only support 
(int8->int8), (int16->int8,int16), 
(float16->float16)
torch.tile
torch.Tensor.tile
input:
Type: No limits
output:
The input and output types need to be the same.
torch.Tensor.toinput:
Type: int8, int16, int32, float16, float32, bool8
Shape: [*]
output:
Type: int8, int16, int32, float16, float32, bool8. only support combinations (int8→int8,int16,int32,float16,float32,bool8), (int16→int8,int16,int32,float16,float32,bool8), (int32→int16,int32,float16,float32), (float16→int8,int16,int32,float16, float32), (float32→int8,int16,int32,float16,float32), (bool8→int8,int16,int32,float16,float32,bool8)
Shape: [*]
torch.Tensor.floatinput:
Type: int8, int16, int32, float16, float32
Shape: [*]
output:
Type: float32
Shape: [*]
torch.topk
torch.Tensor.topk
horizon.functional.stable_topk
 input:
Type: int8, int16, int32, float16, float32
output:
The type of output value is same as input, and indices is integer type.
k:
K <= 1024
others:
The combined size in bytes of the op's operands and outputs must be no more than 4.8MB.
torch.transpose
torch.Tensor.transpose
 input:
Type: No limits
output:
The input and output types need to be the same.
torch.tril
torch.triu
input:
Type: int8, int16, int32, float16, float32
output:
The output and input types need to be the same.
torch.unsqueeze
torch.Tensor.unsqueeze
 input:
Type: No limits
output:
The input and output types need to be the same.
torch.wherehorizon.nn.Whereinput num must be 3.
condition:
Type: bool8
Shape: [*]
lhs:
Type: int8, int16, int32, float16, float32
Shape: [*]
rhs:
The lhs and rhs types need to be the same.
Shape: [*]
output:
Type: float16, float32
Shape: [*]
input:
Type: int8,int16,int32
Shape: [*]
torch.zeros_like
torch.ones_like
No limits
torch.linalg.normhorizon.nn.LinalgNormonly support ord in (1.0, 2.0, "fro")
input:
Type: int8, int16, float16
Dim: normalized dim size ∈ [1, 65535]
output:
Type: int8, int16, float16. only support combinations (int8,int16→int8,int16), (float16→float16)
torch.nn.functional.avg_pool1d
torch.nn.AvgPool1d
torch.nn.functional.avg_pool2d
torch.nn.AvgPool2d
torch.nn.functional.adaptive_avg_pool1d
torch.nn.AdaptiveAvgPool1d
torch.nn.functional.adaptive_avg_pool2d
torch.nn.AdaptiveAvgPool2d
input:
Type: int8, int16
Shape: [*,H,W,C] or [*,L,C]
output:
The input and output types need to be the same.
kernel:
Shape: [KL] or [KH,KW], only support 1d or 2d now
Dim: 1d: KL ∈ [1, 256], KL*bitWidth/8 <= 24576; 2d: KH, KW ∈ [1, 256], KH*KW*bitWidth/8 <= 24576
stride:
Shape: [SH,SW] or [SL]
Dim: SH, SW, SL ∈ [1, 256]
pad:
Shape: [PH_BEGIN,PW_BEGIN,PH_END,PW_END] or [PL_BEGIN,PL_END]
PH_BEGIN,PW_BEGIN,PL_BEGIN,PH_END,PW_END,PL_END ∈ [-255, 256]
dilation:
Shape: 1d: [DW]; 2d: [DH, DW]
Dim: 1d: DW ∈ {1}; 2d: DH, DW ∈ {1}
torch.nn.functional.affine_gridlhs:
Type: int8, int16
Shape: [*,M,C]
Dim: * ∈ [1, 4096], M,C ∈ [1, 8192]
rhl:
Type: int8, int16
Shape: [*,C,N]
Dim: * ∈ [1, 4096]; C ∈ [1, 8192], N ∈  [1, 1048576]
output:
Type: int8, int16, int32, float16, float32
Shape: [*,M,N]
Other constraints: Same as lhs and rhs
torch.nn.functional.dropout
torch.nn.Dropout
torch.nn.functional.dropout1d
torch.nn.Dropout1d
torch.nn.functional.dropout2d
torch.nn.Dropout2d
torch.nn.functional.dropout3d
torch.nn.Dropout3d
torch.nn.Dropout
torch.nn.Dropout1d
torch.nn.Dropout2d
torch.nn.Dropout3d
N/A, collapsed in graph optimization phase
torch.nn.functional.elu
torch.nn.ELU
torch.nn.ELUinput:
Type: int8, int16, float16
output:
Type: int8, int16, float16. Only support 
(int8->int8), (int16->int8,int16), 
(float16->float16)
torch.nn.functional.embedding
torch.nn.Embedding
input:
Type: int8, int16, int32, float16, float32
Shape: [*]
If gather1d, W = inputShape[batchDim], when C <= 4, W ∈ [1, 1048576]; when C > 4, W ∈ [1, 131072].
If gather2d, H = inputShape[batchDim], W = inputShape[batchDim+1], H ∈ [1, 65536], W ∈ [1, 1048576].
If input type is int32, float32, W ∈ [1, 65536].
B is product of inputShape[0: batchDim], B ∈ [1, 1048576].
C is product of inputShape[batchDim+D:], C ∈ [1, 1048576].
indices:
Type: int8, int16, int32; Unsupported negative indices.
Shape: [*, D] indices value should not be larger than 1048576. D ∈ [1, 2].
output:
Shape: [*]
The input and output types need to be the same.
batchDim:
The number of batch dimensions. The gather of indexing starts from dimension of input[batchDim:]
torch.nn.functional.gelu
torch.nn.GELU
torch.nn.GELUinput:
Type: int8, int16, float16
output:
Type: int8, int16, float16. Only support 
(int8->int8), (int16->int8,int16), 
(float16->float16)
torch.nn.functional.glu
torch.nn.GLU
torch.nn.GLUinput:
Type: int8, int16, float16
output:
Type: int8, int16, float16. only support combinations (int8,int16→int8,int16), (float16→float16)
torch.nn.functional.grid_sample input:
Type: int8, int16, float16
Shape: [*,H,W,C]
Dim: If type is int8, H,W ∈ [1, 32768], H and W cannot both be greater than 4096 at the same time, other dims ∈ [1, 65536]. If type is int16 or float16, H,W ∈ [1, 2048], H × W <= 1048576, other dims ∈ [1, 65536].
grid:
Type: int16, float32. If gird type is float32, input type should be int16 or float16.
Shape: [*,H,W,2]
output:
The input and output types need to be the same except Dim constraints
mode:
If input type is int8, support nearest and bilinear
If input type is int16 or float16, only support bilinear.
padding_mode:
If input type is int8, support zeros and border.
If input type is int16 or float16, only support zeros.
torch.nn.functional.hardsigmoid
torch.nn.HardSigmoid
torch.nn.HardSigmoidinput:
Type: int8, int16, float16
output:
Type: int8, int16, float16. Only support 
(int8->int8), (int16->int8,int16), 
(float16->float16)
torch.nn.functional.interpolate
torch.nn.Upsample
torch.nn.UpsamplingNearest2d
torch.nn.UpsamplingBilinear2d
 input:
--case 1: int8--
Type: int8
Shape: [*,H,W,C]
--case 2: float16, int16--
Type: float16, int16(with per tensor quant info)
Shape: [*,H,W,C]
Dim: When steps, initialOffsets are all an integer power of 2, batch * C ∈ [1, 512], other dims ∈ [1, 65536]. When steps and initalOffsets value has one not an integer power of 2, H ∈ [1, 32768], W ∈ [1, 32768], C ∈ [1, 65536], other dims ∈ [1, 16384].
output:
The input and output types need to be the same.
mode:
If Type is int8, support nearest and bilinear
If Type is float16 or int16, only support bilinear
padValue:
If Type is int8, when padvalue is not equal to 0 and quantized, input only supports per tensor quantization
step:
If Type is int8, the integer part of step ∈ [-256, 255]
If Type is float16 or int16,  support steps are all less than and equal to 1
expansionMode:
If Type is int8, support border and constant
If Type is float16 or int16, only support border
others:
When type is float16 or int16, H and W cannot both be greater than 4096 at the same time. KN*KH*KW*KC*bitWidth/8 <= 1048576
torch.nn.functional.leaky_relu
torch.nn.LeakyReLU
torch.nn.LeakyReLUinput:
Type: int8, int16, float16
output:
Type: int8, int16, float16. Only support 
(int8->int8), (int16->int8,int16), 
(float16->float16)
torch.nn.functional.log_softmax
torch.nn.LogSoftmax
torch.nn.LogSoftmaxinput:
Type: int8, int16, float16
Shape: [*]
Dim: reduce axis dim size ∈ [1, 65535]
Element : reduce Elements size ∈ [1, 65535]
output:
Type: int8, int16, float16. only support combinations (int8,int16→int8,int16), (float16→float16)
torch.nn.functional.mish
torch.nn.Mish
torch.nn.Mishinput:
Type: int8, int16, float16
output:
Type: int8, int16, float16. Only support 
(int8->int8), (int16->int8,int16), 
(float16->float16)
torch.nn.functional.normalizehorizon.nn.Normalizeonly support p in (1.0, 2.0)
input:
Type: int8, int16, float16
Dim: normalized dim size ∈ [1, 65535]
output:
Type: int8, int16, float16. only support combinations (int8,int16→int8,int16), (float16→float16)
torch.nn.functional.pad
torch.nn.ConstantPad1d
torch.nn.ConstantPad2d
torch.nn.ConstantPad3d
torch.nn.ReplicationPad1d
torch.nn.ReplicationPad2d
torch.nn.ReplicationPad3d
torch.nn.ZeroPad2d
 input:
Type: int8, int16, float16, float32
Dim: all dims < 737280 when expansionMode is not 'constant' else no constraints
output:
The input and output constraints are the same.
begin/end:
Value should be in range [1, 1024]
torch.nn.functional.pixel_shuffle
torch.nn.PixelShuffle
 input:
dim ∈ [3, 7]
Type: No limits
output:
The output and input types need to be the same.
torch.nn.functional.pixel_unshuffle
torch.nn.PixelUnshuffle
 input:
Type: No limits.
output:
The output and input types need to be the same.
torch.nn.PReLUtorch.nn.PReLUinput:
Type: int8, int16
output:
The output and input types need to be the same.
torch.nn.functional.relu
torch.nn.ReLU
torch.nn.ReLUinput:
Type: int8, int16, int32, float16, float32, if type is int32 or float32, this op must be fusible to a Conv op
Shape: [*]
output:
The input and output types need to be the same.
Shape: [*]
torch.nn.functional.relu6
torch.nn.ReLU6
torch.nn.ReLU6input:
Type: int8, int16, int32, float16, float32
Shape: [*]
output:
The input and output types need to be the same.
Shape: [*]
torch.nn.functional.silu
torch.nn.SiLU
torch.nn.SiLUinput:
Type: int8, int16, float16
output:
Type: int8, int16, float16. Only support 
(int8->int8), (int16->int8,int16), 
(float16->float16)
torch.nn.functional.softmax
torch.nn.Softmax
torch.softmax
torch.Tensor.softmax
torch.nn.Softmaxinput:
Type: int8, int16, float16
Shape: [*]
Dim: reduce axis dim size ∈ [1, 65535]
Element : reduce Elements size ∈ [1, 65535]
output:
Type: int8, int16, float16. only support combinations (int8→int8), (int16→int8,int16), (float16→float16)
torch.nn.functional.softplus
torch.nn.Softplus
input:
Type: int8, int16, float16
output:
Type: int8, int16, float16. Only support 
(int8->int8), (int16->int8,int16), 
(float16->float16)
torch.nn.BatchNorm1d
torch.nn.BatchNorm2d
torch.nn.BatchNorm3d
 input:
Type: int8, int16
Shape: [*,H,W,C]
mean:
Type: float32
Shape: [C]
var:
Type: float32
Shape: [C]
weight:
Type: float32
Shape: [C]
bias:
Type: float32
Shape: [C]
output:
The input and output types need to be the same.
torch.nn.Conv1d
torch.nn.Conv2d
torch.nn.Conv3d
input:
--conv 1d--
Type: int8, int16
Dim: * ∈ [1, 4096]; L,C ∈ [1, 65536]
--conv 2d--
Type: int8, int16
Shape: [*,H,W,C]
Dim: * ∈ [1, 4096]; H,W,C ∈ [1, 65536]
--conv 3d--
Type: int8
Shape: [*,D,H,W,C]
Dim: * ∈ [1, 128]; D,H,W ∈ [1, 65536]; C ∈ [1, 4096];
weight:
--conv 1d--
Type: int8, int16
Shape: [N,KL,C]
Dim: C ∈ [1, 8192]; KL ∈ [1, 31]; N ∈ [1, 1048576] if fout is the last layer of conv else [1, 8192]
Size: KL × C ∈ [1, 65535]
--conv 2d--
Type: int8, int16
Shape: [N,KH,KW,C]
Dim: C ∈ [1, 8192]; KH,KW ∈ [1, 31]; N ∈ [1, 1048576] if fout is the last layer of conv else [1, 8192]
Size: KH × KW × C ∈ [1, 65535]
--conv 3d
Type: int8
Shape: [N,KD,KH,KW,C]
N ∈ [1, 65535]; KD,KH,KW ∈ [1, 9]; Dim: C ∈ [1, 4096];
Size: KD × KH × KW × C ∈ [1, 131072]
bias:
Type: float32
output:
--conv 1d--
Type: int8, int16, int32, float32, float16
Shape: [*,L,C]
Dim: * ∈ [1, 4096]; L,C ∈ [1, 65536]
--conv 2d--
Type: int8, int16, int32, float32, float16
If Type is float32, float16, input and weight only support int8
Shape: [*,H,W,C]
Dim: * ∈ [1, 4096]; H,W,C ∈ [1, 65536]
--conv 3d--
Type: int8, int16, int32, float32, float16
Shape: [*,D,H,W,C]
Dim: * ∈ [1, 128]; D,H,W ∈ [1, 65536]; C ∈ [1, 4096];
stride:
--conv 1d--
Shape: [SL]
Dim: SL ∈ [1, 256]; SL ∈ {1} if dilation > 1
--conv 2d--
Shape: [SH,SW]
Dim: SH,SW ∈ [1, 256]; SH,SW ∈ {1} if dilation > 1
--conv 3d--
Shape: [SD,SH,SW]
Dim: SD,SH,SW must be 1 or 2 and equal to each other.
pad:
--conv 1d--
Shape: [P_left,P_right]
Dim: P_left,P_right ∈ (-L, 256]
--conv 2d--
Shape: [P_top,P_left,P_bottom,P_right]
Dim: P_top,P_bottom ∈ (-H, 256], P_left,P_right ∈ (-W, 256]
--conv 3d--
Shape: [P_front, P_top, P_left, P_back, P_bottom, P_right]
Dim: P_front,P_back ∈ [0, KD/2], P_top,P_bottom ∈ [0, KH/2], P_left,P_right ∈ [0, KW/2]
groupNum:
Fin.c is divisible by group number, conv 3d only support 1
dilation:
--conv 1d--
Shape: [DL]
Dim: DL ∈ [1, 18]
--conv 2d--
Shape: [DH,DW]
Dim: DH,DW ∈ [1, 18]
--conv 3d--
Shape: [DD,DH,DW]
DD,DH,DW = 1
others:
--conv 1d--
Stride only support odd number and 2 when conv is a int16 depthwise conv
If groupNum > 1, for each group, fin.c' ∈ [1, 65535], KL × fin.c' ∈ [1, 65535]
--conv 2d--
Stride only support odd number and 2 when conv is a int16 depthwise conv
If groupNum > 1, for each group, fin.c' ∈ [1, 65535], KH × KW × fin.c' ∈ [1, 65535]
Fin.c' = fin.c × min(lcm(fout.c × (lcm(fin.c, 4) / fin.c), 8) / fout.c, groupNum)
torch.nn.ConvTranspose1d
torch.nn.ConvTranspose2d
torch.nn.ConvTranspose3d
input:
--conv 1d/2d--
Type: int8, int16; input and weight cannot both be int16
1d_Shape: [*,W,C]
1d_Dim: * ∈ [1, 128]; W ∈ [1, 65536]; C ∈ [1, 2048]
2d_Shape: [*,H,W,C]
2d_Dim: * ∈ [1, 128]; H,W ∈ [1, 65536]; C ∈ [1, 2048]
--conv 3d--
Type: int8
3d_Shape: [*,D,H,W,C]
3d_Dim: * ∈ [1, 128]; D,H,W ∈ [1, 65536]; C ∈ [1, 2048]
weight:
--conv 1d/2d--
Type: int8, int16; input and weight cannot both be int16
1d_Shape: [N,KW,C]
1d_Dim: N,C ∈ [1, 2048]; KW ∈ [1, 14]
1d_Size: KW × C ∈ [1, 65535]
2d_Shape: [N,KH,KW,C]
2d_Dim: N,C ∈ [1, 2048]; KH,KW ∈ [1, 14]; KH,KW cannot both be 1
2d_Size: KH × KW × C ∈ [1, 65535]
--conv 3d--
Type: int8
3d_Shape: [N,KD,KH,KW,C]
3d_Dim: N,C ∈ [1, 2048]; KD,KH,KW ∈ [1, 14]; KD,KH,KW cannot all be 1
3d_Size: KH × KW × C ∈ [1, 65535]
bias:
Type: float32
output:
Type: int8, int16, int32, float16, float32
stride:
1d_Shape: [SW]
1d_Dim: SW ∈ [1, 14];
2d_Shape: [SH,SW]
2d_Dim: SH,SW ∈ [1, 14];
3d_Shape: [SD,SH,SW]
3d_Dim: SD,SH,SW ∈ [1, 14];
pad:
1d_Shape: [P_left,P_bottom]
1d_Dim: P_left,P_bottom ∈ [0, 256]
2d_Shape: [P_top,P_left,P_bottom,P_right]
2d_Dim: P_top,P_left,P_bottom,P_right ∈ [0, 256]
3d_Shape: [P_front,P_top,P_left,P_back,P_bottom,P_right]
3d_Dim: P_front,P_top,P_left,P_back,P_bottom,P_right ∈ [0, 256]
groupNum:
Fin.c is divisible by group number, conv 3d only support 1
dilation:
1d_Shape: [DW]
1d_Dim: DW ∈ {1}
2d_Shape: [DH,DW]
2d_Dim: DH,DW ∈ {1}
3d_Shape: [DD,DH,DW]
3d_Dim: DD,DH,DW ∈ {1}
torch.nn.GRUdropout must be 0.0
input:
Type: int8, int16
Dim: C_in ∈ [1, 65535], Seq length < 1024, other dims < 2097152
output:
Type: int8, int16
Dim: all dims < 131072
size < 2G
torch.nn.LSTMinput:
Type: int8, int16
Dim: C_in ∈ [1, 65535], Seq length < 1024, other dims < 2097152
output:
Type: int8, int16
Dim: all dims < 131072
size < 2G
torch.nn.Identity N/A, collapsed in graph optimization phase
torch.nn.LayerNorm
torch.nn.GroupNorm
torch.nn.InstanceNorm1d
torch.nn.InstanceNorm2d
torch.nn.InstanceNorm3d
horizon.nn.LayerNorm
input:
Type: int8, int16, float16
Dim: normalized dim size ∈ [1, 65535]
output:
Type: int8, int16, float16. only support combinations (int8,int16→int8,int16), (float16→float16)
torch.nn.Linearlhs:
Type: int8, int16
Shape: [*,C_in]
Dim: *, C_in ∈ [1, 65536]]
weight:
Type: int8, int16
Shape: [C_out, C_in]
Dim: C_out ∈ [1, 1048576]; C_in ∈ [1, 8192]
bias:
Type: float32
output:
Type: int8, int16, int32, float16, float32
Other constraints: Same as input
torch.Tensor.masked_scatterinput:
Type: float16
Shape: [*]
indices:
Type: int16, int32
Shape: [*,C]
Dim: *, C ∈ [1, 2]
updates:
Type: float16
Shape: [*]
output:
The input and output types need to be the same.
torch.nn.functional.max_pool1d
torch.nn.MaxPool1d
torch.nn.functional.max_pool2d
torch.nn.MaxPool2d
torch.nn.functional.adaptive_max_pool1d
torch.nn.AdaptiveMaxPool1d
torch.nn.functional.adaptive_max_pool2d
torch.nn.AdaptiveMaxPool2d
 input:
Type: int8, int16, float16
Shape: [*,H,W,C]
output:
The input and output types need to be the same.
kernel:
Shape: [KL] or [KH,KW], only support 1d or 2d now
Dim: 1d: KL ∈ [1, 256], KL*bitWidth/8 <= 24576; 2d: KH, KW ∈ [1, 256], KH*KW*bitWidth/8 <= 24576
stride:
Shape: [SH,SW]
Dim: SH, SW ∈ [1, 256]
pad:
Shape: [PH_BEGIN,PW_BEGIN,PH_END,PW_END]
PH_BEGIN,PW_BEGIN,PH_END,PW_END ∈ [-4, 256]
dilation:
Shape: 1d: [DW]; 2d: [DH, DW]
Dim: 1d: DW ∈ {1}; 2d: DH, DW ∈ {1}
torch.nn.MultiheadAttentionsrc_len, tgt_len, head_dim ∈ [1, 8192]
embed_dim, kdim, vdim ∈ [1, 65536]
input:
Type: int8, int16
output:
Type: int8, int16
torch.nn.functional.selu
torch.nn.SELU
torch.nn.SELUinput:
Type: int8, int16, float16
output:
Type: int8, int16, float16. Only support 
(int8->int8), (int16->int8,int16), 
(float16->float16)
torch.nn.functional.sigmoid
torch.sigmoid
torch.Tensor.sigmoid
torch.nn.Sigmoid
torch.nn.Sigmoidinput:
Type: int8, int16, float16
output:
Type: int8, int16, float16. Only support 
(int8->int8), (int16->int8,int16), 
(float16->float16)
torch.sort
torch.Tensor.sort
input:
Type: int8, int16, int32, float16, float32
The dim to sort has size <= maxSortSize, which maxSortSize = 8192/((valueByteWidth + indexByteWidth + 7)/8*8). for example, if input value is float32, output index is int16, maxSortSize = 8192/((4+2+7)/8*8) = 1024.
output:
The type of output value is same as input, and indices is integer type
others:
The combined size in bytes of the op's operands and outputs must be no more than 4.8MB.
torch.tanh
torch.Tensor.tanh
torch.nn.Tanh
torch.nn.Tanhinput:
Type: int8, int16, float16
output:
Type: int8, int16, float16. Only support 
(int8->int8), (int16->int8,int16), 
(float16->float16)
torch.nn.TransformerDecoderLayerxxx_is_causal is not supported
input:
Type: int8, int16
output:
Type: int8, int16
torch.nn.TransformerEncoderLayerxxx_is_causal is not supported
input:
Type: int8, int16
output:
Type: int8, int16
torch.quantization.DeQuantStubinput:
Type: int8, int16, int32
output:
Type: float16, float32
torch.quantization.QuantStubinput:
Type: float16, float32
output:
Type: int8, int16
horizon.nn.AnchorGenerator No limits
horizon.nn.BaseGridGenerator No limits
horizon.nn.functional.filterinput:
Type: int8, int16
Shape: [*, H, W, C]
Bpu filter batch dim must be 1 when rank4, H/W must be in range (0, 32768)
W*C < L1M_SIZE/4 && W <= 4096 when H != 1
threshold:
Type: int8, int16
output:
The input and output types need to be the same.
others:
All ops between the filterData and the last layer of the model should be cpu ops
horizon.nn.GridSampleinput:
Type: nearest mode supports int8, int16, int32, float16, float32, pad must 0 when bit width > 8; bilinear mode and pad 0, supports int8, int16, float16; the others only support int8.
Shape: [*,H,W,C]
Dim: If type is int8, H,W ∈ [1, 32768], H and W cannot both be greater than 4096 at the same time, other dims ∈ [1, 65536]. If type is int16 or float16, H,W ∈ [1, 2048], H × W <= 1048576, other dims ∈ [1, 65536].
grid:
Type: int16, float32. If gird type is float32, input type should be int16 or float16.
Shape: [*,H,W,2]
output:
The input and output types need to be the same except Dim constraints
horizon.nn.functional.scatter_reduce_ndhorizon.nn.ScatterReduceNDinput:
Type: float16, float32, if type is float32, scatterReduceMode must be add or count
Shape: [*,C]
Dim: *, C=1 if scatterReduceMode is count
indices:
Type: int16, int32
Shape: [*,C]
Dim: *, C ∈ [1, 2]
updates:
Type: float16
Shape: [*]
output:
The input and output types need to be the same.
scatterReduceMode:
Mode ∈ [none, add, max, min, count]
torchvision.ops.DeformConv2dinput:
Type: int8
Shape: [*,H,W,C]
Dim: H,W ∈ [1, 1024]; H × W ≤ 720 × 1024; other dims ∈ [1, 65536]
offset:
Type: int16
Shape: [*,OH,OW,2 × offsetGroupNum × KH × KW]
Size: 2 × offsetGroupNum × KH × KW ∈ [2, 256], OH × KH × OW × KW ≤ 720 × 1024
mask:
Type: int8
Shape: [*,OH,OW,offsetGroupNum × KH × KW]
Size: offsetGroupNum × KH × KW ∈ [1, 128]
The value of mask is usually [0, 1]
weight:
Type: int8
Shape: [N,KH,KW,C]
Dim: C ∈ [1, 8192]; KH,KW ∈ [1, 8]; N ∈ [1, 4096]
Size: KH × KW × C ∈ [1, 65535]
bias:
Type: float32
output:
Type: int8, int16, int32
Other constraints: Same as fin
stride:
Shape: [SH,SW]
Dim: SH,SW ∈ [1]
pad:
Shape: [P_top,P_left,P_bottom,P_right]
Dim: P_top,P_bottom ∈ [-H/2, 256], P_left,P_right ∈ [-W/2, 256]
groupNum:
Fin.c is divisible by group number
offsetGroupNum:
Fin.c is divisible by offset group number
Size: offsetGroupNum ∈ [1, 2]
dilation:
Shape: [DH,DW]
Dim: DH,DW ∈ [1]
others:
For each group, fin.c ∈ [1, 8192], KH × KW × fin.c ∈ [1, 65535], fin.c = C when group = 1
torch.Tensor.__getitem__(if indices is index Tensor) input:
Type: int8, int16, int32, float16, float32
Shape: [*]
Input will transpose to [N, W, C]. W is inputShape[dim], N is the product of inputShape[:dim], C is the product of inputShape[dim+1:].
N, C ∈ [1, 1048576]. When C <= 4, W ∈ [1, 1048576]; when C > 4, W ∈ [1, 131072]. If input type is int32, float32, W ∈ [1, 65536].
index:
Type: int8, int16, int32; Unsupported negative indices.
Shape: [*] index value should not be larger than 1048576
output:
The input and output types need to be the same.
torch.Tensor.__getitem__(if indices is int scalar)input:
Type: No limits
output:
The input and output types need to be the same.
torch.Tensor.__getitem__(if indices is slice)input:
Type: No limits
Dim: all dims < 2097152
output:
The input and output types need to be the same.
The input and output constraints are the same.
torch.Tensor.__setitem__horizon.nn.SetIteminput:
Type: float16
Shape: [*]
indices:
Type: int16, int32
Shape: [*,C]
Dim: *, C ∈ [1, 2]
updates:
Type: float16
Shape: [*]
output:
The input and output types need to be the same.
torch.Tensor.clone
torch.Tensor.contiguous
torch.Tensor.detach
 N/A, collapsed in graph optimization phase