, other, acc=None, input_precision=None, allow_tf32=None, max_num_imprecise_acc=None, out_dtype=triton.language.float32)

Returns the matrix product of two blocks.

The two blocks must be two-dimensional and have compatible inner dimensions.

  • input (2D tensor of scalar-type in {float16, bfloat16, float32}) – The first tensor to be multiplied.

  • other (string. Available options for nvidia: "tf32", "tf32x3", "ieee". Default: "tf32". Avaliable options for amd: "ieee".) – The second tensor to be multiplied.

  • input_precision – How to exercise the Tenors cores for f32 x f32. If the device does not have Tensor Cores or the inputs are not of dtype f32, this option is ignored. For devices that do have tensor cores, the default precision is tf32.

  • allow_tf32Deprecated. If true, input_precision is set to “tf32”. Only one of input_precision and allow_tf32 can be specified (i.e. at least one must be None).