triton.experimental.gluon.language.amd.cdna4.mfma_scaled

triton.experimental.gluon.language.amd.cdna4.mfma_scaled(a, a_scale, a_format, b, b_scale, b_format, acc, _semantic=None)

AMD Scaled MFMA operation.

` c = a * a_scale @ b * b_scale + acc `

a and b use microscaling formats described in “OCP Microscaling Formats (MX) Specification”: https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf. Currently supported only on CDNA4 hardware.

Parameters:

a (tensor) – The operand A to be multiplied.
a_scale (Optional[tensor]) – Scale factor for operand A.
a_format (str) – Format of the operand A. Available formats: e2m1, e4m3, e5m2.
b (tensor) – The operand B to be multiplied.
b_scale (Optional[tensor]) – Scale factor for operand B.
b_format (str) – Format of the operand B. Available formats: e2m1, e4m3, e5m2.
acc (tensor) – Accumulator tensor.