triton.experimental.gluon.language.amd.AMDMFMALayout

class triton.experimental.gluon.language.amd.AMDMFMALayout(self, version: int, instr_shape: ~typing.List[int], transposed: bool, warps_per_cta: ~typing.List[int], element_bitwidth: int | None = None, tiles_per_warp: ~typing.List[int] | None = None, cga_layout: ~typing.List[~typing.List[int]] = <factory>)

Represents a layout for AMD MFMA (matrix core) operations.

Parameters:

version (int) – The GPU architecture.
instr_shape (List[int]) – The shape in the form of (M, N, K) of the matrix.
transposed (bool) – Indicates the result tensor is transposed so that each thread holds consecutive elements in the same row instead of column, which is good for chained dot and global write.
warps_per_cta (List[int]) – The warp layout in the block.
Optional (tiles_per_warp) – Bit width of the output element type. Supported values are 32 and 64. Defaults to 32.
Optional – The tile layout within a warp. Defaults to unit tile layout, i.e., single tile on all dimensions.
cga_layout (Optional[List[List[int]]]) – Bases describing CTA tiling.

Current supported versions:

1: gfx908
2: gfx90a
3: gfx942
4: gfx950

__init__(self, version: int, instr_shape: ~typing.List[int], transposed: bool, warps_per_cta: ~typing.List[int], element_bitwidth: int | None = None, tiles_per_warp: ~typing.List[int] | None = None, cga_layout: ~typing.List[~typing.List[int]] = <factory>) → None

Methods

`__init__`(self, version, instr_shape, ...)
`format_hardware_view`(self, shape)
`format_tensor_view`(self, shape)
`mangle`(self)
`verify`(self)

Attributes

`element_bitwidth`
`rank`
`tiles_per_warp`
`type`
`version`
`instr_shape`
`transposed`
`warps_per_cta`
`cga_layout`