triton.experimental.gluon.language.amd.AMDMFMALayout
- class triton.experimental.gluon.language.amd.AMDMFMALayout(self, version: int, instr_shape: ~typing.List[int], transposed: bool, warps_per_cta: ~typing.List[int], element_bitwidth: int | None = None, tiles_per_warp: ~typing.List[int] | None = None, cga_layout: ~typing.List[~typing.List[int]] = <factory>)
Represents a layout for AMD MFMA (matrix core) operations.
- Parameters:
version (int) – The GPU architecture.
instr_shape (List[int]) – The shape in the form of (M, N, K) of the matrix.
transposed (bool) – Indicates the result tensor is transposed so that each thread holds consecutive elements in the same row instead of column, which is good for chained dot and global write.
warps_per_cta (List[int]) – The warp layout in the block.
Optional (tiles_per_warp) – Bit width of the output element type. Supported values are 32 and 64. Defaults to 32.
Optional – The tile layout within a warp. Defaults to unit tile layout, i.e., single tile on all dimensions.
cga_layout (Optional[List[List[int]]]) – Bases describing CTA tiling.
Current supported versions:
1: gfx908
2: gfx90a
3: gfx942
4: gfx950
- __init__(self, version: int, instr_shape: ~typing.List[int], transposed: bool, warps_per_cta: ~typing.List[int], element_bitwidth: int | None = None, tiles_per_warp: ~typing.List[int] | None = None, cga_layout: ~typing.List[~typing.List[int]] = <factory>) None
Methods
__init__(self, version, instr_shape, ...)format_hardware_view(self, shape)format_tensor_view(self, shape)mangle(self)verify(self)Attributes
element_bitwidthranktiles_per_warptypeversioninstr_shapetransposedwarps_per_ctacga_layout