triton.experimental.gluon.language.amd.AMDMFMALayout

class triton.experimental.gluon.language.amd.AMDMFMALayout(self, version: int, instr_shape: ~typing.List[int], transposed: bool, warps_per_cta: ~typing.List[int], element_bitwidth: int | None = None, tiles_per_warp: ~typing.List[int] | None = None, cga_layout: ~typing.List[~typing.List[int]] = <factory>)

Represents a layout for AMD MFMA (matrix core) operations.

Parameters:
  • version (int) – The GPU architecture.

  • instr_shape (List[int]) – The shape in the form of (M, N, K) of the matrix.

  • transposed (bool) – Indicates the result tensor is transposed so that each thread holds consecutive elements in the same row instead of column, which is good for chained dot and global write.

  • warps_per_cta (List[int]) – The warp layout in the block.

  • Optional (tiles_per_warp) – Bit width of the output element type. Supported values are 32 and 64. Defaults to 32.

  • Optional – The tile layout within a warp. Defaults to unit tile layout, i.e., single tile on all dimensions.

  • cga_layout (Optional[List[List[int]]]) – Bases describing CTA tiling.

Current supported versions:

  • 1: gfx908

  • 2: gfx90a

  • 3: gfx942

  • 4: gfx950

__init__(self, version: int, instr_shape: ~typing.List[int], transposed: bool, warps_per_cta: ~typing.List[int], element_bitwidth: int | None = None, tiles_per_warp: ~typing.List[int] | None = None, cga_layout: ~typing.List[~typing.List[int]] = <factory>) None

Methods

__init__(self, version, instr_shape, ...)

format_hardware_view(self, shape)

format_tensor_view(self, shape)

mangle(self)

verify(self)

Attributes

element_bitwidth

rank

tiles_per_warp

type

version

instr_shape

transposed

warps_per_cta

cga_layout