triton.experimental.gluon.language.nvidia.blackwell.TensorMemoryLayout

class triton.experimental.gluon.language.nvidia.blackwell.TensorMemoryLayout(self, block: ~typing.Tuple[int, int], col_stride: int, cga_layout: ~typing.List[~typing.List[int]] = <factory>, two_ctas: bool = False, fp4_padded: bool = False)

Describes the layout for tensor memory in Blackwell architecture.

Parameters:

block (Tuple[int, int]) – Number of contiguous elements per row / column in a CTA.
col_stride (int) – Number of 32-bit columns to advance between logically adjacent columns. Packed layouts use a stride of 1. Unpacked layouts use 32 / bitwidth.
cga_layout (Optional[List[List[int]]]) – CGA layout bases. Defaults to [].
two_ctas (bool) – Whether the layout is for two-CTA mode. Defaults to False.
fp4_padded (bool) – Whether byte-backed operand A uses the padded MMAv5 FP4 layout. Its descriptor keeps the packed Mx(K/2)xi8 shape, MMAv5 treats logical K as twice descriptor K, and physical TMEM reserves one byte per logical FP4 element. Defaults to False.

__init__(self, block: ~typing.Tuple[int, int], col_stride: int, cga_layout: ~typing.List[~typing.List[int]] = <factory>, two_ctas: bool = False, fp4_padded: bool = False) → None

Methods

`__init__`(self, block, int], col_stride, ...)
`mangle`(self)

Attributes

`fp4_padded`
`two_ctas`
`block`
`col_stride`
`cga_layout`