triton.experimental.gluon.language.nvidia.blackwell.TensorMemoryLayout

class triton.experimental.gluon.language.nvidia.blackwell.TensorMemoryLayout(self, block: ~typing.Tuple[int, int], col_stride: int, cga_layout: ~typing.List[~typing.List[int]] = <factory>, two_ctas: bool = False)

Describes the layout for tensor memory in Blackwell architecture.

Parameters:
  • block (Tuple[int, int]) – Number of contiguous elements per row / column in a CTA.

  • col_stride (int) – Number of 32-bit columns to advance between logically adjacent columns. Packed layouts use a stride of 1. Unpacked layouts use 32 / bitwidth.

  • cga_layout (Optional[List[List[int]]]) – CGA layout bases. Defaults to [].

  • two_ctas (bool) – Whether the layout is for two-CTA mode. Defaults to False.

__init__(self, block: ~typing.Tuple[int, int], col_stride: int, cga_layout: ~typing.List[~typing.List[int]] = <factory>, two_ctas: bool = False) None

Methods

__init__(self, block, int], col_stride, ...)

mangle(self)

Attributes

two_ctas

block

col_stride

cga_layout