triton.experimental.gluon.language.nvidia.blackwell.TensorMemoryLayout
- class triton.experimental.gluon.language.nvidia.blackwell.TensorMemoryLayout(self, block: ~typing.Tuple[int, int], col_stride: int, cga_layout: ~typing.List[~typing.List[int]] = <factory>, two_ctas: bool = False)
Describes the layout for tensor memory in Blackwell architecture.
- Parameters:
block (Tuple[int, int]) – Number of contiguous elements per row / column in a CTA.
col_stride (int) – Number of 32-bit columns to advance between logically adjacent columns. Packed layouts use a stride of 1. Unpacked layouts use
32 / bitwidth.cga_layout (Optional[List[List[int]]]) – CGA layout bases. Defaults to [].
two_ctas (bool) – Whether the layout is for two-CTA mode. Defaults to False.
- __init__(self, block: ~typing.Tuple[int, int], col_stride: int, cga_layout: ~typing.List[~typing.List[int]] = <factory>, two_ctas: bool = False) None
Methods
__init__(self, block, int], col_stride, ...)mangle(self)Attributes
two_ctasblockcol_stridecga_layout