triton.experimental.gluon.language.nvidia.blackwell.tma.async_copy_global_to_shared

triton.experimental.gluon.language.nvidia.blackwell.tma.async_copy_global_to_shared(tensor_desc, coord, barrier, result, pred=True, multicast=False, _semantic=None)

Load data from global memory to shared memory using TMA.

Parameters:
  • tensor_desc – Tensor descriptor (tiled)

  • coord – Coordinates in the source tensor

  • barrier – Barrier for synchronization

  • result – Destination memory descriptor

  • pred – Predicate for conditional execution

  • multicast – Enable multicast