triton.experimental.gluon.language.nvidia.hopper.tma.async_load
- triton.experimental.gluon.language.nvidia.hopper.tma.async_load(tensor_desc, coord, barrier, result, pred=True, multicast=False, _semantic=None)
Load data from global memory to shared memory using TMA.
- Parameters:
tensor_desc – Tensor descriptor (tiled)
coord – Coordinates in the source tensor
barrier – Barrier for synchronization. In a two-CTA kernel, use a two-CTA barrier when this TMA load feeds a tcgen05 op; otherwise use a barrier allocated with
two_ctas=False.result – Destination memory descriptor
pred – Predicate for conditional execution
multicast – Enable multicast