triton.experimental.gluon.language.nvidia.hopper.tma.async_load
- triton.experimental.gluon.language.nvidia.hopper.tma.async_load(tensor_desc, coord, barrier, result, pred=True, multicast=False, _semantic=None)
Load data from global memory to shared memory using TMA.
- Parameters:
tensor_desc – Tensor descriptor (tiled)
coord – Coordinates in the source tensor
barrier – Barrier for synchronization
result – Destination memory descriptor
pred – Predicate for conditional execution
multicast – Enable multicast