triton.experimental.gluon.language.nvidia.hopper.tma.async_copy_global_to_shared_im2col
- triton.experimental.gluon.language.nvidia.hopper.tma.async_copy_global_to_shared_im2col(tensor_desc, coord, offsets, barrier, result, pred=True, multicast=False, _semantic=None)
Load data from global memory to shared memory using TMA in im2col mode.
- Parameters:
tensor_desc – Tensor descriptor (im2col)
coord – Coordinates in the source tensor
offsets – Im2col offsets (must be i16 values) - For 3D tensors: 1 offset - For 4D tensors: 2 offsets - For 5D tensors: 3 offsets
barrier – Barrier for synchronization
result – Destination memory descriptor
pred – Predicate for conditional execution
multicast – Enable multicast