triton.experimental.gluon.language.nvidia.blackwell.tma.async_scatter
- triton.experimental.gluon.language.nvidia.blackwell.tma.async_scatter(tensor_desc, x_offsets, y_offset, src, _semantic=None)
Asynchronously scatter elements from shared memory to global memory using TMA.
- Parameters:
tensor_desc (tensor_descriptor) – The tensor descriptor.
x_offsets (tensor) – 1D tensor of X offsets.
y_offset (int) – Scalar Y offset.
src (tensor_memory_descriptor) – The source data, must be in NVMMASharedLayout.