triton.experimental.gluon.language.nvidia.hopper.tma

Functions

async_atomic_add

Atomically add data from shared memory into global memory using TMA.

async_atomic_and

Atomically bitwise-and data from shared memory into global memory using TMA.

async_atomic_max

Atomically compute the maximum of shared memory data and global memory using TMA.

async_atomic_min

Atomically compute the minimum of shared memory data and global memory using TMA.

async_atomic_or

Atomically bitwise-or data from shared memory into global memory using TMA.

async_atomic_xor

Atomically bitwise-xor data from shared memory into global memory using TMA.

async_copy_global_to_shared

Load data from global memory to shared memory using TMA.

async_copy_global_to_shared_im2col

Load data from global memory to shared memory using TMA in im2col mode.

async_copy_shared_to_global

Store data from shared memory to global memory using TMA.

async_load

Load data from global memory to shared memory using TMA.

async_load_im2col

Load data from global memory to shared memory using TMA in im2col mode.

async_store

Store data from shared memory to global memory using TMA.

store_wait

make_tensor_descriptor

Classes

tensor_descriptor

tensor_descriptor_im2col

tensor_descriptor_type

Type for tiled tensor descriptors.

tensor_descriptor_im2col_type

Type for im2col tensor descriptors (convolution-friendly access patterns).