triton.experimental.gluon.language.nvidia.hopper.tma
Functions
Atomically add data from shared memory into global memory using TMA. |
|
Atomically bitwise-and data from shared memory into global memory using TMA. |
|
Atomically compute the maximum of shared memory data and global memory using TMA. |
|
Atomically compute the minimum of shared memory data and global memory using TMA. |
|
Atomically bitwise-or data from shared memory into global memory using TMA. |
|
Atomically bitwise-xor data from shared memory into global memory using TMA. |
|
Load data from global memory to shared memory using TMA. |
|
Load data from global memory to shared memory using TMA in im2col mode. |
|
Store data from shared memory to global memory using TMA. |
|
Load data from global memory to shared memory using TMA. |
|
Load data from global memory to shared memory using TMA in im2col mode. |
|
Store data from shared memory to global memory using TMA. |
|
Classes
Type for tiled tensor descriptors. |
|
Type for im2col tensor descriptors (convolution-friendly access patterns). |