triton.experimental.gluon.language.nvidia.blackwell.tma
Functions
Asynchronously gather elements from global memory to shared memory using TMA. |
|
Asynchronously scatter elements from shared memory to global memory using TMA. |
|
Atomically add data from shared memory into global memory using TMA. |
|
Atomically bitwise-and data from shared memory into global memory using TMA. |
|
Atomically compute the maximum of shared memory data and global memory using TMA. |
|
Atomically compute the minimum of shared memory data and global memory using TMA. |
|
Atomically bitwise-or data from shared memory into global memory using TMA. |
|
Atomically bitwise-xor data from shared memory into global memory using TMA. |
|
Load data from global memory to shared memory using TMA. |
|
Store data from shared memory to global memory using TMA. |
|
Load data from global memory to shared memory using TMA. |
|
Load data from global memory to shared memory using TMA in im2col mode. |
|
Store data from shared memory to global memory using TMA. |
|
Classes
Type for tiled tensor descriptors. |