triton.experimental.gluon.language.nvidia.ampere.async_copy

Functions

async_copy_global_to_shared

Asynchronously load elements from global memory to shared memory.

async_load

Asynchronously load elements from global memory to shared memory.

mbarrier_arrive

Arrive on the mbarrier once all outstanding async copies are complete.

commit_group

Commit the current asynchronous copy group.

wait_group

Wait for outstanding asynchronous copy group operations.