triton.experimental.gluon.language.amd.cdna4.async_copy

Functions

global_load_to_shared(dest, ptr[, mask, ...])

AMD global load to shared operation.

buffer_load_to_shared(dest, ptr, offsets[, ...])

AMD buffer load to shared operation.

commit_group([_semantic])

Commit oustanding async operations.

wait_group([num_outstanding, _semantic])

Wait for outstanding commit groups.

load_shared_relaxed(smem, layout[, _semantic])

Load a tensor from shared memory with extra hints for the underlying compiler to avoid emitting unnecessary waits before loading from the target shared memory.