triton.experimental.gluon.language.amd.cdna4.async_copy

Functions

`global_load_to_shared`(dest, ptr[, mask, ...])	AMD global load to shared operation.
`buffer_load_to_shared`(dest, ptr, offsets[, ...])	AMD buffer load to shared operation.
`commit_group`([_semantic])	Commit oustanding async operations.
`wait_group`([num_outstanding, _semantic])	Wait for outstanding commit groups.
`load_shared_relaxed`(smem, layout[, _semantic])	Load a tensor from shared memory with extra hints for the underlying compiler to avoid emitting unnecessary waits before loading from the target shared memory.