triton.experimental.gluon.language.amd.cdna4.async_copy
Functions
|
AMD global load to shared operation. |
|
AMD buffer load to shared operation. |
|
Commit oustanding async operations. |
|
Wait for outstanding commit groups. |
|
Load a tensor from shared memory with extra hints for the underlying compiler to avoid emitting unnecessary waits before loading from the target shared memory. |