AMD CDNA 3

buffer_load

AMD buffer load from global memory via a scalar base pointer and a tensor of offsets instead of a tensor of pointers.

buffer_store

AMD buffer store a tensor directly to global memory via a scalar base pointer and a tensor of offsets instead of a tensor of pointers.

buffer_atomic_add

buffer_atomic_and

buffer_atomic_max

buffer_atomic_min

buffer_atomic_or

buffer_atomic_xchg

buffer_atomic_xor

mfma

Computes matrix-multiplication of a * b + acc using AMD native matrix core units.