AMD CDNA 4

async_copy

buffer_load

AMD buffer load from global memory via a scalar base pointer and a tensor of offsets instead of a tensor of pointers.

buffer_store

AMD buffer store a tensor directly to global memory via a scalar base pointer and a tensor of offsets instead of a tensor of pointers.

buffer_atomic_add

buffer_atomic_and

buffer_atomic_max

buffer_atomic_min

buffer_atomic_or

buffer_atomic_xchg

buffer_atomic_xor

get_mfma_scale_layout

Get the scale layout for MFMA scaled operands.

mfma

Computes matrix-multiplication of a * b + acc using AMD native matrix core units.

mfma_scaled

AMD Scaled MFMA operation.