AMD CDNA 3
AMD buffer load from global memory via a scalar base pointer and a tensor of offsets instead of a tensor of pointers. |
|
AMD buffer store a tensor directly to global memory via a scalar base pointer and a tensor of offsets instead of a tensor of pointers. |
|
Computes matrix-multiplication of a * b + acc using AMD native matrix core units. |