NVIDIA Hopper

async_copy

cluster

mbarrier

tma

fence_async_shared

Issue a fence to complete asynchronous shared memory operations.

mma_v2

warpgroup_mma

Perform warpgroup MMA (Tensor Core) operations.

warpgroup_mma_wait

Wait until num_outstanding or less warpgroup MMA operations are in-flight.