NVIDIA Blackwell

async_copy

clc

Cluster Launch Control (CLC) for Blackwell (SM100+) dynamic persistent kernels.

mbarrier

tma

allocate_tensor_memory

Allocate tensor memory.

fence_async_shared

Issue a fence to complete asynchronous shared memory operations.

mma_v2

tensor_memory_descriptor

Represents a tensor memory descriptor handle for Tensor Core Gen5 operations.

tensor_memory_descriptor_type

TensorMemoryLayout

Describes the layout for tensor memory in Blackwell architecture.

TensorMemoryScalesLayout

Describes the layout for tensor memory scales in Blackwell architecture.