NVGPUOps¶

`nvgpu.cluster_arrive` (triton::nvgpu::ClusterArriveOp)¶

Syntax:

operation ::= `nvgpu.cluster_arrive` attr-dict

Attributes:¶

Attribute	MLIR Type	Description
`relaxed`	::mlir::IntegerAttr	1-bit signless integer attribute

`nvgpu.cluster_id` (triton::nvgpu::ClusterCTAIdOp)¶

Syntax:

operation ::= `nvgpu.cluster_id` attr-dict

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results:¶

Result	Description
`result`	32-bit signless integer

`nvgpu.cluster_wait` (triton::nvgpu::ClusterWaitOp)¶

Syntax:

operation ::= `nvgpu.cluster_wait` attr-dict

`nvgpu.fence_async_shared` (triton::nvgpu::FenceAsyncSharedOp)¶

Syntax:

operation ::= `nvgpu.fence_async_shared` attr-dict

Attributes:¶

Attribute	MLIR Type	Description
`bCluster`	::mlir::BoolAttr	bool attribute

`nvgpu.ld_acquire` (triton::nvgpu::LoadAcquireOp)¶

Syntax:

operation ::= `nvgpu.ld_acquire` $sem `,` $scope `,` $addr (`,` $mask^)? attr-dict `:` functional-type($addr, $result)

Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{MemoryEffects::Read on ::mlir::SideEffects::DefaultResource}

Attributes:¶

Attribute	MLIR Type	Description
`sem`	::mlir::triton::nvgpu::MemSemanticAttr	allowed 32-bit signless integer cases: 1, 2, 3, 4
`scope`	::mlir::triton::nvgpu::MemSyncScopeAttr	allowed 32-bit signless integer cases: 1, 2, 3

Operands:¶

Operand	Description
`addr`	LLVM pointer in address space 1
`mask`	1-bit signless integer

Results:¶

Result	Description
`result`	floating-point or integer

`nvgpu.ldmatrix` (triton::nvgpu::LoadMatrixOp)¶

Syntax:

operation ::= `nvgpu.ldmatrix` $addr attr-dict `:` functional-type($addr, $result)

Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{MemoryEffects::Read on ::mlir::SideEffects::DefaultResource}

Attributes:¶

Attribute	MLIR Type	Description
`trans`	::mlir::UnitAttr	unit attribute

Operands:¶

Operand	Description
`addr`	LLVM pointer in address space 3

Results:¶

Result	Description
`result`	LLVM structure type or 32-bit signless integer

`nvgpu.stmatrix` (triton::nvgpu::StoreMatrixOp)¶

Syntax:

operation ::= `nvgpu.stmatrix` operands attr-dict `:` type(operands)

Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{MemoryEffects::Write on ::mlir::SideEffects::DefaultResource}

Attributes:¶

Attribute	MLIR Type	Description
`trans`	::mlir::UnitAttr	unit attribute

Operands:¶

Operand	Description
`addr`	LLVM pointer in address space 3
`vals`	variadic of 32-bit signless integer

`nvgpu.tensor_memory_base` (triton::nvgpu::TensorMemoryBaseAddress)¶

Syntax:

operation ::= `nvgpu.tensor_memory_base` attr-dict

Op to represent base address of tensor memory in a kernel. This is used to simplify lowering from TritonGPU to LLVM.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results:¶

Result	Description
`result`	LLVM pointer in address space 6

`nvgpu.wgmma_commit_group` (triton::nvgpu::WGMMACommitGroupOp)¶

Syntax:

operation ::= `nvgpu.wgmma_commit_group` attr-dict

`nvgpu.wgmma_fence` (triton::nvgpu::WGMMAFenceOp)¶

Syntax:

operation ::= `nvgpu.wgmma_fence` attr-dict

`nvgpu.wgmma` (triton::nvgpu::WGMMAOp)¶

Syntax:

operation ::= `nvgpu.wgmma` $opA `,` $opB `,` $useC (`,` $opC^)? attr-dict `:` functional-type(operands, $res)

Attributes:¶

Attribute	MLIR Type	Description
`m`	::mlir::IntegerAttr	32-bit signless integer attribute
`n`	::mlir::IntegerAttr	32-bit signless integer attribute
`k`	::mlir::IntegerAttr	32-bit signless integer attribute
`eltTypeC`	::mlir::triton::nvgpu::WGMMAEltTypeAttr	wgmma operand type, either 's8', 's32', 'e4m3', 'e5m2', 'f16', 'bf16', 'tf32', or 'f32'
`eltTypeA`	::mlir::triton::nvgpu::WGMMAEltTypeAttr	wgmma operand type, either 's8', 's32', 'e4m3', 'e5m2', 'f16', 'bf16', 'tf32', or 'f32'
`eltTypeB`	::mlir::triton::nvgpu::WGMMAEltTypeAttr	wgmma operand type, either 's8', 's32', 'e4m3', 'e5m2', 'f16', 'bf16', 'tf32', or 'f32'
`layoutA`	::mlir::triton::nvgpu::WGMMALayoutAttr	wgmma layout, either 'row' or 'col'
`layoutB`	::mlir::triton::nvgpu::WGMMALayoutAttr	wgmma layout, either 'row' or 'col'

Operands:¶

Operand	Description
`opA`	wgmma operand A/B type
`opB`	wgmma operand A/B type
`useC`	1-bit signless integer
`opC`	LLVM structure type

Results:¶

Result	Description
`res`	LLVM structure type

`nvgpu.wgmma_wait_group` (triton::nvgpu::WGMMAWaitGroupOp)¶

Syntax:

operation ::= `nvgpu.wgmma_wait_group` $input attr-dict `:` type($input)

Interfaces: InferTypeOpInterface

Attributes:¶

Attribute	MLIR Type	Description
`pendings`	::mlir::IntegerAttr	32-bit signless integer attribute

Operands:¶

Operand	Description
`input`	LLVM structure type

Results:¶

Result	Description
`output`	LLVM structure type

`nvgpu.warp_id` (triton::nvgpu::WarpIdOp)¶

Syntax:

operation ::= `nvgpu.warp_id` attr-dict

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results:¶

Result	Description
`result`	32-bit signless integer

NVGPUOps¶

nvgpu.cluster_arrive (triton::nvgpu::ClusterArriveOp)¶

Attributes:¶

nvgpu.cluster_id (triton::nvgpu::ClusterCTAIdOp)¶

Results:¶

nvgpu.cluster_wait (triton::nvgpu::ClusterWaitOp)¶

nvgpu.fence_async_shared (triton::nvgpu::FenceAsyncSharedOp)¶

Attributes:¶

nvgpu.ld_acquire (triton::nvgpu::LoadAcquireOp)¶

Attributes:¶

Operands:¶

Results:¶

nvgpu.ldmatrix (triton::nvgpu::LoadMatrixOp)¶

Attributes:¶

Operands:¶

Results:¶

nvgpu.stmatrix (triton::nvgpu::StoreMatrixOp)¶

Attributes:¶

Operands:¶

nvgpu.tensor_memory_base (triton::nvgpu::TensorMemoryBaseAddress)¶

Results:¶

nvgpu.wgmma_commit_group (triton::nvgpu::WGMMACommitGroupOp)¶

nvgpu.wgmma_fence (triton::nvgpu::WGMMAFenceOp)¶

nvgpu.wgmma (triton::nvgpu::WGMMAOp)¶

Attributes:¶

Operands:¶

Results:¶

nvgpu.wgmma_wait_group (triton::nvgpu::WGMMAWaitGroupOp)¶

Attributes:¶

Operands:¶

Results:¶

nvgpu.warp_id (triton::nvgpu::WarpIdOp)¶

Results:¶

`nvgpu.cluster_arrive` (triton::nvgpu::ClusterArriveOp)¶

`nvgpu.cluster_id` (triton::nvgpu::ClusterCTAIdOp)¶

`nvgpu.cluster_wait` (triton::nvgpu::ClusterWaitOp)¶

`nvgpu.fence_async_shared` (triton::nvgpu::FenceAsyncSharedOp)¶

`nvgpu.ld_acquire` (triton::nvgpu::LoadAcquireOp)¶

`nvgpu.ldmatrix` (triton::nvgpu::LoadMatrixOp)¶

`nvgpu.stmatrix` (triton::nvgpu::StoreMatrixOp)¶

`nvgpu.tensor_memory_base` (triton::nvgpu::TensorMemoryBaseAddress)¶

`nvgpu.wgmma_commit_group` (triton::nvgpu::WGMMACommitGroupOp)¶

`nvgpu.wgmma_fence` (triton::nvgpu::WGMMAFenceOp)¶

`nvgpu.wgmma` (triton::nvgpu::WGMMAOp)¶

`nvgpu.wgmma_wait_group` (triton::nvgpu::WGMMAWaitGroupOp)¶

`nvgpu.warp_id` (triton::nvgpu::WarpIdOp)¶