TritonNvidiaGPUOps

triton_nvidia_gpu.alloc_mbarrier (triton::nvidia_gpu::AllocMBarrierOp)

Allocate a vector of mbarriers

Syntax:

operation ::= `triton_nvidia_gpu.alloc_mbarrier` attr-dict `:` type($result)

Allocate and initialize a vector of mbarriers. The size of the vector is implied in the returned type. Each mbarrier is initialized as: 1, the current phase initialized to 0. 2, the expected arrival count initialized to ‘count’. 3, the pending arrival count initialized to ‘count’. 4, the tx-count initialized to 0.

Example:

case a. when created in vector:

case b. when created in scalar:

Traits: VerifyTensorLayoutsTrait

Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{MemoryEffects::Allocate on ::mlir::triton::gpu::SharedMemory}

Attributes:

AttributeMLIR TypeDescription
count::mlir::IntegerAttr32-bit signless integer attribute

Results:

Result

Description

result

ptr or tensor of 64-bit signless integer values

triton_nvidia_gpu.cluster_arrive (triton::nvidia_gpu::ClusterArriveOp)

Syntax:

operation ::= `triton_nvidia_gpu.cluster_arrive` attr-dict

Traits: VerifyTensorLayoutsTrait

Attributes:

AttributeMLIR TypeDescription
relaxed::mlir::IntegerAttr1-bit signless integer attribute

triton_nvidia_gpu.cluster_wait (triton::nvidia_gpu::ClusterWaitOp)

Syntax:

operation ::= `triton_nvidia_gpu.cluster_wait` attr-dict

Traits: VerifyTensorLayoutsTrait

triton_nvidia_gpu.consumer_release (triton::nvidia_gpu::ConsumerReleaseOp)

Syntax:

operation ::= `triton_nvidia_gpu.consumer_release` $token `,` $idx attr-dict `:` type(operands)

Traits: VerifyTensorLayoutsTrait

Operands:

Operand

Description

token

tensor of values

idx

32-bit signless integer

triton_nvidia_gpu.consumer_wait (triton::nvidia_gpu::ConsumerWaitOp)

Syntax:

operation ::= `triton_nvidia_gpu.consumer_wait` $token `,` $idx attr-dict `:` type(operands)

Traits: VerifyTensorLayoutsTrait

Operands:

Operand

Description

token

tensor of values

idx

32-bit signless integer

triton_nvidia_gpu.create_mutex (triton::nvidia_gpu::CreateMutexOp)

Syntax:

operation ::= `triton_nvidia_gpu.create_mutex` attr-dict `:` type($result)

Traits: VerifyTensorLayoutsTrait

Interfaces: InferTypeOpInterface

Results:

Result

Description

result

triton_nvidia_gpu.create_token (triton::nvidia_gpu::CreateTokenOp)

Syntax:

operation ::= `triton_nvidia_gpu.create_token` attr-dict `:` type($result)

Traits: VerifyTensorLayoutsTrait

Attributes:

AttributeMLIR TypeDescription
num::mlir::IntegerAttr32-bit signless integer attribute

Results:

Result

Description

result

tensor of values

triton_nvidia_gpu.dot_async (triton::nvidia_gpu::DotAsyncOp)

Dot async

Syntax:

operation ::= `triton_nvidia_gpu.dot_async` $a`,` $b`,` $c attr-dict `:` type($a) `*` type($b) `->` type($d)

$d = matrix_multiply($a, $b) + $c

Traits: AlwaysSpeculatableImplTrait, VerifyTensorLayoutsTrait

Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes:

AttributeMLIR TypeDescription
allowTF32::mlir::BoolAttrbool attribute
maxNumImpreciseAcc::mlir::IntegerAttr32-bit signless integer attribute

Operands:

Operand

Description

a

tensor of floating-point values or tensor of integer values

b

tensor of floating-point values or tensor of integer values

c

tensor of floating-point values or tensor of integer values

Results:

Result

Description

d

tensor of floating-point values or tensor of integer values

triton_nvidia_gpu.dot_wait (triton::nvidia_gpu::DotWaitOp)

Dot wait

Syntax:

operation ::= `triton_nvidia_gpu.dot_wait` $inputs attr-dict `:` type($inputs)

This operation defining the waiting action for a async dot, MMAv3 .e.g. The subsequent operations should not execute until this operation completes waiting.

Traits: VerifyTensorLayoutsTrait

Interfaces: InferTypeOpInterface

Attributes:

AttributeMLIR TypeDescription
pendings::mlir::IntegerAttr32-bit signless integer attribute

Operands:

Operand

Description

inputs

variadic of tensor of floating-point values or tensor of integer values

Results:

Result

Description

outputs

variadic of tensor of floating-point values or tensor of integer values

triton_nvidia_gpu.extract_mbarrier (triton::nvidia_gpu::ExtractMBarrierOp)

Extract a mbarrier from a vector of mbarriers

Syntax:

operation ::= `triton_nvidia_gpu.extract_mbarrier` $tensor `[` $index `]` attr-dict `:` type($tensor) `,` type($index) `->` type($result)

Extract a mbarrier from a vector of mbarriers

Example:

Traits: AlwaysSpeculatableImplTrait, VerifyTensorLayoutsTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands:

Operand

Description

tensor

tensor of 64-bit signless integer values

index

32-bit signless integer

Results:

Result

Description

result

ptr

triton_nvidia_gpu.fence_async_shared (triton::nvidia_gpu::FenceAsyncSharedOp)

Fence proxy async

Syntax:

operation ::= `triton_nvidia_gpu.fence_async_shared` attr-dict

Traits: VerifyTensorLayoutsTrait

Attributes:

AttributeMLIR TypeDescription
bCluster::mlir::BoolAttrbool attribute

triton_nvidia_gpu.get_agent_id (triton::nvidia_gpu::GetAgentIdOp)

Syntax:

operation ::= `triton_nvidia_gpu.get_agent_id` attr-dict `:` type($result)

Traits: AlwaysSpeculatableImplTrait, VerifyTensorLayoutsTrait

Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results:

Result

Description

result

32-bit signless integer

triton_nvidia_gpu.get_canonical_warp_id (triton::nvidia_gpu::GetCanonicalWarpId)

Syntax:

operation ::= `triton_nvidia_gpu.get_canonical_warp_id` attr-dict `:` type($result)

Returns the one dimensional warpId when it’s used for producing warp uniform values.

Traits: AlwaysSpeculatableImplTrait, VerifyTensorLayoutsTrait

Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results:

Result

Description

result

32-bit signless integer

triton_nvidia_gpu.get_cluster_cta_id (triton::nvidia_gpu::GetClusterCTAIdOp)

Syntax:

operation ::= `triton_nvidia_gpu.get_cluster_cta_id` attr-dict `:` type($result)

Returns the one dimensional cluster_cta_id.

Traits: AlwaysSpeculatableImplTrait, VerifyTensorLayoutsTrait

Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results:

Result

Description

result

32-bit signless integer

triton_nvidia_gpu.get_mutex_role_id (triton::nvidia_gpu::GetMutexRoleIdOp)

Syntax:

operation ::= `triton_nvidia_gpu.get_mutex_role_id` attr-dict `:` type($result)

Traits: VerifyTensorLayoutsTrait

Interfaces: InferTypeOpInterface

Attributes:

AttributeMLIR TypeDescription
num::mlir::IntegerAttr32-bit signless integer attribute

Results:

Result

Description

result

32-bit signless integer

triton_nvidia_gpu.get_thread_id (triton::nvidia_gpu::GetThreadIdOp)

Syntax:

operation ::= `triton_nvidia_gpu.get_thread_id` attr-dict `:` type($result)

Returns the one dimensional threadId.

Traits: AlwaysSpeculatableImplTrait, VerifyTensorLayoutsTrait

Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results:

Result

Description

result

32-bit signless integer

triton_nvidia_gpu.insert_slice_tma (triton::nvidia_gpu::InsertSliceTMAOp)

Syntax:

operation ::= `triton_nvidia_gpu.insert_slice_tma` operands attr-dict `:` type(operands) `->` type($result)

Traits: AttrSizedOperandSegments, ResultsAreSharedEncoding, VerifyTensorLayoutsTrait

Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{MemoryEffects::Read on ::mlir::triton::GlobalMemory, MemoryEffects::Write on ::mlir::triton::gpu::SharedMemory}

Attributes:

AttributeMLIR TypeDescription
cache::mlir::triton::CacheModifierAttr
allowed 32-bit signless integer cases: 1, 2, 3, 4, 5, 6{{% markdown %}}Enum cases: * none (`NONE`) * ca (`CA`) * cg (`CG`) * wb (`WB`) * cs (`CS`) * wt (`WT`){{% /markdown %}}
evict::mlir::triton::EvictionPolicyAttr
allowed 32-bit signless integer cases: 1, 2, 3{{% markdown %}}Enum cases: * evict_normal (`NORMAL`) * evict_first (`EVICT_FIRST`) * evict_last (`EVICT_LAST`){{% /markdown %}}
isVolatile::mlir::BoolAttrbool attribute
axis::mlir::IntegerAttr32-bit signless integer attribute

Operands:

Operand

Description

src

ptr or tensor of ptr values

dst

tensor of floating-point values or tensor of integer values or tensor of ptr values

index

32-bit signless integer

mbar

ptr

mask

tensor of 1-bit signless integer values or 1-bit signless integer

other

floating-point or tensor of floating-point values or integer or tensor of integer values or ptr or tensor of ptr values or ptr

Results:

Result

Description

result

tensor of floating-point values or tensor of integer values or tensor of ptr values

triton_nvidia_gpu.lock (triton::nvidia_gpu::LockOp)

Syntax:

operation ::= `triton_nvidia_gpu.lock` $mutex attr-dict `:` type(operands)

Traits: VerifyTensorLayoutsTrait

Operands:

Operand

Description

mutex

triton_nvidia_gpu.mbarrier_arrive (triton::nvidia_gpu::MBarrierArriveOp)

Mbarrier arrive

Syntax:

operation ::= `triton_nvidia_gpu.mbarrier_arrive` operands attr-dict `:` type(operands)

This operation defining the arriving action for a mbarrier. txCount: An optional attribute that set tx-count. This Op will be lowered into mbarrier.arrive.expect_tx if the optional attribute exist. trackAsyncOp: If true, this op will be lowered into cp.async.mbarrier.arrive.noinc. pred: Only perform arrive action when pred is true. remoteCtaId: if set, perform an remote arrive action.

Example:

triton_nvidia_gpu.mbarrier_arrive %0 {trackAsyncOp = false} : !tt.ptr

Traits: AttrSizedOperandSegments, VerifyTensorLayoutsTrait

Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{MemoryEffects::Write on ::mlir::triton::gpu::SharedMemory}

Attributes:

AttributeMLIR TypeDescription
trackAsyncOp::mlir::IntegerAttr1-bit signless integer attribute
txCount::mlir::IntegerAttr32-bit signless integer attribute

Operands:

Operand

Description

mbarrier

ptr

pred

1-bit signless integer

remoteCtaId

32-bit signless integer

triton_nvidia_gpu.mbarrier_wait (triton::nvidia_gpu::MBarrierWaitOp)

Mbarrier wait

Syntax:

operation ::= `triton_nvidia_gpu.mbarrier_wait` $mbarrier `,` $phase attr-dict `:` type($mbarrier)

This operation defining the waiting action for a mbarrier. The subsequent operations should not execute until this operation completes waiting.

Example:

triton_nvidia_gpu.mbarrier_wait %0, %1 : !tt.ptr

Traits: VerifyTensorLayoutsTrait

Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{MemoryEffects::Read on ::mlir::triton::gpu::SharedMemory, MemoryEffects::Write on ::mlir::triton::gpu::SharedMemory}

Operands:

Operand

Description

mbarrier

ptr

phase

1-bit signless integer

triton_nvidia_gpu.bar_arrive (triton::nvidia_gpu::NamedBarrierArriveOp)

Named barrier arrive

Syntax:

operation ::= `triton_nvidia_gpu.bar_arrive` $bar `,` $numThreads attr-dict `:` type(operands)

Traits: VerifyTensorLayoutsTrait

Operands:

Operand

Description

bar

32-bit signless integer

numThreads

32-bit signless integer

triton_nvidia_gpu.bar_wait (triton::nvidia_gpu::NamedBarrierWaitOp)

Named barrier wait

Syntax:

operation ::= `triton_nvidia_gpu.bar_wait` $bar `,` $numThreads attr-dict `:` type(operands)

Traits: VerifyTensorLayoutsTrait

Operands:

Operand

Description

bar

32-bit signless integer

numThreads

32-bit signless integer

triton_nvidia_gpu.producer_acquire (triton::nvidia_gpu::ProducerAcquireOp)

Syntax:

operation ::= `triton_nvidia_gpu.producer_acquire` $token `,` $idx attr-dict `:` type(operands)

Traits: VerifyTensorLayoutsTrait

Operands:

Operand

Description

token

tensor of values

idx

32-bit signless integer

triton_nvidia_gpu.producer_commit (triton::nvidia_gpu::ProducerCommitOp)

Syntax:

operation ::= `triton_nvidia_gpu.producer_commit` $token `,` $idx attr-dict `:` type(operands)

Traits: VerifyTensorLayoutsTrait

Operands:

Operand

Description

token

tensor of values

idx

32-bit signless integer

triton_nvidia_gpu.reg_alloc (triton::nvidia_gpu::RegAllocOp)

Register allocation

Syntax:

operation ::= `triton_nvidia_gpu.reg_alloc` $regCount attr-dict

Traits: VerifyTensorLayoutsTrait

Attributes:

AttributeMLIR TypeDescription
regCount::mlir::IntegerAttr32-bit signless integer attribute

triton_nvidia_gpu.reg_dealloc (triton::nvidia_gpu::RegDeallocOp)

Register deallocation

Syntax:

operation ::= `triton_nvidia_gpu.reg_dealloc` $regCount attr-dict

Traits: VerifyTensorLayoutsTrait

Attributes:

AttributeMLIR TypeDescription
regCount::mlir::IntegerAttr32-bit signless integer attribute

triton_nvidia_gpu.store_async_tma (triton::nvidia_gpu::StoreAsyncTMAOp)

Store asynchronous by a tensor pointer

Syntax:

operation ::= `triton_nvidia_gpu.store_async_tma` operands attr-dict `:` type(operands)

Traits: VerifyTensorLayoutsTrait

Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{MemoryEffects::Read on ::mlir::triton::gpu::SharedMemory, MemoryEffects::Write on ::mlir::triton::GlobalMemory}

Attributes:

AttributeMLIR TypeDescription
cache::mlir::triton::CacheModifierAttr
allowed 32-bit signless integer cases: 1, 2, 3, 4, 5, 6{{% markdown %}}Enum cases: * none (`NONE`) * ca (`CA`) * cg (`CG`) * wb (`WB`) * cs (`CS`) * wt (`WT`){{% /markdown %}}

Operands:

Operand

Description

dst

ptr

src

tensor of floating-point values or tensor of integer values or tensor of ptr values

triton_nvidia_gpu.unlock (triton::nvidia_gpu::UnlockOp)

Syntax:

operation ::= `triton_nvidia_gpu.unlock` $mutex attr-dict `:` type(operands)

Traits: VerifyTensorLayoutsTrait

Operands:

Operand

Description

mutex