TritonNvidiaGPUOps

triton_nvidia_gpu.alloc_mbarrier (::mlir::triton::nvidia_gpu::AllocMBarrierOp)

allocate a vector of mbarriers

Syntax:

operation ::= `triton_nvidia_gpu.alloc_mbarrier` attr-dict `:` type($result)

Allocate and initialize a vector of mbarriers. The size of the vector is implied in the returned type. Each mbarrier is initialized as: 1, the current phase initialized to 0. 2, the expected arrival count initialized to ‘count’. 3, the pending arrival count initialized to ‘count’. 4, the tx-count initialized to 0.

Example:

case a. when created in vector:

case b. when created in scalar:

Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{MemoryEffects::Allocate on ::mlir::SideEffects::DefaultResource}

Attributes:

Attribute

MLIR Type

Description

count

::mlir::IntegerAttr

32-bit signless integer attribute

Results:

Result

Description

result

ptr or tensor of 64-bit signless integer values

triton_nvidia_gpu.cluster_arrive (::mlir::triton::nvidia_gpu::ClusterArriveOp)

Syntax:

operation ::= `triton_nvidia_gpu.cluster_arrive` attr-dict

Attributes:

Attribute

MLIR Type

Description

relaxed

::mlir::IntegerAttr

1-bit signless integer attribute

triton_nvidia_gpu.cluster_wait (::mlir::triton::nvidia_gpu::ClusterWaitOp)

Syntax:

operation ::= `triton_nvidia_gpu.cluster_wait` attr-dict

triton_nvidia_gpu.consumer_release (::mlir::triton::nvidia_gpu::ConsumerReleaseOp)

Syntax:

operation ::= `triton_nvidia_gpu.consumer_release` $token `,` $idx attr-dict `:` type(operands)

Operands:

Operand

Description

token

tensor of values

idx

32-bit signless integer

triton_nvidia_gpu.consumer_wait (::mlir::triton::nvidia_gpu::ConsumerWaitOp)

Syntax:

operation ::= `triton_nvidia_gpu.consumer_wait` $token `,` $idx attr-dict `:` type(operands)

Operands:

Operand

Description

token

tensor of values

idx

32-bit signless integer

triton_nvidia_gpu.create_mutex (::mlir::triton::nvidia_gpu::CreateMutexOp)

Syntax:

operation ::= `triton_nvidia_gpu.create_mutex` attr-dict `:` type($result)

Interfaces: InferTypeOpInterface

Results:

Result

Description

result

triton_nvidia_gpu.create_token (::mlir::triton::nvidia_gpu::CreateTokenOp)

Syntax:

operation ::= `triton_nvidia_gpu.create_token` attr-dict `:` type($result)

Attributes:

Attribute

MLIR Type

Description

num

::mlir::IntegerAttr

32-bit signless integer attribute

Results:

Result

Description

result

tensor of values

triton_nvidia_gpu.dot_async (::mlir::triton::nvidia_gpu::DotAsyncOp)

dot async

Syntax:

operation ::= `triton_nvidia_gpu.dot_async` $a`,` $b`,` $c attr-dict `:` type($a) `*` type($b) `->` type($d)

$d = matrix_multiply($a, $b) + $c

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes:

Attribute

MLIR Type

Description

allowTF32

::mlir::BoolAttr

bool attribute

maxNumImpreciseAcc

::mlir::IntegerAttr

32-bit signless integer attribute

Operands:

Operand

Description

a

tensor of floating-point values or tensor of integer values

b

tensor of floating-point values or tensor of integer values

c

tensor of floating-point values or tensor of integer values

Results:

Result

Description

d

tensor of floating-point values or tensor of integer values

triton_nvidia_gpu.dot_wait (::mlir::triton::nvidia_gpu::DotWaitOp)

dot wait

Syntax:

operation ::= `triton_nvidia_gpu.dot_wait` attr-dict

This operation defining the waiting action for a async dot, MMAv3 .e.g. The subsequent operations should not execute until this operation completes waiting.

Attributes:

Attribute

MLIR Type

Description

pendings

::mlir::IntegerAttr

32-bit signless integer attribute

triton_nvidia_gpu.extract_mbarrier (::mlir::triton::nvidia_gpu::ExtractMBarrierOp)

extract a mbarrier from a vector of mbarriers

Syntax:

operation ::= `triton_nvidia_gpu.extract_mbarrier` $tensor `[` $index `]` attr-dict `:` type($tensor) `,` type($index) `->` type($result)

Extract a mbarrier from a vector of mbarriers

Example:

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands:

Operand

Description

tensor

tensor of 64-bit signless integer values

index

32-bit signless integer

Results:

Result

Description

result

ptr

triton_nvidia_gpu.fence_async_shared (::mlir::triton::nvidia_gpu::FenceAsyncSharedOp)

fence proxy async

Syntax:

operation ::= `triton_nvidia_gpu.fence_async_shared` attr-dict

Attributes:

Attribute

MLIR Type

Description

bCluster

::mlir::BoolAttr

bool attribute

triton_nvidia_gpu.get_agent_id (::mlir::triton::nvidia_gpu::GetAgentIdOp)

Syntax:

operation ::= `triton_nvidia_gpu.get_agent_id` attr-dict `:` type($result)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results:

Result

Description

result

32-bit signless integer

triton_nvidia_gpu.get_cluster_cta_id (::mlir::triton::nvidia_gpu::GetClusterCTAIdOp)

Syntax:

operation ::= `triton_nvidia_gpu.get_cluster_cta_id` attr-dict `:` type($result)

Returns the one dimensional cluster_cta_id.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results:

Result

Description

result

32-bit signless integer

triton_nvidia_gpu.get_mutex_role_id (::mlir::triton::nvidia_gpu::GetMutexRoleIdOp)

Syntax:

operation ::= `triton_nvidia_gpu.get_mutex_role_id` attr-dict `:` type($result)

Interfaces: InferTypeOpInterface

Attributes:

Attribute

MLIR Type

Description

num

::mlir::IntegerAttr

32-bit signless integer attribute

Results:

Result

Description

result

32-bit signless integer

triton_nvidia_gpu.get_thread_id (::mlir::triton::nvidia_gpu::GetThreadIdOp)

Syntax:

operation ::= `triton_nvidia_gpu.get_thread_id` attr-dict `:` type($result)

Returns the one dimensional threadId.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results:

Result

Description

result

32-bit signless integer

triton_nvidia_gpu.insert_slice_async_v2 (::mlir::triton::nvidia_gpu::InsertSliceAsyncV2Op)

Syntax:

operation ::= `triton_nvidia_gpu.insert_slice_async_v2` operands attr-dict `:` type(operands) `->` type($result)

Traits: AttrSizedOperandSegments, ResultsAreSharedEncoding

Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{MemoryEffects::Read on ::mlir::SideEffects::DefaultResource, MemoryEffects::Write on ::mlir::SideEffects::DefaultResource}

Attributes:

Attribute

MLIR Type

Description

cache

::mlir::triton::CacheModifierAttr

allowed 32-bit signless integer cases: 1, 2, 3, 4, 5, 6

evict

::mlir::triton::EvictionPolicyAttr

allowed 32-bit signless integer cases: 1, 2, 3

isVolatile

::mlir::BoolAttr

bool attribute

axis

::mlir::IntegerAttr

32-bit signless integer attribute

Operands:

Operand

Description

src

ptr or tensor of ptr values

dst

tensor of floating-point values or tensor of integer values or tensor of ptr values

index

32-bit signless integer

mbar

ptr

mask

tensor of 1-bit signless integer values or 1-bit signless integer

other

floating-point or tensor of floating-point values or integer or tensor of integer values or ptr or tensor of ptr values or ptr

Results:

Result

Description

result

tensor of floating-point values or tensor of integer values or tensor of ptr values

triton_nvidia_gpu.lock (::mlir::triton::nvidia_gpu::LockOp)

Syntax:

operation ::= `triton_nvidia_gpu.lock` $mutex attr-dict `:` type(operands)

Operands:

Operand

Description

mutex

triton_nvidia_gpu.mbarrier_arrive (::mlir::triton::nvidia_gpu::MBarrierArriveOp)

mbarrier arrive

Syntax:

operation ::= `triton_nvidia_gpu.mbarrier_arrive` operands attr-dict `:` type(operands)

This operation defining the arriving action for a mbarrier. txCount: An optional attribute that set tx-count. This Op will be lowered into mbarrier.arrive.expect_tx if the optional attribute exist. trackAsyncOp: If true, this op will be lowered into cp.async.mbarrier.arrive.noinc. pred: Only perform arrive action when pred is true. remoteCtaId: if set, perform an remote arrive action.

Example:

triton_nvidia_gpu.mbarrier_arrive %0 {trackAsyncOp = false} : !tt.ptr

Traits: AttrSizedOperandSegments

Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{MemoryEffects::Write on ::mlir::SideEffects::DefaultResource}

Attributes:

Attribute

MLIR Type

Description

trackAsyncOp

::mlir::IntegerAttr

1-bit signless integer attribute

txCount

::mlir::IntegerAttr

32-bit signless integer attribute

Operands:

Operand

Description

mbarrier

ptr

pred

1-bit signless integer

remoteCtaId

32-bit signless integer

triton_nvidia_gpu.mbarrier_wait (::mlir::triton::nvidia_gpu::MBarrierWaitOp)

mbarrier wait

Syntax:

operation ::= `triton_nvidia_gpu.mbarrier_wait` $mbarrier `,` $phase attr-dict `:` type($mbarrier)

This operation defining the waiting action for a mbarrier. The subsequent operations should not execute until this operation completes waiting.

Example:

triton_nvidia_gpu.mbarrier_wait %0, %1 : !tt.ptr

Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{MemoryEffects::Read on ::mlir::SideEffects::DefaultResource, MemoryEffects::Write on ::mlir::SideEffects::DefaultResource}

Operands:

Operand

Description

mbarrier

ptr

phase

1-bit signless integer

triton_nvidia_gpu.bar_arrive (::mlir::triton::nvidia_gpu::NamedBarrierArriveOp)

named barrier arrive

Syntax:

operation ::= `triton_nvidia_gpu.bar_arrive` $bar `,` $numThreads attr-dict `:` type(operands)

Operands:

Operand

Description

bar

32-bit signless integer

numThreads

32-bit signless integer

triton_nvidia_gpu.bar_wait (::mlir::triton::nvidia_gpu::NamedBarrierWaitOp)

named barrier wait

Syntax:

operation ::= `triton_nvidia_gpu.bar_wait` $bar `,` $numThreads attr-dict `:` type(operands)

Operands:

Operand

Description

bar

32-bit signless integer

numThreads

32-bit signless integer

triton_nvidia_gpu.producer_acquire (::mlir::triton::nvidia_gpu::ProducerAcquireOp)

Syntax:

operation ::= `triton_nvidia_gpu.producer_acquire` $token `,` $idx attr-dict `:` type(operands)

Operands:

Operand

Description

token

tensor of values

idx

32-bit signless integer

triton_nvidia_gpu.producer_commit (::mlir::triton::nvidia_gpu::ProducerCommitOp)

Syntax:

operation ::= `triton_nvidia_gpu.producer_commit` $token `,` $idx attr-dict `:` type(operands)

Operands:

Operand

Description

token

tensor of values

idx

32-bit signless integer

triton_nvidia_gpu.reg_alloc (::mlir::triton::nvidia_gpu::RegAllocOp)

register allocation

Syntax:

operation ::= `triton_nvidia_gpu.reg_alloc` $regCount attr-dict

Attributes:

Attribute

MLIR Type

Description

regCount

::mlir::IntegerAttr

32-bit signless integer attribute

triton_nvidia_gpu.reg_dealloc (::mlir::triton::nvidia_gpu::RegDeallocOp)

register deallocation

Syntax:

operation ::= `triton_nvidia_gpu.reg_dealloc` $regCount attr-dict

Attributes:

Attribute

MLIR Type

Description

regCount

::mlir::IntegerAttr

32-bit signless integer attribute

triton_nvidia_gpu.store_async (::mlir::triton::nvidia_gpu::StoreAsyncOp)

store asynchronous by a tensor pointer

Syntax:

operation ::= `triton_nvidia_gpu.store_async` operands attr-dict `:` type(operands)

Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{MemoryEffects::Write on ::mlir::SideEffects::DefaultResource}

Attributes:

Attribute

MLIR Type

Description

cache

::mlir::triton::CacheModifierAttr

allowed 32-bit signless integer cases: 1, 2, 3, 4, 5, 6

Operands:

Operand

Description

dst

ptr

src

tensor of floating-point values or tensor of integer values or tensor of ptr values

triton_nvidia_gpu.unlock (::mlir::triton::nvidia_gpu::UnlockOp)

Syntax:

operation ::= `triton_nvidia_gpu.unlock` $mutex attr-dict `:` type(operands)

Operands:

Operand

Description

mutex