TritonNvidiaGPUOps¶
triton_nvidia_gpu.alloc_mbarrier
(::mlir::triton::nvidia_gpu::AllocMBarrierOp)¶
allocate a vector of mbarriers
Syntax:
operation ::= `triton_nvidia_gpu.alloc_mbarrier` attr-dict `:` type($result)
Allocate and initialize a vector of mbarriers. The size of the vector is implied in the returned type. Each mbarrier is initialized as: 1, the current phase initialized to 0. 2, the expected arrival count initialized to ‘count’. 3, the pending arrival count initialized to ‘count’. 4, the tx-count initialized to 0.
Example:
case a. when created in vector:
case b. when created in scalar:
Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{MemoryEffects::Allocate on ::mlir::SideEffects::DefaultResource}
Attributes:¶
Attribute |
MLIR Type |
Description |
---|---|---|
|
::mlir::IntegerAttr |
32-bit signless integer attribute |
Results:¶
Result |
Description |
---|---|
|
ptr or tensor of 64-bit signless integer values |
triton_nvidia_gpu.cluster_arrive
(::mlir::triton::nvidia_gpu::ClusterArriveOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.cluster_arrive` attr-dict
Attributes:¶
Attribute |
MLIR Type |
Description |
---|---|---|
|
::mlir::IntegerAttr |
1-bit signless integer attribute |
triton_nvidia_gpu.cluster_wait
(::mlir::triton::nvidia_gpu::ClusterWaitOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.cluster_wait` attr-dict
triton_nvidia_gpu.consumer_release
(::mlir::triton::nvidia_gpu::ConsumerReleaseOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.consumer_release` $token `,` $idx attr-dict `:` type(operands)
Operands:¶
Operand |
Description |
---|---|
|
tensor of values |
|
32-bit signless integer |
triton_nvidia_gpu.consumer_wait
(::mlir::triton::nvidia_gpu::ConsumerWaitOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.consumer_wait` $token `,` $idx attr-dict `:` type(operands)
Operands:¶
Operand |
Description |
---|---|
|
tensor of values |
|
32-bit signless integer |
triton_nvidia_gpu.create_mutex
(::mlir::triton::nvidia_gpu::CreateMutexOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.create_mutex` attr-dict `:` type($result)
Interfaces: InferTypeOpInterface
Results:¶
Result |
Description |
---|---|
|
triton_nvidia_gpu.create_token
(::mlir::triton::nvidia_gpu::CreateTokenOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.create_token` attr-dict `:` type($result)
Attributes:¶
Attribute |
MLIR Type |
Description |
---|---|---|
|
::mlir::IntegerAttr |
32-bit signless integer attribute |
Results:¶
Result |
Description |
---|---|
|
tensor of values |
triton_nvidia_gpu.dot_async
(::mlir::triton::nvidia_gpu::DotAsyncOp)¶
dot async
Syntax:
operation ::= `triton_nvidia_gpu.dot_async` $a`,` $b`,` $c attr-dict `:` type($a) `*` type($b) `->` type($d)
$d = matrix_multiply($a, $b) + $c
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes:¶
Attribute |
MLIR Type |
Description |
---|---|---|
|
::mlir::BoolAttr |
bool attribute |
|
::mlir::IntegerAttr |
32-bit signless integer attribute |
Operands:¶
Operand |
Description |
---|---|
|
tensor of floating-point values or tensor of integer values |
|
tensor of floating-point values or tensor of integer values |
|
tensor of floating-point values or tensor of integer values |
Results:¶
Result |
Description |
---|---|
|
tensor of floating-point values or tensor of integer values |
triton_nvidia_gpu.dot_wait
(::mlir::triton::nvidia_gpu::DotWaitOp)¶
dot wait
Syntax:
operation ::= `triton_nvidia_gpu.dot_wait` attr-dict
This operation defining the waiting action for a async dot, MMAv3 .e.g. The subsequent operations should not execute until this operation completes waiting.
Attributes:¶
Attribute |
MLIR Type |
Description |
---|---|---|
|
::mlir::IntegerAttr |
32-bit signless integer attribute |
triton_nvidia_gpu.extract_mbarrier
(::mlir::triton::nvidia_gpu::ExtractMBarrierOp)¶
extract a mbarrier from a vector of mbarriers
Syntax:
operation ::= `triton_nvidia_gpu.extract_mbarrier` $tensor `[` $index `]` attr-dict `:` type($tensor) `,` type($index) `->` type($result)
Extract a mbarrier from a vector of mbarriers
Example:
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands:¶
Operand |
Description |
---|---|
|
tensor of 64-bit signless integer values |
|
32-bit signless integer |
Results:¶
Result |
Description |
---|---|
|
ptr |
triton_nvidia_gpu.get_agent_id
(::mlir::triton::nvidia_gpu::GetAgentIdOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.get_agent_id` attr-dict `:` type($result)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results:¶
Result |
Description |
---|---|
|
32-bit signless integer |
triton_nvidia_gpu.get_cluster_cta_id
(::mlir::triton::nvidia_gpu::GetClusterCTAIdOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.get_cluster_cta_id` attr-dict `:` type($result)
Returns the one dimensional cluster_cta_id.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results:¶
Result |
Description |
---|---|
|
32-bit signless integer |
triton_nvidia_gpu.get_mutex_role_id
(::mlir::triton::nvidia_gpu::GetMutexRoleIdOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.get_mutex_role_id` attr-dict `:` type($result)
Interfaces: InferTypeOpInterface
Attributes:¶
Attribute |
MLIR Type |
Description |
---|---|---|
|
::mlir::IntegerAttr |
32-bit signless integer attribute |
Results:¶
Result |
Description |
---|---|
|
32-bit signless integer |
triton_nvidia_gpu.get_thread_id
(::mlir::triton::nvidia_gpu::GetThreadIdOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.get_thread_id` attr-dict `:` type($result)
Returns the one dimensional threadId.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results:¶
Result |
Description |
---|---|
|
32-bit signless integer |
triton_nvidia_gpu.insert_slice_async_v2
(::mlir::triton::nvidia_gpu::InsertSliceAsyncV2Op)¶
Syntax:
operation ::= `triton_nvidia_gpu.insert_slice_async_v2` operands attr-dict `:` type(operands) `->` type($result)
Traits: AttrSizedOperandSegments, ResultsAreSharedEncoding
Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{MemoryEffects::Read on ::mlir::SideEffects::DefaultResource, MemoryEffects::Write on ::mlir::SideEffects::DefaultResource}
Attributes:¶
Attribute |
MLIR Type |
Description |
---|---|---|
|
::mlir::triton::CacheModifierAttr |
allowed 32-bit signless integer cases: 1, 2, 3, 4, 5, 6 |
|
::mlir::triton::EvictionPolicyAttr |
allowed 32-bit signless integer cases: 1, 2, 3 |
|
::mlir::BoolAttr |
bool attribute |
|
::mlir::IntegerAttr |
32-bit signless integer attribute |
Operands:¶
Operand |
Description |
---|---|
|
ptr or tensor of ptr values |
|
tensor of floating-point values or tensor of integer values or tensor of ptr values |
|
32-bit signless integer |
|
ptr |
|
tensor of 1-bit signless integer values or 1-bit signless integer |
|
floating-point or tensor of floating-point values or integer or tensor of integer values or ptr or tensor of ptr values or ptr |
Results:¶
Result |
Description |
---|---|
|
tensor of floating-point values or tensor of integer values or tensor of ptr values |
triton_nvidia_gpu.lock
(::mlir::triton::nvidia_gpu::LockOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.lock` $mutex attr-dict `:` type(operands)
Operands:¶
Operand |
Description |
---|---|
|
triton_nvidia_gpu.mbarrier_arrive
(::mlir::triton::nvidia_gpu::MBarrierArriveOp)¶
mbarrier arrive
Syntax:
operation ::= `triton_nvidia_gpu.mbarrier_arrive` operands attr-dict `:` type(operands)
This operation defining the arriving action for a mbarrier. txCount: An optional attribute that set tx-count. This Op will be lowered into mbarrier.arrive.expect_tx if the optional attribute exist. trackAsyncOp: If true, this op will be lowered into cp.async.mbarrier.arrive.noinc. pred: Only perform arrive action when pred is true. remoteCtaId: if set, perform an remote arrive action.
Example:
triton_nvidia_gpu.mbarrier_arrive %0 {trackAsyncOp = false} : !tt.ptr
Traits: AttrSizedOperandSegments
Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{MemoryEffects::Write on ::mlir::SideEffects::DefaultResource}
Attributes:¶
Attribute |
MLIR Type |
Description |
---|---|---|
|
::mlir::IntegerAttr |
1-bit signless integer attribute |
|
::mlir::IntegerAttr |
32-bit signless integer attribute |
Operands:¶
Operand |
Description |
---|---|
|
ptr |
|
1-bit signless integer |
|
32-bit signless integer |
triton_nvidia_gpu.mbarrier_wait
(::mlir::triton::nvidia_gpu::MBarrierWaitOp)¶
mbarrier wait
Syntax:
operation ::= `triton_nvidia_gpu.mbarrier_wait` $mbarrier `,` $phase attr-dict `:` type($mbarrier)
This operation defining the waiting action for a mbarrier. The subsequent operations should not execute until this operation completes waiting.
Example:
triton_nvidia_gpu.mbarrier_wait %0, %1 : !tt.ptr
Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{MemoryEffects::Read on ::mlir::SideEffects::DefaultResource, MemoryEffects::Write on ::mlir::SideEffects::DefaultResource}
Operands:¶
Operand |
Description |
---|---|
|
ptr |
|
1-bit signless integer |
triton_nvidia_gpu.bar_arrive
(::mlir::triton::nvidia_gpu::NamedBarrierArriveOp)¶
named barrier arrive
Syntax:
operation ::= `triton_nvidia_gpu.bar_arrive` $bar `,` $numThreads attr-dict `:` type(operands)
Operands:¶
Operand |
Description |
---|---|
|
32-bit signless integer |
|
32-bit signless integer |
triton_nvidia_gpu.bar_wait
(::mlir::triton::nvidia_gpu::NamedBarrierWaitOp)¶
named barrier wait
Syntax:
operation ::= `triton_nvidia_gpu.bar_wait` $bar `,` $numThreads attr-dict `:` type(operands)
Operands:¶
Operand |
Description |
---|---|
|
32-bit signless integer |
|
32-bit signless integer |
triton_nvidia_gpu.producer_acquire
(::mlir::triton::nvidia_gpu::ProducerAcquireOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.producer_acquire` $token `,` $idx attr-dict `:` type(operands)
Operands:¶
Operand |
Description |
---|---|
|
tensor of values |
|
32-bit signless integer |
triton_nvidia_gpu.producer_commit
(::mlir::triton::nvidia_gpu::ProducerCommitOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.producer_commit` $token `,` $idx attr-dict `:` type(operands)
Operands:¶
Operand |
Description |
---|---|
|
tensor of values |
|
32-bit signless integer |
triton_nvidia_gpu.reg_alloc
(::mlir::triton::nvidia_gpu::RegAllocOp)¶
register allocation
Syntax:
operation ::= `triton_nvidia_gpu.reg_alloc` $regCount attr-dict
Attributes:¶
Attribute |
MLIR Type |
Description |
---|---|---|
|
::mlir::IntegerAttr |
32-bit signless integer attribute |
triton_nvidia_gpu.reg_dealloc
(::mlir::triton::nvidia_gpu::RegDeallocOp)¶
register deallocation
Syntax:
operation ::= `triton_nvidia_gpu.reg_dealloc` $regCount attr-dict
Attributes:¶
Attribute |
MLIR Type |
Description |
---|---|---|
|
::mlir::IntegerAttr |
32-bit signless integer attribute |
triton_nvidia_gpu.store_async
(::mlir::triton::nvidia_gpu::StoreAsyncOp)¶
store asynchronous by a tensor pointer
Syntax:
operation ::= `triton_nvidia_gpu.store_async` operands attr-dict `:` type(operands)
Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{MemoryEffects::Write on ::mlir::SideEffects::DefaultResource}
Attributes:¶
Attribute |
MLIR Type |
Description |
---|---|---|
|
::mlir::triton::CacheModifierAttr |
allowed 32-bit signless integer cases: 1, 2, 3, 4, 5, 6 |
Operands:¶
Operand |
Description |
---|---|
|
ptr |
|
tensor of floating-point values or tensor of integer values or tensor of ptr values |
triton_nvidia_gpu.unlock
(::mlir::triton::nvidia_gpu::UnlockOp)¶
Syntax:
operation ::= `triton_nvidia_gpu.unlock` $mutex attr-dict `:` type(operands)
Operands:¶
Operand |
Description |
---|---|
|