TritonGPUOps¶
triton_gpu.async_commit_group
(triton::gpu::AsyncCommitGroupOp)¶
Async commit group
Syntax:
operation ::= `triton_gpu.async_commit_group` $inputTokens attr-dict
Traits: VerifyTensorLayoutsTrait
Interfaces: InferTypeOpInterface
Operands:¶
Operand |
Description |
---|---|
|
variadic of async token type |
Results:¶
Result |
Description |
---|---|
|
async token type |
triton_gpu.async_copy_global_to_local
(triton::gpu::AsyncCopyGlobalToLocalOp)¶
Copy data from global memory to local memory asynchronously
Syntax:
operation ::= `triton_gpu.async_copy_global_to_local` $src `,` $result (`mask` $mask^)? (`other` $other^)?
oilist(`cacheModifier` `=` $cache | `evictionPolicy` `=` $evict)
attr-dict `:` type($src) `->` type($result)
This operation copies data from global memory to local memory asynchronously. This is analogue to tt.load except the data are copied to local memory pointed by by the memory descriptor instread of a distributed tensor. The rest of the operands are the same as tt.load.
Traits: AttrSizedOperandSegments
, VerifyTensorLayoutsTrait
Interfaces: InferTypeOpInterface
, MemoryEffectOpInterface
Attributes:¶
Attribute | MLIR Type | Description |
---|---|---|
cache | ::mlir::triton::CacheModifierAttr | allowed 32-bit signless integer cases: 1, 2, 3, 4, 5, 6, 7{{% markdown %}}Enum cases: * none (`NONE`) * ca (`CA`) * cg (`CG`) * wb (`WB`) * cs (`CS`) * wt (`WT`) * cv (`CV`){{% /markdown %}} |
evict | ::mlir::triton::EvictionPolicyAttr | allowed 32-bit signless integer cases: 1, 2, 3{{% markdown %}}Enum cases: * evict_normal (`NORMAL`) * evict_first (`EVICT_FIRST`) * evict_last (`EVICT_LAST`){{% /markdown %}} |
isVolatile | ::mlir::BoolAttr | bool attribute |
Operands:¶
Operand |
Description |
---|---|
|
ranked tensor of ptr values |
|
memory descriptor type ( |
|
tensor of 1-bit signless integer values |
|
floating-point or ranked tensor of floating-point values or integer or ranked tensor of integer values or ptr or ranked tensor of ptr values or ptr |
Results:¶
Result |
Description |
---|---|
|
async token type |
triton_gpu.async_wait
(triton::gpu::AsyncWaitOp)¶
Async wait
Syntax:
operation ::= `triton_gpu.async_wait` $asyncToken attr-dict
Traits: VerifyTensorLayoutsTrait
Interfaces: InferTypeOpInterface
Attributes:¶
Attribute | MLIR Type | Description |
---|---|---|
num | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands:¶
Operand |
Description |
---|---|
|
variadic of async token type |
Results:¶
Result |
Description |
---|---|
|
async token type |
triton_gpu.convert_layout
(triton::gpu::ConvertLayoutOp)¶
Convert layout
Syntax:
operation ::= `triton_gpu.convert_layout` $src attr-dict `:` type($src) `->` type($result)
Traits: AlwaysSpeculatableImplTrait
, SameOperandsAndResultElementType
, SameOperandsAndResultShape
, VerifyTensorLayoutsTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands:¶
Operand |
Description |
---|---|
|
ranked tensor of floating-point or integer or ptr values |
Results:¶
Result |
Description |
---|---|
|
ranked tensor of floating-point or integer or ptr values |
triton_gpu.global_scratch_alloc
(triton::gpu::GlobalScratchAllocOp)¶
Allocate a global memory buffer
Syntax:
operation ::= `triton_gpu.global_scratch_alloc` attr-dict `:` qualified(type($result))
This operation allocates a buffer in global memory that is private to the current program.
Traits: VerifyTensorLayoutsTrait
Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{MemoryEffects::Allocate on ::mlir::triton::GlobalMemory}
Attributes:¶
Attribute | MLIR Type | Description |
---|---|---|
nbytes | ::mlir::IntegerAttr | 32-bit signless integer attribute |
alignment | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Results:¶
Result |
Description |
---|---|
|
ptr |
triton_gpu.local_alloc
(triton::gpu::LocalAllocOp)¶
Allocate tensor
Syntax:
operation ::= `triton_gpu.local_alloc` $src attr-dict `:` functional-type(operands, results)
This operation allocates buffer in shared memory and return a descriptor containing the address and a view of the buffer.
Explicitly deallocating a buffer is optional; see local_dealloc.
Traits: VerifyTensorLayoutsTrait
Interfaces: MemoryEffectOpInterface
Attributes:¶
Attribute | MLIR Type | Description |
---|---|---|
alignment | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands:¶
Operand |
Description |
---|---|
|
ranked tensor of floating-point or integer or ptr values |
Results:¶
Result |
Description |
---|---|
|
memory descriptor type ( |
triton_gpu.local_dealloc
(triton::gpu::LocalDeallocOp)¶
Dealloc buffer
Syntax:
operation ::= `triton_gpu.local_dealloc` $src attr-dict `:` qualified(type($src))
This operation deallocates a buffer explicitly. Using the buffer after this operation is undefined.
This operation is optional. If you don’t explicitly dealloc a buffer, the compiler assumes it’s deallocated at the first point that post-dominates all uses of the alloc.
Because we assume a memdesc is dead at the first point that post-dominates its uses, ops that wait for an async operation on a memdesc to complete (such as triton_nvidia_gpu.warp_group_dot_wait) should also take the memdesc as an operand.
Traits: VerifyTensorLayoutsTrait
Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{MemoryEffects::Free on ::mlir::triton::gpu::SharedMemory}
Operands:¶
Operand |
Description |
---|---|
|
memory descriptor type ( |
triton_gpu.local_load
(triton::gpu::LocalLoadOp)¶
Load a buffer from local memory into a distributed tensor
Syntax:
operation ::= `triton_gpu.local_load` $src (`token` $token^)? attr-dict `:` qualified(type($src)) `->` type($result)
Load a tensor from the local memory descriptor into a distributed tensor.
Traits: VerifyTensorLayoutsTrait
Interfaces: MemoryEffectOpInterface
Operands:¶
Operand |
Description |
---|---|
|
memory descriptor type ( |
|
async token type |
Results:¶
Result |
Description |
---|---|
|
ranked tensor of floating-point or integer or ptr values |
triton_gpu.local_store
(triton::gpu::LocalStoreOp)¶
Store a distributed tensor into a buffer in local memory
Syntax:
operation ::= `triton_gpu.local_store` $src `,` $dst attr-dict `:` type($src) `->` qualified(type($dst))
Store a distributed tensor into a buffer in local memory.
Traits: VerifyTensorLayoutsTrait
Interfaces: MemoryEffectOpInterface
Operands:¶
Operand |
Description |
---|---|
|
ranked tensor of floating-point or integer or ptr values |
|
memory descriptor type ( |
triton_gpu.memdesc_subview
(triton::gpu::MemDescSubviewOp)¶
Take a subview of the descriptor.
Syntax:
operation ::= `triton_gpu.memdesc_subview` $src `[` $offsets `]` attr-dict `:` qualified(type($src)) `->` qualified(type($result))
This operation returns a new descriptor representing a subview of the buffer. It doesn’t affect the underlying memory. The subview can be rank-reduced.
For example, suppose that
the input shape is 2x4x16xf16,
the output shape is 4x4xf16, and
offsets = [1, 0, 4].
Then in Python syntax, the subview covers input[1][0:4][4:8].
Traits: AlwaysSpeculatableImplTrait
, VerifyTensorLayoutsTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands:¶
Operand |
Description |
---|---|
|
memory descriptor type ( |
|
variadic of 32-bit signless integer |
Results:¶
Result |
Description |
---|---|
|
memory descriptor type ( |
triton_gpu.memdesc_trans
(triton::gpu::MemDescTransOp)¶
Transpose the descriptor
Syntax:
operation ::= `triton_gpu.memdesc_trans` $src attr-dict `:` qualified(type($src)) `->` qualified(type($result))
This operation returns a new descriptor representing a transposed view of the buffer.
Traits: AlwaysSpeculatableImplTrait
, SameOperandsAndResultElementType
, VerifyTensorLayoutsTrait
Interfaces: ConditionallySpeculatable
, InferTypeOpInterface
, NoMemoryEffect (MemoryEffectOpInterface)
, TransposeOpInterface
Effects: MemoryEffects::Effect{}
Attributes:¶
Attribute | MLIR Type | Description |
---|---|---|
order | ::mlir::DenseI32ArrayAttr | i32 dense array attribute |
Operands:¶
Operand |
Description |
---|---|
|
memory descriptor type ( |
Results:¶
Result |
Description |
---|---|
|
memory descriptor type ( |
triton_gpu.upcast_mxfp
(triton::gpu::UpcastMXFPOp)¶
Convert an mxfp tensor to bf16
Syntax:
operation ::= `triton_gpu.upcast_mxfp` $src `,` $scale `fp_type` `=` $fp_type attr-dict `:` type($src) `,` type($scale) `->` type($result)
Compute the bf16 encoded in the given mxfp number as per https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf
Traits: AlwaysSpeculatableImplTrait
, VerifyTensorLayoutsTrait
Interfaces: ConditionallySpeculatable
, InferTypeOpInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes:¶
Attribute | MLIR Type | Description |
---|---|---|
fp_type | ::mlir::triton::ScaleDotElemTypeAttr | allowed 32-bit signless integer cases: 0, 1, 2, 3, 4, 5{{% markdown %}}Enum cases: * e4m3 (`E4M3`) * e5m2 (`E5M2`) * e2m3 (`E2M3`) * e3m2 (`E3M2`) * e2m1 (`E2M1`) * bf16 (`BF16`){{% /markdown %}} |
Operands:¶
Operand |
Description |
---|---|
|
ranked tensor of floating-point or integer or ptr values |
|
ranked tensor of floating-point or integer or ptr values |
Results:¶
Result |
Description |
---|---|
|
ranked tensor of floating-point or integer or ptr values |