# TritonGPUOps
### `triton_gpu.async_commit_group` (triton::gpu::AsyncCommitGroupOp)
_Async commit group_
Syntax:
```
operation ::= `triton_gpu.async_commit_group` $inputTokens attr-dict
```
Traits: `VerifyTensorLayoutsTrait`
Interfaces: `InferTypeOpInterface`
#### Operands:
| Operand | Description |
| :-----: | ----------- |
| `inputTokens` | variadic of async token type
#### Results:
| Result | Description |
| :----: | ----------- |
| `asyncToken` | async token type
### `triton_gpu.async_copy_global_to_local` (triton::gpu::AsyncCopyGlobalToLocalOp)
_Copy data from global memory to local memory asynchronously_
Syntax:
```
operation ::= `triton_gpu.async_copy_global_to_local` $src `,` $result (`mask` $mask^)? (`other` $other^)?
oilist(`cacheModifier` `=` $cache | `evictionPolicy` `=` $evict)
attr-dict `:` type($src) `->` type($result)
```
This operation copies data from global memory to local memory asynchronously.
This is analogue to tt.load except the data are copied to local memory pointed
by by the memory descriptor instread of a distributed tensor. The rest of the
operands are the same as tt.load.
Traits: `AttrSizedOperandSegments`, `VerifyTensorLayoutsTrait`
Interfaces: `InferTypeOpInterface`, `MemoryEffectOpInterface (MemoryEffectOpInterface)`
Effects: `MemoryEffects::Effect{MemoryEffects::Read on ::mlir::triton::GlobalMemory, MemoryEffects::Write on ::mlir::triton::gpu::SharedMemory}`
#### Attributes:
Attribute | MLIR Type | Description |
cache | ::mlir::triton::CacheModifierAttr | allowed 32-bit signless integer cases: 1, 2, 3, 4, 5, 6{{% markdown %}}Enum cases:
* none (`NONE`)
* ca (`CA`)
* cg (`CG`)
* wb (`WB`)
* cs (`CS`)
* wt (`WT`){{% /markdown %}} |
evict | ::mlir::triton::EvictionPolicyAttr | allowed 32-bit signless integer cases: 1, 2, 3{{% markdown %}}Enum cases:
* evict_normal (`NORMAL`)
* evict_first (`EVICT_FIRST`)
* evict_last (`EVICT_LAST`){{% /markdown %}} |
isVolatile | ::mlir::BoolAttr | bool attribute |
#### Operands:
| Operand | Description |
| :-----: | ----------- |
| `src` | ranked tensor of ptr values
| `result` | memory descriptor type (`::mlir::triton::MemDescType`) in Triton IR type system
| `mask` | tensor of 1-bit signless integer values
| `other` | floating-point or ranked tensor of floating-point values or integer or ranked tensor of integer values or ptr or ranked tensor of ptr values or ptr
#### Results:
| Result | Description |
| :----: | ----------- |
| `token` | async token type
### `triton_gpu.async_wait` (triton::gpu::AsyncWaitOp)
_Async wait_
Syntax:
```
operation ::= `triton_gpu.async_wait` $asyncToken attr-dict
```
Traits: `VerifyTensorLayoutsTrait`
Interfaces: `InferTypeOpInterface`
#### Attributes:
Attribute | MLIR Type | Description |
num | ::mlir::IntegerAttr | 32-bit signless integer attribute |
#### Operands:
| Operand | Description |
| :-----: | ----------- |
| `asyncToken` | variadic of async token type
#### Results:
| Result | Description |
| :----: | ----------- |
| `retToken` | async token type
### `triton_gpu.convert_layout` (triton::gpu::ConvertLayoutOp)
_Convert layout_
Syntax:
```
operation ::= `triton_gpu.convert_layout` $src attr-dict `:` type($src) `->` type($result)
```
Traits: `AlwaysSpeculatableImplTrait`, `SameOperandsAndResultElementType`, `SameOperandsAndResultShape`, `VerifyTensorLayoutsTrait`
Interfaces: `ConditionallySpeculatable`, `NoMemoryEffect (MemoryEffectOpInterface)`
Effects: `MemoryEffects::Effect{}`
#### Operands:
| Operand | Description |
| :-----: | ----------- |
| `src` | ranked tensor of floating-point or integer or ptr values
#### Results:
| Result | Description |
| :----: | ----------- |
| `result` | ranked tensor of floating-point or integer or ptr values
### `triton_gpu.local_alloc` (triton::gpu::LocalAllocOp)
_Allocate tensor_
Syntax:
```
operation ::= `triton_gpu.local_alloc` $src attr-dict `:` functional-type(operands, results)
```
This operation allocates buffer in shared memory and return a descriptor
containing the address and a view of the buffer.
Explicitly deallocating a buffer is optional; see local_dealloc.
Traits: `VerifyTensorLayoutsTrait`
Interfaces: `MemoryEffectOpInterface`
#### Operands:
| Operand | Description |
| :-----: | ----------- |
| `src` | ranked tensor of floating-point or integer or ptr values
#### Results:
| Result | Description |
| :----: | ----------- |
| `result` | memory descriptor type (`::mlir::triton::MemDescType`) in Triton IR type system
### `triton_gpu.local_dealloc` (triton::gpu::LocalDeallocOp)
_Dealloc buffer_
Syntax:
```
operation ::= `triton_gpu.local_dealloc` $src attr-dict `:` qualified(type($src))
```
This operation deallocates a buffer explicitly. Using the buffer after this
operation is undefined.
This operation is optional. If you don't explicitly dealloc a buffer, the
compiler assumes it's deallocated at the first point that post-dominates all
uses of the alloc.
Because we assume a memdesc is dead at the first point that post-dominates
its uses, ops that wait for an async operation on a memdesc to complete
(such as triton_nvidia_gpu.dot_wait) should also take the memdesc as an
operand.
Traits: `VerifyTensorLayoutsTrait`
Interfaces: `MemoryEffectOpInterface (MemoryEffectOpInterface)`
Effects: `MemoryEffects::Effect{MemoryEffects::Free on ::mlir::triton::gpu::SharedMemory}`
#### Operands:
| Operand | Description |
| :-----: | ----------- |
| `src` | memory descriptor type (`::mlir::triton::MemDescType`) in Triton IR type system
### `triton_gpu.local_load` (triton::gpu::LocalLoadOp)
_Load a buffer from local memory into a distributed tensor_
Syntax:
```
operation ::= `triton_gpu.local_load` $src (`token` $token^)? attr-dict `:` qualified(type($src)) `->` type($result)
```
Load a tensor from the local memory descriptor into a distributed tensor.
Traits: `VerifyTensorLayoutsTrait`
Interfaces: `MemoryEffectOpInterface (MemoryEffectOpInterface)`
Effects: `MemoryEffects::Effect{MemoryEffects::Read on ::mlir::triton::gpu::SharedMemory}`
#### Operands:
| Operand | Description |
| :-----: | ----------- |
| `src` | memory descriptor type (`::mlir::triton::MemDescType`) in Triton IR type system
| `token` | async token type
#### Results:
| Result | Description |
| :----: | ----------- |
| `result` | ranked tensor of floating-point or integer or ptr values
### `triton_gpu.memdesc_subview` (triton::gpu::MemDescSubviewOp)
_Take a subview of the descriptor._
Syntax:
```
operation ::= `triton_gpu.memdesc_subview` $src `[` $offsets `]` attr-dict `:` qualified(type($src)) `->` qualified(type($result))
```
This operation returns a new descriptor representing a subview of the buffer.
It doesn't affect the underlying memory. The subview can be rank-reduced.
For example, suppose that
- the input shape is 2x4x16xf16,
- the output shape is 4x4xf16, and
- offsets = [1, 0, 4].
Then in Python syntax, the subview covers input[1][0:4][4:8].
Traits: `AlwaysSpeculatableImplTrait`, `VerifyTensorLayoutsTrait`
Interfaces: `ConditionallySpeculatable`, `NoMemoryEffect (MemoryEffectOpInterface)`
Effects: `MemoryEffects::Effect{}`
#### Operands:
| Operand | Description |
| :-----: | ----------- |
| `src` | memory descriptor type (`::mlir::triton::MemDescType`) in Triton IR type system
| `offsets` | variadic of 32-bit signless integer
#### Results:
| Result | Description |
| :----: | ----------- |
| `result` | memory descriptor type (`::mlir::triton::MemDescType`) in Triton IR type system