# NVWSOps
### `nvws.aref.buffer` (triton::nvws::ArefBufferOp)
_Get buffer from aref_
Syntax:
```
operation ::= `nvws.aref.buffer` $aref (`[` $stage^ `]`)? `,` $token attr-dict
`:` type($aref) `,` type($token) `->` type(results)
```
Interfaces: `ArefStageInterface`
#### Operands:
| Operand | Description |
| :-----: | ----------- |
| `aref` | Asynchronous Reference |
| `token` | async token type |
| `stage` | 32-bit signless integer |
#### Results:
| Result | Description |
| :----: | ----------- |
| `buffers` | variadic of memory descriptor type (`::mlir::triton::gpu::MemDescType`) in Triton IR type system |
### `nvws.aref.create` (triton::nvws::ArefCreateOp)
_Create an asynchronous reference._
Syntax:
```
operation ::= `nvws.aref.create` $buffers attr-dict `:` type($result)
```
Create an asynchronous reference.
Takes as inputs a variadic number of buffers, and returns an ARef.
The inputs are expected to be array-like (i.e., Tensor, MemDesc, etc)
and the first axis of the shape should match between all inputs, representing
multi-buffering of the values.
Traits: `AlwaysSpeculatableImplTrait`
Interfaces: `ConditionallySpeculatable`, `NoMemoryEffect (MemoryEffectOpInterface)`
Effects: `MemoryEffects::Effect{}`
#### Operands:
| Operand | Description |
| :-----: | ----------- |
| `buffers` | variadic of memory descriptor type (`::mlir::triton::gpu::MemDescType`) in Triton IR type system |
#### Results:
| Result | Description |
| :----: | ----------- |
| `result` | Asynchronous Reference |
### `nvws.aref.get.enter` (triton::nvws::ArefGetEnterOp)
_Enter ArefGet region where the buffer can be used to read data_
Syntax:
```
operation ::= `nvws.aref.get.enter` $aref ( `[` $stage^ `,` $phase `]`)? attr-dict
`:` type($aref) `->` type(results)
```
Enter a "region" where you can freely read from the buffer)
These ArefGet "regions" can span multiple iterations.
Traits: `AttrSizedOperandSegments`
Interfaces: `ArefStageInterface`
#### Operands:
| Operand | Description |
| :-----: | ----------- |
| `aref` | Asynchronous Reference |
| `stage` | 32-bit signless integer |
| `phase` | 32-bit signless integer |
#### Results:
| Result | Description |
| :----: | ----------- |
| `buffers` | variadic of memory descriptor type (`::mlir::triton::gpu::MemDescType`) in Triton IR type system |
| `token` | async token type |
### `nvws.aref.get.exit` (triton::nvws::ArefGetExitOp)
_Exit ArefGet region, where the buffer should no longer be used_
Syntax:
```
operation ::= `nvws.aref.get.exit` $aref (`[` $stage^ `]`)? `,` $token $async_ops attr-dict
`:` type($aref) `,` type($token)
```
Leave the region where you can freely read from the buffer).
These ArefGet "regions" can span multiple iterations.
Interfaces: `ArefStageInterface`
#### Attributes:
Attribute | MLIR Type | Description |
async_ops | ::mlir::ArrayAttr | array of async op attributes |
#### Operands:
| Operand | Description |
| :-----: | ----------- |
| `aref` | Asynchronous Reference |
| `token` | async token type |
| `stage` | 32-bit signless integer |
### `nvws.aref.put.enter` (triton::nvws::ArefPutEnterOp)
_Enter ArefPut region where the buffer can be used to read data_
Syntax:
```
operation ::= `nvws.aref.put.enter` $aref ( `[` $stage^ `,` $phase `]`)? attr-dict
`:` type($aref) `->` type(results)
```
Enter a "region" where you can freely write to the buffer)
These ArefPut "regions" can span multiple iterations.
Traits: `AttrSizedOperandSegments`
Interfaces: `ArefStageInterface`
#### Operands:
| Operand | Description |
| :-----: | ----------- |
| `aref` | Asynchronous Reference |
| `stage` | 32-bit signless integer |
| `phase` | 32-bit signless integer |
#### Results:
| Result | Description |
| :----: | ----------- |
| `buffers` | variadic of memory descriptor type (`::mlir::triton::gpu::MemDescType`) in Triton IR type system |
| `token` | async token type |
### `nvws.aref.put.exit` (triton::nvws::ArefPutExitOp)
_Exit ArefPut region, where the buffer should no longer be used_
Syntax:
```
operation ::= `nvws.aref.put.exit` $aref (`[` $stage^ `]`)? `,` $token $async_ops attr-dict
`:` type($aref) `,` type($token)
```
Leave the region where you can freely write to the buffer).
These ArefPut "regions" can span multiple iterations.
Interfaces: `ArefStageInterface`
#### Attributes:
Attribute | MLIR Type | Description |
async_ops | ::mlir::ArrayAttr | array of async op attributes |
#### Operands:
| Operand | Description |
| :-----: | ----------- |
| `aref` | Asynchronous Reference |
| `token` | async token type |
| `stage` | 32-bit signless integer |
### `nvws.consumer_release` (triton::nvws::ConsumerReleaseOp)
_Consumer releases the token_
Syntax:
```
operation ::= `nvws.consumer_release` $token `,` $idx attr-dict `:` type(operands)
```
The consumer will release the token and signal the producer
that the buffers are ready to be filled.
#### Operands:
| Operand | Description |
| :-----: | ----------- |
| `token` | tensor of values |
| `idx` | 32-bit signless integer |
### `nvws.consumer_wait` (triton::nvws::ConsumerWaitOp)
_Consumer awaits buffer readiness_
Syntax:
```
operation ::= `nvws.consumer_wait` $token `,` $idx `,` $phase attr-dict `:` type(operands)
```
The consumer will wait for the buffer to be ready
to be consumed. If the buffers are not ready, the consumer will wait to be
signalled by the producer which finishes filling the buffers and
releases the token.
#### Operands:
| Operand | Description |
| :-----: | ----------- |
| `token` | tensor of values |
| `idx` | 32-bit signless integer |
| `phase` | 1-bit signless integer |
### `nvws.create_token` (triton::nvws::CreateTokenOp)
_Create a token to be used for synchronizations in communication channels_
Syntax:
```
operation ::= `nvws.create_token` attr-dict `:` type($result)
```
A token will be used by the producer and consumer to synchronize.
The producer will acquire and hold the token, until it has filled the buffers,
and signal the waiting consumer.
The consumer will hold the token until it has consumed the buffers,
and will signal the waiting producer trying to acquire the token.
#### Attributes:
Attribute | MLIR Type | Description |
numBuffers | ::mlir::IntegerAttr | 32-bit signless integer attribute |
loadType | ::mlir::triton::nvws::TokenLoadTypeAttr | allowed 32-bit signless integer cases: 0, 1, 2, 3, 4 |
#### Results:
| Result | Description |
| :----: | ----------- |
| `result` | tensor of values |
### `nvws.descriptor_gather` (triton::nvws::DescriptorGatherOp)
_Gather multiple rows from a descriptor into shared memory_
Syntax:
```
operation ::= `nvws.descriptor_gather` $desc `[` $x_offsets `,` $y_offset `]` $txCount $result
attr-dict `:` type(operands)
```
This op behaves exactly like the op with the same name in Triton Dialect, but the result of the load is stored into shared memory.
The execution is still synchronous.
Interfaces: `NVWS_DescriptorLoadOpInterface`, `TT_DescriptorOpInterface`
#### Attributes:
Attribute | MLIR Type | Description |
txCount | ::mlir::IntegerAttr | 32-bit signless integer attribute |
#### Operands:
| Operand | Description |
| :-----: | ----------- |
| `desc` | Tensor descriptor type (`::mlir::triton::TensorDescType`) in Triton IR type system |
| `x_offsets` | ranked tensor of 32-bit signless integer values |
| `y_offset` | 32-bit signless integer |
| `result` | memory descriptor type (`::mlir::triton::gpu::MemDescType`) in Triton IR type system |
### `nvws.descriptor_load` (triton::nvws::DescriptorLoadOp)
_Load from descriptor and store into shared memory_
Syntax:
```
operation ::= `nvws.descriptor_load` $desc `[` $indices `]` $txCount $result
oilist(
`cacheModifier` `=` $cache |
`evictionPolicy` `=` $evict
)
attr-dict `:` type(operands)
```
This op behaves exactly like the op with the same name in Triton Dialect, but the result of the load is stored into shared memory.
The execution is still synchronous.
Interfaces: `NVWS_DescriptorLoadOpInterface`, `TT_DescriptorOpInterface`
#### Attributes:
Attribute | MLIR Type | Description |
txCount | ::mlir::IntegerAttr | 32-bit signless integer attribute |
cache | ::mlir::triton::CacheModifierAttr | allowed 32-bit signless integer cases: 1, 2, 3, 4, 5, 6, 7 |
evict | ::mlir::triton::EvictionPolicyAttr | allowed 32-bit signless integer cases: 1, 2, 3 |
#### Operands:
| Operand | Description |
| :-----: | ----------- |
| `desc` | Tensor descriptor type (`::mlir::triton::TensorDescType`) in Triton IR type system |
| `indices` | variadic of 32-bit signless integer |
| `result` | memory descriptor type (`::mlir::triton::gpu::MemDescType`) in Triton IR type system |
### `nvws.producer_acquire` (triton::nvws::ProducerAcquireOp)
_Producer acquires a token to fill buffers_
Syntax:
```
operation ::= `nvws.producer_acquire` $token `,` $idx `,` $phase attr-dict `:` type(operands)
```
The producer will try to acquire the token prior to filling
the buffers. If the buffers are not ready to be filled, the producer will wait to be
signalled by the consumer which finishes consuming the buffers and
releases the token.
#### Operands:
| Operand | Description |
| :-----: | ----------- |
| `token` | tensor of values |
| `idx` | 32-bit signless integer |
| `phase` | 1-bit signless integer |
### `nvws.producer_commit` (triton::nvws::ProducerCommitOp)
_Producer commits the buffer changes_
Syntax:
```
operation ::= `nvws.producer_commit` $token `,` $idx attr-dict `:` type(operands)
```
The producer will release the token and signal the consumer
that the buffers are ready to be consumed.
#### Operands:
| Operand | Description |
| :-----: | ----------- |
| `token` | tensor of values |
| `idx` | 32-bit signless integer |
### `nvws.warp_group` (triton::nvws::WarpGroupOp)
_Container Op for Warp Specialization_
Higher level container for Warp Specialization Analysis.
Contains a variadic number warp groups, with
the number of warps in each group, plus a region to hold the
computation for that warp group.
The results of this op, if any, are those of the first region, as returned by
nvws.warp_group.yield op.
nvws.warp_group should be lowered to ttg.warp_specialize
before execution.
Traits: `RecursiveMemoryEffects`, `RecursivelySpeculatableImplTrait`
Interfaces: `ConditionallySpeculatable`
#### Attributes:
Attribute | MLIR Type | Description |
numWarps | ::mlir::DenseI32ArrayAttr | i32 dense array attribute |
#### Results:
| Result | Description |
| :----: | ----------- |
| `results` | variadic of any type |
### `nvws.warp_group.return` (triton::nvws::WarpGroupReturnOp)
_Terminator for a warp group region_
Syntax:
```
operation ::= `nvws.warp_group.return` attr-dict
```
Warp groups are expected to return values via referential modification
of their inputs. Thus, the warp_group.return op takes no values to
return from the warp group.
Traits: `AlwaysSpeculatableImplTrait`, `HasParent`, `Terminator`
Interfaces: `ConditionallySpeculatable`, `NoMemoryEffect (MemoryEffectOpInterface)`
Effects: `MemoryEffects::Effect{}`
### `nvws.warp_group.yield` (triton::nvws::WarpGroupYieldOp)
_Yield from the first region of `nvws.warp_group`_
Syntax:
```
operation ::= `nvws.warp_group.yield` ($values^)? attr-dict (`:` type($values)^)?
```
This op is equivalent to ttg.warp_yield op for ttg.warp_specialize op.
TODO: Decide if we should move nvws.warp_group to TritonGPU, or continue to
have TritonGPU depend on NVWS. In the former case, this op can be removed.
The latter one involves a circular dependency between TritonGPU and NVWS.
Traits: `AlwaysSpeculatableImplTrait`, `HasParent`, `ReturnLike`, `Terminator`
Interfaces: `ConditionallySpeculatable`, `NoMemoryEffect (MemoryEffectOpInterface)`, `RegionBranchTerminatorOpInterface`
Effects: `MemoryEffects::Effect{}`
#### Operands:
| Operand | Description |
| :-----: | ----------- |
| `values` | variadic of any type |