NVWSOps

nvws.aref.create (triton::nvws::ArefCreateOp)

Create an asynchronous reference.

Syntax:

operation ::= `nvws.aref.create` $buffers attr-dict `:` type($result)

Create an asynchronous reference.

Takes as inputs a variadic number of buffers, and returns an ARef. The inputs are expected to be array-like (i.e., Tensor, MemDesc, etc) and the first axis of the shape should match between all inputs, representing multi-buffering of the values.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands:

Operand

Description

buffers

variadic of memory descriptor type (::mlir::triton::gpu::MemDescType) in Triton IR type system

Results:

Result

Description

result

Asynchronous Reference

nvws.aref.get.enter (triton::nvws::ArefGetEnterOp)

Enter ArefGet region where the buffer can be used to read data

Syntax:

operation ::= `nvws.aref.get.enter` $aref `[` $stage `,` $phase `]` attr-dict
              `:` type($aref) `->` type(results)

Enter a “region” where you can freely read from the buffer) These ArefGet “regions” can span multiple iterations.

Operands:

Operand

Description

aref

Asynchronous Reference

stage

32-bit signless integer

phase

32-bit signless integer

Results:

Result

Description

buffers

variadic of memory descriptor type (::mlir::triton::gpu::MemDescType) in Triton IR type system

token

async token type

nvws.aref.get.exit (triton::nvws::ArefGetExitOp)

Exit ArefGet region, where the buffer should no longer be used

Syntax:

operation ::= `nvws.aref.get.exit` $aref `[` $stage `]` `,` $token $async_ops attr-dict
              `:` type($aref) `,` type($token)

Leave the region where you can freely read from the buffer). These ArefGet “regions” can span multiple iterations.

Attributes:

AttributeMLIR TypeDescription
async_ops::mlir::ArrayAttrarray of async op attributes

Operands:

Operand

Description

aref

Asynchronous Reference

token

async token type

stage

32-bit signless integer

nvws.aref.put.enter (triton::nvws::ArefPutEnterOp)

Enter ArefPut region where the buffer can be used to read data

Syntax:

operation ::= `nvws.aref.put.enter` $aref `[` $stage `,` $phase `]` attr-dict
              `:` type($aref) `->` type(results)

Enter a “region” where you can freely write to the buffer) These ArefPut “regions” can span multiple iterations.

Operands:

Operand

Description

aref

Asynchronous Reference

stage

32-bit signless integer

phase

32-bit signless integer

Results:

Result

Description

buffers

variadic of memory descriptor type (::mlir::triton::gpu::MemDescType) in Triton IR type system

token

async token type

nvws.aref.put.exit (triton::nvws::ArefPutExitOp)

Exit ArefPut region, where the buffer should no longer be used

Syntax:

operation ::= `nvws.aref.put.exit` $aref`[` $stage `]` `,` $token  $async_ops attr-dict
              `:` type($aref) `,` type($token)

Leave the region where you can freely write to the buffer). These ArefPut “regions” can span multiple iterations.

Attributes:

AttributeMLIR TypeDescription
async_ops::mlir::ArrayAttrarray of async op attributes

Operands:

Operand

Description

aref

Asynchronous Reference

token

async token type

stage

32-bit signless integer

nvws.consumer_release (triton::nvws::ConsumerReleaseOp)

Consumer releases the token

Syntax:

operation ::= `nvws.consumer_release` $token `,` $idx attr-dict `:` type(operands)

The consumer will release the token and signal the producer that the buffers are ready to be filled.

Operands:

Operand

Description

token

tensor of values

idx

32-bit signless integer

nvws.consumer_wait (triton::nvws::ConsumerWaitOp)

Consumer awaits buffer readiness

Syntax:

operation ::= `nvws.consumer_wait` $token `,` $idx `,` $phase attr-dict `:` type(operands)

The consumer will wait for the buffer to be ready to be consumed. If the buffers are not ready, the consumer will wait to be signalled by the producer which finishes filling the buffers and releases the token.

Operands:

Operand

Description

token

tensor of values

idx

32-bit signless integer

phase

1-bit signless integer

nvws.create_token (triton::nvws::CreateTokenOp)

Create a token to be used for synchronizations in communication channels

Syntax:

operation ::= `nvws.create_token` attr-dict `:` type($result)

A token will be used by the producer and consumer to synchronize. The producer will acquire and hold the token, until it has filled the buffers, and signal the waiting consumer. The consumer will hold the token until it has consumed the buffers, and will signal the waiting producer trying to acquire the token.

Attributes:

AttributeMLIR TypeDescription
numBuffers::mlir::IntegerAttr32-bit signless integer attribute
loadType::mlir::triton::nvws::TokenLoadTypeAttrallowed 32-bit signless integer cases: 0, 1, 2, 3, 4

Results:

Result

Description

result

tensor of values

nvws.descriptor_gather (triton::nvws::DescriptorGatherOp)

Gather multiple rows from a descriptor into shared memory

Syntax:

operation ::= `nvws.descriptor_gather` $desc `[` $x_offsets `,` $y_offset `]` $txCount $result
              attr-dict `:` type(operands)

This op behaves exactly like the op with the same name in Triton Dialect, but the result of the load is stored into shared memory. The execution is still synchronous.

Interfaces: NVWS_DescriptorLoadOpInterface, TT_DescriptorOpInterface

Attributes:

AttributeMLIR TypeDescription
txCount::mlir::IntegerAttr32-bit signless integer attribute

Operands:

Operand

Description

desc

Tensor descriptor type (::mlir::triton::TensorDescType) in Triton IR type system

x_offsets

ranked tensor of 32-bit signless integer values

y_offset

32-bit signless integer

result

memory descriptor type (::mlir::triton::gpu::MemDescType) in Triton IR type system

nvws.descriptor_load (triton::nvws::DescriptorLoadOp)

Load from descriptor and store into shared memory

Syntax:

operation ::= `nvws.descriptor_load` $desc `[` $indices `]` $txCount $result
              oilist(
              `cacheModifier` `=` $cache |
              `evictionPolicy` `=` $evict
              )
              attr-dict `:` type(operands)

This op behaves exactly like the op with the same name in Triton Dialect, but the result of the load is stored into shared memory. The execution is still synchronous.

Interfaces: NVWS_DescriptorLoadOpInterface, TT_DescriptorOpInterface

Attributes:

AttributeMLIR TypeDescription
txCount::mlir::IntegerAttr32-bit signless integer attribute
cache::mlir::triton::CacheModifierAttrallowed 32-bit signless integer cases: 1, 2, 3, 4, 5, 6, 7
evict::mlir::triton::EvictionPolicyAttrallowed 32-bit signless integer cases: 1, 2, 3

Operands:

Operand

Description

desc

Tensor descriptor type (::mlir::triton::TensorDescType) in Triton IR type system

indices

variadic of 32-bit signless integer

result

memory descriptor type (::mlir::triton::gpu::MemDescType) in Triton IR type system

nvws.producer_acquire (triton::nvws::ProducerAcquireOp)

Producer acquires a token to fill buffers

Syntax:

operation ::= `nvws.producer_acquire` $token `,` $idx `,` $phase attr-dict `:` type(operands)

The producer will try to acquire the token prior to filling the buffers. If the buffers are not ready to be filled, the producer will wait to be signalled by the consumer which finishes consuming the buffers and releases the token.

Operands:

Operand

Description

token

tensor of values

idx

32-bit signless integer

phase

1-bit signless integer

nvws.producer_commit (triton::nvws::ProducerCommitOp)

Producer commits the buffer changes

Syntax:

operation ::= `nvws.producer_commit` $token `,` $idx attr-dict `:` type(operands)

The producer will release the token and signal the consumer that the buffers are ready to be consumed.

Operands:

Operand

Description

token

tensor of values

idx

32-bit signless integer

nvws.warp_group (triton::nvws::WarpGroupOp)

Container Op for Warp Specialization

Higher level container for Warp Specialization Analysis.

Contains a variadic number warp groups, with the number of warps in each group, plus a region to hold the computation for that warp group.

The results of this op, if any, are those of the first region, as returned by nvws.warp_group.yield op.

nvws.warp_group should be lowered to ttg.warp_specialize before execution.

Traits: RecursiveMemoryEffects, RecursivelySpeculatableImplTrait

Interfaces: ConditionallySpeculatable

Attributes:

AttributeMLIR TypeDescription
numWarps::mlir::DenseI32ArrayAttri32 dense array attribute

Results:

Result

Description

results

variadic of any type

nvws.warp_group.return (triton::nvws::WarpGroupReturnOp)

Terminator for a warp group region

Syntax:

operation ::= `nvws.warp_group.return` attr-dict

Warp groups are expected to return values via referential modification of their inputs. Thus, the warp_group.return op takes no values to return from the warp group.

Traits: AlwaysSpeculatableImplTrait, HasParent<WarpGroupOp>, Terminator

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

nvws.warp_group.yield (triton::nvws::WarpGroupYieldOp)

Yield from the first region of nvws.warp_group

Syntax:

operation ::= `nvws.warp_group.yield` ($values^)? attr-dict (`:` type($values)^)?

This op is equivalent to ttg.warp_yield op for ttg.warp_specialize op.

TODO: Decide if we should move nvws.warp_group to TritonGPU, or continue to have TritonGPU depend on NVWS. In the former case, this op can be removed. The latter one involves a circular dependency between TritonGPU and NVWS.

Traits: AlwaysSpeculatableImplTrait, HasParent<WarpGroupOp>, ReturnLike, Terminator

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface), RegionBranchTerminatorOpInterface

Effects: MemoryEffects::Effect{}

Operands:

Operand

Description

values

variadic of any type