TritonAMDGPUOps¶
amdgpu.buffer_load
(triton::amdgpu::BufferLoadOp)¶
Load from a scalar base pointer and a tensor offset
Syntax:
operation ::= `amdgpu.buffer_load` $ptr `[` $offsets `]` (`,` $mask^)? (`,` $other^)?
attr-dict `:` type($result)
AMD Buffer load operation. Buffer store is similar to
a normal store but it accesses global memory via a scalar base pointer
and a tensor of offsets instead of a tensor of pointers. The other fields
are similar to a normal load, i.e., the mask
is a boolean vector that
determines if a given element should be read from memory, and other
is the
element that should be returned on lane i
when mask[i] == 0
.
Traits: AttrSizedOperandSegments
, SameLoadStoreOperandsAndResultEncoding
Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{MemoryEffects::Read on ::mlir::triton::GlobalMemory}
Operands:¶
Operand |
Description |
---|---|
|
ptr |
|
tensor of 32-bit signless integer values |
|
ranked tensor of 1-bit signless integer values |
|
ranked tensor of floating-point or integer or ptr values |
Results:¶
Result |
Description |
---|---|
|
ranked tensor of floating-point or integer or ptr values |
amdgpu.buffer_store
(triton::amdgpu::BufferStoreOp)¶
Store into scalar base pointer and a tensor offset
Syntax:
operation ::= `amdgpu.buffer_store` $value `,` $ptr `[` $offsets `]` (`,` $mask^)?
attr-dict `:` type($value)
AMD Buffer store operation. Buffer store is similar to
normal store but it accesses global memory via a scalar base pointer
and a tensor of offsets instead of a tensor of pointers. The other fields
are similar to a normal store , i.e., the mask
is a boolean vector that
determines if a given element should be written to memory, and value
is the
tensor of elements that should be written on lane i
when mask[i] == 1
.
Traits: SameLoadStoreOperandsEncoding
Interfaces: MemoryEffectOpInterface (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{MemoryEffects::Write on ::mlir::triton::GlobalMemory}
Operands:¶
Operand |
Description |
---|---|
|
ranked tensor of floating-point or integer or ptr values |
|
ptr |
|
tensor of 32-bit signless integer values |
|
ranked tensor of 1-bit signless integer values |
amdgpu.extract_slice
(triton::amdgpu::ExtractSliceOp)¶
Extract slice operation
Syntax:
operation ::= `amdgpu.extract_slice` $source $static_offsets attr-dict `:` type($source) `to` type($result)
The “extract_slice” operation enables extracting a slice of a tensor in registers.
The “extract_slice” operation supports the following arguments:
source: the base tensor on which to create a view tensor
offsets: offsets into the base tensor at which to create the view
Example 1:
#blocked = #triton_gpu.blocked<{sizePerThread = [1, 8],
threadsPerWarp = [4, 16], warpsPerCTA = [4, 1], order = [0, 1]}>
#blocked1 = #triton_gpu.blocked<{sizePerThread = [1, 8],
threadsPerWarp = [16, 4], warpsPerCTA = [4, 1], order = [0, 1]}>
%1 = triton_gpu.convert_layout %0 : tensor<128x128xf16, #blocked>
-> tensor<128x128xf16, #blocked1>
// create a slice of base tensor %1 with static offsets
%2 = amdgpu.extract_slice %0 [0, 0] :
tensor<128x128xf16, #blocked1> to tensor<128x32xf16, #blocked1>
Example 1 shows how “extract_slice” operation may be used. In this example a new slice of 128x32 is created. “extract_slice” works on tensors with layout where the desired slice has the same layout as the source tensor. “%0” cannot be sliced directly as the resulting slice cannot have the same layout as “%0”. Therefore it needs to be converted to a layout suitable for slicing. “#blocked1” layout is appropriate for this as it keeps the sizePerThread the same thus keeping coalescing properties the same. In order to utilize all threads in a warp, “threadsPerWarp” is set to [16,4] for this new layout. This layout conversion carried out before using “extract_slice” ensures slicing still uses all threads efficiently. The size of the slice is determined by the result type.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes:¶
Attribute | MLIR Type | Description |
---|---|---|
static_offsets | ::mlir::DenseI64ArrayAttr | i64 dense array attribute |
Operands:¶
Operand |
Description |
---|---|
|
ranked tensor of any type values |
Results:¶
Result |
Description |
---|---|
|
ranked tensor of any type values |
amdgpu.instruction_sched_hint
(triton::amdgpu::InstructionSchedHint)¶
A placeholder op for instruction scheduling hints within a basic block
Syntax:
operation ::= `amdgpu.instruction_sched_hint` attr-dict
A placeholder op for instruction scheduling hints applied to instructions within
a basic block where the placeholder op is located. This op is primarily intended
to be used to adjust instruction scheduling inside the resulting main loop
of a tt.dot
operation. It’s easier to identify dot ops at a high level and, thus,
to mark intended scheduling regions. The hint ops are eventually lowered
into LLVM AMDGPU instruction scheduling primitives, which are meant to control
how different kinds of instructions (valu/mfma, global/shared memory, etc.) should
interleave for better instruction level parallelism.
Attributes:¶
Attribute | MLIR Type | Description |
---|---|---|
numDsReadsA | ::mlir::triton::amdgpu::InstCounterAttr | An instruction counter attribute.{{% markdown %}} The attribute holds the number of issued LLVM instructions of a specific kind as well as the data type. {{% /markdown %}} |
numDsReadsB | ::mlir::triton::amdgpu::InstCounterAttr | An instruction counter attribute.{{% markdown %}} The attribute holds the number of issued LLVM instructions of a specific kind as well as the data type. {{% /markdown %}} |
numDsWritesA | ::mlir::triton::amdgpu::InstCounterAttr | An instruction counter attribute.{{% markdown %}} The attribute holds the number of issued LLVM instructions of a specific kind as well as the data type. {{% /markdown %}} |
numDsWritesB | ::mlir::triton::amdgpu::InstCounterAttr | An instruction counter attribute.{{% markdown %}} The attribute holds the number of issued LLVM instructions of a specific kind as well as the data type. {{% /markdown %}} |
numGlobalLoadsA | ::mlir::triton::amdgpu::InstCounterAttr | An instruction counter attribute.{{% markdown %}} The attribute holds the number of issued LLVM instructions of a specific kind as well as the data type. {{% /markdown %}} |
numGlobalLoadsB | ::mlir::triton::amdgpu::InstCounterAttr | An instruction counter attribute.{{% markdown %}} The attribute holds the number of issued LLVM instructions of a specific kind as well as the data type. {{% /markdown %}} |
isBufferLoadsAEnabled | ::mlir::BoolAttr | bool attribute |
isBufferLoadsBEnabled | ::mlir::BoolAttr | bool attribute |
numMMAs | ::mlir::triton::amdgpu::InstCounterAttr | An instruction counter attribute.{{% markdown %}} The attribute holds the number of issued LLVM instructions of a specific kind as well as the data type. {{% /markdown %}} |