triton.language¶
Programming Model¶
Represents an N-dimensional array of values or pointers. |
|
Returns the id of the current program instance along the given |
|
Returns the number of program instances launched along the given |
Creation Ops¶
Returns contiguous values within the half-open interval |
|
Concatenate the given blocks |
|
Returns a tensor filled with the scalar value for the given |
|
Returns a tensor filled with the scalar value 0 for the given |
|
Returns a tensor of zeros with the same shape and type as a given tensor. |
|
Casts a tensor to the given |
Shape Manipulation Ops¶
Tries to broadcast the two given blocks to a common compatible shape. |
|
Tries to broadcast the given tensor to a new |
|
Expand the shape of a tensor, by inserting new length-1 dimensions. |
|
Interleaves the values of two tensors along their last dimension. |
|
Join the given tensors in a new, minor dimension. |
|
Permutes the dimensions of a tensor. |
|
Returns a contiguous flattened view of |
|
Returns a tensor with the same number of elements as input but with the provided shape. |
|
Split a tensor in two along its last dim, which must have size 2. |
|
Permutes the dimensions of a tensor. |
|
Returns a tensor with the same elements as input but a different shape. |
Linear Algebra Ops¶
Returns the matrix product of two blocks. |
|
Returns the matrix product of two blocks in microscaling format. |
Memory/Pointer Ops¶
Return a tensor of data whose values are loaded from memory at location defined by pointer: |
|
Store a tensor of data into memory locations defined by pointer. |
|
Returns a pointer to a block in a parent tensor |
|
Advance a block pointer |
Indexing Ops¶
Flips a tensor x along the dimension dim. |
|
Returns a tensor of elements from either |
|
Transforms the indices of a row-major size_i * size_j matrix into the indices of a column-major matrix for each group of size_g rows. |
Math Ops¶
Computes the element-wise absolute value of |
|
Computes the ceiling division of |
|
Computes the element-wise ceil of |
|
Clamps the input tensor |
|
Computes the element-wise cosine of |
|
Computes the element-wise precise division (rounding to nearest wrt the IEEE standard) of |
|
Computes the element-wise error function of |
|
Computes the element-wise exponential of |
|
Computes the element-wise exponential (base 2) of |
|
Computes the element-wise fast division of |
|
Computes the element-wise floor of |
|
Computes the element-wise fused multiply-add of |
|
Computes the element-wise natural logarithm of |
|
Computes the element-wise logarithm (base 2) of |
|
Computes the element-wise maximum of |
|
Computes the element-wise minimum of |
|
Computes the element-wise inverse square root of |
|
Computes the element-wise sigmoid of |
|
Computes the element-wise sine of |
|
Computes the element-wise softmax of |
|
Computes the element-wise fast square root of |
|
Computes the element-wise precise square root (rounding to nearest wrt the IEEE standard) of |
|
Computes the element-wise most significant N bits of the 2N-bit product of |
Reduction Ops¶
Returns the maximum index of all elements in the |
|
Returns the minimum index of all elements in the |
|
Returns the maximum of all elements in the |
|
Returns the minimum of all elements in the |
|
Applies the combine_fn to all elements in |
|
Returns the sum of all elements in the |
|
Returns the xor sum of all elements in the |
Scan/Sort Ops¶
Applies the combine_fn to each elements with a carry in |
|
Returns the cumprod of all elements in the |
|
Returns the cumsum of all elements in the |
|
computes an histogram based on input tensor with num_bins bins, the bins have a width of 1 and start at 0. |
|
Sorts a tensor along a specified dimension. |
Atomic Ops¶
Performs an atomic add at the memory location specified by |
|
Performs an atomic logical and at the memory location specified by |
|
Performs an atomic compare-and-swap at the memory location specified by |
|
Performs an atomic max at the memory location specified by |
|
Performs an atomic min at the memory location specified by |
|
Performs an atomic logical or at the memory location specified by |
|
Performs an atomic exchange at the memory location specified by |
|
Performs an atomic logical xor at the memory location specified by |
Random Number Generation¶
Given a |
|
Given a |
|
Given a |
|
Given a |
Iterators¶
Iterator that counts upward forever. |
|
Iterator that counts upward forever. |
Inline Assembly¶
Execute inline assembly over a tensor. |
Compiler Hint Ops¶
Insert a barrier to synchronize all threads in a block. |
|
Let the compiler know that the value first values in |
|
Let the compiler know that the value first values in |
|
Let the compiler know that the values in |
Debug Ops¶
Print the values at compile time. |
|
Assert the condition at compile time. |
|
Print the values at runtime from the device. |
|
Assert the condition at runtime from the device. |