triton.language¶

Programming Model¶

 program_id Returns the id of the current program instance along the given axis. num_programs Returns the number of program instances launched along the given axis.

Creation Ops¶

 arange Returns contiguous values within the open interval [start, end). zeros Returns a tensor filled with the scalar value 0 for the given shape and dtype.

Shape Manipulation Ops¶

 broadcast_to Tries to broadcast the given tensor to a new shape. reshape Tries to reshape the given tensor to a new shape. ravel Returns a contiguous flattened view of x

Linear Algebra Ops¶

 dot Returns the matrix product of two blocks.

Memory Ops¶

 load Return a tensor of data whose values are, elementwise, loaded from memory at location defined by pointer. store Stores value tensor of elements in memory, element-wise, at the memory locations specified by pointer. atomic_cas Performs an atomic compare-and-swap at the memory location specified by pointer. atomic_xchg Performs an atomic exchange at the memory location specified by pointer.

Indexing Ops¶

 where Returns a tensor of elements from either x or y, depending on condition.

Math Ops¶

 exp Computes the element-wise exponential of x log Computes the element-wise natural logarithm of x cos Computes the element-wise cosine of x sin Computes the element-wise sine of x sqrt Computes the element-wise square root of x sigmoid Computes the element-wise sigmoid of x softmax Computes the element-wise softmax of x

Reduction Ops¶

 max Returns the maximum of all elements in the input tensor along the provided axis min Returns the minimum of all elements in the input tensor along the provided axis sum Returns the sum of all elements in the input tensor along the provided axis

Atomic Ops¶

 atomic_cas Performs an atomic compare-and-swap at the memory location specified by pointer. atomic_xchg Performs an atomic exchange at the memory location specified by pointer. atomic_add Performs an atomic add at the memory location specified by pointer. atomic_max Performs an atomic max at the memory location specified by pointer. atomic_min Performs an atomic min at the memory location specified by pointer. atomic_and Performs an atomic logical and at the memory location specified by pointer. atomic_or Performs an atomic logical or at the memory location specified by pointer. atomic_xor Performs an atomic logical xor at the memory location specified by pointer.

Comparison ops¶

 minimum Computes the element-wise minimum of x and y. maximum Computes the element-wise maximum of x and y.

Random Number Generation¶

 randint4x Given a seed scalar and an offset block, returns four blocks of random int32. randint Given a seed scalar and an offset block, returns a single block of random int32. rand Given a seed scalar and an offset block, returns a block of random float32 in $$U(0, 1)$$ randn Given a seed scalar and an offset block, returns a block of random float32 in $$\mathcal{N}(0, 1)$$

Compiler Hint Ops¶

 multiple_of Let the compiler knows that the values in input are all multiples of value.