triton.experimental.gluon.language.warp_specialize

triton.experimental.gluon.language.warp_specialize(functions_and_args, worker_num_warps, worker_num_regs=None, _semantic=None, _generator=None)

Create a warp-specialized execution region, partitioning work across warps.

This forks the current execution into a “default partition” and an arbitrary number of “worker partitons”. The default partition is executed in the same num_warps warps as the parent region, and may accept tensor arguments and return tensors. Worker partitions are executed in additional warps, which sit idle while executing the parent region.

Note that calling warp_specialize recursively is not supported.

Parameters:

functions_and_args (List[Tuple[Callable, Any]]) – List of functions and arguments for each partition. The first of which is the default partition.
worker_num_warps (List[int]) – Number of warps used for each worker partition.
worker_num_regs (List[int], optional) – Number of registers for each worker partition. If not None, will be used by backend for dynamic register reallocation.

Returns:

Results from the default partition.

Return type:

Tuple[Any, …]