triton.Config

class triton.Config(self, kwargs, num_warps=4, num_stages=2, num_ctas=1, enable_warp_specialization=False, pre_hook=None)

An object that represents a possible kernel configuration for the auto-tuner to try.

Variables:
  • meta – a dictionary of meta-parameters to pass to the kernel as keyword arguments.

  • num_warps – the number of warps to use for the kernel when compiled for GPUs. For example, if num_warps=8, then each kernel instance will be automatically parallelized to cooperatively execute using 8 * 32 = 256 threads.

  • num_stages – the number of stages that the compiler should use when software-pipelining loops. Mostly useful for matrix multiplication workloads on SM80+ GPUs.

  • num_ctas – number of blocks in a block cluster. SM90+ only.

  • enable_warp_specialization – enable specialization (spatial partitioning) or not. See https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#spatial-partitioning-also-known-as-warp-specialization

  • pre_hook – a function that will be called before the kernel is called. Parameters of this function are args.

__init__(self, kwargs, num_warps=4, num_stages=2, num_ctas=1, enable_warp_specialization=False, pre_hook=None)

Methods

__init__(self, kwargs[, num_warps, ...])