triton.testing.do_bench_proton

triton.testing.do_bench_proton(fn, warmup=25, rep=100, grad_to_none=None, quantiles=None, return_mode='mean')

Benchmark the runtime of kernels invoked by the provided function using the Proton profiler.

The measured runtime is generally more accurate than do_bench for short kernels that are affected by CPU overhead. Note that this function has several constraints compared to do_bench: - It does not measure GPU operations other than kernels (e.g., memory copies, synchronization, etc.). - It supports only AMD and NVIDIA GPUs.

Parameters:
  • fn (Callable) – Function to benchmark.

  • warmup (int) – Warmup time (in ms).

  • rep (int) – Repetition time (in ms).

  • grad_to_none (torch.Tensor, optional) – Reset the gradient of the provided tensor(s) to None.

  • quantiles (list[float], optional) – Performance percentiles to return in addition to the median.

  • return_mode (str) – The statistical measure to return. Options are “min”, “max”, “mean”, “median”, or “all”. Default is “mean”.