triton.testing.do_bench_proton

triton.testing.do_bench_proton(fn, warmup=25, rep=100, grad_to_none=None, quantiles=None, return_mode='mean')

Benchmark the runtime of kernels invoked by the provided function using the Proton profiler.

The measured runtime is generally more accurate than do_bench for short kernels that are affected by CPU overhead. Note that this function has several constraints compared to do_bench: - It does not measure GPU operations other than kernels (e.g., memory copies, synchronization, etc.). - It supports only AMD and NVIDIA GPUs.

Parameters:

fn (Callable) – Function to benchmark.
warmup (int) – Warmup time (in ms).
rep (int) – Repetition time (in ms).
grad_to_none (torch.Tensor, optional) – Reset the gradient of the provided tensor(s) to None.
quantiles (list[float], optional) – Performance percentiles to return in addition to the median.
return_mode (str) – The statistical measure to return. Options are “min”, “max”, “mean”, “median”, or “all”. Default is “mean”.

Returns:

The runtime(s) in milliseconds: a single float for a scalar return_mode, or a list of floats if quantiles is set or return_mode="all".

Return type:

float | list[float]