triton.testing.do_bench_proton
- triton.testing.do_bench_proton(fn, warmup=25, rep=100, grad_to_none=None, quantiles=None, return_mode='mean')
Benchmark the runtime of kernels invoked by the provided function using the Proton profiler.
The measured runtime is generally more accurate than do_bench for short kernels that are affected by CPU overhead. Note that this function has several constraints compared to do_bench: - It does not measure GPU operations other than kernels (e.g., memory copies, synchronization, etc.). - It supports only AMD and NVIDIA GPUs.
- Parameters:
fn (Callable) – Function to benchmark.
warmup (int) – Warmup time (in ms).
rep (int) – Repetition time (in ms).
grad_to_none (torch.Tensor, optional) – Reset the gradient of the provided tensor(s) to None.
quantiles (list[float], optional) – Performance percentiles to return in addition to the median.
return_mode (str) – The statistical measure to return. Options are “min”, “max”, “mean”, “median”, or “all”. Default is “mean”.