triton.experimental.gluon.language.nvidia.blackwell.tma.store_wait

triton.experimental.gluon.language.nvidia.blackwell.tma.store_wait(pendings, read_only=True, _semantic=None)

Wait for pending TMA stores.

Parameters:
  • pendings (int | ttgl.constexpr) – Maximum number of TMA stores allowed to remain pending.

  • read_only (bool | ttgl.constexpr) – If true, wait only until the pending stores have finished reading their shared-memory sources, but writes may not be visible in HBM. Defaults to true.

Notes

By default, tma.store_wait only waits for the TMA store to finish reading from the shared memory, however this does not mean that the write has been fully flushed to HBM. If your kernel uses TMA to pass messages between CTAs, or between nvlink devices then you will need to use read_only=False before any release operation.