.. SPDX-FileCopyrightText: 2025 ETH Zurich and University of Bologna .. .. SPDX-License-Identifier: Apache-2.0 Microbenchmark ============== Pass ``--profileMicrobenchmark`` to any PULPOpen runner (``testMVP.py``, ``generateNetwork.py``, ``deeployRunner_*.py``) to wrap each layer in ``RunNetwork`` with PULP performance counters. Off by default; zero overhead when unused. The flag flows through :py:attr:`Deeploy.DeeployTypes.CodeGenVerbosity.microbenchmarkProfiling` into :py:class:`Deeploy.Targets.PULPOpen.CodeTransformationPasses.PULPMicrobenchmark.PULPMicrobenchmark`, which is registered last in the PULPOpen ``ForkTransformer`` and ``ClusterTransformer`` chains so it covers the full per-layer body (tiling, DMA, memory management). The C-side helpers live in ``TargetLibraries/PULPOpen/inc/perf_utils.h``. Each layer prints one block on ``core 0``: .. code-block:: text === Performance Statistics: Add_0 === Cycles: 1442 Instructions: 149 IPC: 0.103 Loads / Stores / Branches / Taken Branches / RVC Load Stalls / Jump Stalls / I-cache Misses / TCDM Contentions External Loads / Stores and their cycle counts External-memory and TCDM-contention counters are zero when the wrapped region has no L2/L3 traffic or no bank conflicts (e.g. small untiled kernels that fit in L1). Some events may not be modelled by GVSoC — verify on a tiled test before assuming a counter is broken.