Debugging ========= Printing Tensor Values ---------------------- Deeploy provides two primary approaches for printing the input and output tensors of computational kernels: 1. **Topology Pass**: This approach modifies the graph's topology by inserting print nodes. These nodes can be added before or after a specified operator, which can be selected using a regular expression. 2. **Code Transformation**: This approach inserts code-level print statements to output tensor values at various stages of execution. Deeploy also offers memory-aware versions that allow printing values at specific memory levels (e.g., per tile). Topology Pass ~~~~~~~~~~~~~ .. currentmodule:: Deeploy.CommonExtensions.OptimizationPasses.TopologyOptimizationPasses.DebugPasses The :py:class:`DebugPrint` topology pass modifies the graph by inserting print nodes either before or after specified operators. The target operator(s) can be selected using a regular expression pattern. To enable this, extend the optimization passes by adding :py:class:`DebugPrintPass`. For example, to modify the ``GenericOptimizer`` in ``Deeploy/Targets/Generic/Platform.py``, you can add: .. code-block:: python GenericOptimizer = TopologyOptimizer([ # ... existing passes ... DebugPrintPass(r'.*[Mm]at[Mm]ul.*', position='after'), ]) **Ensure that your platform provides a valid implementation and mapping for the ``DebugPrint`` node.** Code Transformation ~~~~~~~~~~~~~~~~~~~ .. currentmodule:: Deeploy.CommonExtensions.CodeTransformationPasses.PrintInputs The :py:class:`PrintInputGeneration` and :py:class:`PrintOutputGeneration` code transformations offer a flexible way to insert print statements directly into the generated code. These transformations allow you to log tensor values at any point during execution, making them useful for in-depth debugging. For cases where memory layout is important—such as debugging tiled execution—Deeploy also provides memory-aware variants: :py:class:`MemoryAwarePrintInputGeneration` and :py:class:`MemoryAwarePrintOutputGeneration`. To use these transformations, add them to the code transformation pipeline in your target bindings. For example, you can extend the ``BasicTransformer`` in ``Deeploy/Targets/Generic/Bindings.py``: .. code-block:: python BasicTransformer = CodeTransformation([ # ... existing passes ... PrintInputGeneration(), PrintOutputGeneration() ]) For memory-aware platforms, use the memory-aware transformations instead. For example, extend ``ForkTransformer`` in ``Deeploy/Targets/PULPOpen/Platform.py``: .. code-block:: python ForkTransformer = CodeTransformation([ # ... existing passes ... MemoryAwarePrintInputGeneration("L1"), MemoryAwarePrintOutputGeneration("L1") ]) To apply these code transformations across all bindings, refer to the test implementation in ``DeeployTest/testPrintInputOutputTransformation.py``. The output of the print statements will be directed to the standard output stream. .. code-block:: Add_0 DeeployNetwork_input_0: int8_t, [1, 5, 5, 5], 0x2062b0 [[[[ -63, -76,-116, -22, -35,], [-105, -69, -51, -95, -69,], [-104, -6, -37, -12, -63,], [ -32, -10, -8, -29, -15,], [-111, -18,-120,-106, -50,], ], [[ -62, -22, -60,-109, -13,], [ -78, -52, -42,-104,-100,], [-115,-105,-119,-104, -62,], [ -57, -81,-104, -39, -13,], [ -51, -47, -18, -14,-123,], ], [[-111, -92, -91, -84,-121,], [ -41,-118,-128,-109, -7,], [-120, -66, -9, -66, -55,], [ -96, -34, -47,-105, -91,], [ -94,-106, -3,-121, -55,], ], [[ -15, -65,-126, -11,-101,], [ -85, -6, -98, -46, -84,], [ -55, -8, -53, -99, -79,], [ -72, -12, -2,-103, -55,], [ -95, -48, -16, -78, -95,], ], [[ -18, -40, -43,-104, -78,], [ -33,-102, -46,-110, -40,], [-128, -24, -68, -70,-113,], [ -73,-123,-114, -51, -3,], [ -69, -68, -66, -26,-124,], ], ], ],