Debugging

Printing Tensor Values

Deeploy provides two primary approaches for printing the input and output tensors of computational kernels:

Topology Pass: This approach modifies the graph’s topology by inserting print nodes. These nodes can be added before or after a specified operator, which can be selected using a regular expression.
Code Transformation: This approach inserts code-level print statements to output tensor values at various stages of execution. Deeploy also offers memory-aware versions that allow printing values at specific memory levels (e.g., per tile).

Topology Pass

The DebugPrint topology pass modifies the graph by inserting print nodes either before or after specified operators. The target operator(s) can be selected using a regular expression pattern.

To enable this, extend the optimization passes by adding DebugPrintPass. For example, to modify the GenericOptimizer in Deeploy/Targets/Generic/Platform.py, you can add:

GenericOptimizer = TopologyOptimizer([
    # ... existing passes ...
    DebugPrintPass(r'.*[Mm]at[Mm]ul.*', position='after'),
])

Ensure that your platform provides a valid implementation and mapping for the ``DebugPrint`` node.

Code Transformation

The PrintInputGeneration and PrintOutputGeneration code transformations offer a flexible way to insert print statements directly into the generated code. These transformations allow you to log tensor values at any point during execution, making them useful for in-depth debugging. For cases where memory layout is important—such as debugging tiled execution—Deeploy also provides memory-aware variants: MemoryAwarePrintInputGeneration and MemoryAwarePrintOutputGeneration.

To use these transformations, add them to the code transformation pipeline in your target bindings. For example, you can extend the BasicTransformer in Deeploy/Targets/Generic/Bindings.py:

BasicTransformer = CodeTransformation([
    # ... existing passes ...
    PrintInputGeneration(),
    PrintOutputGeneration()
])

For memory-aware platforms, use the memory-aware transformations instead. For example, extend ForkTransformer in Deeploy/Targets/PULPOpen/Platform.py:

ForkTransformer = CodeTransformation([
    # ... existing passes ...
    MemoryAwarePrintInputGeneration("L1"),
    MemoryAwarePrintOutputGeneration("L1")
])

To apply these code transformations across all bindings, refer to the test implementation in DeeployTest/testPrintInputOutputTransformation.py.

The output of the print statements will be directed to the standard output stream. .. code-block:

Add_0 DeeployNetwork_input_0: int8_t, [1, 5, 5, 5], 0x2062b0
[[[[ -63, -76,-116, -22, -35,],
[-105, -69, -51, -95, -69,],
[-104,  -6, -37, -12, -63,],
[ -32, -10,  -8, -29, -15,],
[-111, -18,-120,-106, -50,],
],
[[ -62, -22, -60,-109, -13,],
[ -78, -52, -42,-104,-100,],
[-115,-105,-119,-104, -62,],
[ -57, -81,-104, -39, -13,],
[ -51, -47, -18, -14,-123,],
],
[[-111, -92, -91, -84,-121,],
[ -41,-118,-128,-109,  -7,],
[-120, -66,  -9, -66, -55,],
[ -96, -34, -47,-105, -91,],
[ -94,-106,  -3,-121, -55,],
],
[[ -15, -65,-126, -11,-101,],
[ -85,  -6, -98, -46, -84,],
[ -55,  -8, -53, -99, -79,],
[ -72, -12,  -2,-103, -55,],
[ -95, -48, -16, -78, -95,],
],
[[ -18, -40, -43,-104, -78,],
[ -33,-102, -46,-110, -40,],
[-128, -24, -68, -70,-113,],
[ -73,-123,-114, -51,  -3,],
[ -69, -68, -66, -26,-124,],
],
],
],