Debugging

Printing Tensor Values

Deeploy provides two primary approaches for printing the input and output tensors of computational kernels:

  1. Topology Pass: This approach modifies the graph’s topology by inserting print nodes. These nodes can be added before or after a specified operator, which can be selected using a regular expression.

  2. Code Transformation: This approach inserts code-level print statements to output tensor values at various stages of execution. Deeploy also offers memory-aware versions that allow printing values at specific memory levels (e.g., per tile).

Topology Pass

The DebugPrint topology pass modifies the graph by inserting print nodes either before or after specified operators. The target operator(s) can be selected using a regular expression pattern.

To enable this, extend the optimization passes by adding DebugPrintPass. For example, to modify the GenericOptimizer in Deeploy/Targets/Generic/Platform.py, you can add:

GenericOptimizer = TopologyOptimizer([
    # ... existing passes ...
    DebugPrintPass(r'.*[Mm]at[Mm]ul.*', position='after'),
])

Ensure that your platform provides a valid implementation and mapping for the ``DebugPrint`` node.

Code Transformation

The PrintInputGeneration and PrintOutputGeneration code transformations offer a flexible way to insert print statements directly into the generated code. These transformations allow you to log tensor values at any point during execution, making them useful for in-depth debugging. For cases where memory layout is important—such as debugging tiled execution—Deeploy also provides memory-aware variants: MemoryAwarePrintInputGeneration and MemoryAwarePrintOutputGeneration.

To use these transformations, add them to the code transformation pipeline in your target bindings. For example, you can extend the BasicTransformer in Deeploy/Targets/Generic/Bindings.py:

BasicTransformer = CodeTransformation([
    # ... existing passes ...
    PrintInputGeneration(),
    PrintOutputGeneration()
])

For memory-aware platforms, use the memory-aware transformations instead. For example, extend ForkTransformer in Deeploy/Targets/PULPOpen/Platform.py:

ForkTransformer = CodeTransformation([
    # ... existing passes ...
    MemoryAwarePrintInputGeneration("L1"),
    MemoryAwarePrintOutputGeneration("L1")
])

To apply these code transformations across all bindings, refer to the test implementation in DeeployTest/testPrintInputOutputTransformation.py.

The output of the print statements will be directed to the standard output stream. .. code-block:

Add_0 DeeployNetwork_input_0: int8_t, [1, 5, 5, 5], 0x2062b0
[[[[ -63, -76,-116, -22, -35,],
[-105, -69, -51, -95, -69,],
[-104,  -6, -37, -12, -63,],
[ -32, -10,  -8, -29, -15,],
[-111, -18,-120,-106, -50,],
],
[[ -62, -22, -60,-109, -13,],
[ -78, -52, -42,-104,-100,],
[-115,-105,-119,-104, -62,],
[ -57, -81,-104, -39, -13,],
[ -51, -47, -18, -14,-123,],
],
[[-111, -92, -91, -84,-121,],
[ -41,-118,-128,-109,  -7,],
[-120, -66,  -9, -66, -55,],
[ -96, -34, -47,-105, -91,],
[ -94,-106,  -3,-121, -55,],
],
[[ -15, -65,-126, -11,-101,],
[ -85,  -6, -98, -46, -84,],
[ -55,  -8, -53, -99, -79,],
[ -72, -12,  -2,-103, -55,],
[ -95, -48, -16, -78, -95,],
],
[[ -18, -40, -43,-104, -78,],
[ -33,-102, -46,-110, -40,],
[-128, -24, -68, -70,-113,],
[ -73,-123,-114, -51,  -3,],
[ -69, -68, -66, -26,-124,],
],
],
],