Deeploy.TilingExtension.TilerExtension.TilerDeployerWrapper

class Deeploy.TilingExtension.TilerExtension.TilerDeployerWrapper(deployer: MemoryLevelAwareDeployer | MemoryDeployerWrapper, tilerCls: Type[Tiler] = <class 'Deeploy.TilingExtension.TilerExtension.Tiler'>, testName: str | None = None, workDir: str | None = None)

Bases: NetworkDeployerWrapper

Wrapper for network deployers that adds tiling capabilities.

Extends NetworkDeployerWrapper to provide automatic tiling and memory management for neural network deployment on memory-constrained hardware.

Parameters:
Variables:

tiler (Tiler) – The tiler instance used for memory optimization.

Raises:

AssertionError – If the platform is not a MemoryPlatform or MemoryPlatformWrapper.

Notes

The wrapper automatically handles tiling setup, constraint solving, and memory allocation during the binding process.

Methods

__init__(deployer: MemoryLevelAwareDeployer | MemoryDeployerWrapper, tilerCls: Type[Tiler] = <class 'Deeploy.TilingExtension.TilerExtension.Tiler'>, testName: str | None = None, workDir: str | None = None)

Initialize the tiler deployer wrapper.

Parameters:
  • deployer (Union[MemoryLevelAwareDeployer, MemoryDeployerWrapper]) – The base deployer to wrap.

  • tilerCls (Type[Tiler], optional) – The tiler class to instantiate, by default Tiler.

  • testName (Optional[str], optional) – Optional name for the test case, used for file naming. Defaults to None.

  • workDir (Optional[str], optional) – Optional working directory for temporary files. Defaults to None.

__init__(deployer, tilerCls, testName, workDir)

Initialize the tiler deployer wrapper.

backEnd([verbose])

API hook to generate code once kernel implementations are picked and tiling, memory allocation, and other low-level optimizations have been done.

bind()

Bind the network with automatic tiling.

codeTransform([verbose])

Apply code transformations on every layer's execution block

exportDeeployState(folderPath, fileName)

Export compressed network context and neural network graph

frontEnd()

API hook to prepare the graph to be deployed and build the initial NetworkContext

generateBufferAllocationCode()

Generates code to allocate space for the global input and output buffer of the network

generateBufferDeAllocationCode()

Generates code to deallocate all global buffers

generateBufferInitializationCode()

Generates code for all forward-declaration of buffers used during inference

generateEngineInitializationCode()

Generate initialization code for all compute engines

generateFunction([verbose])

Helper function to prepare deployment and return generated function code

generateGlobalDefinitionCode()

Generate all global definition code for inference

generateIOBufferInitializationCode()

Generate initialization code for global network inputs and outputs

generateIncludeString()

Generate code to include platform-dependent includes

generateInferenceCode()

Generate the actual inference function for the entire network

generateInferenceInitializationCode()

Generate initialization code, including static memory allocation and other setup tasks

importDeeployState(folderPath, fileName)

Override this container's graph and context with loaded compressed artifacts

inputs()

Return a list of all VariableBuffers that are also global inputs of the network

lower(graph)

Apply the lowering optimize

midEnd()

API hook to be used after finalizing kernel selection; hoist transient buffers, and perform low-level code optimizations (e.g. tiling and static memory allocation).

numberOfOps(verbose)

Returns the total number of operations per network inference

outputs()

Return a list of all VariableBuffers that are also global outputs of the network

parse([default_channels_first])

Parses the full network by iteratively exploring mapping and binding options with backtracking

prepare([verbose])

API hook to perform the entire deployment process to the point where generated code may be extracted

tile([tilingSolution, memoryMap])

Perform tiling and memory allocation for the network.

Attributes

bound

parsed

prepared

transformed

worstCaseBufferSize

Get the worst-case buffer sizes including inputs and outputs.

property worstCaseBufferSize

Get the worst-case buffer sizes including inputs and outputs.

Computes the total worst-case memory requirements including both tiled buffers and input/output buffers.

Returns:

Dictionary mapping memory level names to their total worst-case buffer sizes in bytes.

Return type:

Dict[str, int]

Notes

Extends the tiler’s worst-case buffer size calculation by adding the memory requirements of input and output buffers.

tile(tilingSolution: List[PatternMemoryConstraints] | None = None, memoryMap: Dict[str, List[List[MemoryBlock]]] | None = None)

Perform tiling and memory allocation for the network.

Executes the complete tiling process including constraint setup, optimization, memory allocation, and code generation updates.

Parameters:
  • tilingSolution (Optional[TilingSolution], optional) – Pre-computed tiling solution to use instead of computing one. If None, the solution will be computed automatically.

  • memoryMap (Optional[MemoryMap], optional) – Pre-computed memory map to use instead of computing one. If None, the memory map will be computed automatically.

Raises:

AssertionError – If only one of tilingSolution or memoryMap is provided, if MiniMalloc is used with non-layer-wise tiling, or if tensors are not uniformly allocated when using MiniMalloc.

Notes

When using MiniMalloc memory allocation strategy, additional constraints apply: - Only layer-wise execution is supported - All tensors must be in the default memory level

The method performs validation of the computed solutions and updates the execution blocks with tiling information.

bind()

Bind the network with automatic tiling.

Performs the complete binding process including layer binding and automatic tiling optimization.

Returns:

True if binding was successful, False otherwise.

Return type:

bool

Notes

Calls the parent bind() method first, then performs tiling if the initial binding was successful.

backEnd(verbose: CodeGenVerbosity = CodeGenVerbosity(tilingProfiling=None, untiledProfiling=None))

API hook to generate code once kernel implementations are picked and tiling, memory allocation, and other low-level optimizations have been done.

Parameters:

verbose (CodeGenVerbosity) – Control verbosity of generated code

codeTransform(verbose: CodeGenVerbosity = CodeGenVerbosity(tilingProfiling=None, untiledProfiling=None))

Apply code transformations on every layer’s execution block

Parameters:

verbose (CodeGenVerbosity) – Control code generation verbosity

Raises:

RuntimeError – Raises a RuntimeError if the entire network is not bound

exportDeeployState(folderPath: str, fileName: str)

Export compressed network context and neural network graph

Parameters:
  • folderPath (str) – path to directory where to save context and graph

  • fileName (str) – prefix to use when saving artifacts

frontEnd()

API hook to prepare the graph to be deployed and build the initial NetworkContext

generateBufferAllocationCode() str

Generates code to allocate space for the global input and output buffer of the network

Returns:

Allocation code for global IO buffers

Return type:

str

Raises:

RuntimeError – Raises a RuntimeError if network is not parsed and bound

generateBufferDeAllocationCode() str

Generates code to deallocate all global buffers

Returns:

Code to deallocate buffers

Return type:

str

Raises:

RuntimeError – Raises a RuntimeError if network is not parsed and bound

generateBufferInitializationCode() str

Generates code for all forward-declaration of buffers used during inference

Returns:

Returns forward-declaration code

Return type:

str

Raises:

RuntimeError – Raises a RuntimeError if network is not parsed and bound

generateEngineInitializationCode() str

Generate initialization code for all compute engines

Returns:

Initialization code for all engines

Return type:

str

generateFunction(verbose: CodeGenVerbosity = CodeGenVerbosity(tilingProfiling=None, untiledProfiling=None)) str

Helper function to prepare deployment and return generated function code

generateGlobalDefinitionCode() str

Generate all global definition code for inference

Returns:

Global Definition code

Return type:

str

Raises:

RuntimeError – Raises a RuntimeError if network is not parsed and bound

generateIOBufferInitializationCode() str

Generate initialization code for global network inputs and outputs

Returns:

Initialization code

Return type:

str

Raises:

RuntimeError – Raises a RuntimeError if network is not parsed and bound

generateIncludeString() str

Generate code to include platform-dependent includes

Returns:

Include code

Return type:

str

generateInferenceCode() str

Generate the actual inference function for the entire network

Returns:

The full inference method

Return type:

str

Raises:

ValueError – Raises a RuntimeError if network is not parsed and bound

generateInferenceInitializationCode() str

Generate initialization code, including static memory allocation and other setup tasks

Returns:

Initialization code

Return type:

str

Raises:

RuntimeError – Raises a RuntimeError if network is not parsed and bound

importDeeployState(folderPath: str, fileName: str)

Override this container’s graph and context with loaded compressed artifacts

Parameters:
  • folderPath (str) – Path to the artifact directory

  • fileName (str) – prefix of the saved artifacts

inputs() List[VariableBuffer]

Return a list of all VariableBuffers that are also global inputs of the network

Returns:

Global inputs

Return type:

List[VariableBuffer]

lower(graph: Graph) Graph

Apply the lowering optimize

Parameters:

graph (gs.Graph) – Unmodified input neural network graph

Returns:

Neural network graph that is deployable with the DeploymentPlatform’s Mapping

Return type:

gs.Graph

midEnd()

API hook to be used after finalizing kernel selection; hoist transient buffers, and perform low-level code optimizations (e.g. tiling and static memory allocation)

numberOfOps(verbose: bool) int

Returns the total number of operations per network inference

Parameters:

verbose (bool) – Control whether the number of operations are printed to STDOUT for each operator

Returns:

Number of operations (1 MAC = 2 Ops) per network inference

Return type:

int

Raises:

RuntimeError – Raises a RuntimeError if network is not parsed and bound

outputs() List[VariableBuffer]

Return a list of all VariableBuffers that are also global outputs of the network

Returns:

Global outputs

Return type:

List[VariableBuffer]

parse(default_channels_first: bool = True) bool

Parses the full network by iteratively exploring mapping and binding options with backtracking

Parameters:

default_channels_first (bool) – Whether the default data layout is CxHxW or HxWxC

Returns:

Returns a boolean to indicate whether parsing was successful

Return type:

bool

Raises:

RuntimeError – Raises a RuntimeError if backtracking was exhausted without finding a mapping solution

prepare(verbose: CodeGenVerbosity = CodeGenVerbosity(tilingProfiling=None, untiledProfiling=None))

API hook to perform the entire deployment process to the point where generated code may be extracted

Parameters:

verbose (CodeGenVerbosity) – Control verbosity of generated code