Deeploy.TilingExtension.TilerExtension.TilerDeployerWrapper

class Deeploy.TilingExtension.TilerExtension.TilerDeployerWrapper(deployer: MemoryLevelAwareDeployer | MemoryDeployerWrapper, tilerCls: Type[Tiler] = <class 'Deeploy.TilingExtension.TilerExtension.Tiler'>, testName: str | None = None, workDir: str | None = None)

Bases: NetworkDeployerWrapper

Wrapper for network deployers that adds tiling capabilities.

Extends NetworkDeployerWrapper to provide automatic tiling and memory management for neural network deployment on memory-constrained hardware.

Parameters:

deployer (Union[MemoryLevelAwareDeployer, MemoryDeployerWrapper]) – The base deployer to wrap with tiling capabilities.
tilerCls (Type[Tiler], optional) – The tiler class to use, by default Tiler.

Variables:

tiler (Tiler) – The tiler instance used for memory optimization.

Raises:

AssertionError – If the platform is not a MemoryPlatform or MemoryPlatformWrapper.

Notes

The wrapper automatically handles tiling setup, constraint solving, and memory allocation during the binding process.

Methods

__init__(deployer: MemoryLevelAwareDeployer | MemoryDeployerWrapper, tilerCls: Type[Tiler] = <class 'Deeploy.TilingExtension.TilerExtension.Tiler'>, testName: str | None = None, workDir: str | None = None)

Initialize the tiler deployer wrapper.

Parameters:

deployer (Union[MemoryLevelAwareDeployer, MemoryDeployerWrapper]) – The base deployer to wrap.
tilerCls (Type[Tiler], optional) – The tiler class to instantiate, by default Tiler.
testName (Optional[str], optional) – Optional name for the test case, used for file naming. Defaults to None.
workDir (Optional[str], optional) – Optional working directory for temporary files. Defaults to None.

`__init__`(deployer, tilerCls, testName, workDir)	Initialize the tiler deployer wrapper.
`backEnd`([verbose])	API hook to generate code once kernel implementations are picked and tiling, memory allocation, and other low-level optimizations have been done.
`bind`()	Bind the network with automatic tiling.
`codeTransform`([verbose])	Apply code transformations on every layer's execution block
`exportDeeployState`(folderPath, fileName)	Export compressed network context and neural network graph
`frontEnd`()	API hook to prepare the graph to be deployed and build the initial NetworkContext
`generateBufferAllocationCode`()	Generates code to allocate space for the global input and output buffer of the network
`generateBufferDeAllocationCode`()	Generates code to deallocate all global buffers
`generateBufferInitializationCode`()	Generates code for all forward-declaration of buffers used during inference
`generateEngineInitializationCode`()	Generate initialization code for all compute engines
`generateFunction`([verbose])	Helper function to prepare deployment and return generated function code
`generateGlobalDefinitionCode`()	Generate all global definition code for inference
`generateIOBufferInitializationCode`()	Generate initialization code for global network inputs and outputs
`generateIncludeString`()	Generate code to include platform-dependent includes
`generateInferenceCode`()	Generate the actual inference function for the entire network
`generateInferenceInitializationCode`()	Generate initialization code, including static memory allocation and other setup tasks
`importDeeployState`(folderPath, fileName)	Override this container's graph and context with loaded compressed artifacts
`inputs`()	Return a list of all VariableBuffers that are also global inputs of the network
`lower`(graph)	Apply the lowering optimize
`midEnd`()	API hook to be used after finalizing kernel selection; hoist transient buffers, and perform low-level code optimizations (e.g. tiling and static memory allocation).
`numberOfOps`(verbose)	Returns the total number of operations per network inference
`outputs`()	Return a list of all VariableBuffers that are also global outputs of the network
`parse`([default_channels_first])	Parses the full network by iteratively exploring mapping and binding options with backtracking
`prepare`([verbose])	API hook to perform the entire deployment process to the point where generated code may be extracted
`tile`([tilingSolution, memoryMap])	Perform tiling and memory allocation for the network.

Attributes

`bound`
`parsed`
`prepared`
`transformed`
`worstCaseBufferSize`	Get the worst-case buffer sizes including inputs and outputs.

property worstCaseBufferSize

Get the worst-case buffer sizes including inputs and outputs.

Computes the total worst-case memory requirements including both tiled buffers and input/output buffers.

Returns:: Dictionary mapping memory level names to their total worst-case buffer sizes in bytes.
Return type:: Dict[str, int]

Notes

Extends the tiler’s worst-case buffer size calculation by adding the memory requirements of input and output buffers.

tile(tilingSolution: List[PatternMemoryConstraints] | None = None, memoryMap: Dict[str, List[List[MemoryBlock]]] | None = None)

Perform tiling and memory allocation for the network.

Executes the complete tiling process including constraint setup, optimization, memory allocation, and code generation updates.

Parameters:

tilingSolution (Optional[TilingSolution], optional) – Pre-computed tiling solution to use instead of computing one. If None, the solution will be computed automatically.
memoryMap (Optional[MemoryMap], optional) – Pre-computed memory map to use instead of computing one. If None, the memory map will be computed automatically.

Raises:

AssertionError – If only one of tilingSolution or memoryMap is provided, if MiniMalloc is used with non-layer-wise tiling, or if tensors are not uniformly allocated when using MiniMalloc.

Notes

When using MiniMalloc memory allocation strategy, additional constraints apply: - Only layer-wise execution is supported - All tensors must be in the default memory level

The method performs validation of the computed solutions and updates the execution blocks with tiling information.

bind()

Bind the network with automatic tiling.

Performs the complete binding process including layer binding and automatic tiling optimization.

Returns:: True if binding was successful, False otherwise.
Return type:: bool

Notes

Calls the parent bind() method first, then performs tiling if the initial binding was successful.

backEnd(verbose: CodeGenVerbosity = CodeGenVerbosity(tilingProfiling=None, untiledProfiling=None, microbenchmarkProfiling=False))

API hook to generate code once kernel implementations are picked and tiling, memory allocation, and other low-level optimizations have been done.

Parameters:: verbose (CodeGenVerbosity) – Control verbosity of generated code

codeTransform(verbose: CodeGenVerbosity = CodeGenVerbosity(tilingProfiling=None, untiledProfiling=None, microbenchmarkProfiling=False))

Apply code transformations on every layer’s execution block

Parameters:: verbose (CodeGenVerbosity) – Control code generation verbosity
Raises:: RuntimeError – Raises a RuntimeError if the entire network is not bound

exportDeeployState(folderPath: str, fileName: str)

Export compressed network context and neural network graph

Parameters:

folderPath (str) – path to directory where to save context and graph
fileName (str) – prefix to use when saving artifacts

frontEnd(): API hook to prepare the graph to be deployed and build the initial NetworkContext

generateBufferAllocationCode() → str

Generates code to allocate space for the global input and output buffer of the network

Returns:: Allocation code for global IO buffers
Return type:: str
Raises:: RuntimeError – Raises a RuntimeError if network is not parsed and bound

generateBufferDeAllocationCode() → str

Generates code to deallocate all global buffers

Returns:: Code to deallocate buffers
Return type:: str
Raises:: RuntimeError – Raises a RuntimeError if network is not parsed and bound

generateBufferInitializationCode() → str

Generates code for all forward-declaration of buffers used during inference

Returns:: Returns forward-declaration code
Return type:: str
Raises:: RuntimeError – Raises a RuntimeError if network is not parsed and bound

generateEngineInitializationCode() → str

Generate initialization code for all compute engines

Returns:: Initialization code for all engines
Return type:: str

generateFunction(verbose: CodeGenVerbosity = CodeGenVerbosity(tilingProfiling=None, untiledProfiling=None, microbenchmarkProfiling=False)) → str: Helper function to prepare deployment and return generated function code

generateGlobalDefinitionCode() → str

Generate all global definition code for inference

Returns:: Global Definition code
Return type:: str
Raises:: RuntimeError – Raises a RuntimeError if network is not parsed and bound

generateIOBufferInitializationCode() → str

Generate initialization code for global network inputs and outputs

Returns:: Initialization code
Return type:: str
Raises:: RuntimeError – Raises a RuntimeError if network is not parsed and bound

generateIncludeString() → str

Generate code to include platform-dependent includes

Returns:: Include code
Return type:: str

generateInferenceCode() → str

Generate the actual inference function for the entire network

Returns:: The full inference method
Return type:: str
Raises:: ValueError – Raises a RuntimeError if network is not parsed and bound

generateInferenceInitializationCode() → str

Generate initialization code, including static memory allocation and other setup tasks

Returns:: Initialization code
Return type:: str
Raises:: RuntimeError – Raises a RuntimeError if network is not parsed and bound

importDeeployState(folderPath: str, fileName: str)

Override this container’s graph and context with loaded compressed artifacts

Parameters:

folderPath (str) – Path to the artifact directory
fileName (str) – prefix of the saved artifacts

inputs() → List[VariableBuffer]

Return a list of all VariableBuffers that are also global inputs of the network

Returns:: Global inputs
Return type:: List[VariableBuffer]

lower(graph: Graph) → Graph

Apply the lowering optimize

Parameters:: graph (gs.Graph) – Unmodified input neural network graph
Returns:: Neural network graph that is deployable with the DeploymentPlatform’s Mapping
Return type:: gs.Graph

midEnd(): API hook to be used after finalizing kernel selection; hoist transient buffers, and perform low-level code optimizations (e.g. tiling and static memory allocation)

numberOfOps(verbose: bool) → int

Returns the total number of operations per network inference

Parameters:: verbose (bool) – Control whether the number of operations are printed to STDOUT for each operator
Returns:: Number of operations (1 MAC = 2 Ops) per network inference
Return type:: int
Raises:: RuntimeError – Raises a RuntimeError if network is not parsed and bound

outputs() → List[VariableBuffer]

Return a list of all VariableBuffers that are also global outputs of the network

Returns:: Global outputs
Return type:: List[VariableBuffer]

parse(default_channels_first: bool = True) → bool

Parses the full network by iteratively exploring mapping and binding options with backtracking

Parameters:: default_channels_first (bool) – Whether the default data layout is CxHxW or HxWxC
Returns:: Returns a boolean to indicate whether parsing was successful
Return type:: bool
Raises:: RuntimeError – Raises a RuntimeError if backtracking was exhausted without finding a mapping solution

prepare(verbose: CodeGenVerbosity = CodeGenVerbosity(tilingProfiling=None, untiledProfiling=None, microbenchmarkProfiling=False))

API hook to perform the entire deployment process to the point where generated code may be extracted

Parameters:: verbose (CodeGenVerbosity) – Control verbosity of generated code