Deeploy.TilingExtension.TilerExtension.TilerDeployerWrapper
- class Deeploy.TilingExtension.TilerExtension.TilerDeployerWrapper(deployer: MemoryLevelAwareDeployer | MemoryDeployerWrapper, tilerCls: Type[Tiler] = <class 'Deeploy.TilingExtension.TilerExtension.Tiler'>, testName: str | None = None, workDir: str | None = None)
Bases:
NetworkDeployerWrapperWrapper for network deployers that adds tiling capabilities.
Extends NetworkDeployerWrapper to provide automatic tiling and memory management for neural network deployment on memory-constrained hardware.
- Parameters:
deployer (Union[MemoryLevelAwareDeployer, MemoryDeployerWrapper]) – The base deployer to wrap with tiling capabilities.
tilerCls (Type[Tiler], optional) – The tiler class to use, by default Tiler.
- Variables:
tiler (Tiler) – The tiler instance used for memory optimization.
- Raises:
AssertionError – If the platform is not a MemoryPlatform or MemoryPlatformWrapper.
Notes
The wrapper automatically handles tiling setup, constraint solving, and memory allocation during the binding process.
Methods
- __init__(deployer: MemoryLevelAwareDeployer | MemoryDeployerWrapper, tilerCls: Type[Tiler] = <class 'Deeploy.TilingExtension.TilerExtension.Tiler'>, testName: str | None = None, workDir: str | None = None)
Initialize the tiler deployer wrapper.
- Parameters:
deployer (Union[MemoryLevelAwareDeployer, MemoryDeployerWrapper]) – The base deployer to wrap.
tilerCls (Type[Tiler], optional) – The tiler class to instantiate, by default Tiler.
testName (Optional[str], optional) – Optional name for the test case, used for file naming. Defaults to None.
workDir (Optional[str], optional) – Optional working directory for temporary files. Defaults to None.
__init__(deployer, tilerCls, testName, workDir)Initialize the tiler deployer wrapper.
backEnd([verbose])API hook to generate code once kernel implementations are picked and tiling, memory allocation, and other low-level optimizations have been done.
bind()Bind the network with automatic tiling.
codeTransform([verbose])Apply code transformations on every layer's execution block
exportDeeployState(folderPath, fileName)Export compressed network context and neural network graph
frontEnd()API hook to prepare the graph to be deployed and build the initial NetworkContext
Generates code to allocate space for the global input and output buffer of the network
Generates code to deallocate all global buffers
Generates code for all forward-declaration of buffers used during inference
Generate initialization code for all compute engines
generateFunction([verbose])Helper function to prepare deployment and return generated function code
Generate all global definition code for inference
Generate initialization code for global network inputs and outputs
Generate code to include platform-dependent includes
Generate the actual inference function for the entire network
Generate initialization code, including static memory allocation and other setup tasks
importDeeployState(folderPath, fileName)Override this container's graph and context with loaded compressed artifacts
inputs()Return a list of all VariableBuffers that are also global inputs of the network
lower(graph)Apply the lowering optimize
midEnd()API hook to be used after finalizing kernel selection; hoist transient buffers, and perform low-level code optimizations (e.g. tiling and static memory allocation).
numberOfOps(verbose)Returns the total number of operations per network inference
outputs()Return a list of all VariableBuffers that are also global outputs of the network
parse([default_channels_first])Parses the full network by iteratively exploring mapping and binding options with backtracking
prepare([verbose])API hook to perform the entire deployment process to the point where generated code may be extracted
tile([tilingSolution, memoryMap])Perform tiling and memory allocation for the network.
Attributes
boundparsedpreparedtransformedGet the worst-case buffer sizes including inputs and outputs.
- property worstCaseBufferSize
Get the worst-case buffer sizes including inputs and outputs.
Computes the total worst-case memory requirements including both tiled buffers and input/output buffers.
- Returns:
Dictionary mapping memory level names to their total worst-case buffer sizes in bytes.
- Return type:
Dict[str, int]
Notes
Extends the tiler’s worst-case buffer size calculation by adding the memory requirements of input and output buffers.
- tile(tilingSolution: List[PatternMemoryConstraints] | None = None, memoryMap: Dict[str, List[List[MemoryBlock]]] | None = None)
Perform tiling and memory allocation for the network.
Executes the complete tiling process including constraint setup, optimization, memory allocation, and code generation updates.
- Parameters:
tilingSolution (Optional[TilingSolution], optional) – Pre-computed tiling solution to use instead of computing one. If None, the solution will be computed automatically.
memoryMap (Optional[MemoryMap], optional) – Pre-computed memory map to use instead of computing one. If None, the memory map will be computed automatically.
- Raises:
AssertionError – If only one of tilingSolution or memoryMap is provided, if MiniMalloc is used with non-layer-wise tiling, or if tensors are not uniformly allocated when using MiniMalloc.
Notes
When using MiniMalloc memory allocation strategy, additional constraints apply: - Only layer-wise execution is supported - All tensors must be in the default memory level
The method performs validation of the computed solutions and updates the execution blocks with tiling information.
- bind()
Bind the network with automatic tiling.
Performs the complete binding process including layer binding and automatic tiling optimization.
- Returns:
True if binding was successful, False otherwise.
- Return type:
bool
Notes
Calls the parent bind() method first, then performs tiling if the initial binding was successful.
- backEnd(verbose: CodeGenVerbosity = CodeGenVerbosity(tilingProfiling=None, untiledProfiling=None))
API hook to generate code once kernel implementations are picked and tiling, memory allocation, and other low-level optimizations have been done.
- Parameters:
verbose (CodeGenVerbosity) – Control verbosity of generated code
- codeTransform(verbose: CodeGenVerbosity = CodeGenVerbosity(tilingProfiling=None, untiledProfiling=None))
Apply code transformations on every layer’s execution block
- Parameters:
verbose (CodeGenVerbosity) – Control code generation verbosity
- Raises:
RuntimeError – Raises a RuntimeError if the entire network is not bound
- exportDeeployState(folderPath: str, fileName: str)
Export compressed network context and neural network graph
- Parameters:
folderPath (str) – path to directory where to save context and graph
fileName (str) – prefix to use when saving artifacts
- frontEnd()
API hook to prepare the graph to be deployed and build the initial NetworkContext
- generateBufferAllocationCode() str
Generates code to allocate space for the global input and output buffer of the network
- Returns:
Allocation code for global IO buffers
- Return type:
str
- Raises:
RuntimeError – Raises a RuntimeError if network is not parsed and bound
- generateBufferDeAllocationCode() str
Generates code to deallocate all global buffers
- Returns:
Code to deallocate buffers
- Return type:
str
- Raises:
RuntimeError – Raises a RuntimeError if network is not parsed and bound
- generateBufferInitializationCode() str
Generates code for all forward-declaration of buffers used during inference
- Returns:
Returns forward-declaration code
- Return type:
str
- Raises:
RuntimeError – Raises a RuntimeError if network is not parsed and bound
- generateEngineInitializationCode() str
Generate initialization code for all compute engines
- Returns:
Initialization code for all engines
- Return type:
str
- generateFunction(verbose: CodeGenVerbosity = CodeGenVerbosity(tilingProfiling=None, untiledProfiling=None)) str
Helper function to prepare deployment and return generated function code
- generateGlobalDefinitionCode() str
Generate all global definition code for inference
- Returns:
Global Definition code
- Return type:
str
- Raises:
RuntimeError – Raises a RuntimeError if network is not parsed and bound
- generateIOBufferInitializationCode() str
Generate initialization code for global network inputs and outputs
- Returns:
Initialization code
- Return type:
str
- Raises:
RuntimeError – Raises a RuntimeError if network is not parsed and bound
- generateIncludeString() str
Generate code to include platform-dependent includes
- Returns:
Include code
- Return type:
str
- generateInferenceCode() str
Generate the actual inference function for the entire network
- Returns:
The full inference method
- Return type:
str
- Raises:
ValueError – Raises a RuntimeError if network is not parsed and bound
- generateInferenceInitializationCode() str
Generate initialization code, including static memory allocation and other setup tasks
- Returns:
Initialization code
- Return type:
str
- Raises:
RuntimeError – Raises a RuntimeError if network is not parsed and bound
- importDeeployState(folderPath: str, fileName: str)
Override this container’s graph and context with loaded compressed artifacts
- Parameters:
folderPath (str) – Path to the artifact directory
fileName (str) – prefix of the saved artifacts
- inputs() List[VariableBuffer]
Return a list of all VariableBuffers that are also global inputs of the network
- Returns:
Global inputs
- Return type:
List[VariableBuffer]
- lower(graph: Graph) Graph
Apply the lowering optimize
- Parameters:
graph (gs.Graph) – Unmodified input neural network graph
- Returns:
Neural network graph that is deployable with the DeploymentPlatform’s Mapping
- Return type:
gs.Graph
- midEnd()
API hook to be used after finalizing kernel selection; hoist transient buffers, and perform low-level code optimizations (e.g. tiling and static memory allocation)
- numberOfOps(verbose: bool) int
Returns the total number of operations per network inference
- Parameters:
verbose (bool) – Control whether the number of operations are printed to STDOUT for each operator
- Returns:
Number of operations (1 MAC = 2 Ops) per network inference
- Return type:
int
- Raises:
RuntimeError – Raises a RuntimeError if network is not parsed and bound
- outputs() List[VariableBuffer]
Return a list of all VariableBuffers that are also global outputs of the network
- Returns:
Global outputs
- Return type:
List[VariableBuffer]
- parse(default_channels_first: bool = True) bool
Parses the full network by iteratively exploring mapping and binding options with backtracking
- Parameters:
default_channels_first (bool) – Whether the default data layout is CxHxW or HxWxC
- Returns:
Returns a boolean to indicate whether parsing was successful
- Return type:
bool
- Raises:
RuntimeError – Raises a RuntimeError if backtracking was exhausted without finding a mapping solution
- prepare(verbose: CodeGenVerbosity = CodeGenVerbosity(tilingProfiling=None, untiledProfiling=None))
API hook to perform the entire deployment process to the point where generated code may be extracted
- Parameters:
verbose (CodeGenVerbosity) – Control verbosity of generated code