Applications
This subdirectory contains some applications or benchmarks specifically implemented and optimized for Snitch.
Contents
- Data generation:
data_gen.py: script to generate data and expected results for various benchmarksdata: output folder ofdata_gen.pywhich also contains the configuration to generate the data
src:kernels: basic kernels, currently containsGEMM,BatchNorm,Maxpool,Fusedconvlayers: wraps the kernel to form a DNN layer. Manages data-movement, synchronization, double buffering etc.utils: some helpful functions for benchmarking, verification, fastmemsetnet_layer.c: various ready tests to run layers.
include: includeslayerstruct.
SW Testbenches
There are currently a few tests for various layer types. Some additional information about these tests is given below:
net_maxpool.c: Naive implementation of a maxpooling layer, not optimized in any way due to memory-boundnessnet-batchnorm.c: Implementation of a batchnorm layer with SSR streams (both read and write)net-conv2d.c: Implementation and tiling of a 2D convolution that can be distributed to multiple clusters. The convolution is implemented as anim2coltransformation (performed by 2D DMA transfers) + optimized GEMM. The memory layout of input and output feature map is Height x Width x Channels. The convolution is globally parallelized over output channels. Inside a cluster, the output pixels are distributed among the cores. There is an option to load the feature map from a different cluster instead of the main memory by settingcluster2clusterin the layer struct to1. Currently onlyfp64is implemented, but the data movement forfp32or lower precision SIMD should be analogously.net-gemm.c: Testbench to benchmark the optimized GEMM implementation for different memory layouts, dimensions and precisions.net-fusedconv.c: Implementation of a fused kernel with Conv2d + BatchNorm + ReLU. The interface of the kernel is compatible with DORY. Parameters of a tile can be specified indata/fusedconv_param.hjson. Supported paramters are input/output dimension, padding, kernel dimension & stride, flags for BatchNorm and ReLU. Further there are two additional specialized kernels 1) a CHW kernel for input layers with very few input channels, the output of this kernel is in the HWC layout again 2) A depthwise kernel
Usage
To run a specific benchmark, first configure the dimensions and the desired precision data/app_params.hjson.
{
kernel: "GEMM"
M: 16,
N: 16,
K: 16,
alpha: 0,
transpose_A: false,
transpose_B: true,
prec: 16
}
The file will be automatically generated with a cmake macro and is stored in data/data_app.h. The result will also be checked. Reference is a golden model written in python with help of the torch.
The applications are compiled into a folder which can be enabled by adding add_subdirectory(${SNITCH_SOFTWARE_DIR}/applications to CMakeLists.txt in the specific sw folder.
Requirements
torch
Updated on 2023-06-19 at 09:43:56 +0000