Snitch Runtime
|
This file provides functions to synchronize Snitch cores. More...
#include "../../deps/riscv-opcodes/encoding.h"
#include <math.h>
Go to the source code of this file.
Macros | |
#define | SNRT_BROADCAST_MASK ((SNRT_CLUSTER_NUM - 1) * SNRT_CLUSTER_OFFSET) |
Functions | |
void | snrt_comm_init () |
Initialize the communicator functions. | |
void | snrt_comm_create (uint32_t size, snrt_comm_t *communicator) |
Creates a communicator object. | |
volatile uint32_t * | snrt_mutex () |
Get a pointer to a mutex variable. | |
void | snrt_mutex_acquire (volatile uint32_t *pmtx) |
Acquire a mutex, blocking. | |
void | snrt_mutex_ttas_acquire (volatile uint32_t *pmtx) |
Acquire a mutex, blocking. | |
void | snrt_mutex_release (volatile uint32_t *pmtx) |
Release a previously-acquired mutex. | |
void | snrt_wake_clusters (uint32_t core_mask, snrt_comm_t comm=NULL) |
Wake the clusters belonging to a given communicator. | |
void | snrt_cluster_hw_barrier () |
Synchronize cores in a cluster with a hardware barrier, blocking. | |
static void | snrt_inter_cluster_barrier (snrt_comm_t comm=NULL) |
Synchronize one core from every cluster with the others. | |
void | snrt_global_barrier (snrt_comm_t comm) |
Synchronize all Snitch cores. | |
void | snrt_partial_barrier (snrt_barrier_t *barr, uint32_t n) |
Generic software barrier. | |
uint32_t | snrt_global_all_to_all_reduction (uint32_t value) |
Perform a global sum reduction, blocking. | |
template<typename T > | |
void | snrt_global_reduction_dma (T *dst_buffer, T *src_buffer, size_t len, snrt_comm_t comm=NULL) |
Perform a sum reduction among clusters, blocking. | |
void | snrt_wait_writeback (uint32_t val) |
Ensure value is written back to the register file. | |
void | snrt_enable_multicast (uint32_t mask) |
Enable LSU multicast. | |
void | snrt_disable_multicast () |
Disable LSU multicast. | |
Variables | |
__thread snrt_comm_info_t | snrt_comm_world_info |
__thread snrt_comm_t | snrt_comm_world |
This file provides functions to synchronize Snitch cores.
|
inline |
Synchronize cores in a cluster with a hardware barrier, blocking.
|
inline |
Creates a communicator object.
The newly created communicator object includes the first size
clusters. All clusters, even those which are not part of the communicator, must invoke this function.
size | The number of clusters to include in the communicator. |
communicator | Pointer to the communicator object to be created. |
|
inline |
Initialize the communicator functions.
This function initializes the L1 allocator by calculating the end address of the heap and setting the base, end, and next pointers of the allocator.
|
inline |
Disable LSU multicast.
|
inline |
Enable LSU multicast.
All stores performed after this call will be multicast to all addresses specified by the address and mask pair.
mask | Multicast mask value |
|
inline |
Perform a global sum reduction, blocking.
All cores participate in the reduction and synchronize globally to wait for the reduction to complete. The synchronization is performed via snrt_global_barrier.
value | The value to be summed. |
|
inline |
Synchronize all Snitch cores.
Synchronization is performed hierarchically. Within a cluster, cores are synchronized through a hardware barrier (see snrt_cluster_hw_barrier). Clusters are synchronized through a software barrier (see snrt_inter_cluster_barrier).
comm | The communicator determining which clusters synchronize. |
|
inline |
Perform a sum reduction among clusters, blocking.
The reduction is performed in a logarithmic fashion. Half of the clusters active in every level of the binary-tree participate as as senders, the other half as receivers. Senders use the DMA to send their data to the respective receiver's destination buffer. The receiver then reduces each element in its destination buffer with the respective element in its source buffer. The result is stored in the source buffer. It then proceeds to the next level in the binary tree.
dst_buffer | The pointer to the calling cluster's destination buffer. |
src_buffer | The pointer to the calling cluster's source buffer. |
len | The amount of data in each buffer. Only integer multiples of the number of compute cores are supported at the moment. |
comm | The communicator determining which clusters participate in the reduction. |
|
inlinestatic |
Synchronize one core from every cluster with the others.
comm | The communicator determining which clusters synchronize. |
Implemented as a software barrier.
|
inline |
Get a pointer to a mutex variable.
|
inline |
Acquire a mutex, blocking.
Test-and-set (TAS) implementation of a lock.
pmtx | A pointer to a variable which can be used as a mutex, i.e. to which all cores have a reference and at a memory location to which atomic accesses can be made. This can be declared e.g. as static volatile uint32_t mtx = 0; . |
|
inline |
Release a previously-acquired mutex.
|
inline |
Acquire a mutex, blocking.
Same as snrt_mutex_acquire but acquires the lock using a test and test-and-set (TTAS) strategy.
|
inline |
Generic software barrier.
barr | pointer to a barrier variable. |
n | number of harts that have to enter before released. |
|
inline |
Ensure value is written back to the register file.
This function introduces a RAW dependency on val to stall the core until val is written back to the register file.
val | The variable we want to wait on. |
|
inline |
Wake the clusters belonging to a given communicator.
comm | The communicator determining which clusters to wake up. |
|
extern |