This file provides functions to synchronize Snitch cores.
More...
#include <math.h>
Go to the source code of this file.
This file provides functions to synchronize Snitch cores.
◆ snrt_cluster_hw_barrier()
void snrt_cluster_hw_barrier |
( |
| ) |
|
|
inline |
Synchronize cores in a cluster with a hardware barrier, blocking.
- Note
- Synchronizes all (both DM and compute) cores. All cores must invoke this function, or the calling cores will stall indefinitely.
◆ snrt_global_all_to_all_reduction()
uint32_t snrt_global_all_to_all_reduction |
( |
uint32_t | value | ) |
|
|
inline |
Perform a global sum reduction, blocking.
All cores participate in the reduction and synchronize globally to wait for the reduction to complete. The synchronization is performed via snrt_global_barrier.
- Parameters
-
value | The value to be summed. |
- Returns
- The result of the sum reduction.
- Note
- Every Snitch core must invoke this function, or the calling cores will stall indefinitely.
◆ snrt_global_barrier()
void snrt_global_barrier |
( |
| ) |
|
|
inline |
Synchronize all Snitch cores.
Synchronization is performed hierarchically. Within a cluster, cores are synchronized through a hardware barrier (see snrt_cluster_hw_barrier). Clusters are synchronized through a software barrier (see snrt_inter_cluster_barrier).
- Note
- Every Snitch core must invoke this function, or the calling cores will stall indefinitely.
◆ snrt_global_reduction_dma()
void snrt_global_reduction_dma |
( |
double * | dst_buffer, |
|
|
double * | src_buffer, |
|
|
size_t | len ) |
|
inline |
Perform a sum reduction among clusters, blocking.
The reduction is performed in a logarithmic fashion. Half of the clusters active in every level of the binary-tree participate as as senders, the other half as receivers. Senders use the DMA to send their data to the respective receiver's destination buffer. The receiver then reduces each element in its destination buffer with the respective element in its source buffer. It then proceeds to the next level in the binary tree.
- Parameters
-
dst_buffer | The pointer to the calling cluster's destination buffer. |
src_buffer | The pointer to the calling cluster's source buffer. |
len | The amount of data in each buffer. |
- Note
- The destination buffers must lie at the same offset in every cluster's TCDM.
◆ snrt_inter_cluster_barrier()
void snrt_inter_cluster_barrier |
( |
| ) |
|
|
inline |
Synchronize one core from every cluster with the others.
Implemented as a software barrier.
- Note
- One core per cluster must invoke this function, or the calling cores will stall indefinitely.
◆ snrt_mutex_acquire()
void snrt_mutex_acquire |
( |
volatile uint32_t * | pmtx | ) |
|
|
inline |
Acquire a mutex, blocking.
Test-and-set (TAS) implementation of a lock.
- Parameters
-
pmtx | A pointer to a variable which can be used as a mutex, i.e. to which all cores have a reference and at a memory location to which atomic accesses can be made. This can be declared e.g. as static volatile uint32_t mtx = 0; . |
◆ snrt_mutex_ttas_acquire()
void snrt_mutex_ttas_acquire |
( |
volatile uint32_t * | pmtx | ) |
|
|
inline |
Acquire a mutex, blocking.
Same as snrt_mutex_acquire but acquires the lock using a test and test-and-set (TTAS) strategy.
◆ snrt_partial_barrier()
Generic software barrier.
- Parameters
-
barr | pointer to a barrier variable. |
n | number of harts that have to enter before released. |
- Note
- Exactly the specified number of harts must invoke this function, or the calling cores will stall indefinitely.