ara_system
: Integration of CVA6 and Ara
Overview
ara_system
instantiates and connects the scalar core CVA6 with the Ara vector accelerator. It builds the system-level interface between the two, handles AXI width conversion, invalidation signaling, and merges AXI traffic into a unified master port.
This module is the core of Ara’s architectural integration, enabling:
Vector instruction dispatch from CVA6 to Ara
Shared AXI access arbitration
Coherent invalidation handling for vector memory stores
Flexible benchmarking configurations through parameterization
CVA6 is a RV64GC RISC-V Linux-ready core. It features one L1 instruction cache and one L1 data cache connecting to the L2 memory.
CVA6 is the only “RISC-V core” properly speaking, meaning that it is the only one accessing the program’s instruction flow.
Ara is a tighlty-coupled accelerator plugged into CVA6, and receives vector instructions from CVA6.
Both Ara and CVA6 have a private AXI4-compliant Load-Store unit. Ara’s one directly connects to the L2 memory, bypassing CVA6’s L1 data cache.
This allows for potential coherence and memory operation ordering issues.
Memory Coherence and Consistency
Memory coherence is enforced through three main mechanisms:
Memory writes are serialized through a single memory bus.
CVA6 L1-D$ is write-through.
An invalidation filter snoops on Ara’s AXI AW memory bus and invalidates the potentially-stale sets in CVA6’s L1-D$.
Memory ordering is enforced by CVA6 and control signals between CVA6 and Ara. No memory operations are issued or started until it’s safe to do so. For example, pending vector stores prevent CVA6 from issuing scalar memory operations, and vector memory operations are not dispatched to Ara if there is a pending scalar store.
Virtual memory support
Ara uses CVA6’s MMU to translate virtual addresses into physical ones. This is done through the MMU interface.
Parameters
Ara + RVV Parameters
Name |
Description |
---|---|
|
Number of vector lanes |
|
Vector register length (in bits) |
|
Enables OS features |
|
Enabled FP precisions |
|
Support for |
|
Enable fixed-point ops |
|
Support for segmented memory ops |
CVA6 and AXI Interface
Name |
Description |
---|---|
|
CVA6 configuration record |
|
CVA6 exception type |
|
Accelerator interface (CVA6 → Ara) |
|
Accelerator interface (Ara → CVA6) |
|
MMU interface |
|
Packed request interface |
|
Packed response interface |
|
AXI bus widths |
AXI typedefs |
All channel and request/response types |
Ports
Port |
Direction |
Description |
---|---|---|
|
Input |
Clock signal |
|
Input |
Active-low reset |
|
Input |
Initial fetch address for CVA6 |
|
Input |
Hardware thread ID |
|
In/Out |
Scan chain (test) |
|
Output |
AXI master request (merged) |
|
Input |
AXI master response (merged) |
Internal Blocks
1. CVA6 Core
Scalar processor core
Interfaces to Ara via a dedicated accelerator port
Outputs standard AXI (
ariane_axi_req_t
) at narrow data width to L2 memory
2. Ara Accelerator
Fully parameterized vector unit
Receives CVA6 requests, returns results and exceptions
AXI master interface at wide data width (
32 * #Lanes
data width)
3. AXI Width Converter
axi_dw_converter
adjusts CVA6’s narrow AXI (e.g., 64-bit) to match Ara/system-wide bus width
4. AXI Invalidation Filter
Detects vector memory stores and emits invalidation signals
Ensures cache coherence with CVA6
Integrated with Ara’s AXI path
5. AXI Multiplexer
Merges CVA6 and Ara AXI requests
Handles arbitration, backpressure, and spill registers
Vector Interface Handling
The
acc_to_cva6_t
signal is extended to includeinval_valid
andinval_addr
, enabling memory coherence notification from Ara back to CVA6.Ara can assert
inval_valid
when performing stores that require CVA6 cache line invalidation.acc_cons_en
gate controls the invalidation path.
Alternative Configuration: IDEAL_DISPATCHER
The ideal dispatcher is just a tool to benchmark Ara’s performance with an ideal vector instruction dispatcher instantiated INSTEAD OF CVA6.
If IDEAL_DISPATCHER
is defined:
CVA6 is replaced with a perfect dispatcher (
accel_dispatcher_ideal
), i.e., a FIFO containing the dynamic instruction trace of the program plus the correct register file valuesUseful for functional validation/benchmarking or micro-benchmarking Ara in isolation