`ara_system`: Integration of CVA6 and Ara

Overview

ara_system instantiates and connects the scalar core CVA6 with the Ara vector accelerator. It builds the system-level interface between the two, handles AXI width conversion, invalidation signaling, and merges AXI traffic into a unified master port.

This module is the core of Ara’s architectural integration, enabling:

Vector instruction dispatch from CVA6 to Ara
Shared AXI access arbitration
Coherent invalidation handling for vector memory stores
Flexible benchmarking configurations through parameterization

CVA6 is a RV64GC RISC-V Linux-ready core. It features one L1 instruction cache and one L1 data cache connecting to the L2 memory.

CVA6 is the only “RISC-V core” properly speaking, meaning that it is the only one accessing the program’s instruction flow.

Ara is a tighlty-coupled accelerator plugged into CVA6, and receives vector instructions from CVA6.

Both Ara and CVA6 have a private AXI4-compliant Load-Store unit. Ara’s one directly connects to the L2 memory, bypassing CVA6’s L1 data cache.

This allows for potential coherence and memory operation ordering issues.

Memory Coherence and Consistency

Memory coherence is enforced through three main mechanisms:

Memory writes are serialized through a single memory bus.
CVA6 L1-D$ is write-through.
An invalidation filter snoops on Ara’s AXI AW memory bus and invalidates the potentially-stale sets in CVA6’s L1-D$.

Memory ordering is enforced by CVA6 and control signals between CVA6 and Ara. No memory operations are issued or started until it’s safe to do so. For example, pending vector stores prevent CVA6 from issuing scalar memory operations, and vector memory operations are not dispatched to Ara if there is a pending scalar store.

Virtual memory support

Ara uses CVA6’s MMU to translate virtual addresses into physical ones. This is done through the MMU interface.

Parameters

Ara + RVV Parameters

Name	Description
`NrLanes`	Number of vector lanes
`VLEN`	Vector register length (in bits)
`OSSupport`	Enables OS features
`FPUSupport`	Enabled FP precisions
`FPExtSupport`	Support for `vfrec7`, `vfrsqrt7`
`FixPtSupport`	Enable fixed-point ops
`SegSupport`	Support for segmented memory ops

CVA6 and AXI Interface

Name	Description
`CVA6Cfg`	CVA6 configuration record
`exception_t`	CVA6 exception type
`accelerator_req_t`	Accelerator interface (CVA6 → Ara)
`accelerator_resp_t`	Accelerator interface (Ara → CVA6)
`acc_mmu_{req,resp}`	MMU interface
`cva6_to_acc_t`	Packed request interface
`acc_to_cva6_t`	Packed response interface
`Axi*Width`	AXI bus widths
AXI typedefs	All channel and request/response types

Ports

Port	Direction	Description
`clk_i`	Input	Clock signal
`rst_ni`	Input	Active-low reset
`boot_addr_i`	Input	Initial fetch address for CVA6
`hart_id_i`	Input	Hardware thread ID
`scan_*`	In/Out	Scan chain (test)
`axi_req_o`	Output	AXI master request (merged)
`axi_resp_i`	Input	AXI master response (merged)

Internal Blocks

1. CVA6 Core

Scalar processor core
Interfaces to Ara via a dedicated accelerator port
Outputs standard AXI (ariane_axi_req_t) at narrow data width to L2 memory

2. Ara Accelerator

Fully parameterized vector unit
Receives CVA6 requests, returns results and exceptions
AXI master interface at wide data width (32 * #Lanes data width)

3. AXI Width Converter

axi_dw_converter adjusts CVA6’s narrow AXI (e.g., 64-bit) to match Ara/system-wide bus width

4. AXI Invalidation Filter

Detects vector memory stores and emits invalidation signals
Ensures cache coherence with CVA6
Integrated with Ara’s AXI path

5. AXI Multiplexer

Merges CVA6 and Ara AXI requests
Handles arbitration, backpressure, and spill registers

Vector Interface Handling

The acc_to_cva6_t signal is extended to include inval_valid and inval_addr, enabling memory coherence notification from Ara back to CVA6.
Ara can assert inval_valid when performing stores that require CVA6 cache line invalidation.
acc_cons_en gate controls the invalidation path.

Alternative Configuration: `IDEAL_DISPATCHER`

The ideal dispatcher is just a tool to benchmark Ara’s performance with an ideal vector instruction dispatcher instantiated INSTEAD OF CVA6.

If IDEAL_DISPATCHER is defined:

CVA6 is replaced with a perfect dispatcher (accel_dispatcher_ideal), i.e., a FIFO containing the dynamic instruction trace of the program plus the correct register file values
Useful for functional validation/benchmarking or micro-benchmarking Ara in isolation

ara_system: Integration of CVA6 and Ara