ara_system: Integration of CVA6 and Ara

Overview

ara_system instantiates and connects the scalar core CVA6 with the Ara vector accelerator. It builds the system-level interface between the two, handles AXI width conversion, invalidation signaling, and merges AXI traffic into a unified master port.

This module is the core of Ara’s architectural integration, enabling:

  • Vector instruction dispatch from CVA6 to Ara

  • Shared AXI access arbitration

  • Coherent invalidation handling for vector memory stores

  • Flexible benchmarking configurations through parameterization

CVA6 is a RV64GC RISC-V Linux-ready core. It features one L1 instruction cache and one L1 data cache connecting to the L2 memory.

CVA6 is the only “RISC-V core” properly speaking, meaning that it is the only one accessing the program’s instruction flow.

Ara is a tighlty-coupled accelerator plugged into CVA6, and receives vector instructions from CVA6.

Both Ara and CVA6 have a private AXI4-compliant Load-Store unit. Ara’s one directly connects to the L2 memory, bypassing CVA6’s L1 data cache.

This allows for potential coherence and memory operation ordering issues.


Memory Coherence and Consistency

Memory coherence is enforced through three main mechanisms:

  • Memory writes are serialized through a single memory bus.

  • CVA6 L1-D$ is write-through.

  • An invalidation filter snoops on Ara’s AXI AW memory bus and invalidates the potentially-stale sets in CVA6’s L1-D$.

Memory ordering is enforced by CVA6 and control signals between CVA6 and Ara. No memory operations are issued or started until it’s safe to do so. For example, pending vector stores prevent CVA6 from issuing scalar memory operations, and vector memory operations are not dispatched to Ara if there is a pending scalar store.


Virtual memory support

Ara uses CVA6’s MMU to translate virtual addresses into physical ones. This is done through the MMU interface.


Parameters

Ara + RVV Parameters

Name

Description

NrLanes

Number of vector lanes

VLEN

Vector register length (in bits)

OSSupport

Enables OS features

FPUSupport

Enabled FP precisions

FPExtSupport

Support for vfrec7, vfrsqrt7

FixPtSupport

Enable fixed-point ops

SegSupport

Support for segmented memory ops

CVA6 and AXI Interface

Name

Description

CVA6Cfg

CVA6 configuration record

exception_t

CVA6 exception type

accelerator_req_t

Accelerator interface (CVA6 → Ara)

accelerator_resp_t

Accelerator interface (Ara → CVA6)

acc_mmu_{req,resp}

MMU interface

cva6_to_acc_t

Packed request interface

acc_to_cva6_t

Packed response interface

Axi*Width

AXI bus widths

AXI typedefs

All channel and request/response types


Ports

Port

Direction

Description

clk_i

Input

Clock signal

rst_ni

Input

Active-low reset

boot_addr_i

Input

Initial fetch address for CVA6

hart_id_i

Input

Hardware thread ID

scan_*

In/Out

Scan chain (test)

axi_req_o

Output

AXI master request (merged)

axi_resp_i

Input

AXI master response (merged)


Internal Blocks

1. CVA6 Core

  • Scalar processor core

  • Interfaces to Ara via a dedicated accelerator port

  • Outputs standard AXI (ariane_axi_req_t) at narrow data width to L2 memory

2. Ara Accelerator

  • Fully parameterized vector unit

  • Receives CVA6 requests, returns results and exceptions

  • AXI master interface at wide data width (32 * #Lanes data width)

3. AXI Width Converter

  • axi_dw_converter adjusts CVA6’s narrow AXI (e.g., 64-bit) to match Ara/system-wide bus width

4. AXI Invalidation Filter

  • Detects vector memory stores and emits invalidation signals

  • Ensures cache coherence with CVA6

  • Integrated with Ara’s AXI path

5. AXI Multiplexer

  • Merges CVA6 and Ara AXI requests

  • Handles arbitration, backpressure, and spill registers


Vector Interface Handling

  • The acc_to_cva6_t signal is extended to include inval_valid and inval_addr, enabling memory coherence notification from Ara back to CVA6.

  • Ara can assert inval_valid when performing stores that require CVA6 cache line invalidation.

  • acc_cons_en gate controls the invalidation path.


Alternative Configuration: IDEAL_DISPATCHER

The ideal dispatcher is just a tool to benchmark Ara’s performance with an ideal vector instruction dispatcher instantiated INSTEAD OF CVA6.

If IDEAL_DISPATCHER is defined:

  • CVA6 is replaced with a perfect dispatcher (accel_dispatcher_ideal), i.e., a FIFO containing the dynamic instruction trace of the program plus the correct register file values

  • Useful for functional validation/benchmarking or micro-benchmarking Ara in isolation