`ara`: Top-Level Vector Unit

Overview

The ara module is the top-level vector processing unit that implements the RISC-V Vector 1.0 Extension (RVV). It interfaces directly with the CVA6 scalar core and contains all vector-specific sub-units required to execute RVV instructions, including:

A dispatcher (decodes vector instructions, keeps the vector CSR state, injects special micro-operations)
A sequencer (controls instruction issue to the units, enforce instruction dependencies, collect results)
Lanes (each lane contains a slice of the vector register file plus the arithmetic units)
Vector Load/Store Unit (VLSU, it initiates AXI AR/AW transactions and handle the dataflow from/to memory)
Slide Unit (SLDU, runs vector slides and byte layout reshuffling)
Mask Unit (MASKU, assembles and distributes mask (predication) bits, executes bit-level mask instructions, and runs more complex bit-level permutations plus vrgather)

The design is modular and scalable, with configurable parameters for lane count, VLEN, data types, and extended features like segmentation or MMU support.

The most stable configurations are with a power-of-2 number of lanes from 2 to 16, with a VLEN derived from considering a VLEN-per-lane of 1024 bit/lane.

Parameters

Name	Description
`NrLanes`	Number of parallel vector lanes
`VLEN`	Vector length (in bits)
`OSSupport`	Enables MMU and fault-only-first logic
`FPUSupport`	Enables FP16/32/64 support
`FPExtSupport`	Enables `vfrec7`, `vfrsqrt7`
`FixPtSupport`	Enables fixed-point support
`SegSupport`	Enables segmented memory operations
`CVA6Cfg`	CVA6 configuration record
`Axi*Width`	AXI bus widths
`axi_*`	AXI channel and bundle typedefs
`exception_t`, `accelerator_`, `acc_mmu_`	Types for accelerator/MMU interfacing

Ports

Port	Dir	Description
`clk_i`, `rst_ni`	In	Clock and reset
`scan_*`	In/Out	Scan chain (test)
`acc_req_i`, `acc_resp_o`	In/Out	Vector accelerator interface (with CVA6)
`axi_req_o`, `axi_resp_i`	Out/In	AXI memory interface

Internal Units

1. Dispatcher (`ara_dispatcher`)

Fully decodes RVV instructions
Keeps the vector CSR state
Handles register reshuffling for EW mismatches
Injects special vector micro-ops (e.g., reshuffle slides, segment memory micro-ops)
Tracks active instructions and completion

2. Sequencer (`ara_sequencer`)

Manages vector instruction lifecycle
Tracks instruction dependencies with a scoreboard
Handles scalar responses and exception tracking
Broadcast work to lanes and special units

3. Vector Lanes (`lane`)

Keeps a slice of the Vector Register File and multiple functional units (FPU, ALU, Multiplier)
Executes arithmetic and logic vector operations
Feed Ara’s units through the VRF, and receive memory operands from the load unit

4. Vector Load/Store Unit (`vlsu`)

Handles vector memory access
Interfaces with CVA6’s MMU (virtual address translation through dedicated interface) and memory (through AXI4)
Manages address generation and exception detection for memory operations

5. Slide Unit (`sldu`)

Performs byte-reshuffling and slide ops
Optimized for power-of-two strides
All-to-all lane connectivity

6. Mask Unit (`masku`)

Centralized logic for mask generation and bit-level access
Handles mask combination ops (e.g., vfirst, vcpop)
Shares scalar result lines with the sequencer

MMU Support

When OSSupport = 1, the module:

Interfaces with CVA6’s MMU (via SV39)
Performs virtual-to-physical address translation
Supports fault-only-first loads and address exceptions

Interconnect & Communication

The sequencer broadcasts instructions to all the units in parallel (VLSU, SLDU, MASKU, Lanes). Also, the all-to-all connected units (VLSU, SLDU, MASKU) are connected to every lane in parallel.

Each lane works on a 64-bit datapath, and every all-to-all unit receives at least one bus of 64-bit from every lane.

ara: Top-Level Vector Unit

Overview

Parameters

Ports

Internal Units

1. Dispatcher (ara_dispatcher)

2. Sequencer (ara_sequencer)

3. Vector Lanes (lane)

4. Vector Load/Store Unit (vlsu)

5. Slide Unit (sldu)

6. Mask Unit (masku)