vector_fus_stage
Module Documentation
Module Name: vector_fus_stage
Authors: Matheus Cavalcante
Description: This is Ara’s vector execution stage. It instantiates and connects the vector functional units (VFUs) for each lane, specifically the Vector ALU (VALU) and the Vector Multiplier/FPU (VMFPU), enabling SIMD-style parallel vector operations.
1. Overview
This module coordinates the operation of all vector functional units for one lane in the Ara vector processor. It interfaces with:
The Dispatcher for configuration (e.g., rounding/saturation).
The Lane Sequencer for operation dispatch and handshaking.
The Vector Register File (VRF) for read/write data movement.
The Slide Unit, to handle reduction and data forwarding.
The Mask Unit, to apply selective operation masking.
2. Parameters
Parameter |
Type |
Description |
---|---|---|
|
|
Number of lanes in the vector processor |
|
|
Vector register length |
|
Struct |
Configuration of the CVA6 processor |
|
Enum |
Enable/disable support for FP16, FP32, FP64 |
|
Enum |
Enable external FP operations (like vfrec7, vfrsqrt7) |
|
Enum |
Support for fixed-point arithmetic |
|
Type |
Type used to address vector elements |
|
Type |
Type representing vector functional unit operations |
3. Submodule Instantiations
3.1 VALU
- Vector ALU
Handles all integer and fixed-point arithmetic operations.
Inputs:
Operands (
alu_operand_i
)Operation type (
vfu_operation_i
)Control flags (
alu_vxrm_i
, mask info)
Outputs:
Result back to VRF (
alu_result_wdata_o
,alu_result_addr_o
)Done signal per instruction (
alu_vinsn_done_o
)Reductions (
alu_red_complete_o
)Saturation flag (
alu_vxsat
)
Handshake signals:
alu_ready_o
,alu_result_gnt_i
, mask signals
3.2 VMFPU
- Vector Multiplier/FPU
Handles:
Integer multiply and multiply-accumulate
All floating-point operations (add, mul, div, sqrt, etc.)
Optional external FP instructions
Fixed-point arithmetic extensions
Inputs:
3 operands (
mfpu_operand_i
)Operation type (
vfu_operation_i
)
Outputs:
Result and writeback signals (
mfpu_result_*
)Exception flags (
fflags_ex_o
)Completion tracking (
mfpu_vinsn_done_o
,fpu_red_complete_o
)
Handshake and masking similar to
VALU
4. Interface Summary
4.1 Input/Output Overview
Inputs from Dispatcher:
alu_vxrm_i
: Rounding modevfu_operation_i
: Operation typevfu_operation_valid_i
: Validity
Operand Queues:
ALU:
alu_operand_i[1:0]
,alu_operand_valid_i
MFPU:
mfpu_operand_i[2:0]
,mfpu_operand_valid_i
VRF Writeback:
ALU:
alu_result_*
MFPU:
mfpu_result_*
Mask Interface:
Shared
mask_i
,mask_valid_i
Split readiness signals:
alu_mask_ready
,mfpu_mask_ready
Slide Unit:
Slide operands (
sldu_operand_i
)Slide handshake per unit
Reduction request/ack (
sldu_*_req_valid_o
,sldu_*_gnt_i
)
Saturation:
vxsat_flag_o
: Indicates whether saturation occurred
FPU Exceptions:
fflags_ex_o
,fflags_ex_valid_o
5. Control Logic and Signal Routing
5.1 Masking Coordination
Shared input mask:
mask_i
,mask_valid_i
Readiness
mask_ready_o = alu_mask_ready | mfpu_mask_ready
Broadcast strategy requires tagged mask handling if instruction queue > 1.
5.2 Saturation Flag Aggregation
Both units can set a saturation flag:
assign vxsat_flag_o = mfpu_vxsat | alu_vxsat;
6. Lane-Level Operation and Modular Structure
This module is fully parameterized per-lane, enabling reuse across multiple vector lanes. Each lane runs its own instance of this module, and each instance manages:
One ALU (
i_valu
)One MFPU (
i_vmfpu
)
The control strategy is uniform:
Operand queues → functional units
Results → VRF
Status → dispatcher/sequencer
Mask/Slide → helpers
7. Design Notes
Uses the
ara_pkg
andrvv_pkg
definitions for consistency across Ara.Designed to be flexible with regard to precision, operation type, and lane width.
Follows strict handshake protocols to avoid hazards.
Clean separation of logic for ALU and MFPU makes it easy to extend or adapt.
Interfacing with the Mask and Slide Units is modular and scalable.