vmfpu
— Instantiate in-lane SIMD FPU, SIMD multiplier, and SIMD divider (pipelined or multi-cycle)
Module Name: vmfpu
Summary
The vmfpu
module is part of the Ara RISC-V vector processor and implements the Vector Multiply and Floating Point Unit (VFPU) responsible for executing arithmetic operations that include multiplication and floating point computation. It supports operations over elements coming from the vector register file, controlled by the vector instruction sequencer and interconnected via operand queues.
This module receives instructions, manages multiple operand queues, schedules operations on sub-units (like FP multipliers and ALUs), and routes results back to the operand dispatcher. It plays a key role in exploiting vector-level parallelism in floating point and multiplication domains.
Interface Description
Clocking and Reset
Signal |
Direction |
Description |
---|---|---|
|
Input |
Clock signal |
|
Input |
Active-low synchronous reset |
Operand Queue Interface
Signal |
Direction |
Description |
---|---|---|
|
Input |
One-hot encoding for operand queue validity |
|
Input |
Operands input from the vector register file |
|
Output |
Ready handshake for each operand queue |
Result Interface
Signal |
Direction |
Description |
---|---|---|
|
Output |
Valid signal to request writing result |
|
Output |
ID of the instruction producing the result |
|
Output |
VRF write address |
|
Output |
Result data to be written |
|
Output |
Byte enable signal |
|
Input |
Grant from operand requester |
|
Input |
Final commit acknowledgment from operand requester |
Functional Blocks Overview
1. Operand Queue Decoding
The module reads from several operand queues concurrently and decodes operand availability. A onehot
signal (operand_valid_i
) triggers internal FSMs to accept and latch operand data into internal registers.
This mechanism ensures that the pipeline only proceeds when all necessary operands are ready.
2. FSM (Finite State Machine) Control Logic
The FSM implements several states:
IDLE: Waits for operands to be valid.
WAIT: Waits for all operands to be latched and for a backend (like FPU/MUL) to be free.
EXECUTE: Passes the operands to the compute unit and asserts a
valid
signal.DONE: Waits for the result grant handshake.
Transitions occur on the basis of operand readiness, backend availability, and completion handshakes. This careful control ensures precise handling of instruction lifecycle.
3. Backend Compute Unit Selection
Depending on the instruction:
Integer multiplication might be routed to the
Mul
unit.Floating point computation might use FP ALU, FP MUL, or FP FMA.
These units are external to the vmfpu
module and are connected via operand and result interfaces. The correct selection depends on the target_fu
and operation code embedded in the vector instruction.
4. Stream Registers
Each result-producing unit feeds into a stream register, which buffers results until downstream units acknowledge the result (result_gnt_i
), avoiding backpressure to the compute units.
These registers decouple execution and result storage, facilitating pipeline throughput.
Code Walkthrough
Module Declaration and Parameters
module vmfpu import ara_pkg::*; ...
The module imports Ara and RVV packages, ensuring access to operand types, configuration constants, and hardware definitions.
It uses several type parameters:
NrLanes
,VLEN
,vaddr_t
: define structural hardware constraints.operand_queue_cmd_t
,operand_request_cmd_t
: encapsulate control signals between scheduler and operand queues.
Registers and Internal Signals
Internal signals such as fsm_state_q
, operand_valid
, and stream_reg_payload
track FSM state, operand readiness, and result data, respectively.
These are updated in sequential logic blocks and control combinational data paths.
Operand Gathering and Arbitration
Each operand queue’s validity is polled, and the module arbitrates between them using one-hot logic.
When valid operands are detected, the FSM transitions into WAIT and EXECUTE states.
Computation Dispatch
Operations are dispatched to the correct backend depending on instruction decoding logic:
case (target_fu)
FPU: begin
...
end
MUL: begin
...
end
endcase
Here, target_fu
determines which functional unit receives the operands.
Result Capture and Output
Results are written to output ports using stream registers:
stream_register #(...) i_fpu_stream_reg (
...
.valid_i (fpu_result_valid),
.data_i ({id, addr, result}),
...
);
Once the result is accepted by the downstream module, the FSM returns to IDLE.