simd_div
— Ara’s in-lane SIMD divider
Overview
The simd_div
module implements Ara’s Serial Divider, designed to execute vector division and remainder operations element-by-element. The unit supports signed and unsigned divisions for different vector element widths (8, 16, 32, 64 bits) but operates serially — processing one element at a time using a backend divider (serdiv
). Its serial nature restricts intra-vector parallelism but simplifies design complexity and area.
This module is parameterized with CVA6 configuration settings and is tailored for use within Ara’s scalar/vector datapath execution pipeline.
Interface Description
Port |
Width |
Direction |
Description |
---|---|---|---|
|
1 |
input |
Clock signal |
|
1 |
input |
Active-low synchronous reset |
|
|
input |
Dividend operand vector element |
|
|
input |
Divisor operand vector element |
|
|
input |
Mask bits for each byte |
|
|
input |
Operation type: VDIV, VREM, VDIVU, VREMU |
|
|
input |
Byte-enable per vector element |
|
|
input |
Vector element width (VEW): EW8, EW16, etc. |
|
|
output |
Final division result vector element |
|
|
output |
Output mask signal |
|
1 |
input |
New valid input transaction |
|
1 |
output |
Module ready to accept input |
|
1 |
input |
Downstream ready for result |
|
1 |
output |
Output is valid |
Module Structure and Key Components
FSMs: Issue and Commit Control Units
issue_state_q/d
(FSM):Accepts operands from upstream.
Tracks issued bytes.
Serially sends one operand pair to the divider.
commit_state_q/d
(FSM):Collects results from the divider.
Buffers and shifts them into output.
Drives
valid_o
when the entire result is ready.
Operands and Control Buffers
Input operands
opa_q
,opb_q
and their staging versionsopa_d
,opb_d
.Opcode and vector element width held in
op_q
,vew_q
.
Counters
issue_cnt_q/d
: How many elements still to be issued.commit_cnt_q/d
: How many results still to be committed.Both counters decrement as each element is processed or skipped (masked off).
Divider Core
Uses the
serdiv
instance, a serial divider supporting signed/unsigned division and remainder.Supported Opcodes:
VDIV
– signed divisionVDIVU
– unsigned divisionVREM
– signed remainderVREMU
– unsigned remainder
Operand Width Handling
Each element width (VEW) has a specialized operand unpacking logic:
EW8: 8-bit → Sign-extended to 64-bit.
EW16: 16-bit → Sign-extended to 64-bit.
EW32: 32-bit → Sign-extended to 64-bit.
EW64: Already native 64-bit.
These are extracted from the operand unions and sign-extended (for signed ops).
Output Construction
Partial results are shifted into the final
result_q
.Results are masked and merged based on the current element width and byte enables.
Timing and Pipeline
Fully serialized pipeline.
Accepts new input only when the previous result is fully committed.
Maintains FSM state and stable operand/context throughout.