simd_mul
— Ara’s in-lane SIMD multiplier
Module Name: simd_mul
Function: This module implements a SIMD vector multiplier for Ara.
Overview
The simd_mul
module is Ara’s SIMD multiplier, supporting 8/16/32/64-bit elements and fixed-point multiplication with rounding and saturation. It operates in a pipelined manner and can produce one 64-bit result per cycle when fully utilized. It supports several RISC-V vector arithmetic operations (VMUL
, VMULH
, VMULHU
, VSMUL
, VMADD
, VMACC
, etc.).
Parameters
Parameter |
Type |
Description |
---|---|---|
|
|
Enables fixed-point support and saturation logic. |
|
|
Number of pipeline stages. |
|
|
Specifies element width (EW8, EW16, EW32, EW64). |
Interface
Port |
Direction |
Type |
Description |
---|---|---|---|
|
Input |
|
Clock. |
|
Input |
|
Active-low reset. |
`operand_[a |
b |
c]_i` |
Input |
|
Input |
|
Element-level mask. |
|
Input |
|
Operation selector. |
|
Input |
|
Rounding mode. |
|
Input |
|
Input valid handshake. |
|
Output |
|
Input ready. |
|
Output |
|
Computed result. |
|
Output |
|
Mask forwarded to next stage. |
|
Output |
|
Saturation flags. |
|
Output |
|
Output valid. |
|
Input |
|
Output ready handshake. |
Key Internal Structures
Operand Packing (mul_operand_t
)
Unions the operand into multiple views (64, 32, 16, or 8-bit chunks) for ease of SIMD processing.
Pipeline Buffers
Multi-stage pipeline implemented using shift-register logic, enabled via NumPipeRegs
. All operands, opcodes, masks, and valid bits are pipelined.
Ready Valid Interface
Flow control logic supports handshake-based data propagation through the pipeline.
Functional Description
Multiply Logic
Supports various operations:
VMUL
,VMULHU
,VMULHSU
,VMULH
→ Integer multiplyVSMUL
→ Fixed-point multiply with rounding/saturationVMACC
,VMADD
,VNMSAC
,VNMSUB
→ Multiply-accumulate & multiply-subtract
Signedness
Sign control logic dynamically selects signed/unsigned behavior based on opcode (signed_a
, signed_b
).
Result Rounding
Rounding logic for VSMUL
is driven by vxrm
mode:
00: round to nearest even
01: round to nearest, tie to max magnitude
10: truncate
11: ceiling
Saturation (vxsat
)
When enabled (FixedPointEnable
), detects overflow and sets corresponding saturation bits.
Element Width Handling
For each supported width, dedicated combinational blocks handle:
Partial multiplications
Optional fixed-point rounding/saturation
Result assembly and output formatting
EW64
1 × 64-bit operation
Uses full 128-bit product
EW32
2 × 32-bit elements
Operates on upper/lower halves
EW16
4 × 16-bit elements
Operates on 16-bit lanes
EW8
8 × 8-bit elements
Operates on each byte independently