simd_alu
- Ara’s in-lane SIMD ALU (simd_alu
)
This document provides an in-depth technical explanation of the simd_alu
module in the Ara vector processor. The simd_alu
(Single Instruction, Multiple Data Arithmetic Logic Unit) is responsible for element-wise ALU operations on 64-bit vector elements. It supports fixed-point arithmetic, saturating arithmetic, logical and comparison operations, shift instructions, and narrowing/rounding/merge instructions.
Summary
Module Name:
simd_alu
Source:
simd_alu.sv
Author: Matheus Cavalcante
License: Solderpad Hardware License, Version 0.51
Purpose: Implements vector ALU functionality supporting element-wise operations, fixed-point saturation, rounding, comparisons, and shifts for Ara’s 64-bit SIMD vector datapath.
Inputs
Signal |
Type |
Description |
---|---|---|
|
|
First operand for the ALU operation |
|
|
Second operand |
|
|
Enables processing of a new instruction |
|
|
Vector mask enable |
|
|
Byte-level mask controlling predicate effects |
|
|
Select for narrowing results |
|
|
ALU operation code |
|
|
Vector element width selector (EW8, EW16, etc.) |
|
|
Rounding mode (used in fixed-point ops) |
|
|
Fixed-point rounding mode (VXRM) |
Outputs
Signal |
Type |
Description |
---|---|---|
|
|
Final result after SIMD ALU computation |
|
|
Overflow saturation flags per lane |
Internal Types
alu_operand_t
: Unions allowing the interpretation of a 64-bit value as 8/16/32/64-bit elements.alu_sat_operand_t
: Extended width unions for saturation detection.
Main Features and Functionality
1. Vector Element Width Awareness
Operations are performed on lanes as defined by vew_i
:
EW8
: 8x 8-bit operationsEW16
: 4x 16-bitEW32
: 2x 32-bitEW64
: 1x 64-bit
Each operation adapts to the selected width via unpacking the input operands accordingly.
2. ALU Operation Decoding
The module uses a large case
statement on op_i
to implement logic/arithmetic/comparison instructions. Many instructions use nested case
statements based on vew_i
.
3. Saturation and Fixed-Point Handling
Fixed-point operations (e.g., VSADD
, VASUB
, VNCLIP
) are handled conditionally using FixPtSupport
. Overflow checks are done by checking high bits and flags are set in vxsat
.
4. Mask Logic & Merging
The mask signal (mask_i
) interacts with vm_i
and is embedded in certain instruction results (e.g., comparisons). Merge and scalar move operations use the mask to choose between operands.
5. Shift & Narrowing Operations
Includes support for:
Logical/arithmetic shifts (
VSLL
,VSRL
,VSRA
)Narrowing shift with optional rounding (
VNSRL
,VNSRA
)Clip instructions (
VNCLIP
,VNCLIPU
) with saturation
6. Rounding Modes (VXRM)
Rounding behavior for fixed-point arithmetic and narrowing instructions is selected via vxrm_i
, using 4 defined rounding modes (e.g., round to nearest even, zero, etc.).
Assertions
The final assertion checks that
DataWidth == $bits(alu_operand_t)
to ensure 64-bit operation compatibility.
Instruction Categories
Instructions include but are not limited to:
Category |
Examples |
---|---|
Logical |
|
Arithmetic |
|
Comparison |
|
Saturating |
|
Fixed-point |
|
Merging/Masking |
|
Shift Operations |
|
Design Considerations
Efficiency: Optimized for combinational output with modular per-lane calculations.
Flexibility: Supports varied element widths and rounding behavior.
Masking Support: Integrated mask control for conditional computation.
Saturation Awareness: vxsat flags make it suitable for overflow-sensitive ops.
RISC-V RVV Compatible: Aligns with vector instruction format and control conventions.
Example Behavior (Pseudocode)
// VADD with EW16 and two operands
for (int i = 0; i < 4; i++) {
res.w16[i] = opa.w16[i] + opb.w16[i];
}