simd_alu - Ara’s in-lane SIMD ALU (simd_alu)
This document provides an in-depth technical explanation of the simd_alu module in the Ara vector processor. The simd_alu (Single Instruction, Multiple Data Arithmetic Logic Unit) is responsible for element-wise ALU operations on 64-bit vector elements. It supports fixed-point arithmetic, saturating arithmetic, logical and comparison operations, shift instructions, and narrowing/rounding/merge instructions.
Summary
Module Name:
simd_aluSource:
simd_alu.svAuthor: Matheus Cavalcante
License: Solderpad Hardware License, Version 0.51
Purpose: Implements vector ALU functionality supporting element-wise operations, fixed-point saturation, rounding, comparisons, and shifts for Ara’s 64-bit SIMD vector datapath.
Inputs
Signal |
Type |
Description |
|---|---|---|
|
|
First operand for the ALU operation |
|
|
Second operand |
|
|
Enables processing of a new instruction |
|
|
Vector mask enable |
|
|
Byte-level mask controlling predicate effects |
|
|
Select for narrowing results |
|
|
ALU operation code |
|
|
Vector element width selector (EW8, EW16, etc.) |
|
|
Rounding mode (used in fixed-point ops) |
|
|
Fixed-point rounding mode (VXRM) |
Outputs
Signal |
Type |
Description |
|---|---|---|
|
|
Final result after SIMD ALU computation |
|
|
Overflow saturation flags per lane |
Internal Types
alu_operand_t: Unions allowing the interpretation of a 64-bit value as 8/16/32/64-bit elements.alu_sat_operand_t: Extended width unions for saturation detection.
Main Features and Functionality
1. Vector Element Width Awareness
Operations are performed on lanes as defined by vew_i:
EW8: 8x 8-bit operationsEW16: 4x 16-bitEW32: 2x 32-bitEW64: 1x 64-bit
Each operation adapts to the selected width via unpacking the input operands accordingly.
2. ALU Operation Decoding
The module uses a large case statement on op_i to implement logic/arithmetic/comparison instructions. Many instructions use nested case statements based on vew_i.
3. Saturation and Fixed-Point Handling
Fixed-point operations (e.g., VSADD, VASUB, VNCLIP) are handled conditionally using FixPtSupport. Overflow checks are done by checking high bits and flags are set in vxsat.
4. Mask Logic & Merging
The mask signal (mask_i) interacts with vm_i and is embedded in certain instruction results (e.g., comparisons). Merge and scalar move operations use the mask to choose between operands.
5. Shift & Narrowing Operations
Includes support for:
Logical/arithmetic shifts (
VSLL,VSRL,VSRA)Narrowing shift with optional rounding (
VNSRL,VNSRA)Clip instructions (
VNCLIP,VNCLIPU) with saturation
6. Rounding Modes (VXRM)
Rounding behavior for fixed-point arithmetic and narrowing instructions is selected via vxrm_i, using 4 defined rounding modes (e.g., round to nearest even, zero, etc.).
Assertions
The final assertion checks that
DataWidth == $bits(alu_operand_t)to ensure 64-bit operation compatibility.
Instruction Categories
Instructions include but are not limited to:
Category |
Examples |
|---|---|
Logical |
|
Arithmetic |
|
Comparison |
|
Saturating |
|
Fixed-point |
|
Merging/Masking |
|
Shift Operations |
|
Design Considerations
Efficiency: Optimized for combinational output with modular per-lane calculations.
Flexibility: Supports varied element widths and rounding behavior.
Masking Support: Integrated mask control for conditional computation.
Saturation Awareness: vxsat flags make it suitable for overflow-sensitive ops.
RISC-V RVV Compatible: Aligns with vector instruction format and control conventions.
Example Behavior (Pseudocode)
// VADD with EW16 and two operands
for (int i = 0; i < 4; i++) {
res.w16[i] = opa.w16[i] + opb.w16[i];
}