# `simd_alu` - Ara's in-lane SIMD ALU (`simd_alu`) This document provides an in-depth technical explanation of the `simd_alu` module in the Ara vector processor. The `simd_alu` (Single Instruction, Multiple Data Arithmetic Logic Unit) is responsible for element-wise ALU operations on 64-bit vector elements. It supports fixed-point arithmetic, saturating arithmetic, logical and comparison operations, shift instructions, and narrowing/rounding/merge instructions. --- ## Summary - **Module Name:** `simd_alu` - **Source:** `simd_alu.sv` - **Author:** Matheus Cavalcante - **License:** Solderpad Hardware License, Version 0.51 - **Purpose:** Implements vector ALU functionality supporting element-wise operations, fixed-point saturation, rounding, comparisons, and shifts for Ara’s 64-bit SIMD vector datapath. --- ## Inputs | Signal | Type | Description | |----------------------|----------------------------|-------------| | `operand_a_i` | `elen_t` (64-bit) | First operand for the ALU operation | | `operand_b_i` | `elen_t` (64-bit) | Second operand | | `valid_i` | `logic` | Enables processing of a new instruction | | `vm_i` | `logic` | Vector mask enable | | `mask_i` | `strb_t` (byte-wide mask) | Byte-level mask controlling predicate effects | | `narrowing_select_i` | `logic` | Select for narrowing results | | `op_i` | `ara_op_e` | ALU operation code | | `vew_i` | `vew_e` | Vector element width selector (EW8, EW16, etc.) | | `rm` | `strb_t` | Rounding mode (used in fixed-point ops) | | `vxrm_i` | `vxrm_t` | Fixed-point rounding mode (VXRM) | ## Outputs | Signal | Type | Description | |--------------|---------------|-------------| | `result_o` | `elen_t` | Final result after SIMD ALU computation | | `vxsat_o` | `vxsat_t` | Overflow saturation flags per lane | --- ## Internal Types - `alu_operand_t`: Unions allowing the interpretation of a 64-bit value as 8/16/32/64-bit elements. - `alu_sat_operand_t`: Extended width unions for saturation detection. --- ## Main Features and Functionality ### 1. **Vector Element Width Awareness** Operations are performed on lanes as defined by `vew_i`: - `EW8`: 8x 8-bit operations - `EW16`: 4x 16-bit - `EW32`: 2x 32-bit - `EW64`: 1x 64-bit Each operation adapts to the selected width via unpacking the input operands accordingly. ### 2. **ALU Operation Decoding** The module uses a large `case` statement on `op_i` to implement logic/arithmetic/comparison instructions. Many instructions use nested `case` statements based on `vew_i`. ### 3. **Saturation and Fixed-Point Handling** Fixed-point operations (e.g., `VSADD`, `VASUB`, `VNCLIP`) are handled conditionally using `FixPtSupport`. Overflow checks are done by checking high bits and flags are set in `vxsat`. ### 4. **Mask Logic & Merging** The mask signal (`mask_i`) interacts with `vm_i` and is embedded in certain instruction results (e.g., comparisons). Merge and scalar move operations use the mask to choose between operands. ### 5. **Shift & Narrowing Operations** Includes support for: - Logical/arithmetic shifts (`VSLL`, `VSRL`, `VSRA`) - Narrowing shift with optional rounding (`VNSRL`, `VNSRA`) - Clip instructions (`VNCLIP`, `VNCLIPU`) with saturation ### 6. **Rounding Modes (VXRM)** Rounding behavior for fixed-point arithmetic and narrowing instructions is selected via `vxrm_i`, using 4 defined rounding modes (e.g., round to nearest even, zero, etc.). --- ## Assertions - The final assertion checks that `DataWidth == $bits(alu_operand_t)` to ensure 64-bit operation compatibility. --- ## Instruction Categories Instructions include but are not limited to: | Category | Examples | |-------------------|----------| | Logical | `VAND`, `VOR`, `VXOR` | | Arithmetic | `VADD`, `VSUB`, `VSADDU`, `VSADD` | | Comparison | `VMSEQ`, `VMSLT`, `VMAX`, `VMIN`, etc. | | Saturating | `VSSUB`, `VSSUBU`, `VSADDU` | | Fixed-point | `VASUB`, `VNCLIP`, `VSSRA`, `VSSRL` | | Merging/Masking | `VMERGE`, `VMXOR`, `VMXNOR`, etc. | | Shift Operations | `VSLL`, `VSRA`, `VNSRA`, etc. | --- ## Design Considerations - **Efficiency:** Optimized for combinational output with modular per-lane calculations. - **Flexibility:** Supports varied element widths and rounding behavior. - **Masking Support:** Integrated mask control for conditional computation. - **Saturation Awareness:** vxsat flags make it suitable for overflow-sensitive ops. - **RISC-V RVV Compatible:** Aligns with vector instruction format and control conventions. --- ## Example Behavior (Pseudocode) ```systemverilog // VADD with EW16 and two operands for (int i = 0; i < 4; i++) { res.w16[i] = opa.w16[i] + opb.w16[i]; } ```