# `valu` - Instantiate the in-lane SIMD ALU (unpipelined) ## Overview The `valu` module is a central component of Ara's vector processing pipeline. It acts as a SIMD (Single Instruction, Multiple Data) integer Arithmetic Logic Unit, capable of executing vector instructions over 64-bit wide data lanes. Its primary role is to execute integer operations across multiple vector lanes in parallel and to manage operations including fixed-point arithmetic, scalar replication, vector reductions, narrowing operations, and interaction with the mask unit and slide unit. This documentation serves as a **golden reference**, explaining every functional aspect of the `valu` module in detail. --- ## Table of Contents 1. [Module Parameters](#module-parameters) 2. [Top-Level Interface](#top-level-interface) 3. [Vector Instruction Queue](#vector-instruction-queue) 4. [Result Queue](#result-queue) 5. [Scalar Operand Processing](#scalar-operand-processing) 6. [Mask Operand Handling](#mask-operand-handling) 7. [Reduction Support](#reduction-support) 8. [Narrowing Instructions](#narrowing-instructions) 9. [SIMD ALU Execution](#simd-alu-execution) 10. [Fixed-Point Rounding and Saturation](#fixed-point-rounding-and-saturation) 11. [Control and State Machines](#control-and-state-machines) 12. [Instruction Lifecycle](#instruction-lifecycle) 13. [Commit and Writeback](#commit-and-writeback) 14. [Summary](#summary) --- ## Module Parameters ```systemverilog parameter int unsigned NrLanes; parameter int unsigned VLEN; parameter fixpt_support_e FixPtSupport; parameter type vaddr_t; parameter type vfu_operation_t; ``` - `NrLanes`: Number of vector lanes. - `VLEN`: Vector register length. - `FixPtSupport`: Enable or disable fixed-point support. - `vaddr_t`, `vfu_operation_t`: Type definitions for vector addresses and operations. --- ## Top-Level Interface The `valu` module interfaces with several components: - **Dispatcher**: Provides new vector operations (`vfu_operation_i`) and receives `vxsat_flag_o`. - **Lane sequencer**: Coordinates the execution flow. - **Operand queues**: Deliver source operands (`alu_operand_i`). - **VRF**: Accepts results for writeback. - **Slide Unit**: Exchanges operands and results for inter-lane reduction. - **Mask Unit**: Manages masking for selective operations. --- ## Vector Instruction Queue The queue tracks in-flight instructions and separates their execution phases: - `accept_pnt`: Points to the next free entry for acceptance. - `issue_pnt`: Instruction to be executed next. - `commit_pnt`: Instruction whose results are being written back. Each instruction is described by a `vfu_operation_t` struct and includes the vector operation, destination ID, and control fields like `vm` and `vsew`. --- ## Result Queue This FIFO queue temporarily stores computation results, including: - `wdata`: Resulting word. - `id`: Instruction ID. - `addr`: Target vector address. - `be`: Byte enable. - `mask`: Whether this result is to be forwarded to the mask unit. Two pointers (`write_pnt`, `read_pnt`) and a count (`result_queue_cnt_q`) track the queue status. --- ## Scalar Operand Processing Scalar values from instructions are **replicated** to 64-bit words depending on `vsew`. For instance: - `EW8`: replicated 8× - `EW16`: replicated 4× - etc. This ensures compatibility with SIMD-wide operations. --- ## Mask Operand Handling When vector masking is enabled (`vm == 0`), a `spill_register` sends result data to the mask unit, filtered via the `mask_operand_ready_i` signal. --- ## Reduction Support Reduction operations are divided into: - **Intra-lane**: Sequential accumulation within a single lane. - **Inter-lane**: Exchange of partial sums between lanes via the Slide Unit. The ALU manages: - Internal counters (`reduction_rx_cnt_q`) - Inter-lane state transitions (e.g., `INTER_LANES_REDUCTION_TX`) - Final SIMD reduction in lane 0 --- ## Narrowing Instructions Instructions like `VNSRA`, `VNCLIP` produce only **half the normal element width per cycle**. The module uses a toggle (`narrowing_select_q`) to alternate writing high/low halves. --- ## SIMD ALU Execution The core computation is handled by `simd_alu`, parameterized by fixed-point support and rounding. It takes: - `alu_operand_a` and `alu_operand_b` - The operation (`op_i`) - The mask (`mask_i`) - A rounding modifier (`rm`) Results are fed to the result queue and selectively masked. --- ## Fixed-Point Rounding and Saturation When enabled, rounding and saturation are implemented via `fixed_p_rounding` and internal saturation flags (`alu_vxsat_q`). The saturation flag `vxsat_flag_o` is updated during commit. --- ## Control and State Machines ### ALU State (`alu_state_q`) Defines execution phase: - `NO_REDUCTION` - `INTRA_LANE_REDUCTION` - `INTER_LANES_REDUCTION_TX` - `INTER_LANES_REDUCTION_RX` - `SIMD_REDUCTION` - `LN0_REDUCTION_COMMIT` The FSM governs transitions based on instruction type, operand readiness, and SLDU handshake. --- ## Instruction Lifecycle 1. **Acceptance**: New vector instructions are stored in the queue. 2. **Issue**: Instructions are fetched and executed if operands are valid. 3. **Execution**: Results are generated (immediately or over multiple cycles). 4. **Commit**: Data is written to the vector register file or sent to the mask unit. --- ## Commit and Writeback - Writeback to VRF is gated by `alu_result_gnt_i`. - Results for masking go through `mask_operand_o`. - Queue counters and state pointers are updated accordingly.