# `fixed_p_rounding` - Set up fixed-point arithmetic rounding information ## Overview The `fixed_p_rounding` module implements fixed-point rounding logic for Ara's vector execution stage. It operates on 64-bit wide elements and generates rounding result bits for each element stream based on the rounding mode and vector element width (VEW). --- ## Ports ### Inputs | Name | Type | Description | |-----------------|--------------|-------------| | `operand_a_i` | `elen_t` | Encodes the shift amount `j` for each sub-element. | | `operand_b_i` | `elen_t` | Source operand to extract rounding bits from. | | `valid_i` | `logic` | Enables rounding logic when high. | | `op_i` | `ara_op_e` | Operation type; rounding occurs only on selected operations. | | `vew_i` | `vew_e` | Specifies the vector element width: `EW8`, `EW16`, `EW32`, or `EW64`. | | `vxrm_i` | `vxrm_t` | RISC-V vector rounding mode: `00`, `01`, `10`, or `11`. | ### Output | Name | Type | Description | |-----------------|--------------|-------------| | `r_o` | `strb_t` | Output rounding bits for each element. | --- ## Internal Structures ### `rounding_args` A `union packed` type that allows simultaneous interpretation of a 64-bit value as: - 8 x 8-bit (`w8`) - 4 x 16-bit (`w16`) - 2 x 32-bit (`w32`) - 1 x 64-bit (`w64`) --- ## Functional Behavior Rounding logic is triggered only when `valid_i` is high and `op_i` matches one of: - `VSSRA` - `VSSRL` - `VNCLIP` - `VNCLIPU` Depending on `vew_i` and `vxrm_i`, different rounding strategies are selected. ### Rounding Modes (`vxrm_i`) - `00` (RNU): Round to Nearest, Up. - `01` (RNE): Round to Nearest, ties to Even. - `10` (RTZ): Round Towards Zero (no rounding). - `11` (ROD): Round towards -∞ for negative and +∞ for positive. ### Element Width Handling Each `vew_i` type (`EW8`, `EW16`, etc.) leads to different slicing of input operands. #### Example for `EW8`, `vxrm = 00`: ```verilog for (int i = 0; i < 8; i++) begin j = opa.w8[i]; r_o[i] = opb.w8[i][j-1]; end ``` #### Example for `EW32`, `vxrm = 01`: ```verilog for (int i = 0; i < 2; i++) begin j = opa.w32[i]; r_o[i] = opb.w32[i][j-1] & opb.w32[i][j]; end ``` #### For `vxrm = 10`, no rounding is performed: ```verilog r_o = '0; ``` #### For `vxrm = 11` (ROD), more logic is applied: - Negates the `j`th bit - ORs the masked remaining bits using a lookup from `bit_select` --- ## Lookup Table: `bit_select` This is a constant table used to mask and check lower bits for rounding: ```verilog bit_select = { 64'h0000000000000000, 64'h0000000000000001, 64'h0000000000000003, ... 64'h7FFFFFFFFFFFFFFF }; ``` Each index `j` selects `bit_select[j]` to help evaluate whether remaining bits are non-zero.