`sldu` — Ara’s slide unit, for permutations, shuffles, and slides

Overview

The Slide Unit (sldu) in Ara’s vector processor is responsible for implementing vector slide instructions as specified in the RISC-V Vector Extension (RVV). These instructions shift elements within vector registers, either left or right, potentially with a configurable stride, and can support varying effective element widths (EEWs). The design is modular and consists of three components:

sldu: The top-level Slide Unit module
sldu_op_dp: The datapath handling element reshuffling and shifting
p2_stride_gen: A utility module that generates power-of-two strides

This unit supports seamless data flow between the operand lanes and result queues, handling valid/ready handshakes and internal reshuffling, aligning with the RVV specification.

1. `sldu`: Top-Level Slide Unit

Purpose

The sldu module serves as the interface and coordinator for the entire slide operation. It connects the operand input/output ports, manages the slide operation control logic, and integrates the datapath (sldu_op_dp) and the stride generator (p2_stride_gen).

Key Interfaces

Clock and Reset
- clk_i, rst_ni: Standard synchronous design signals.
Operands
- sldu_operand_i [NrLanes-1:0]: Operand vector from the lanes.
- sldu_operand_valid_i, sldu_operand_ready_o: Valid/ready handshake.
- sldu_result_o, sldu_result_valid_o, sldu_result_ready_i: Slide result vector and handshake signals.
Control
- vinsn_issue_i: Vector instruction information (EEW, SEW, etc.).
- stride_valid_i, stride_update_i: Control inputs for stride progression.
- stride_i: Incoming stride value.
Utility
- stride_valid_o: Asserted if the stride value is a power of two.
- popcount_o: Output of 1-bit population count on stride vector.

Functionality

Integrates:
- Datapath (sldu_op_dp) for reshuffling and sliding
- Stride Generator (p2_stride_gen) for power-of-two stride sequence generation
Responds to stride updates and dynamically loads new strides.

The slide unit’s datapath can only handle power-of-two strides. Every non-power-of-two stride is broken down into power-of-two strides. This ensures a lightweight interconnect datapath in the slide unit while accelerating the common case. Non-power-of-2 slides are extremely rare.

The slide unit can also reshuffle, i.e., perform a slide-by-zero with different input and output data widths. This is used to change the byte layout of a vector register file.

2. `sldu_op_dp`: Slide Operand Datapath

Purpose

This module implements the actual sliding logic of the operands, depending on:

Source and destination EEW (eew_src_i, eew_dst_i)
Direction (dir_i)
Slide amount (slamt_i)

It operates with flattened vectors (op_i_flat, op_o_flat) for simplified internal manipulation.

Operation

Uses a large unique case block over {eew_src_i, eew_dst_i, slamt_i, dir_i} to pattern-match operations.
For each case, byte-wise manipulation (via +: 8 slices) rearranges bytes between source and destination.
The result is assigned back to the op_o_flat register, which is then returned to the module interface.

Notable Features

Handles conversions across EEWs (e.g., EW8 → EW16)
To have a simpler datapath, it cannot slide and reshuffle in the same cycle

3. `p2_stride_gen`: Power-of-Two Stride Generator

Purpose

This utility module generates stride vectors where exactly one bit is high (i.e., a power-of-two stride), and can sequentially generate the next stride on update_i.

Interfaces

Input
- stride_i: A stride vector to load.
- valid_i: Load enable.
- update_i: Trigger to generate the next stride.
Output
- stride_p2_o: Power-of-two stride vector
- valid_o: Indicates if a valid (non-zero) stride is present
- popc_o: Population count of stride bits

Functionality

Uses:
- popcount module to count active bits in stride_i
- lzc module to detect the first active bit
Computes the next stride by XORing the current with the last stride
Asserts valid_o if the stride is valid

Signal Behavior Across Modules

vinsn_issue_i is propagated across modules to control EEW behaviors and operand reshuffling.
sldu_op_dp interprets the sliding direction (dir_i) and index (slamt_i) to select the output permutation.
stride_p2_o controls which element is selected during a stride-slide.
All data vectors (op_i, op_o) are organized as elen_t [NrLanes-1:0], allowing lane-based parallel operation.

sldu — Ara’s slide unit, for permutations, shuffles, and slides

Overview

1. sldu: Top-Level Slide Unit

Purpose

Key Interfaces

Functionality

2. sldu_op_dp: Slide Operand Datapath

Purpose

Operation

Notable Features

3. p2_stride_gen: Power-of-Two Stride Generator

Purpose

Interfaces

Functionality

Signal Behavior Across Modules

`sldu` — Ara’s slide unit, for permutations, shuffles, and slides

1. `sldu`: Top-Level Slide Unit

2. `sldu_op_dp`: Slide Operand Datapath

3. `p2_stride_gen`: Power-of-Two Stride Generator