sldu — Ara’s slide unit, for permutations, shuffles, and slides

Overview

The Slide Unit (sldu) in Ara’s vector processor is responsible for implementing vector slide instructions as specified in the RISC-V Vector Extension (RVV). These instructions shift elements within vector registers, either left or right, potentially with a configurable stride, and can support varying effective element widths (EEWs). The design is modular and consists of three components:

  • sldu: The top-level Slide Unit module

  • sldu_op_dp: The datapath handling element reshuffling and shifting

  • p2_stride_gen: A utility module that generates power-of-two strides

This unit supports seamless data flow between the operand lanes and result queues, handling valid/ready handshakes and internal reshuffling, aligning with the RVV specification.


1. sldu: Top-Level Slide Unit

Purpose

The sldu module serves as the interface and coordinator for the entire slide operation. It connects the operand input/output ports, manages the slide operation control logic, and integrates the datapath (sldu_op_dp) and the stride generator (p2_stride_gen).

Key Interfaces

  • Clock and Reset

    • clk_i, rst_ni: Standard synchronous design signals.

  • Operands

    • sldu_operand_i [NrLanes-1:0]: Operand vector from the lanes.

    • sldu_operand_valid_i, sldu_operand_ready_o: Valid/ready handshake.

    • sldu_result_o, sldu_result_valid_o, sldu_result_ready_i: Slide result vector and handshake signals.

  • Control

    • vinsn_issue_i: Vector instruction information (EEW, SEW, etc.).

    • stride_valid_i, stride_update_i: Control inputs for stride progression.

    • stride_i: Incoming stride value.

  • Utility

    • stride_valid_o: Asserted if the stride value is a power of two.

    • popcount_o: Output of 1-bit population count on stride vector.

Functionality

  • Integrates:

    • Datapath (sldu_op_dp) for reshuffling and sliding

    • Stride Generator (p2_stride_gen) for power-of-two stride sequence generation

  • Responds to stride updates and dynamically loads new strides.

The slide unit’s datapath can only handle power-of-two strides. Every non-power-of-two stride is broken down into power-of-two strides. This ensures a lightweight interconnect datapath in the slide unit while accelerating the common case. Non-power-of-2 slides are extremely rare.

The slide unit can also reshuffle, i.e., perform a slide-by-zero with different input and output data widths. This is used to change the byte layout of a vector register file.


2. sldu_op_dp: Slide Operand Datapath

Purpose

This module implements the actual sliding logic of the operands, depending on:

  • Source and destination EEW (eew_src_i, eew_dst_i)

  • Direction (dir_i)

  • Slide amount (slamt_i)

It operates with flattened vectors (op_i_flat, op_o_flat) for simplified internal manipulation.

Operation

  • Uses a large unique case block over {eew_src_i, eew_dst_i, slamt_i, dir_i} to pattern-match operations.

  • For each case, byte-wise manipulation (via +: 8 slices) rearranges bytes between source and destination.

  • The result is assigned back to the op_o_flat register, which is then returned to the module interface.

Notable Features

  • Handles conversions across EEWs (e.g., EW8 → EW16)

  • To have a simpler datapath, it cannot slide and reshuffle in the same cycle


3. p2_stride_gen: Power-of-Two Stride Generator

Purpose

This utility module generates stride vectors where exactly one bit is high (i.e., a power-of-two stride), and can sequentially generate the next stride on update_i.

Interfaces

  • Input

    • stride_i: A stride vector to load.

    • valid_i: Load enable.

    • update_i: Trigger to generate the next stride.

  • Output

    • stride_p2_o: Power-of-two stride vector

    • valid_o: Indicates if a valid (non-zero) stride is present

    • popc_o: Population count of stride bits

Functionality

  • Uses:

    • popcount module to count active bits in stride_i

    • lzc module to detect the first active bit

  • Computes the next stride by XORing the current with the last stride

  • Asserts valid_o if the stride is valid


Signal Behavior Across Modules

  • vinsn_issue_i is propagated across modules to control EEW behaviors and operand reshuffling.

  • sldu_op_dp interprets the sliding direction (dir_i) and index (slamt_i) to select the output permutation.

  • stride_p2_o controls which element is selected during a stride-slide.

  • All data vectors (op_i, op_o) are organized as elen_t [NrLanes-1:0], allowing lane-based parallel operation.