lane_sequencer — Set up the in-lane operations

Overview

The lane_sequencer module in Ara coordinates the execution of vector instructions within an individual lane. It acts as a local micro-sequencer for a single lane, interpreting commands from the main sequencer, dispatching operand requests, managing operand queues, and issuing operations to local functional units such as ALU, MFPU, and Mask Unit.

This document breaks down the functionality into sections, each focusing on a specific group of responsibilities or sub-components of the module.


Parameters

  • NrLanes: Number of lanes in the vector unit.

  • pe_req_t, pe_resp_t: Packed types for communication with the main sequencer.

  • operand_request_cmd_t: Structure for operand requests.

  • vfu_operation_t: Packed structure for vector functional unit operation control.


1. Main Sequencer Interface

The interface with the main sequencer involves:

  • Handshake mechanism (pe_req_valid_i, pe_req_ready_o)

  • A mechanism to avoid re-sampling an already seen instruction using a combination of:

    • last_id_q

    • en_sync_mask_q

Key Functionality

  • Instruction ID tracking to prevent double-sampling

  • Register-based handshake with fall_through_register instance

  • Conditional request masking based on synchronization and instruction validity


2. Operand Request Queues

Each lane manages several operand request queues, which are not simple FIFOs due to the need for:

  • Hazard tracking

  • Fine-grained operand reuse or bypassing

Mechanism

  • A set of operand_request and operand_request_valid signals per operand queue

  • Update logic that resets, flushes, or pushes requests as needed

Notable Logic

  • Upon memory exceptions (lsu_ex_flush_o), select queues are flushed.


3. VRGATHER FSM (Finite State Machine)

Manages VRGATHER/VCOMPRESS operand scheduling using:

  • spill_register for buffering incoming vrgat_req_t transactions

  • FSM with two states: IDLE and REQUESTING

  • Counter vrgat_cmd_req_cnt_q to track the number of outstanding requests

Coordination

Ensures MaskB operand queue isn’t double-booked, and only services requests when not full.


4. Operand Request Dispatch Logic

A massive combinational block prepares operand requests depending on the current operation. Vector instructions are categorized by their VFU:

  • ALU operations

  • Floating point via MFPU

  • Load/Store

  • Slide operations

  • Mask logic or comparisons

  • Special (non-standard) requests

Each VFU type causes specific requests to be sent to matching operand queues with:

  • Proper vector element width (eew)

  • Vector length (vl)

  • Start index (vstart)

  • Hazard flags

This block also defines how VL is distributed across lanes and balances load when VL is not divisible by lane count.


5. VFUs Operation Dispatch

Issues operations to the appropriate VFU (ALU, MFPU, Mask Unit) using vfu_operation_o and vfu_operation_valid_o.

Highlights

  • Determines correct VFU based on instruction type

  • Ensures operations are balanced per-lane

  • Prevents spurious instructions by validating VL and vector enable masks


6. Instruction Bookkeeping

Bookkeeping logic tracks which instructions are running (vinsn_running_q) and which are completed (vinsn_done_q).

Responsibilities

  • Ensures instructions are only started once

  • Marks completion using signals from ALU and MFPU

  • Notifies the main sequencer via pe_resp_o.vinsn_done


7. Synchronous and Asynchronous State Updates

All state is registered with clock and reset to ensure correct FSM behavior and sequential pipeline consistency.


Conclusion

The lane_sequencer is a central piece of the Ara vector processor responsible for:

  • Correctly dispatching instruction executions

  • Coordinating operand fetching

  • Managing control signals between sequencer, operand queues, and VFUs

  • Supporting special masking and vector gathering behavior

Its design is highly modular and ready for extension for additional VFUs or operand features.