ara_sequencer
— Instruction sequencer and macro dependency check
Overview
The ara_sequencer
is a central control module in Ara’s vector processor that manages instruction dispatching and execution synchronization across its parallel processing elements (PEs). It ensures correct ordering and dependency resolution for vector instructions, tracks the state of each instruction in-flight, and handles hazards and stalling due to resource constraints.
Key Features
Tracks running vector instructions and their mapping to PEs
Maintains a global hazard table for dependency management
Calculates start and end lanes for operand access
Arbitrates instruction issuance based on operand readiness and structural hazards
Interfaces with CVA6 via a tokenized valid/ready protocol
Handles load/store sequencing and exception propagation
Supports masked vector operations and precise scalar forwarding
Interface Description
Inputs
clk_i
,rst_ni
: Standard clock/resetara_req_i
,ara_req_valid_i
: Instruction request from dispatcherpe_req_ready_i
,pe_resp_i
: PE readiness and response signalsalu_vinsn_done_i
,mfpu_vinsn_done_i
: Completion signals from specific FU typesaddrgen_ack_i
,addrgen_exception_i
,addrgen_exception_vstart_i
,addrgen_fof_exception_i
: Address generator and exception interfacespe_scalar_resp_i
,pe_scalar_resp_valid_i
: Scalar value return for scalar-result instructions
Outputs
ara_req_ready_o
,ara_resp_o
,ara_resp_valid_o
: Request response handshakepe_req_o
,pe_req_valid_o
: Instruction issued to PEsglobal_hazard_table_o
: Dependency matrix broadcast to operand requestersara_idle_o
: High when no instruction is in-flightpe_scalar_resp_ready_o
: Ready signal for scalar result
Main Components
Instruction State Tracking
pe_vinsn_running_q
: Bitmap showing which PE is executing which instructionvinsn_running_q
: Aggregated bitmap indicating if any instruction is live. This signal is extremely useful for debugvinsn_id_n
: Allocated ID for the next instruction using LZC
Hazard Management
RAW, WAR, WAW hazards computed against
read_list_q
andwrite_list_q
global_hazard_table_o
updated with current hazard vectorsEnforces correct serialization and prevents premature execution
Start/End Lane Calculation
Derives which lanes will produce the first and last valid elements
Based on
vstart
,vl
, andvsew
Important for operand alignment and masking
Issuance Arbitration
FSM with
IDLE
andWAIT
statesUses counters per VFU to throttle instruction dispatch
“Gold ticket” system ensures stalled-but-accounted instructions are not blocked
Functional Unit Interface
Identifies target VFU for each instruction
Uses
target_vfus()
function to map to ALU, MFPU, SLDU, MASKU, etc.Only issues when operand requesters and FU queues are ready
Special Features
Slide unit constraints handled to avoid chaining issues
Handles scalar results with mask unit coordination
Exception signaling for burst and address-related faults
Provides synchronization to CVA6 (via token and response logic)
Instruction Flow
Dispatcher issues instruction to sequencer.
Sequencer:
Allocates ID
Checks for hazards
Builds request (
pe_req_d
)Calculates start/end lanes
Evaluates VFU counters
If resources available:
Issues request
Updates global hazard table and instruction trackers
Enters
WAIT
if instruction needs scalar return or memory ackOnce response is received or exception detected, returns to
IDLE
.
FSM States
IDLE: Default state; waits for instruction or handles stalls from the lanes’ operand requesters.
WAIT: Holding state for memory/scalar responses.
Dependency tracking and chaining
Dependencies are tracked per instruction, so that chaining can be implemented at vector-element level.
The sequencer only knows which instruction depends on which other instruction, and assign special “hazard” signals to each instruction before issuing it to the units. Every instruction keeps hazard metadata per operand register, so that it is clear upon which instruction every operand register depends.
Chaining is implemented in each lane, during operand fetch. Every dependency (RAW, WAR, WAW) on a specific register will throttle the source operand fetch from the VRF. This throttling is controlled by the write throughput of the instruction that generated the dependency.
RAW example:
vld v0, addr
vadd v1, v0, v0
When executing the vadd
(vld
is executing in parallel), a lane will fetch the next element from v0
only if vld
has written one element first. This control is a credit-based system with a depth of one element only. Therefore, if vld
writes 5 elements, the vadd
only registers one credit for a read.
WAR and WAW hazards are handled in the same way.
WAR example:
vmul v2, v1, v1
vadd v1, v0, v0
Also in this case, vadd
will be able to fetch from v0
only when vmul
has written into v2
. This works because if source operands are chained, destination operands are also correctly ordered.
As soon as one instruction that causes a dependency is completes execution, the scoreboard is cleared and the second instruction will be allowed to fetch operands without restrictions.
This works as long as:
The second instruction has source operands from the VRF. For example, WAR and WAW stall loads, which would not be able to chain with this mechanism.
The first instruction actually writes something into the VRF. Therefore, WAR on store instructions stalls the second instruction until the first one has not completed.
Instruction Issue
The sequencer keeps an instruction counter per functional unit to track how many instructions are in-flight and stall instruction issue whenever the next target functional unit’s instruction queue is already full.
A new instruction bumps up the respective counter, and a completed instruction bumps it down.
Since, for timing reasons, instructions flow into the sequencer and bump the respective counter without waiting to be issues, counters can also go beyond their maximum capacity for one cycle. This event is registered through a gold ticket assigned to the instruction, which basically implies that the instruction was already registered by the respective counter. As soon as the counter returns to its maximum capacity (this happens when an instruction is finishes execution in the respective unit), the gold ticket allows the stalled instruction to proceed.
Physical Considerations
vinsn_queue_ready
: Derived from counter depth per FUstall_lanes_desynch
: Ensures lane-0 aligned counters for ALU/MFPUglobal_hazard_table_d
: Matrix [NrVInsn][NrVInsn] with sparse update logicCareful pipeline management to support exception-aware issuing