vstu: Ara Vector Store Unit
The vstu module implements the vector store unit of the Ara RISC-V vector processor. It is responsible for generating and issuing memory write (store) operations via the AXI W and B channels. It consumes vector operands from the vector lanes, aligns and masks them, and dispatches them as AXI bursts to memory. It supports vector masking, vector start (vstart) handling, and exception processing in case of illegal transactions or MMU faults.
Contents
Module Parameters
Parameter |
Description |
|---|---|
|
Number of vector lanes. |
|
Maximum vector length (elements). |
|
Addressing type for vector register file. |
|
Types for PE request/response. |
|
AXI write data width (W channel). |
|
AXI address width. |
|
AXI W and B channel types. |
Interface Overview
Inputs
pe_req_i,pe_req_valid_i,pe_vinsn_running_i: Vector instruction from the PE.axi_addrgen_req_i,axi_addrgen_req_valid_i: AXI address and burst info from address generator.stu_operand_i,stu_operand_valid_i: Operand data and valid signals from each lane.mask_i,mask_valid_i: Byte-wise mask from the Mask Unit.axi_w_ready_i,axi_b_valid_i,axi_b_i: AXI handshake and write response signals.
Outputs
axi_w_o,axi_w_valid_o: Write payload to AXI W channel.axi_b_ready_o: Write response acknowledgment.stu_operand_ready_o: Ready signals to lanes for operand acceptance.mask_ready_o: Mask consumption indicator.store_pending_o,store_complete_o: Store state indicators.axi_addrgen_req_ready_o: Handshake with address generator.stu_current_burst_exception_o: Store exception notifier.
Key Functional Blocks
The VSTU can change the byte layout of the vector registers on-the-fly and does not usually require reshuffles.
Vector Instruction Queue
A FIFO queue stores incoming instructions and tracks three execution stages:
Accept: Instruction is accepted and stored.
Issue: Instruction is issued to AXI and operand lanes.
Commit: Instruction waits for AXI
bresponse.
Pointers and counters (accept_pnt, issue_pnt, commit_pnt, issue_cnt, commit_cnt) track instruction flow through these phases.
Operand Registers
Each lane has a spill register to buffer operand data. Flushable on lsu_ex_flush_i.
Mask Registers
Byte-wise mask signals are buffered using flushable registers. Used for element-wise masking when vm=0.
AXI Write Logic
Operand Check: Ensures all operands and masks are valid.
Byte Mapping: Converts vector lane data into AXI word-aligned format using
shuffle_index.AXI Beat Formation: Constructs and sends AXI W payloads with
strbindicating valid bytes.Beat Completion: Monitors burst length and prepares next instruction or beat.
Byte Validity Logic
Determines the effective byte count from:
vstart offset
mask enablement
lane alignment
AXI burst alignment
Instruction Issuing
Handles issuing multiple micro-ops per vector store using:
issue_cnt_bytes_q: Tracks bytes left.axi_len_q,vrf_pnt_q: AXI and VRF pointers.vinsn_running_q: Tracks active instructions.
B Channel Handling
Upon receiving a valid response (axi_b_valid_i), the unit:
Acknowledges the AXI response
Updates commit pointer and counters
Signals store completion to dispatcher
Exception Handling
Catches and handles:
Address generation faults
MMU-related exceptions
Store permission violations
Flushes affected instructions if they’re the only pending ones and transitions the state accordingly.