ara_dispatcher — Vector Instruction Decoder and Issuer

The ara_dispatcher is the central instruction decoder and legality checker for Ara’s RISC-V vector unit. It receives instructions from the scalar core (CVA6) via the acc_req_i interface and dispatches well-formed vector requests to Ara’s backend using the ara_req_o interface.


Role in Ara

  • Decodes RISC-V vector instructions (RVV) from scalar core

  • Validates legality based on LMUL, SEW, CSR, segment loads/stores, and support fflags

  • Manages control and status registers (CSRs) for VL, VTYPE, VSTART, VXRM, and VXSAT

  • Handles load/store reshuffling to maintain consistent EEW across register groups

  • Issues vector requests via ara_req_o and coordinates responses via ara_resp_valid


Interface

Input Ports

Signal

Width

Description

clk_i

1

Clock input

rst_ni

1

Active-low reset

acc_req_i

struct

Incoming request from scalar core

ara_req_ready_i

1

Back-end ready to receive vector request

ara_resp_valid

1

Back-end has completed a request

ara_resp

struct

Response metadata from Ara

ara_idle_i

1

Ara is idle, ready to accept new instructions

load_complete_i

1

Vector load completed

store_complete_i

1

Vector store completed

Output Ports

Signal

Width

Description

acc_resp_o

struct

Response back to scalar core

ara_req_valid_o

1

Ara request is valid

ara_req_o

struct

Decoded vector request

pending_seg_mem_op_o

1

Pending segment memory operation tracker


FSM States

  • IDLE — Waiting for valid vector instructions

  • WAIT_IDLE — Waiting for Ara to become idle (CSR ops)

  • WAIT_IDLE_FLUSH — Flushes vector state after exceptions

  • RESHUFFLE — Triggers register reshuffling before execution


Internal Concepts

CSR Registers

  • csr_vl_q, csr_vtype_q, csr_vstart_q — Active state of vector CSRs

  • csr_vxrm_q, csr_vxsat_q — Fixed-point rounding/saturation

EEW Tracking

In Ara, every vector register is encoded with a byte layout that forces consecutive vector elements into consecutive lanes (i.e., element 0 in lane 0, element 1 in lane 1, end so on). This means that a vector interpreted with a different element width will require a byte layout reshuffling to enforce consecutive vector elements in consecutive lanes.

  • eew_q[0..31] stores Element Effective Width for each vreg. This is basically the byte layout encoding of every vector register

  • Updated upon successful dispatch of instructions

Reshuffling

When a vector register needs to be re-interpreted with a different byte encoding, the Ara’s Dispatcher injects slide micro-operations to reshuffle the vector register’s byte layout.

  • Needed if same register used with different EEW

  • Controlled by reshuffle_req_d[2:0] for vs1, vs2, vd

  • Buffering via eew_old_buffer_d, eew_new_buffer_d, etc.


Interface with CVA6

Vector instructions are dispatched from CVA6 to Ara when they have reached the top of CVA6’s scoreboard, i.e., when they are no more speculative and can be committed from CVA6’s perspective.

Ara’s dispatcher handshakes the request (and returns a response) if exceptions cannot happen for that instruction or if exceptions are immediately raised during decoding.

For example, arithmetic instructions can raise exceptions only during decoding. Thus, the answer to CVA6 is really fast (1 cycle).

Memory operations can raise errors on the memory bus or exceptions during virtual-to-physical translation. Therefore, memory instructions freeze the dispatcher until the VLSU has reported back an exception or the absence of it. This process requires more than 1 cycle.


Instruction Decoding

Instructions are decoded based on RVV encoding using extracted fields:

  • vmem_type, varith_type, etc.

  • mop, nf, vm, rs1, rs2, rd, mew, width


Memory Operation Handling

  • Load Types: VLE, VLSE, VLXE, VLVX

  • Store Types: VSE, VSSE, VSXE, VSVX

  • Unit-stride, strided, indexed, and whole-register

  • Segment operations detected if nf != 0


Illegal Instruction Checks

Illegal cases include:

  • Illegal operand registers given the current SEW, LMUL state

  • EMUL × NF > 8

  • Access beyond register 31

  • Inconsistent EEW across a register group

  • Disallowed CSR writes or invalid opcodes

  • Fixed-point ops without hardware support

  • Floating-point ops (e.g., VFREC7) without FPExt support


Reshuffling Flow

  • Triggered by EEW mismatch for reused vector registers

  • Masked out if same register appears in multiple operand slots

  • FSM state switches to RESHUFFLE, issues internal reshuffle ops

  • Once reshuffling is complete, instruction is re-issued


CSR Handling

All CSR access instructions (e.g., csrrw, csrrs, csrrc, and immediate variants) are handled.

  • Only vstart, vxrm, vxsat are writable

  • vl, vtype, vlenb are read-only

  • Illegal accesses cause exception


Zero VL Behavior

If vl = 0, most instructions are treated as NOPs.

  • Some exceptions (whole-reg ops, special instructions)

  • Response is generated with req_ready and resp_valid set

  • Ensures scalar pipeline doesn’t stall