simd_div — Ara’s in-lane SIMD divider

Overview

The simd_div module implements Ara’s Serial Divider, designed to execute vector division and remainder operations element-by-element. The unit supports signed and unsigned divisions for different vector element widths (8, 16, 32, 64 bits) but operates serially — processing one element at a time using a backend divider (serdiv). Its serial nature restricts intra-vector parallelism but simplifies design complexity and area.

This module is parameterized with CVA6 configuration settings and is tailored for use within Ara’s scalar/vector datapath execution pipeline.


Interface Description

Port

Width

Direction

Description

clk_i

1

input

Clock signal

rst_ni

1

input

Active-low synchronous reset

operand_a_i

elen_t

input

Dividend operand vector element

operand_b_i

elen_t

input

Divisor operand vector element

mask_i

strb_t

input

Mask bits for each byte

op_i

ara_op_e

input

Operation type: VDIV, VREM, VDIVU, VREMU

be_i

strb_t

input

Byte-enable per vector element

vew_i

vew_e

input

Vector element width (VEW): EW8, EW16, etc.

result_o

elen_t

output

Final division result vector element

mask_o

strb_t

output

Output mask signal

valid_i

1

input

New valid input transaction

ready_o

1

output

Module ready to accept input

ready_i

1

input

Downstream ready for result

valid_o

1

output

Output is valid


Module Structure and Key Components

FSMs: Issue and Commit Control Units

  • issue_state_q/d (FSM):

    • Accepts operands from upstream.

    • Tracks issued bytes.

    • Serially sends one operand pair to the divider.

  • commit_state_q/d (FSM):

    • Collects results from the divider.

    • Buffers and shifts them into output.

    • Drives valid_o when the entire result is ready.

Operands and Control Buffers

  • Input operands opa_q, opb_q and their staging versions opa_d, opb_d.

  • Opcode and vector element width held in op_q, vew_q.

Counters

  • issue_cnt_q/d: How many elements still to be issued.

  • commit_cnt_q/d: How many results still to be committed.

  • Both counters decrement as each element is processed or skipped (masked off).

Divider Core

  • Uses the serdiv instance, a serial divider supporting signed/unsigned division and remainder.

  • Supported Opcodes:

    • VDIV – signed division

    • VDIVU – unsigned division

    • VREM – signed remainder

    • VREMU – unsigned remainder

Operand Width Handling

Each element width (VEW) has a specialized operand unpacking logic:

  • EW8: 8-bit → Sign-extended to 64-bit.

  • EW16: 16-bit → Sign-extended to 64-bit.

  • EW32: 32-bit → Sign-extended to 64-bit.

  • EW64: Already native 64-bit.

These are extracted from the operand unions and sign-extended (for signed ops).

Output Construction

  • Partial results are shifted into the final result_q.

  • Results are masked and merged based on the current element width and byte enables.


Timing and Pipeline

  • Fully serialized pipeline.

  • Accepts new input only when the previous result is fully committed.

  • Maintains FSM state and stable operand/context throughout.