Skip to content

Instruction Set Extensions

For efficient execution we have defined a number of custom instructions. This document gives a brief overview of their encoding.

"Xssr" Extension for Stream Semantic Registers

The "Xssr" extension assigns stream semantics to a handful of the processor's registers. If enabled, reading and writing these registers translates into corresponding memory reads and writes. The addresses for these memory accesses are derived from a hardware address generator.

Configuration Register Operations

imm[11:5] imm[4:0] rs1 funct3 rd opcode operation
7 5 5 3 5 7
reg ssr 00000 001 dest OP-CUSTOM1 SCFGRI
reg ssr value 010 00000 OP-CUSTOM1 SCFGWI

SCFGRI and SCFGWI read and write a value from or to an SSR configuration register. The immediate argument reg specifies the index of the register, ssr specifies which SSR should be accessed. SCFGRI places the read value in rd. SCFGWI moves the value in rs1 to the selected SSR configuration register.

funct7 rs2 rs1 funct3 rd opcode operation
7 5 5 3 5 7
0000000 addr 00001 001 dest OP-CUSTOM1 SCFGR
0000000 addr value 010 00001 OP-CUSTOM1 SCFGW

SCFGR and SCFGW read and write a value from or to an SSR configuration register. The value in register rs2 specifies specifies the address of the register as follows: bits 4 to 0 correspond to ssr and indicate the SSR to be used, and the bits 11 to 5 correspond to reg and indicate the index of the register. SCFGR places the read value in rd. SCFGW moves the value in rs1 to the selected SSR configuration register.

"Xfrep" Extension for Floating-Point Repetition

With the "Xfrep" extension we can automatically repeat a sequence of instructions without the need to manage a loop in software, issuing the instructions from Snitch's FPU sequencer buffer. This has a number of benefits, including alleviating the pressure on the I$ bandwidth. Furthermore, after the first iteration the instructions can be fetched from the FPU sequencer, which has a lower energy access cost than the L1 I$.

The FREP instruction has the following signature:

imm1 rs1 imm2 imm3 is_outer opcode operation
12 5 3 4 1 7
max_inst max_rpt stagger_max stagger_mask 0 OP-CUSTOM1 FREP.I
max_inst max_rpt stagger_max stagger_mask 1 OP-CUSTOM1 FREP.O

FREP.I and FREP.O repeat the max_inst + 1 instructions following the FREP instruction for max_rpt + 1 times. The FREP.I instruction (I stands for inner) repeats every instruction the specified number of times and moves on to executing and repeating the next. The FREP.O instruction (O stands for outer) repeats the whole sequence of instructions max_rpt + 1 times. Register staggering can be enabled and configured via the stagger_mask and stagger_max immediates. A detailed explanation of their use can be found in the Snitch paper.

The assembly instruction signature follows:

frep.i   rs1, imm1, imm2, imm3

"Xdma" Extension for Asynchronous Data Movement

The "Xdma" extension provides custom instructions to control an asynchronous data movement engine tightly coupled to the processor core.

Address Operations

funct7 rs2 rs1 funct3 rd opcode operation
7 5 5 3 5 7
0000000 ptrhi ptrlo 000 00000 OP-CUSTOM1 DMSRC
0000001 ptrhi ptrlo 000 00000 OP-CUSTOM1 DMDST

DMSRC and DMDST specify the source and destination address pointers for the next data movement operation. The arguments ptrhi and ptrlo are truncated to 32-bit values, and concatenated to form a 64-bit value, and truncated to PLEN.

Stride Operations

funct7 rs2 rs1 funct3 rd opcode operation
7 5 5 3 5 7
0000110 dststrd srcstrd 000 00000 OP-CUSTOM1 DMSTR
0000111 00000 reps 000 00000 OP-CUSTOM1 DMREP

DMSTRD configures the stride for two-dimensional transfers. The value in registers rs1 and rs2 are sign-extended to PLEN and configured as the source and destination stride, respectively. After each transfer of the innermost dimension, the strides are added to the respective address pointers.

DMREPS configures the value in register rs1 as the size of the outer dimension for two-dimensional transfers.

Control Operations

funct7 rs2 rs1 funct3 rd opcode operation
7 5 5 3 5 7
0000011 config size 000 dest OP-CUSTOM1 DMCPY
0000101 status 00000 000 dest OP-CUSTOM1 DMSTAT
funct7 imm5 rs1 funct3 rd opcode operation
7 5 5 3 5 7
0000010 config size 000 dest OP-CUSTOM1 DMCPYI
0000100 status 00000 000 dest OP-CUSTOM1 DMSTATI

DMCPY and DMCPYI initiate an asynchronous data movement with the parameters configured by the previous DM instructions. A transfer id is placed in register rd, which is necessary to later check for transfer completion. size contains the number of consecutive bytes to transfer. For multi-dimensional transfers this is the size of the innermost dimension. config* determines the following parameters of the transfer:

Bits Value Description
config[0] decouple_rw Decouple the handshakes of the read and write channels
config[1] enable_2d Enable two-dimensional transfer
config[4:2] channel_sel Selects the DMA backend if a multi-channel DMA is used

DMSTAT and DMSTATI place the selected status flag of the DMA into register rd. The following status flags are supported:

status Name Description
0 completed_id Id of last completed transfer
1 next_id Id allocated to the next transfer
2 busy At least one transfer in progress
3 would_block Next DMCPY[I] blocks (transfer queue full)

The DMSTATI instruction can be used to implement a blocking wait for the completion of a specific DMA transfer:

    dmcpyi a0, ...
1:  dmstati t0, 0
    bltu t0, a0, 1b

Similarly, waiting for the completion of all DMA transfers:

1:  dmstati t0, 2
    bnez t0, zero, 1b