Instruction Set Extensions

For efficient execution we have defined a number of custom instructions. This document gives a brief overview of their encoding.

"Xssr" Extension for Stream Semantic Registers

The "Xssr" extension assigns stream semantics to a handful of the processor's registers. If enabled, reading and writing these registers translates into corresponding memory reads and writes. The addresses for these memory accesses are derived from a hardware address generator.

Configuration Register Operations

imm[11:5]	imm[4:0]	rs1	funct3	rd	opcode	operation
7	5	5	3	5	7
reg	ssr	00000	001	dest	OP-CUSTOM1	SCFGRI
reg	ssr	value	010	00000	OP-CUSTOM1	SCFGWI

SCFGRI and SCFGWI read and write a value from or to an SSR configuration register. The immediate argument reg specifies the index of the register, ssr specifies which SSR should be accessed. SCFGRI places the read value in rd. SCFGWI moves the value in rs1 to the selected SSR configuration register.

funct7	rs2	rs1	funct3	rd	opcode	operation
7	5	5	3	5	7
0000000	addr	00001	001	dest	OP-CUSTOM1	SCFGR
0000000	addr	value	010	00001	OP-CUSTOM1	SCFGW

SCFGR and SCFGW read and write a value from or to an SSR configuration register. The value in register rs2 specifies specifies the address of the register as follows: bits 4 to 0 correspond to ssr and indicate the SSR to be used, and the bits 11 to 5 correspond to reg and indicate the index of the register. SCFGR places the read value in rd. SCFGW moves the value in rs1 to the selected SSR configuration register.

"Xfrep" Extension for Floating-Point Repetition

With the "Xfrep" extension we can automatically repeat a sequence of instructions without the need to manage a loop in software, issuing the instructions from Snitch's FPU sequencer buffer. This has a number of benefits, including alleviating the pressure on the I$ bandwidth. Furthermore, after the first iteration the instructions can be fetched from the FPU sequencer, which has a lower energy access cost than the L1 I$.

The FREP instruction has the following signature:

imm1	rs1	imm2	imm3	is_outer	opcode	operation
12	5	3	4	1	7
max_inst	max_rpt	stagger_max	stagger_mask	0	OP-CUSTOM1	FREP.I
max_inst	max_rpt	stagger_max	stagger_mask	1	OP-CUSTOM1	FREP.O

FREP.I and FREP.O repeat the max_inst + 1 instructions following the FREP instruction for max_rpt + 1 times. The FREP.I instruction (I stands for inner) repeats every instruction the specified number of times and moves on to executing and repeating the next. The FREP.O instruction (O stands for outer) repeats the whole sequence of instructions max_rpt + 1 times. Register staggering can be enabled and configured via the stagger_mask and stagger_max immediates. A detailed explanation of their use can be found in the Snitch paper.

The assembly instruction signature follows:

frep.i   rs1, imm1, imm2, imm3

"Xdma" Extension for Asynchronous Data Movement

The "Xdma" extension provides custom instructions to control an asynchronous data movement engine tightly coupled to the processor core.

Address Operations

funct7	rs2	rs1	funct3	rd	opcode	operation
7	5	5	3	5	7
0000000	ptrhi	ptrlo	000	00000	OP-CUSTOM1	DMSRC
0000001	ptrhi	ptrlo	000	00000	OP-CUSTOM1	DMDST

DMSRC and DMDST specify the source and destination address pointers for the next data movement operation. The arguments ptrhi and ptrlo are truncated to 32-bit values, and concatenated to form a 64-bit value, and truncated to PLEN.

Stride Operations

funct7	rs2	rs1	funct3	rd	opcode	operation
7	5	5	3	5	7
0000110	dststrd	srcstrd	000	00000	OP-CUSTOM1	DMSTR
0000111	00000	reps	000	00000	OP-CUSTOM1	DMREP

DMSTRD configures the stride for two-dimensional transfers. The value in registers rs1 and rs2 are sign-extended to PLEN and configured as the source and destination stride, respectively. After each transfer of the innermost dimension, the strides are added to the respective address pointers.

DMREPS configures the value in register rs1 as the size of the outer dimension for two-dimensional transfers.

Control Operations

funct7	rs2	rs1	funct3	rd	opcode	operation
7	5	5	3	5	7
0000011	config	size	000	dest	OP-CUSTOM1	DMCPY
0000101	status	00000	000	dest	OP-CUSTOM1	DMSTAT

funct7	imm5	rs1	funct3	rd	opcode	operation
7	5	5	3	5	7
0000010	config	size	000	dest	OP-CUSTOM1	DMCPYI
0000100	status	00000	000	dest	OP-CUSTOM1	DMSTATI

DMCPY and DMCPYI initiate an asynchronous data movement with the parameters configured by the previous DM instructions. A transfer id is placed in register rd, which is necessary to later check for transfer completion. size contains the number of consecutive bytes to transfer. For multi-dimensional transfers this is the size of the innermost dimension. config* determines the following parameters of the transfer:

Bits	Value	Description
config[0]	decouple_rw	Decouple the handshakes of the read and write channels
config[1]	enable_2d	Enable two-dimensional transfer
config[4:2]	channel_sel	Selects the DMA backend if a multi-channel DMA is used

DMSTAT and DMSTATI place the selected status flag of the DMA into register rd. The following status flags are supported:

status	Name	Description
0	completed_id	Id of last completed transfer
1	next_id	Id allocated to the next transfer
2	busy	At least one transfer in progress
3	would_block	Next DMCPY[I] blocks (transfer queue full)

The DMSTATI instruction can be used to implement a blocking wait for the completion of a specific DMA transfer:

    dmcpyi a0, ...
1:  dmstati t0, 0
    bltu t0, a0, 1b

Similarly, waiting for the completion of all DMA transfers:

1:  dmstati t0, 2
    bnez t0, zero, 1b