Vector Dot Product Kernels
Module: Basic Math Functions / Vector Dot Product
Functions
Name | |
---|---|
void | plp_dot_prod_f32p_xpulpv2(void * S) Parallel dot product with interleaved access of 32-bit float vectors kernel for XPULPV2 extension. |
void | plp_dot_prod_f32s_rv32im(const float32_t restrict pSrcA, const float32_t restrict pSrcB, uint32_t blockSize, float32_t *restrict pRes) Glue code for dot product of 32-bit float vectors. |
void | plp_dot_prod_f32s_xpulpv2(const float32_t restrict pSrcA, const float32_t restrict pSrcB, uint32_t blockSize, float32_t *restrict pRes) Glue code for dot product of 32-bit float vectors. |
void | plp_dot_prod_i16s_rv32im(const int16_t restrict pSrcA, const int16_t restrict pSrcB, uint32_t blockSize, int32_t *restrict pRes) Scalar dot product of 16-bit integer vectors kernel for RV32IM extension. |
void | plp_dot_prod_i16s_xpulpv2(const int16_t restrict pSrcA, const int16_t restrict pSrcB, uint32_t blockSize, int32_t *restrict pRes) Vectorized dot product of 16-bit integer vectors singlecore kernel for XPULPV2 extension. |
void | plp_dot_prod_i32p_xpulpv2(void * S) Parallel dot product with interleaved access of 32-bit integer vectors kernel for XPULPV2 extension. |
void | plp_dot_prod_i32s_rv32im(const int32_t restrict pSrcA, const int32_t restrict pSrcB, uint32_t blockSize, int32_t *restrict pRes) Scalar dot product of 32-bit integer vectors kernel for RV32IM extension. |
void | plp_dot_prod_i32s_xpulpv2(const int32_t restrict pSrcA, const int32_t restrict pSrcB, uint32_t blockSize, int32_t *restrict pRes) Scalar dot product of 32-bit integer vectors kernel for XPULPV2 extension. |
void | plp_dot_prod_i8s_rv32im(const int8_t restrict pSrcA, const int8_t restrict pSrcB, uint32_t blockSize, int32_t *restrict pRes) Scalar dot product of 8-bit integer vectors kernel for RV32IM extension. |
void | plp_dot_prod_i8s_xpulpv2(const int8_t restrict pSrcA, const int8_t restrict pSrcB, uint32_t blockSize, int32_t *restrict pRes) Vectorized dot product of 8-bit integer vectors singlecore kernel for XPULPV2 extension. |
void | plp_dot_prod_q16s_rv32im(const int16_t restrict pSrcA, const int16_t restrict pSrcB, uint32_t blockSize, uint32_t deciPoint, int32_t *restrict pRes) Scalar dot product of 16-bit fixed point vectors kernel for RV32IM extension. |
void | plp_dot_prod_q16s_xpulpv2(const int16_t restrict pSrcA, const int16_t restrict pSrcB, uint32_t blockSize, uint32_t deciPoint, int32_t *restrict pRes) Vectorized dot product of 16-bit fixed point vectors singlecore kernel for XPULPV2 extension. |
void | plp_dot_prod_q32p_xpulpv2(void * S) Parallel dot product with interleaved access of 32-bit fixed point vectors kernel for XPULPV2 extension. |
void | plp_dot_prod_q32s_rv32im(const int32_t restrict pSrcA, const int32_t restrict pSrcB, uint32_t blockSize, uint32_t deciPoint, int32_t *restrict pRes) Scalar dot product of 32-bit fixed point vectors kernel for RV32IM extension. |
void | plp_dot_prod_q32s_xpulpv2(const int32_t restrict pSrcA, const int32_t restrict pSrcB, uint32_t blockSize, uint32_t deciPoint, int32_t *restrict pRes) Scalar dot product of 32-bit fixed point vectors kernel for XPULPV2 extension. |
void | plp_dot_prod_q8s_rv32im(const int8_t restrict pSrcA, const int8_t restrict pSrcB, uint32_t blockSize, uint32_t deciPoint, int32_t *restrict pRes) Scalar dot product of 8-bit fixed point vectors kernel for RV32IM extension. |
void | plp_dot_prod_q8s_xpulpv2(const int8_t restrict pSrcA, const int8_t restrict pSrcB, uint32_t blockSize, uint32_t deciPoint, int32_t *restrict pRes) Scalar dot product of 8-bit fixed point vectors singlecore kernel for XPULPV2 extension. |
Detailed Description
Computes the scalar dot product of two vectors. The vectors are multiplied element-by-element and then summed. sum = pSrcA[0]pSrcB[0] + pSrcA[1]pSrcB[1] + ... + pSrcA[blockSize-1]*pSrcB[blockSize-1] There are separate functions for floating-point, int8, int16, and int32 data types. For lower precision integers (int8, int16), functions exploiting SIMD instructions are provided.
The naming of the functions follows the following pattern (for example plp_dot_prod_i32s_rv32im):
data type = {f, i, q} respectively for floats, integers, fixed points
precision = {32, 16, 8} bits
method = {s, p} respectively meaning single core or parallel multicore implementation.
isa extension = rv32im, xpulpv2, etc. of which rv32im is the most general one.
Functions Documentation
function plp_dot_prod_f32p_xpulpv2
void plp_dot_prod_f32p_xpulpv2(
void * S
)
Parallel dot product with interleaved access of 32-bit float vectors kernel for XPULPV2 extension.
Parameters:
- S points to the instance structure for float parallel dot product
Return: none
function plp_dot_prod_f32s_rv32im
void plp_dot_prod_f32s_rv32im(
const float32_t *__restrict__ pSrcA,
const float32_t *__restrict__ pSrcB,
uint32_t blockSize,
float32_t *__restrict__ pRes
)
Glue code for dot product of 32-bit float vectors.
Parameters:
- pSrcA points to the first input vector
- pSrcB points to the second input vector
- blockSize number of samples in each vector
- pRes output result returned here
Return: none
function plp_dot_prod_f32s_xpulpv2
void plp_dot_prod_f32s_xpulpv2(
const float32_t *__restrict__ pSrcA,
const float32_t *__restrict__ pSrcB,
uint32_t blockSize,
float32_t *__restrict__ pRes
)
Glue code for dot product of 32-bit float vectors.
Parameters:
- pSrcA points to the first input vector
- pSrcB points to the second input vector
- blockSize number of samples in each vector
- pRes output result returned here
Return: none
function plp_dot_prod_i16s_rv32im
void plp_dot_prod_i16s_rv32im(
const int16_t *__restrict__ pSrcA,
const int16_t *__restrict__ pSrcB,
uint32_t blockSize,
int32_t *__restrict__ pRes
)
Scalar dot product of 16-bit integer vectors kernel for RV32IM extension.
Parameters:
- pSrcA points to the first input vector [16 bit]
- pSrcB points to the second input vector [16 bit]
- blockSize number of samples in each vector
- pRes output result returned here [32 bit]
Return: none
Par: Exploiting SIMD instructions
When the ISA supports, the 16 bit values are packed two by two into 32 bit vectors and then the two dot products are performed simultaneously on 32 bit vectors, with 32 bit accumulator. RV32IM doesn't support SIMD. For SIMD, check out other ISA extensions (e.g. XPULPV2).
Vectorized dot product of 16-bit integer vectors kernel for RV32IM extension.
function plp_dot_prod_i16s_xpulpv2
void plp_dot_prod_i16s_xpulpv2(
const int16_t *__restrict__ pSrcA,
const int16_t *__restrict__ pSrcB,
uint32_t blockSize,
int32_t *__restrict__ pRes
)
Vectorized dot product of 16-bit integer vectors singlecore kernel for XPULPV2 extension.
Parameters:
- pSrcA points to the first input vector [16 bit]
- pSrcB points to the second input vector [16 bit]
- blockSize number of samples in each vector
- pRes output result returned here [32 bit]
Return: none
Par: Exploiting SIMD instructions
The 16 bit values are packed two by two into 32 bit vectors and then the two dot products are performed simultaneously on 32 bit vectors, with 32 bit accumulator.
Vectorized dot product of 16-bit integer vectors kernel singlecore for XPULPV2 extension.
function plp_dot_prod_i32p_xpulpv2
void plp_dot_prod_i32p_xpulpv2(
void * S
)
Parallel dot product with interleaved access of 32-bit integer vectors kernel for XPULPV2 extension.
Parameters:
- S points to the instance structure for integer parallel dot product
Return: none
function plp_dot_prod_i32s_rv32im
void plp_dot_prod_i32s_rv32im(
const int32_t *__restrict__ pSrcA,
const int32_t *__restrict__ pSrcB,
uint32_t blockSize,
int32_t *__restrict__ pRes
)
Scalar dot product of 32-bit integer vectors kernel for RV32IM extension.
Parameters:
- pSrcA points to the first input vector
- pSrcB points to the second input vector
- blockSize number of samples in each vector
- pRes output result returned here
Return: none
function plp_dot_prod_i32s_xpulpv2
void plp_dot_prod_i32s_xpulpv2(
const int32_t *__restrict__ pSrcA,
const int32_t *__restrict__ pSrcB,
uint32_t blockSize,
int32_t *__restrict__ pRes
)
Scalar dot product of 32-bit integer vectors kernel for XPULPV2 extension.
Parameters:
- pSrcA points to the first input vector
- pSrcB points to the second input vector
- blockSize number of samples in each vector
- pRes output result returned here
Return: none
function plp_dot_prod_i8s_rv32im
void plp_dot_prod_i8s_rv32im(
const int8_t *__restrict__ pSrcA,
const int8_t *__restrict__ pSrcB,
uint32_t blockSize,
int32_t *__restrict__ pRes
)
Scalar dot product of 8-bit integer vectors kernel for RV32IM extension.
Parameters:
- pSrcA points to the first input vector [8] bit]
- pSrcB points to the second input vector [8 bit]
- blockSize number of samples in each vector
- pRes output result returned here [32 bit]
Return: none
Par: Exploiting SIMD instructions
When the ISA supports, the 8 bit values are packed four by four into 32 bit vectors and then the four dot products are performed simultaneously on 32 bit vectors, with 32 bit accumulator. RV32IM doesn't support SIMD. For SIMD, check out other ISA extensions (e.g. XPULPV2).
Vectorized dot product of 8-bit integer vectors kernel for RV32IM extension.
function plp_dot_prod_i8s_xpulpv2
void plp_dot_prod_i8s_xpulpv2(
const int8_t *__restrict__ pSrcA,
const int8_t *__restrict__ pSrcB,
uint32_t blockSize,
int32_t *__restrict__ pRes
)
Vectorized dot product of 8-bit integer vectors singlecore kernel for XPULPV2 extension.
Parameters:
- pSrcA points to the first input vector [8 bit]
- pSrcB points to the second input vector [8 bit]
- blockSize number of samples in each vector
- pRes output result returned here [32 bit]
Return: none
Par: Exploiting SIMD instructions
The 8 bit values are packed four by four into 32 bit vectors and then the four dot products are performed on 32 bit vectors, with 32 bit accumulator.
function plp_dot_prod_q16s_rv32im
void plp_dot_prod_q16s_rv32im(
const int16_t *__restrict__ pSrcA,
const int16_t *__restrict__ pSrcB,
uint32_t blockSize,
uint32_t deciPoint,
int32_t *__restrict__ pRes
)
Scalar dot product of 16-bit fixed point vectors kernel for RV32IM extension.
Parameters:
- pSrcA points to the first input vector [16 bit]
- pSrcB points to the second input vector [16 bit]
- blockSize number of samples in each vector
- deciPoint decimal point for right shift
- pRes output result returned here [32 bit]
Return: none
Par: Exploiting SIMD instructions
When the ISA supports, the 16 bit values are packed two by two into 32 bit vectors and then the two dot products are performed simultaneously on 32 bit vectors, with 32 bit accumulator. RV32IM doesn't support SIMD. For SIMD, check out other ISA extensions (e.g. XPULPV2).
function plp_dot_prod_q16s_xpulpv2
void plp_dot_prod_q16s_xpulpv2(
const int16_t *__restrict__ pSrcA,
const int16_t *__restrict__ pSrcB,
uint32_t blockSize,
uint32_t deciPoint,
int32_t *__restrict__ pRes
)
Vectorized dot product of 16-bit fixed point vectors singlecore kernel for XPULPV2 extension.
Parameters:
- pSrcA points to the first input vector [16 bit]
- pSrcB points to the second input vector [16 bit]
- blockSize number of samples in each vector
- deciPoint decimal point for right shift
- pRes output result returned here [32 bit]
Return: none
Par: Exploiting SIMD instructions
The 16 bit values are packed two by two into 32 bit vectors and then the two dot products are performed simultaneously on 32 bit vectors, with 32 bit accumulator.
function plp_dot_prod_q32p_xpulpv2
void plp_dot_prod_q32p_xpulpv2(
void * S
)
Parallel dot product with interleaved access of 32-bit fixed point vectors kernel for XPULPV2 extension.
Parameters:
- S points to the instance structure for fixed point parallel dot product
Return: none
function plp_dot_prod_q32s_rv32im
void plp_dot_prod_q32s_rv32im(
const int32_t *__restrict__ pSrcA,
const int32_t *__restrict__ pSrcB,
uint32_t blockSize,
uint32_t deciPoint,
int32_t *__restrict__ pRes
)
Scalar dot product of 32-bit fixed point vectors kernel for RV32IM extension.
Parameters:
- pSrcA points to the first input vector
- pSrcB points to the second input vector
- blockSize number of samples in each vector
- deciPoint decimal point for right shift
- pRes output result returned here
Return: none
function plp_dot_prod_q32s_xpulpv2
void plp_dot_prod_q32s_xpulpv2(
const int32_t *__restrict__ pSrcA,
const int32_t *__restrict__ pSrcB,
uint32_t blockSize,
uint32_t deciPoint,
int32_t *__restrict__ pRes
)
Scalar dot product of 32-bit fixed point vectors kernel for XPULPV2 extension.
Parameters:
- pSrcA points to the first input vector
- pSrcB points to the second input vector
- blockSize number of samples in each vector
- deciPoint decimal point for right shift
- pRes output result returned here
Return: none
function plp_dot_prod_q8s_rv32im
void plp_dot_prod_q8s_rv32im(
const int8_t *__restrict__ pSrcA,
const int8_t *__restrict__ pSrcB,
uint32_t blockSize,
uint32_t deciPoint,
int32_t *__restrict__ pRes
)
Scalar dot product of 8-bit fixed point vectors kernel for RV32IM extension.
Parameters:
- pSrcA points to the first input vector [8 bit]
- pSrcB points to the second input vector [8 bit]
- blockSize number of samples in each vector
- deciPoint decimal point for right shift
- pRes output result returned here [32 bit]
Return: none
Par: Exploiting SIMD instructions
When the ISA supports, the 8 bit values are packed four by four into 32 bit vectors and then the four dot products are performed simultaneously on 32 bit vectors, with 32 bit accumulator. RV32IM doesn't support SIMD. For SIMD, check out other ISA extensions (e.g. XPULPV2).
function plp_dot_prod_q8s_xpulpv2
void plp_dot_prod_q8s_xpulpv2(
const int8_t *__restrict__ pSrcA,
const int8_t *__restrict__ pSrcB,
uint32_t blockSize,
uint32_t deciPoint,
int32_t *__restrict__ pRes
)
Scalar dot product of 8-bit fixed point vectors singlecore kernel for XPULPV2 extension.
Parameters:
- pSrcA points to the first input vector [8 bit]
- pSrcB points to the second input vector [8 bit]
- blockSize number of samples in each vector
- deciPoint decimal point for right shift
- pRes output result returned here [32 bit]
Return: none
Par: Exploiting SIMD instructions
The 8 bit values are packed four by four into 32 bit vectors and then the four dot products are performed on 32 bit vectors, with 32 bit accumulator.
Updated on 2023-03-01 at 16:16:32 +0000