Vector Dot Product Kernels

Module: Basic Math Functions / Vector Dot Product

Functions

	Name
void	plp_dot_prod_f32p_xpulpv2(void * S) Parallel dot product with interleaved access of 32-bit float vectors kernel for XPULPV2 extension.
void	plp_dot_prod_f32s_rv32im(const float32_t restrict pSrcA, const float32_t restrict pSrcB, uint32_t blockSize, float32_t *restrict pRes) Glue code for dot product of 32-bit float vectors.
void	plp_dot_prod_f32s_xpulpv2(const float32_t restrict pSrcA, const float32_t restrict pSrcB, uint32_t blockSize, float32_t *restrict pRes) Glue code for dot product of 32-bit float vectors.
void	plp_dot_prod_i16s_rv32im(const int16_t restrict pSrcA, const int16_t restrict pSrcB, uint32_t blockSize, int32_t *restrict pRes) Scalar dot product of 16-bit integer vectors kernel for RV32IM extension.
void	plp_dot_prod_i16s_xpulpv2(const int16_t restrict pSrcA, const int16_t restrict pSrcB, uint32_t blockSize, int32_t *restrict pRes) Vectorized dot product of 16-bit integer vectors singlecore kernel for XPULPV2 extension.
void	plp_dot_prod_i32p_xpulpv2(void * S) Parallel dot product with interleaved access of 32-bit integer vectors kernel for XPULPV2 extension.
void	plp_dot_prod_i32s_rv32im(const int32_t restrict pSrcA, const int32_t restrict pSrcB, uint32_t blockSize, int32_t *restrict pRes) Scalar dot product of 32-bit integer vectors kernel for RV32IM extension.
void	plp_dot_prod_i32s_xpulpv2(const int32_t restrict pSrcA, const int32_t restrict pSrcB, uint32_t blockSize, int32_t *restrict pRes) Scalar dot product of 32-bit integer vectors kernel for XPULPV2 extension.
void	plp_dot_prod_i8s_rv32im(const int8_t restrict pSrcA, const int8_t restrict pSrcB, uint32_t blockSize, int32_t *restrict pRes) Scalar dot product of 8-bit integer vectors kernel for RV32IM extension.
void	plp_dot_prod_i8s_xpulpv2(const int8_t restrict pSrcA, const int8_t restrict pSrcB, uint32_t blockSize, int32_t *restrict pRes) Vectorized dot product of 8-bit integer vectors singlecore kernel for XPULPV2 extension.
void	plp_dot_prod_q16s_rv32im(const int16_t restrict pSrcA, const int16_t restrict pSrcB, uint32_t blockSize, uint32_t deciPoint, int32_t *restrict pRes) Scalar dot product of 16-bit fixed point vectors kernel for RV32IM extension.
void	plp_dot_prod_q16s_xpulpv2(const int16_t restrict pSrcA, const int16_t restrict pSrcB, uint32_t blockSize, uint32_t deciPoint, int32_t *restrict pRes) Vectorized dot product of 16-bit fixed point vectors singlecore kernel for XPULPV2 extension.
void	plp_dot_prod_q32p_xpulpv2(void * S) Parallel dot product with interleaved access of 32-bit fixed point vectors kernel for XPULPV2 extension.
void	plp_dot_prod_q32s_rv32im(const int32_t restrict pSrcA, const int32_t restrict pSrcB, uint32_t blockSize, uint32_t deciPoint, int32_t *restrict pRes) Scalar dot product of 32-bit fixed point vectors kernel for RV32IM extension.
void	plp_dot_prod_q32s_xpulpv2(const int32_t restrict pSrcA, const int32_t restrict pSrcB, uint32_t blockSize, uint32_t deciPoint, int32_t *restrict pRes) Scalar dot product of 32-bit fixed point vectors kernel for XPULPV2 extension.
void	plp_dot_prod_q8s_rv32im(const int8_t restrict pSrcA, const int8_t restrict pSrcB, uint32_t blockSize, uint32_t deciPoint, int32_t *restrict pRes) Scalar dot product of 8-bit fixed point vectors kernel for RV32IM extension.
void	plp_dot_prod_q8s_xpulpv2(const int8_t restrict pSrcA, const int8_t restrict pSrcB, uint32_t blockSize, uint32_t deciPoint, int32_t *restrict pRes) Scalar dot product of 8-bit fixed point vectors singlecore kernel for XPULPV2 extension.

Detailed Description

Computes the scalar dot product of two vectors. The vectors are multiplied element-by-element and then summed. sum = pSrcA[0]pSrcB[0] + pSrcA[1]pSrcB[1] + ... + pSrcA[blockSize-1]*pSrcB[blockSize-1] There are separate functions for floating-point, int8, int16, and int32 data types. For lower precision integers (int8, int16), functions exploiting SIMD instructions are provided.

The naming of the functions follows the following pattern (for example plp_dot_prod_i32s_rv32im): _ _ _, with

data type = {f, i, q} respectively for floats, integers, fixed points

precision = {32, 16, 8} bits

method = {s, p} respectively meaning single core or parallel multicore implementation.

isa extension = rv32im, xpulpv2, etc. of which rv32im is the most general one.

Functions Documentation

function plp_dot_prod_f32p_xpulpv2

void plp_dot_prod_f32p_xpulpv2(
    void * S
)

Parallel dot product with interleaved access of 32-bit float vectors kernel for XPULPV2 extension.

Parameters:

S points to the instance structure for float parallel dot product

Return: none

function plp_dot_prod_f32s_rv32im

void plp_dot_prod_f32s_rv32im(
    const float32_t *__restrict__ pSrcA,
    const float32_t *__restrict__ pSrcB,
    uint32_t blockSize,
    float32_t *__restrict__ pRes
)

Glue code for dot product of 32-bit float vectors.

Parameters:

pSrcA points to the first input vector
pSrcB points to the second input vector
blockSize number of samples in each vector
pRes output result returned here

Return: none

function plp_dot_prod_f32s_xpulpv2

void plp_dot_prod_f32s_xpulpv2(
    const float32_t *__restrict__ pSrcA,
    const float32_t *__restrict__ pSrcB,
    uint32_t blockSize,
    float32_t *__restrict__ pRes
)

Glue code for dot product of 32-bit float vectors.

Parameters:

pSrcA points to the first input vector
pSrcB points to the second input vector
blockSize number of samples in each vector
pRes output result returned here

Return: none

function plp_dot_prod_i16s_rv32im

void plp_dot_prod_i16s_rv32im(
    const int16_t *__restrict__ pSrcA,
    const int16_t *__restrict__ pSrcB,
    uint32_t blockSize,
    int32_t *__restrict__ pRes
)

Scalar dot product of 16-bit integer vectors kernel for RV32IM extension.

Parameters:

pSrcA points to the first input vector [16 bit]
pSrcB points to the second input vector [16 bit]
blockSize number of samples in each vector
pRes output result returned here [32 bit]

Return: none

Par: Exploiting SIMD instructions

When the ISA supports, the 16 bit values are packed two by two into 32 bit vectors and then the two dot products are performed simultaneously on 32 bit vectors, with 32 bit accumulator. RV32IM doesn't support SIMD. For SIMD, check out other ISA extensions (e.g. XPULPV2).

Vectorized dot product of 16-bit integer vectors kernel for RV32IM extension.

function plp_dot_prod_i16s_xpulpv2

void plp_dot_prod_i16s_xpulpv2(
    const int16_t *__restrict__ pSrcA,
    const int16_t *__restrict__ pSrcB,
    uint32_t blockSize,
    int32_t *__restrict__ pRes
)

Vectorized dot product of 16-bit integer vectors singlecore kernel for XPULPV2 extension.

Parameters:

pSrcA points to the first input vector [16 bit]
pSrcB points to the second input vector [16 bit]
blockSize number of samples in each vector
pRes output result returned here [32 bit]

Return: none

Par: Exploiting SIMD instructions

The 16 bit values are packed two by two into 32 bit vectors and then the two dot products are performed simultaneously on 32 bit vectors, with 32 bit accumulator.

Vectorized dot product of 16-bit integer vectors kernel singlecore for XPULPV2 extension.

function plp_dot_prod_i32p_xpulpv2

void plp_dot_prod_i32p_xpulpv2(
    void * S
)

Parallel dot product with interleaved access of 32-bit integer vectors kernel for XPULPV2 extension.

Parameters:

S points to the instance structure for integer parallel dot product

Return: none

function plp_dot_prod_i32s_rv32im

void plp_dot_prod_i32s_rv32im(
    const int32_t *__restrict__ pSrcA,
    const int32_t *__restrict__ pSrcB,
    uint32_t blockSize,
    int32_t *__restrict__ pRes
)

Scalar dot product of 32-bit integer vectors kernel for RV32IM extension.

Parameters:

pSrcA points to the first input vector
pSrcB points to the second input vector
blockSize number of samples in each vector
pRes output result returned here

Return: none

function plp_dot_prod_i32s_xpulpv2

void plp_dot_prod_i32s_xpulpv2(
    const int32_t *__restrict__ pSrcA,
    const int32_t *__restrict__ pSrcB,
    uint32_t blockSize,
    int32_t *__restrict__ pRes
)

Scalar dot product of 32-bit integer vectors kernel for XPULPV2 extension.

Parameters:

pSrcA points to the first input vector
pSrcB points to the second input vector
blockSize number of samples in each vector
pRes output result returned here

Return: none

function plp_dot_prod_i8s_rv32im

void plp_dot_prod_i8s_rv32im(
    const int8_t *__restrict__ pSrcA,
    const int8_t *__restrict__ pSrcB,
    uint32_t blockSize,
    int32_t *__restrict__ pRes
)

Scalar dot product of 8-bit integer vectors kernel for RV32IM extension.

Parameters:

pSrcA points to the first input vector [8] bit]
pSrcB points to the second input vector [8 bit]
blockSize number of samples in each vector
pRes output result returned here [32 bit]

Return: none

Par: Exploiting SIMD instructions

When the ISA supports, the 8 bit values are packed four by four into 32 bit vectors and then the four dot products are performed simultaneously on 32 bit vectors, with 32 bit accumulator. RV32IM doesn't support SIMD. For SIMD, check out other ISA extensions (e.g. XPULPV2).

Vectorized dot product of 8-bit integer vectors kernel for RV32IM extension.

function plp_dot_prod_i8s_xpulpv2

void plp_dot_prod_i8s_xpulpv2(
    const int8_t *__restrict__ pSrcA,
    const int8_t *__restrict__ pSrcB,
    uint32_t blockSize,
    int32_t *__restrict__ pRes
)

Vectorized dot product of 8-bit integer vectors singlecore kernel for XPULPV2 extension.

Parameters:

pSrcA points to the first input vector [8 bit]
pSrcB points to the second input vector [8 bit]
blockSize number of samples in each vector
pRes output result returned here [32 bit]

Return: none

Par: Exploiting SIMD instructions

The 8 bit values are packed four by four into 32 bit vectors and then the four dot products are performed on 32 bit vectors, with 32 bit accumulator.

function plp_dot_prod_q16s_rv32im

void plp_dot_prod_q16s_rv32im(
    const int16_t *__restrict__ pSrcA,
    const int16_t *__restrict__ pSrcB,
    uint32_t blockSize,
    uint32_t deciPoint,
    int32_t *__restrict__ pRes
)

Scalar dot product of 16-bit fixed point vectors kernel for RV32IM extension.

Parameters:

pSrcA points to the first input vector [16 bit]
pSrcB points to the second input vector [16 bit]
blockSize number of samples in each vector
deciPoint decimal point for right shift
pRes output result returned here [32 bit]

Return: none

Par: Exploiting SIMD instructions

When the ISA supports, the 16 bit values are packed two by two into 32 bit vectors and then the two dot products are performed simultaneously on 32 bit vectors, with 32 bit accumulator. RV32IM doesn't support SIMD. For SIMD, check out other ISA extensions (e.g. XPULPV2).

function plp_dot_prod_q16s_xpulpv2

void plp_dot_prod_q16s_xpulpv2(
    const int16_t *__restrict__ pSrcA,
    const int16_t *__restrict__ pSrcB,
    uint32_t blockSize,
    uint32_t deciPoint,
    int32_t *__restrict__ pRes
)

Vectorized dot product of 16-bit fixed point vectors singlecore kernel for XPULPV2 extension.

Parameters:

pSrcA points to the first input vector [16 bit]
pSrcB points to the second input vector [16 bit]
blockSize number of samples in each vector
deciPoint decimal point for right shift
pRes output result returned here [32 bit]

Return: none

Par: Exploiting SIMD instructions

The 16 bit values are packed two by two into 32 bit vectors and then the two dot products are performed simultaneously on 32 bit vectors, with 32 bit accumulator.

function plp_dot_prod_q32p_xpulpv2

void plp_dot_prod_q32p_xpulpv2(
    void * S
)

Parallel dot product with interleaved access of 32-bit fixed point vectors kernel for XPULPV2 extension.

Parameters:

S points to the instance structure for fixed point parallel dot product

Return: none

function plp_dot_prod_q32s_rv32im

void plp_dot_prod_q32s_rv32im(
    const int32_t *__restrict__ pSrcA,
    const int32_t *__restrict__ pSrcB,
    uint32_t blockSize,
    uint32_t deciPoint,
    int32_t *__restrict__ pRes
)

Scalar dot product of 32-bit fixed point vectors kernel for RV32IM extension.

Parameters:

pSrcA points to the first input vector
pSrcB points to the second input vector
blockSize number of samples in each vector
deciPoint decimal point for right shift
pRes output result returned here

Return: none

function plp_dot_prod_q32s_xpulpv2

void plp_dot_prod_q32s_xpulpv2(
    const int32_t *__restrict__ pSrcA,
    const int32_t *__restrict__ pSrcB,
    uint32_t blockSize,
    uint32_t deciPoint,
    int32_t *__restrict__ pRes
)

Scalar dot product of 32-bit fixed point vectors kernel for XPULPV2 extension.

Parameters:

pSrcA points to the first input vector
pSrcB points to the second input vector
blockSize number of samples in each vector
deciPoint decimal point for right shift
pRes output result returned here

Return: none

function plp_dot_prod_q8s_rv32im

void plp_dot_prod_q8s_rv32im(
    const int8_t *__restrict__ pSrcA,
    const int8_t *__restrict__ pSrcB,
    uint32_t blockSize,
    uint32_t deciPoint,
    int32_t *__restrict__ pRes
)

Scalar dot product of 8-bit fixed point vectors kernel for RV32IM extension.

Parameters:

pSrcA points to the first input vector [8 bit]
pSrcB points to the second input vector [8 bit]
blockSize number of samples in each vector
deciPoint decimal point for right shift
pRes output result returned here [32 bit]

Return: none

Par: Exploiting SIMD instructions

When the ISA supports, the 8 bit values are packed four by four into 32 bit vectors and then the four dot products are performed simultaneously on 32 bit vectors, with 32 bit accumulator. RV32IM doesn't support SIMD. For SIMD, check out other ISA extensions (e.g. XPULPV2).

function plp_dot_prod_q8s_xpulpv2

void plp_dot_prod_q8s_xpulpv2(
    const int8_t *__restrict__ pSrcA,
    const int8_t *__restrict__ pSrcB,
    uint32_t blockSize,
    uint32_t deciPoint,
    int32_t *__restrict__ pRes
)

Scalar dot product of 8-bit fixed point vectors singlecore kernel for XPULPV2 extension.

Parameters:

pSrcA points to the first input vector [8 bit]
pSrcB points to the second input vector [8 bit]
blockSize number of samples in each vector
deciPoint decimal point for right shift
pRes output result returned here [32 bit]

Return: none

Par: Exploiting SIMD instructions

The 8 bit values are packed four by four into 32 bit vectors and then the four dot products are performed on 32 bit vectors, with 32 bit accumulator.

Updated on 2023-03-01 at 16:16:32 +0000