Skip to content

Vector Dot Product

Module: Basic Math Functions

More...

Modules

Name
Vector Dot Product Kernels

Functions

Name
void plp_dot_prod_f32(const float32_t restrict pSrcA, const float32_t restrict pSrcB, uint32_t blockSize, float32_t *restrict pRes)
Glue code for dot product of 32-bit float vectors.
void plp_dot_prod_f32_parallel(const float32_t restrict pSrcA, const float32_t restrict pSrcB, uint32_t blockSize, uint32_t nPE, float32_t *restrict pRes)
Glue code for parallel dot product of 32-bit float vectors.
void plp_dot_prod_i16(const int16_t restrict pSrcA, const int16_t restrict pSrcB, uint32_t blockSize, int32_t *restrict pRes)
Glue code for dot product of 16-bit integer vectors.
void plp_dot_prod_i32(const int32_t restrict pSrcA, const int32_t restrict pSrcB, uint32_t blockSize, int32_t *restrict pRes)
Glue code for dot product of 32-bit integer vectors.
void plp_dot_prod_i32_parallel(const int32_t restrict pSrcA, const int32_t restrict pSrcB, uint32_t blockSize, uint32_t nPE, int32_t *restrict pRes)
Glue code for parallel dot product of 32-bit integer vectors.
void plp_dot_prod_i8(const int8_t restrict pSrcA, const int8_t restrict pSrcB, uint32_t blockSize, int32_t *restrict pRes)
Glue code for dot product of 8-bit integer vectors.
void plp_dot_prod_q16(const int16_t restrict pSrcA, const int16_t restrict pSrcB, uint32_t blockSize, uint32_t deciPoint, int32_t *restrict pRes)
Glue code for dot product of 16-bit fixed point vectors.
void plp_dot_prod_q32(const int32_t restrict pSrcA, const int32_t restrict pSrcB, uint32_t blockSize, uint32_t deciPoint, int32_t *restrict pRes)
Glue code for dot product of 32-bit fixed point vectors.
void plp_dot_prod_q32_parallel(const int32_t restrict pSrcA, const int32_t restrict pSrcB, uint32_t blockSize, uint32_t deciPoint, uint32_t nPE, int32_t *restrict pRes)
Glue code for parallel dot product of 32-bit fixed point vectors.
void plp_dot_prod_q8(const int8_t restrict pSrcA, const int8_t restrict pSrcB, uint32_t blockSize, uint32_t deciPoint, int32_t *restrict pRes)
Glue code for dot product of 8-bit fixed point vectors.

Detailed Description

This module contains the glue code for Vector Dot Product. The kernel codes (kernels) are in the Moducle Vector Dot Product Kernels.

The Vector Dot Product computes the dot product of two vectors. The vectors are multiplied element-by-element and then summed. sum = pSrcA[0]pSrcB[0] + pSrcA[1]pSrcB[1] + ... + pSrcA[blockSize-1]*pSrcB[blockSize-1] There are separate functions for floating point, integer, and fixed point 32- 16- 8-bit data types. For lower precision integers (16- and 8-bit), functions exploiting SIMD instructions are provided.

The naming scheme of the functions follows the following pattern (for example plp_dot_prod_i32s): _ _ _ , with

data type = {f, i, q} respectively for floats, integers, fixed points

precision = {32, 16, 8} bits

method = {s, p} respectively meaning single core or parallel multicore implementation.

isa extension = rv32im, xpulpv2, etc. of which rv32im is the most general one.

Functions Documentation

function plp_dot_prod_f32

void plp_dot_prod_f32(
    const float32_t *__restrict__ pSrcA,
    const float32_t *__restrict__ pSrcB,
    uint32_t blockSize,
    float32_t *__restrict__ pRes
)

Glue code for dot product of 32-bit float vectors.

Parameters:

  • pSrcA points to the first input vector
  • pSrcB points to the second input vector
  • blockSize number of samples in each vector
  • pRes output result returned here

Return: none

function plp_dot_prod_f32_parallel

void plp_dot_prod_f32_parallel(
    const float32_t *__restrict__ pSrcA,
    const float32_t *__restrict__ pSrcB,
    uint32_t blockSize,
    uint32_t nPE,
    float32_t *__restrict__ pRes
)

Glue code for parallel dot product of 32-bit float vectors.

Parameters:

  • pSrcA points to the first input vector
  • pSrcB points to the second input vector
  • blockSize number of samples in each vector
  • nPE number of parallel processing units
  • pRes output result returned here

Return: none

function plp_dot_prod_i16

void plp_dot_prod_i16(
    const int16_t *__restrict__ pSrcA,
    const int16_t *__restrict__ pSrcB,
    uint32_t blockSize,
    int32_t *__restrict__ pRes
)

Glue code for dot product of 16-bit integer vectors.

Parameters:

  • pSrcA points to the first input vector [16 bit]
  • pSrcB points to the second input vector [16 bit]
  • blockSize number of samples in each vector
  • pRes output result returned here [32 bit]

Return: none

Par: Exploiting SIMD instructions

When the ISA supports, the 16 bit values are packed two by two into 32 bit vectors and then the two dot products are performed simultaneously on 32 bit vectors, with 32 bit accumulator.

function plp_dot_prod_i32

void plp_dot_prod_i32(
    const int32_t *__restrict__ pSrcA,
    const int32_t *__restrict__ pSrcB,
    uint32_t blockSize,
    int32_t *__restrict__ pRes
)

Glue code for dot product of 32-bit integer vectors.

Parameters:

  • pSrcA points to the first input vector
  • pSrcB points to the second input vector
  • blockSize number of samples in each vector
  • pRes output result returned here

Return: none

function plp_dot_prod_i32_parallel

void plp_dot_prod_i32_parallel(
    const int32_t *__restrict__ pSrcA,
    const int32_t *__restrict__ pSrcB,
    uint32_t blockSize,
    uint32_t nPE,
    int32_t *__restrict__ pRes
)

Glue code for parallel dot product of 32-bit integer vectors.

Parameters:

  • pSrcA points to the first input vector
  • pSrcB points to the second input vector
  • blockSize number of samples in each vector
  • nPE number of parallel processing units
  • pRes output result returned here

Return: none

function plp_dot_prod_i8

void plp_dot_prod_i8(
    const int8_t *__restrict__ pSrcA,
    const int8_t *__restrict__ pSrcB,
    uint32_t blockSize,
    int32_t *__restrict__ pRes
)

Glue code for dot product of 8-bit integer vectors.

Parameters:

  • pSrcA points to the first input vector [8 bit]
  • pSrcB points to the second input vector [8 bit]
  • blockSize number of samples in each vector
  • pRes output result returned here [32 bit]

Return: none

Par: Exploiting SIMD instructions

When the ISA supports, the 8 bit values are packed four by four into 32 bit vectors and then the four dot products are performed simultaneously on 32 bit vectors, with 32 bit accumulator.

function plp_dot_prod_q16

void plp_dot_prod_q16(
    const int16_t *__restrict__ pSrcA,
    const int16_t *__restrict__ pSrcB,
    uint32_t blockSize,
    uint32_t deciPoint,
    int32_t *__restrict__ pRes
)

Glue code for dot product of 16-bit fixed point vectors.

Parameters:

  • pSrcA points to the first input vector [16 bit]
  • pSrcB points to the second input vector [16 bit]
  • blockSize number of samples in each vector
  • deciPoint decimal point for right shift
  • pRes output result returned here [32 bit]

Return: none

Par: Exploiting SIMD instructions

When the ISA supports, the 16 bit values are packed two by two into 32 bit vectors and then the two dot products are performed simultaneously on 32 bit vectors, with 32 bit accumulator.

function plp_dot_prod_q32

void plp_dot_prod_q32(
    const int32_t *__restrict__ pSrcA,
    const int32_t *__restrict__ pSrcB,
    uint32_t blockSize,
    uint32_t deciPoint,
    int32_t *__restrict__ pRes
)

Glue code for dot product of 32-bit fixed point vectors.

Parameters:

  • pSrcA points to the first input vector
  • pSrcB points to the second input vector
  • blockSize number of samples in each vector
  • deciPoint decimal point for right shift
  • pRes output result returned here

Return: none

function plp_dot_prod_q32_parallel

void plp_dot_prod_q32_parallel(
    const int32_t *__restrict__ pSrcA,
    const int32_t *__restrict__ pSrcB,
    uint32_t blockSize,
    uint32_t deciPoint,
    uint32_t nPE,
    int32_t *__restrict__ pRes
)

Glue code for parallel dot product of 32-bit fixed point vectors.

Parameters:

  • pSrcA points to the first input vector
  • pSrcB points to the second input vector
  • blockSize number of samples in each vector
  • deciPoint decimal point for right shift
  • nPE number of parallel processing units
  • pRes output result returned here

Return: none

function plp_dot_prod_q8

void plp_dot_prod_q8(
    const int8_t *__restrict__ pSrcA,
    const int8_t *__restrict__ pSrcB,
    uint32_t blockSize,
    uint32_t deciPoint,
    int32_t *__restrict__ pRes
)

Glue code for dot product of 8-bit fixed point vectors.

Parameters:

  • pSrcA points to the first input vector [8 bit]
  • pSrcB points to the second input vector [8 bit]
  • blockSize number of samples in each vector
  • deciPoint decimal point for right shift
  • pRes output result returned here [32 bit]

Return: none

Par: Exploiting SIMD instructions

When the ISA supports, the 8 bit values are packed four by four into 32 bit vectors and then the four dot products are performed simultaneously on 32 bit vectors, with 32 bit accumulator.


Updated on 2023-03-01 at 16:16:32 +0000