init

2025-12-11 09:43:42 +08:00
commit d8b2974133
1822 changed files with 280037 additions and 0 deletions
--- a/lib_xcore_math/doc/rst/lib_xcore_math.rst
+++ b/lib_xcore_math/doc/rst/lib_xcore_math.rst
@@ -0,0 +1,18 @@
+
+####################################
+lib_xcore_math: xcore optimised math
+####################################
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Contents:
+
+   src/introduction
+
+   src/getting_started
+   src/bfp_background
+   src/reference/reference_index
+   src/examples
+   src/tests
+
+
--- a/lib_xcore_math/doc/rst/src/bfp_background.rst
+++ b/lib_xcore_math/doc/rst/src/bfp_background.rst
@@ -0,0 +1,117 @@
+.. _bfp_background:
+
+*******************************
+Block Floating-Point background
+*******************************
+
+Block Floating-Point vectors
+============================
+
+A standard (IEEE) floating-point object can exist either as a scalar, e.g.
+
+
+.. code-block:: c
+
+    //Single IEEE floating-point variable
+    float foo;
+
+
+or as a vector, e.g.
+
+.. code-block:: c
+
+    //Array of IEEE floating-point variables
+    float foo[20];
+
+
+Standard floating-point values carry both a mantissa :math:`m` and an exponent :math:`p`, such that
+the logical value represented by such a variable is :math:`m\cdot2^p`. When you have a vector of
+standard floating-point values, each element of the vector carries its own mantissa and its own
+exponent: :math:`m[k]\cdot2^{p[k]}`.
+
+.. image:: images/bfp_bg_fig1.png
+
+By contrast, block floating-point objects have a vector of mantissas :math:`\bar{m}` which all share
+the same exponent :math:`p`, such that the logical value of the element at index :math:`k` is
+:math:`m[k]\cdot2^p`.
+
+.. code-block:: c
+
+    struct {
+        // Array of mantissas
+        int32_t mant[20];
+        // Shared exponent
+        int32_t exp;
+    } bfp_vect;
+
+.. image:: images/bfp_bg_fig2.png
+
+
+.. _headroom_intro:
+
+Headroom
+========
+
+With a given exponent, :math:`p`, the largest value that can be represented by a 32-bit BFP vector
+is given by a maximal mantissa (:math:`2^{31}-1`), for a logical value of
+:math:`(2^{31}-1)\cdot2^p`. The smallest non-zero value that an element can represent is
+:math:`1\cdot2^p`.
+
+Because all elements must share a single exponent, in order to avoid overflow or saturation of the
+largest magnitude values, the exponent of a BFP vector is constrained by the element with the
+largest (logical) value. The drawback to this is that when the elements of a BFP vector represent a
+large dynamic range -- that is, where the largest magnitude element is many, many times larger than
+the smallest (non-zero) magnitude element -- the smaller magnitude elements effectively have fewer
+bits of precision.
+
+Consider a 2-element BFP vector intended to carry the values :math:`2^{20}` and :math:`255 \cdot
+2^{-10}`. One way this vector can be represented is to use an exponent of :math:`0`.
+
+.. code-block:: c
+
+    struct {
+        int32_t mant[2];
+        int32_t exp;
+    } vect = { { (1<<20), (0xFF >> 10) }, 0 };
+
+.. image:: images/bfp_bg_fig3.png
+
+In the diagram above, the fractional bits (shown in red text) are discarded, as the mantissa is only
+32 bits. Then, with :math:`0` as the exponent, ``mant[1]`` underflows to :math:`0`. Meanwhile, the
+12 most significant bits of ``mant[0]`` are all zeros.
+
+The headroom of a signed integer is the number of *redundant* leading sign bits. Equivalently, it is
+the number of bits that a mantissa can be left-shifted without losing any information. In the the
+diagram, the bits corresponding to headroom are shown in green text. Here ``mant[0]`` has 10 bits of
+headroom and ``mant[1]`` has a full 32 bits of headroom. (``mant[0]`` does not have 11 bits of
+headroom because in two's complement the MSb serves as a sign bit). The headroom for a BFP vector is
+the `minimum` of headroom amongst each of its elements; in this case, 10 bits.
+
+If we remove headroom from one mantissa of a BFP vector, all other mantissas must shift by the same
+number of bits, and the vector's exponent must be adjusted accordingly. A left-shift of one bit
+corresponds to reducing the exponent by 1, because a single bit left-shift corresponds to
+multiplication by 2.
+
+In this case, if we remove 10 bits of headroom and subtract 10 from the exponent we get the
+following:
+
+.. code-block:: c
+
+    struct {
+        int32_t mant[2];
+        int32_t exp;
+    } vect = { { (1<<30), (0xFF >> 0) }, -10 };
+
+.. image:: images/bfp_bg_fig4.png
+
+Now, no information is lost in either element. One of the main goals of BFP arithmetic is to keep
+the headroom in BFP vectors to the minimum necessary (equivalently, keeping the exponent as small as
+possible). That allows for maximum effective precision of the elements in the vector.
+
+Note that the headroom of a vector also tells you something about the size of the largest magnitude
+mantissa in the vector. That information (in conjunction with exponents) can be used to determine
+the largest possible output of an operation without having to look at the mantissas.
+
+For this reason, the BFP vectors in ``lib_xcore_math`` carry a field which tracks their current
+headroom. The functions in the BFP API use this property to make determinations about how best to
+preserve precision.
--- a/lib_xcore_math/doc/rst/src/examples.rst
+++ b/lib_xcore_math/doc/rst/src/examples.rst
@@ -0,0 +1,105 @@
+.. _examples:
+
+********************
+Example Applications
+********************
+
+Several example applications are offered to demonstrate use of the ``lib_xcore_math`` APIs through
+simple code examples.
+
+  * ``app_bfp_demo`` - Demonstration of the block floating-point arithmetic API
+  * ``app_vect_demo`` - Demonstration of the low-level vectorized arithmetic API
+  * ``app_fft_demo`` - Demonstration of the Fast Fourier Transform API
+  * ``app_filter_demo`` - Demonstration of the filtering API
+
+This section assumes you have downloaded and installed the `XMOS XTC tools <https://www.xmos.com/software-tools/>`_
+(see `README` for required version).
+Installation instructions can be found `here <https://xmos.com/xtc-install-guide>`_.
+
+Particular attention should be paid to the section `Installation of required third-party tools
+<https://www.xmos.com/documentation/XM-014363-PC-10/html/installation/install-configure/install-tools/install_prerequisites.html>`_.
+
+The application examples uses the `xcommon-cmake <https://www.xmos.com/file/xcommon-cmake-documentation/?version=latest>`_
+build system as bundled with the XTC tools.
+
+Building Examples
+=================
+
+To build the applications, from an XTC command prompt run the following commands in the
+`lib_xcore_math/examples` directory::
+
+    cmake -B build -G "Unix Makefiles"
+    xmake -C build
+
+Individual examples can be built using a command similar to the following::
+
+    xmake -C build EXAMPLE_NAME
+
+where ``EXAMPLE_NAME`` is the example to build.
+
+Running Examples
+================
+
+Once built, the example ``EXAMPLE_NAME`` can be run on the `XK-EVK-XU316` board using the following
+command::
+
+    xrun --xscope examples/EXAMPLE_NAME/bin/EXAMPLE_NAME.xe
+
+For instance, to run the ``bfp_demo`` example, use::
+
+    xrun --xscope examples/app_bfp_demo/bin/app_bfp_demo.xe
+
+To run the example using the ``xcore`` simulator instead, use::
+
+    xsim examples/EXAMPLE_NAME/bin/EXAMPLE_NAME.xe
+
+app_bfp_demo
+=============
+
+The purpose of this example application is to demonstrate how the arithmetic functions of
+``lib_xcore_math``'s block floating-point API may be used.
+
+In it, three 32-bit BFP vectors are allocated, initialized and filled with random data. Then several
+BFP operations are applied using those vectors as inputs and/or outputs.
+
+The example only demonstrates the real 32-bit arithmetic BFP functions (that is, functions with
+names ``bfp_s32_*``). The real 16-bit (``bfp_s16_*``), complex 32-bit (``bfp_complex_s32_*``) and
+complex 16-bit (``bfp_complex_s16_*``) functions all use similar naming conventions.
+
+app_vect_demo
+=============
+
+The purpose of this example application is to demonstrate how the arithmetic functions of
+``lib_xcore_math``'s lower-level vector API may be used.
+
+In general the low-level arithmetic API are the functions in this library whose names begin with
+``vect_*``, such as :c:func:`vect_s32_mul()` for element-wise multiplication of 32-bit vectors, and
+:c:func:`vect_complex_s16_scale()` for multiplying a complex 16-bit vector by a complex scalar.
+
+We assume that where the low-level API is being used it is because some behavior other than the
+default behavior of the high-level block floating-point API is required. Given that, rather than
+showcasing the breadth of operations available, this example examines first how to achieve
+comparable behavior to the BFP API, and then ways in which that behavior can be modified.
+
+app_fft_demo
+============
+
+The purpose of this example application is to demonstrate how the FFT functions of
+``lib_xcore_math``'s block floating-point API may be used.
+
+In this example we demonstrate each of the offered forward and inverse FFTs of the BFP API.
+
+app_filter_demo
+===============
+
+The purpose of this example application is to demonstrate how the functions of
+``lib_xcore_math``'s filtering vector API may be used.
+
+The filtering API currently supports three different filter types:
+
+  * 32-bit FIR Filter
+  * 16-bit FIR Filter
+  * 32-bit Biquad Filter
+
+This example application presents simple demonstrations of how to use each of these filter types.
+
--- a/lib_xcore_math/doc/rst/src/getting_started.rst
+++ b/lib_xcore_math/doc/rst/src/getting_started.rst
@@ -0,0 +1,227 @@
+.. _getting_started:
+
+***************
+Getting Started
+***************
+
+Overview
+========
+
+``lib_xcore_math`` is a library containing efficient implementations of various mathematical
+operations that may be required in an embedded application.  In particular, this library is geared
+towards operations which work on vectors or arrays of data, including vectorized arithmetic,
+linear filtering, and fast Fourier transforms.
+
+This library comprises several sub-APIs.  Grouping of operations into sub-APIs is a matter of
+conceptual convenience.  In general, functions from a given API share a common prefix indicating
+which API the function comes from, or the type of object on which it acts.  Additionally, there is
+some interdependence between these APIs.
+
+These APIs are:
+
+* :ref:`Block floating-point (BFP) API <bfp_api>` -- High-level API providing operations on BFP
+  vectors. See :ref:`bfp_background` for an introduction to block floating-point. These functions
+  manage the exponents and headroom of input and output BFP vectors to avoid overflow and underflow
+  conditions.
+
+* :ref:`Vector/Array API <vect_api>` -- Lower-level API which is used heavily by the BFP API.
+  As such, the operations available in this API are similar to those in the BFP API, but the user
+  will have to manage exponents and headroom on their own. Many of these routines are implemented
+  directly in optimized assembly to use the hardware as efficiently as possible.
+
+* :ref:`Scalar API <scalar_api>` -- Provides various operations on scalar objects. In particular,
+  these operations focus on simple arithmetic operations applied to non-IEEE 754 floating-point
+  objects, as well as optimized operations which are applied to IEEE 754 ``floats``.
+
+* :ref:`Filtering API <filter_api>` -- Provides access to linear filtering operations, including
+  16- and 32-bit FIR filters and 32-bit biquad filters.
+
+* :ref:`Fast Fourier Transform (FFT) API <fft_api>` -- Provides both low-level and block
+  floating-point FFT implementations.  Optimized FFT implementations are provided for real signals,
+  pairs of real signals, and for complex signals.
+
+* :ref:`Discrete Cosine Transform (DCT) API <dct_api>` -- Provides functions which implement the
+  `type-II <https://en.wikipedia.org/wiki/Discrete_cosine_transform#DCT-II>`_ ('forward') and
+  `type-III <https://en.wikipedia.org/wiki/Discrete_cosine_transform#DCT-III>`_ ('inverse') DCT for
+  a variety of block lengths. Also provides a fast 8x8 two dimensional forward and inverse DCT.
+
+All APIs are accessed by including the single header file:
+
+.. code-block:: c
+
+    #include "xcore_math.h"
+
+Usage
+=====
+
+The following sections are intended to give the reader a general sense of how to use the API.
+
+BFP API
+-------
+
+In the BFP API the BFP vectors are C structures such as ``bfp_s16_t``, ``bfp_s32_t``, or
+``bfp_complex_s32_t``, backed by a memory buffer. These objects contain a pointer to the data
+carrying the content (mantissas) of the vector, as well as information about the length, headroom
+and exponent of the BFP vector.
+
+Below is the definition of :c:struct:`bfp_s32_t` from xmath/types.h.
+
+.. code-block:: c
+
+    C_TYPE
+    typedef struct {
+        /** Pointer to the underlying element buffer.*/
+        int32_t* data;
+        /** Exponent associated with the vector. */
+        exponent_t exp;
+        /** Current headroom in the ``data[]`` */
+        headroom_t hr;
+        /** Current size of ``data[]``, expressed in elements */
+        unsigned length;
+        /** BFP vector flags. Users should not normally modify these manually. */
+        bfp_flags_e flags;
+    } bfp_s32_t;
+
+The :ref:`32-bit BFP functions <bfp_s32>` take :c:struct:`bfp_s32_t` pointers as input and output
+parameters.
+
+Functions in the BFP API generally are prefixed with ``bfp_``. More specifically, functions where
+the 'main' operands are 32-bit BFP vectors are prefixed with ``bfp_s32_``, whereas functions where
+the 'main' operands are complex 16-bit BFP vectors are prefixed with ``bfp_complex_s16_``, and so
+on for the other BFP vector types.
+
+Initializing BFP Vectors
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Before calling these functions, the BFP vectors represented by the arguments must be initialized.
+For :c:struct:`bfp_s32_t` this is accomplished with :c:func:`bfp_s32_init()`.  Initialization
+requires that a buffer of sufficient size be provided to store the mantissa vector, as well as an
+initial exponent. If the first usage of a BFP vector is as an output, then the exponent will not
+matter, but the object must still be initialized before use.  Additionally, the headroom of the
+vector may be computed upon initialization; otherwise it is set to ``0``.
+
+Here is an example of a 32-bit BFP vector being initialized.
+
+.. code-block:: c
+
+    #define LEN (20)
+
+    //The object representing the BFP vector
+    bfp_s32_t bfp_vect;
+
+    // buffer backing bfp_vect
+    int32_t data_buffer[LEN];
+    for(int i = 0; i < LEN; i++) data_buffer[i] = i;
+
+    // The initial exponent associated with bfp_vect
+    exponent_t initial_exponent = 0;
+
+    // If non-zero, `bfp_s32_init()` will compute headroom currently present in data_buffer.
+    // Otherwise, headroom is initialized to 0 (which is always safe but may not be optimal)
+    unsigned calculate_headroom = 1;
+
+    // Initialize the vector object
+    bfp_s32_init(&bfp_vec, data_buffer, initial_exponent, LEN, calculate_headroom);
+
+    // Go do stuff with bfp_vect
+    ...
+
+
+Once initialized, the exponent and mantissas of the vector can be accessed by ``bfp_vect.exp`` and
+``bfp_vect.data[]`` respectively, with the logical (floating-point) value of element ``k`` being
+given by :math:`\mathtt{bfp\_vect.data[k]}\cdot2^{\mathtt{bfp\_vect.exp}}`.
+
+BFP Arithmetic Functions
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The following snippet shows a function ``foo()`` which takes 3 BFP vectors, ``a``, ``b`` and ``c``,
+as arguments. It multiplies together ``a`` and ``b`` element-wise, and then subtracts ``c`` from the
+product. In this example both operations are performed in-place on ``a``. (See
+:c:func:`bfp_s32_mul()` and :c:func:`bfp_s32_sub()` for more information about those functions)
+
+.. code-block:: c
+
+    void foo(bfp_s32_t* a, const bfp_s32_t* b, const bfp_s32_t* c)
+    {
+        // Multiply together a and b, updating a with the result.
+        bfp_s32_mul(a, a, b);
+
+        // Subtract c from the product, again updating a with the result.
+        bfp_s32_sub(a, a, c);
+    }
+
+
+The caller of ``foo()`` can then access the results through ``a``. Note that the pointer ``a->data``
+was not modified during this call.
+
+Vector API
+----------
+
+The functions in the lower-level vector API are optimized for performance. They do very little to
+protect the user from mangling their data by arithmetic saturation/overflows or underflows (although
+they do provide the means to prevent this).
+
+Functions in the vector API are generally prefixed with ``vect_``. For example, functions which
+operate primarily on 16-bit vectors are prefixed with ``vect_s16_``.
+
+Some functions are prefixed with ``chunk_`` instead of ``vect_``.  A "chunk" is just a vector with a
+fixed memory footprint (currently 32 bytes, or 8 32-bit elements) meant to match the width of the
+architecture's vector registers.
+
+As an example of a function from the vector API, see :c:func:`vect_s32_mul()` (from
+``vect_s32.h``), which multiplies together two ``int32_t`` vectors element by element.
+
+.. code-block:: c
+
+    C_API
+    headroom_t vect_s32_mul(
+        int32_t a[],
+        const int32_t b[],
+        const int32_t c[],
+        const unsigned length,
+        const right_shift_t b_shr,
+        const right_shift_t c_shr);
+
+This function takes two ``int32_t`` arrays, ``b`` and ``c``, as inputs and one ``int32_t`` array,
+``a``, as output (in the case of :c:func:`vect_s32_mul()`, it is safe to have ``a`` point to the
+same buffer as ``b`` or ``c``, computing the result in-place). ``length`` indicates the number of
+elements in each array. The final two parameters, ``b_shr`` and ``c_shr``, are the arithmetic
+right-shifts applied to each element of ``b`` and ``c`` before they are multiplied together.
+
+Why the right-shifts? In the case of 32-bit multiplication, the largest possible product is
+:math:`2^{62}`, which will not fit in the 32-bit output vector. Applying positive arithmetic
+right-shifts to the input vectors reduces the largest possible product. So, the shifts are there to
+manage the headroom/size of the resulting product in order to maximize precision while avoiding
+overflow or saturation.
+
+Contrast this with :c:func:`vect_s16_mul()`:
+
+.. code-block:: c
+
+    C_API
+    headroom_t vect_s16_mul(
+        int16_t a[],
+        const int16_t b[],
+        const int16_t c[],
+        const unsigned length,
+        const right_shift_t a_shr);
+
+The parameters are similar here, but instead of ``b_shr`` and ``c_shr``, there's only an ``a_shr``.
+In this case, the arithmetic right-shift ``a_shr`` is applied to the *products* of ``b`` and ``c``.
+In this case the right-shift is also *unsigned* -- it can only be used to reduce the size of the
+product.
+
+Shifts like those in these two examples are very common in the vector API, as they are the main
+mechanism for managing exponents and headroom.  Whether the shifts are applied to inputs, outputs,
+both, or only one input will depend on a number of factors.  In the case of :c:func:`vect_s32_mul()`
+they are applied to inputs because the XS3 VPU includes a compulsory (hardware) right-shift of 30
+bits on all products of 32-bit numbers, and so often inputs may need to be *left*-shifted (negative
+shift) in order to avoid underflows.  In the case of :c:func:`vect_s16_mul()`, this is unnecessary
+because no compulsory shift is included in 16-bit multiply-accumulates.
+
+Both :c:func:`vect_s32_mul()` and :c:func:`vect_s16_mul()` return the headroom of the output
+vector ``a``.
+
+Functions in the vector API are in many cases closely tied to the instruction set architecture
+for XS3. As such, if more efficient algorithms are found to perform an operation these low-level API
+functions are more likely to change in future versions.
--- a/lib_xcore_math/doc/rst/src/images/bfp_bg_fig1.png
+++ b/lib_xcore_math/doc/rst/src/images/bfp_bg_fig1.png
--- a/lib_xcore_math/doc/rst/src/images/bfp_bg_fig2.png
+++ b/lib_xcore_math/doc/rst/src/images/bfp_bg_fig2.png
--- a/lib_xcore_math/doc/rst/src/images/bfp_bg_fig3.png
+++ b/lib_xcore_math/doc/rst/src/images/bfp_bg_fig3.png
--- a/lib_xcore_math/doc/rst/src/images/bfp_bg_fig4.png
+++ b/lib_xcore_math/doc/rst/src/images/bfp_bg_fig4.png
--- a/lib_xcore_math/doc/rst/src/images/getting_started_fig1.png
+++ b/lib_xcore_math/doc/rst/src/images/getting_started_fig1.png
--- a/lib_xcore_math/doc/rst/src/introduction.rst
+++ b/lib_xcore_math/doc/rst/src/introduction.rst
@@ -0,0 +1,52 @@
+
+************
+Introduction
+************
+
+``lib_xcore_math`` is a library of optimised math functions for taking advantage of the vector
+processing unit (VPU) of the `XMOS` XS3 architecture (i.e `xcore.ai`).
+Included in the library are functions for block floating-point arithmetic, fast Fourier transforms,
+linear algebra, discrete cosine transforms, linear filtering and more.
+
+Repository structure
+====================
+
+* */lib_xcore_math/*
+
+  * *api/* - Headers containing the public API.
+  * *script/* - Scripts used for source generation.
+  * *src/*- Library source code.
+
+* */doc/* - documentation source.
+* */examples/* - Example applications.
+* */tests/*  - Unit test projects.
+
+API structure
+=============
+
+This library is organised around several sub-APIs.  These APIs collect the provided operations into
+coherent groups based on the kind of operation or the types of object being acted upon.
+
+The current APIs are:
+
+  * Block Floating-Point Vector API
+  * Vector/Array API
+  * Scalar API
+  * Linear Filtering API
+  * Fast Fourier Transform API
+  * Discrete Cosine Transform API
+
+Using ``lib_xcore_math``
+========================
+
+``lib_xcore_math`` is intended to be used with the `XCommon CMake <https://www.xmos.com/file/xcommon-cmake-documentation/?version=latest>`_
+, the `XMOS` application build and dependency management system.
+
+``lib_xcore_math`` can be compiled for both x86 platforms and XS3 based processors.
+
+On x86 platforms you can develop DSP algorithms and test them for functional correctness;
+this is an optional step before porting the library to an `xcore` device.
+
+To use this module, include ``lib_xcore_math`` in the application's ``APP_DEPENDENT_MODULES`` list and
+include the ``xcore_math.h`` header file.
+
--- a/lib_xcore_math/doc/rst/src/notes.md
+++ b/lib_xcore_math/doc/rst/src/notes.md
@@ -0,0 +1,159 @@
+
+Notes for lib_xcore_math                          {#notes}
+========================
+
+## &nbsp;
+
+## Vector Alignment ##                        {#vector_alignment}
+
+This library makes use of the XMOS architecture's vector processing unit (VPU). In the XS3 version
+of the architecture, all loads and stores stores to and from the XS3 VPU have the requirement that
+the loaded/stored addresses must be aligned to a 4-byte boundary (word-aligned).
+
+In the current version of the API, this leads to the requirement that most API functions require
+vectors (or the data backing a BFP vector) to begin at word-aligned addresses. Vectors are *not*
+required, however, to have a size (in bytes) that is a multiple of 4.
+
+Some functions also make use of instructures which require data to be 8-byte-aligned.
+
+### Writing Alignment-safe Code ###
+
+The alignment requirement is ultimately always on the data that backs a vector. This applies to all
+but the scalar API. For the BFP API, this applies to the memory to which the `data` field (or the
+`real` and `imag` fields in the case of `bfp_complex_s16_t`) points, specified when the BFP vector
+is initialized. A similar constraint applies when initializing filters. For the other APIs, this
+will apply to the pointers that get passed into the API functions.
+
+Arrays of type `int32_t` and `complex_s32_t` will normally be guaranteed to be word-aligned by the
+compiler. However, if the user manually specifies the beginning of an `int32_t` array, as in the
+following..
+
+\code{.c}
+    uint8_t byte_buffer[100];
+    int32_t* integer_array = (int32_t*) &byte_buffer[1];
+\endcode
+
+.. the vector may not be word-aligned. It is the responsibility of the user to ensure proper
+alignment of data.
+
+For `int16_t` arrays, the compiler does not by default guarantee that the array starts on a
+word-aligned address. To force word-alignment on arrays of this type, use 
+`__attribute__((aligned (4)))` in the variable definition, as in the following.
+
+\code{.c}
+    int16_t __attribute__((aligned (4))) data[100];
+\endcode
+
+Occasionally, 8-byte (double word) alignment is required. In this case, neither `int32_t` nor
+`int16_t` is necessarily guaranteed to align as required. Similar to the above, this can be hinted
+to the compiler as in the following.
+
+\code{.c}
+    int32_t __attribute__((aligned (8))) data[100];
+\endcode
+
+This library also provides the macros `WORD_ALIGNED` and `DWORD_ALIGNED` which force 4- and 8-byte
+alignment respectively as above.
+
+---------
+## Symmetrically Saturating Arithmetic ##     {#saturation}
+
+With ordinary integer arithmetic the block floating-point logic chooses exponents and operand shifts
+to prevent integer overflow with worst-case input values. However, the XS3 VPU uses symmetrically
+saturating integer arithmetic.
+
+Saturating arithmetic is that where partial results of the applied operation use a bit depth greater
+than the output bit depth, and values that can't be properly expressed with the output bit depth are
+set to the nearest expressible value. 
+
+For example, in ordinary C integer arithmetic, a function which multiplies two 32-bit integers may
+internally compute the full 64-bit product and then clamp values to the range `(INT32_MIN,
+INT32_MAX)` before returning a 32-bit result.
+
+Symmetrically saturating arithmetic also includes the property that the lower bound of the
+expressible range is the negative of the upper bound of the expressible range.
+
+One of the major troubles with non-saturating integer arithmetic is that in a twos complement
+encoding, there exists a non-zero integer (e.g. INT16_MIN in 16-bit twos complement arithmetic)
+value @f$x@f$ for which  @f$-1 \cdot x = x@f$. Serious arithmetic errors can result when this case
+is not accounted for.
+
+One of the results of _symmetric_ saturation, on the other hand, is that there is a corner case
+where (using the same exponent and shift logic as non-saturating arithmetic) saturation may occur
+for a particular combination of input mantissas. The corner case is different for different
+operations.
+
+When the corner case occurs, the minimum (and largest magnitude) value of the resulting vector is 1
+LSb greater than its ideal value (e.g. `-0x3FFF` instead of `-0x4000` for 16-bit arithmetic). The
+error in this output element's mantissa is then 1 LSb, or @f$2^p@f$, where @f$p@f$ is the exponent
+of the resulting BFP vector.
+
+Of course, the very nature of BFP arithmetic routinely involves errors of this magnitude.
+
+---------
+## Spectrum Packing ##              {#spectrum_packing}
+
+In its general form, the @math{N}-point Discrete Fourier Transform is an operation applied to a
+complex @math{N}-point signal @math{x[n]} to produce a complex spectrum @math{X[f]}. Any spectrum
+@math{X[f]} which is the result of a @math{N}-point DFT has the property that @math{X[f+N] = X[f]}.
+Thus, the complete representation of the @math{N}-point DFT of @math{X[n]} requires @math{N} complex
+elements.
+
+### Complex DFT and IDFT ###
+
+In this library, when performing a complex DFT (e.g. using fft_bfp_forward_complex()), the spectral
+representation that results in a straight-forward mapping:
+
+`X[f]` @math{\longleftarrow X[f]} for @math{0 \le f < N}
+
+where `X` is an @math{N}-element array of `complex_s32_t`, where the real part of @math{X[f]} is in
+`X[f].re` and the imaginary part in `X[f].im`.
+
+Likewise, when performing an @math{N}-point complex inverse DFT, that is also the representation
+that is expected.
+
+### Real DFT and IDFT ###
+
+Oftentimes we instead wish to compute the DFT of real signals. In addition to the periodicity
+property (@math{X[f+N] = X[f]}), the DFT of a real signal also has a complex conjugate symmetry such
+that @math{X[-f] = X^*[f]}, where @math{X^*[f]} is the complex conjugate of @math{X[f]}. This
+symmetry makes it redundant (and thus undesirable) to store such symmetric pairs of elements. This
+would allow us to get away with only explicitly storing @math{X[f} for @math{0 \le f \le N/2} in
+@math{(N/2)+1} complex elements.
+
+Unfortunately, using such a representation has the undesirable property that the DFT of an
+@math{N}-point real signal cannot be computed in-place, as the representation requires more memory
+than we started with.
+
+However, if we take the periodicity and complex conjugate symmetry properties together:
+
+\f[
+    X[0] = X^*[0] \rightarrow Imag\{X[0]\} = 0 \\
+
+    X[-(N/2) + N] = X[N/2] \\
+
+    X[-N/2] = X^*[N/2] \rightarrow X[N/2] = X^*[N/2] \rightarrow Imag \{ X[N/2] \} = 0
+\f]
+
+Because both @math{X[0]} and @math{X[N/2]} are guaranteed to be real, we can recover the benefit of
+in-place computation in our representation by packing the real part of @math{X[N/2]} into the
+imaginary part of @math{X[0]}.
+
+Therefore, the functions in this library that produce the spectra of real signals (such as
+fft_bfp_forward_mono() and fft_bfp_forward_stereo()) will pack the spectra in a slightly less
+straight-forward manner (as compared with the complex DFTs):
+
+
+`X[f]` @math{\longleftarrow X[f]} for @math{1 \le f < N/2}
+
+`X[0]` @math{\longleftarrow X[0] + j X[N/2]}
+
+where `X` is an @math{N/2}-element array of `complex_s32_t`.
+
+Likewise, this is the encoding expected when computing the @math{N}-point inverse DFT, such as by
+fft_bfp_inverse_mono() or fft_bfp_inverse_stereo().
+
+@note One additional note, when performing a stereo DFT or inverse DFT, so as to preserve the
+in-place computation of the result, the spectra of the two signals will be encoded into adjacent
+blocks of memory, with the second spectrum (i.e. associated with 'channel b') occupying the higher
+memory address.
--- a/lib_xcore_math/doc/rst/src/reference/bfp/bfp_complex_s16.rst
+++ b/lib_xcore_math/doc/rst/src/reference/bfp/bfp_complex_s16.rst
@@ -0,0 +1,6 @@
+.. _bfp_complex_s16:
+
+Complex 16-bit Block Floating-Point API
+---------------------------------------
+
+.. doxygengroup:: bfp_complex_s16_api
--- a/lib_xcore_math/doc/rst/src/reference/bfp/bfp_complex_s32.rst
+++ b/lib_xcore_math/doc/rst/src/reference/bfp/bfp_complex_s32.rst
@@ -0,0 +1,6 @@
+.. _bfp_complex_s32:
+
+Complex 32-bit Block Floating-Point API
+---------------------------------------
+
+.. doxygengroup:: bfp_complex_s32_api
--- a/lib_xcore_math/doc/rst/src/reference/bfp/bfp_index.rst
+++ b/lib_xcore_math/doc/rst/src/reference/bfp/bfp_index.rst
@@ -0,0 +1,12 @@
+.. _bfp_api:
+
+Block Floating-Point API
+========================
+
+.. toctree::
+
+    bfp_quickref
+    bfp_s16
+    bfp_s32
+    bfp_complex_s16
+    bfp_complex_s32
--- a/lib_xcore_math/doc/rst/src/reference/bfp/bfp_quickref.rst
+++ b/lib_xcore_math/doc/rst/src/reference/bfp/bfp_quickref.rst
@@ -0,0 +1,111 @@
+
+BFP API quick reference
+-----------------------
+
+The tables below list the functions of the block floating-point API. The "EW" column indicates
+whether the operation acts element-wise.
+
+The "Signature" column is intended as a hint which quickly conveys the kind of the conceptual inputs
+to and outputs from the operation.  The signatures are only intended to convey how many (conceptual)
+inputs and outputs there are, and their dimensionality.
+
+The functions themselves will typically take more arguments than these signatures indicate.  Check
+the function's full documentation to get more detailed information.
+
+The following symbols are used in the signatures:
+
+.. table::
+    :widths: 40 60
+    :class: longtable
+
+    +--------------------------------------+---------------------------------------------+
+    |  Symbol                              | Description                                 |
+    +======================================+=============================================+
+    | :math:`\mathbb{S}`                   | A scalar input or output value.             |
+    +--------------------------------------+---------------------------------------------+
+    | :math:`\mathbb{V}`                   | A vector-valued input or output.            |
+    +--------------------------------------+---------------------------------------------+
+    | :math:`\mathbb{M}`                   | A matrix-valued input or output.            |
+    +--------------------------------------+---------------------------------------------+
+    | :math:`\varnothing`                  | Placeholder indicating no input or output.  |
+    +--------------------------------------+---------------------------------------------+
+
+For example, the operation signature :math:`(\mathbb{V \times V \times S}) \to \mathbb{V}` indicates
+the operation takes two vector inputs and a scalar input, and the output is a vector.
+
+* `32-Bit BFP Ops <bfp32_api_>`_
+* `16-Bit BFP Ops <bfp16_api_>`_
+* `Complex 32-Bit BFP Ops <bfp32_complex_api_>`_
+* `Complex 16-Bit BFP Ops <bfp16_complex_api_>`_
+
+|newpage|
+
+32-Bit BFP API quick reference
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. _bfp32_api:
+
+|beginfullwidth|
+
+.. csv-table:: 32-Bit BFP API - quick reference
+    :file: csv/32bit_bfp_quickref.csv
+    :widths: 42, 5, 20, 33
+    :header-rows: 1
+    :class: longtable
+
+|endfullwidth|
+
+|newpage|
+
+16-Bit BFP API quick reference
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. _bfp16_api:
+
+|beginfullwidth|
+
+.. csv-table:: 16-Bit BFP API - quick reference
+    :file: csv/16bit_bfp_quickref.csv
+    :widths: 42, 5, 20, 33
+    :header-rows: 1
+    :class: longtable
+
+|endfullwidth|
+
+|newpage|
+
+Complex 32-bit BFP API quick reference
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. _bfp32_complex_api:
+
+|beginfullwidth|
+
+.. csv-table:: Complex 32-Bit BFP API - quick reference
+    :file: csv/complex_32bit_bfp_quickref.csv
+    :widths: 42, 5, 20, 33
+    :header-rows: 1
+    :class: longtable
+
+
+|endfullwidth|
+
+|newpage|
+
+Complex 16-bit BFP API quick reference
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. _bfp16_complex_api:
+
+|beginfullwidth|
+
+.. csv-table:: Complex 16-Bit BFP API - quick reference
+    :file: csv/complex_16bit_bfp_quickref.csv
+    :widths: 42, 5, 20, 33
+    :header-rows: 1
+    :class: longtable
+
+
+|endfullwidth|
+
+|newpage|
--- a/lib_xcore_math/doc/rst/src/reference/bfp/bfp_s16.rst
+++ b/lib_xcore_math/doc/rst/src/reference/bfp/bfp_s16.rst
@@ -0,0 +1,6 @@
+.. _bfp_s16:
+
+16-bit Block Floating-Point API
+-------------------------------
+
+.. doxygengroup:: bfp_s16_api
--- a/lib_xcore_math/doc/rst/src/reference/bfp/bfp_s32.rst
+++ b/lib_xcore_math/doc/rst/src/reference/bfp/bfp_s32.rst
@@ -0,0 +1,6 @@
+.. _bfp_s32:
+
+32-bit Block Floating-Point API
+-------------------------------
+
+.. doxygengroup:: bfp_s32_api
--- a/lib_xcore_math/doc/rst/src/reference/bfp/csv/16bit_bfp_quickref.csv
+++ b/lib_xcore_math/doc/rst/src/reference/bfp/csv/16bit_bfp_quickref.csv
@@ -0,0 +1,34 @@
+Function,EW,Signature,Brief
+:c:func:`bfp_s16_init()`               ,  , ":math:`(\mathbb{V \times S}) \to \mathbb{V}`              ",  Initialize (static)
+:c:func:`bfp_s16_alloc()`              ,  , ":math:`\varnothing \to \mathbb{V}`                        ",  Initialize (dynamic)
+:c:func:`bfp_s16_dealloc()`            ,  , ":math:`\mathbb{V} \to \mathbb{\varnothing}`               ",  Deinitialize
+:c:func:`bfp_s16_set()`                , x, ":math:`(\mathbb{V \times S}) \to \mathbb{V}`              ",  Set All Elements
+:c:func:`bfp_s16_use_exponent()`       ,  , ":math:`(\mathbb{V \times S}) \to \mathbb{V}`              ",  Force Exponent
+:c:func:`bfp_s16_headroom()`           ,  , ":math:`\mathbb{V} \to \mathbb{S}`                         ",  Get Headroom
+:c:func:`bfp_s16_shl()`                , x, ":math:`(\mathbb{V \times S}) \to \mathbb{V}`              ",  Shift Mantissas
+:c:func:`bfp_s16_add()`                , x, ":math:`(\mathbb{V \times V}) \to \mathbb{V}`              ",  Add Vector
+:c:func:`bfp_s16_add_scalar()`         ,  , ":math:`(\mathbb{V \times S}) \to \mathbb{V}`              ",  Add Scalar
+:c:func:`bfp_s16_sub()`                , x, ":math:`(\mathbb{V \times V}) \to \mathbb{V}`              ",  Subtract Vector
+:c:func:`bfp_s16_mul()`                , x, ":math:`(\mathbb{V \times V}) \to \mathbb{V}`              ",  Multiply Vector
+:c:func:`bfp_s16_macc()`               , x, ":math:`(\mathbb{V \times V \times V}) \to \mathbb{V}`     ",  Multiply-Accumulate
+:c:func:`bfp_s16_nmacc()`              , x, ":math:`(\mathbb{V \times V \times V}) \to \mathbb{V}`     ",  Negated Multiply-Accumulate
+:c:func:`bfp_s16_scale()`              ,  , ":math:`(\mathbb{V \times S}) \to \mathbb{V}`              ",  Multiply Scalar
+:c:func:`bfp_s16_abs()`                , x, ":math:`\mathbb{V} \to \mathbb{V}`                         ",  Absolute Values
+:c:func:`bfp_s16_sum()`                ,  , ":math:`\mathbb{V} \to \mathbb{S}`                         ",  Sum Elements
+:c:func:`bfp_s16_dot()`                ,  , ":math:`(\mathbb{V \times V}) \to \mathbb{S}`              ",  Inner Product
+:c:func:`bfp_s16_clip()`               , x, ":math:`(\mathbb{V \times S \times S}) \to \mathbb{V}`     ",  Clip Bounds
+:c:func:`bfp_s16_rect()`               , x, ":math:`\mathbb{V} \to \mathbb{V}`                         ",  Rectify Elements
+:c:func:`bfp_s16_to_bfp_s32()`         , x, ":math:`\mathbb{V} \to \mathbb{V}`                         ",  Convert to 32-bit
+:c:func:`bfp_s16_sqrt()`               , x, ":math:`\mathbb{V} \to \mathbb{V}`                         ",  Square Root
+:c:func:`bfp_s16_inverse()`            , x, ":math:`\mathbb{V} \to \mathbb{V}`                         ",  Multiplicative Inverse
+:c:func:`bfp_s16_abs_sum()`            ,  , ":math:`\mathbb{V} \to \mathbb{V}`                         ",  Absolute Sum Elements
+:c:func:`bfp_s16_mean()`               ,  , ":math:`\mathbb{V} \to \mathbb{V}`                         ",  Vector Mean Value
+:c:func:`bfp_s16_energy()`             ,  , ":math:`\mathbb{V} \to \mathbb{S}`                         ",  Vector Energy
+:c:func:`bfp_s16_rms()`                ,  , ":math:`\mathbb{V} \to \mathbb{S}`                         ",  Vector RMS Value
+:c:func:`bfp_s16_max()`                ,  , ":math:`\mathbb{V} \to \mathbb{S}`                         ",  Vector Max Element
+:c:func:`bfp_s16_min()`                ,  , ":math:`\mathbb{V} \to \mathbb{S}`                         ",  Vector Min Element
+:c:func:`bfp_s16_max_elementwise()`    , x, ":math:`(\mathbb{V \times V}) \to \mathbb{V}`              ",  Elementwise Max
+:c:func:`bfp_s16_min_elementwise()`    , x, ":math:`(\mathbb{V \times V}) \to \mathbb{V}`              ",  Elementwise Min
+:c:func:`bfp_s16_argmax()`             ,  , ":math:`\mathbb{V} \to \mathbb{S}`                         ",  Max Element Index
+:c:func:`bfp_s16_argmin()`             ,  , ":math:`\mathbb{V} \to \mathbb{S}`                         ",  Min Element Index
+:c:func:`bfp_s16_accumulate()`         , x, ":math:`(\mathbb{V \times V}) \to \mathbb{V}`              ", Elementwise Accumulate
--- a/lib_xcore_math/doc/rst/src/reference/bfp/csv/32bit_bfp_quickref.csv
+++ b/lib_xcore_math/doc/rst/src/reference/bfp/csv/32bit_bfp_quickref.csv
@@ -0,0 +1,35 @@
+Function,EW,Signature,Brief
+:c:func:`bfp_s32_init()`            ,   , ":math:`(\mathbb{V \times S}) \to \mathbb{V}`           ", "Initialize (static)"
+:c:func:`bfp_s32_alloc()`           ,   , ":math:`\varnothing \to \mathbb{V}`                     ", "Initialize (dynamic)"
+:c:func:`bfp_s32_dealloc()`         ,   , ":math:`\mathbb{V} \to \mathbb{\varnothing}`            ", "Deinitialize"
+:c:func:`bfp_s32_set()`             ,  x, ":math:`(\mathbb{V \times S}) \to \mathbb{V}`           ", "Set All Elements"
+:c:func:`bfp_s32_use_exponent()`    ,   , ":math:`(\mathbb{V \times S}) \to \mathbb{V}`           ", "Force Exponent"
+:c:func:`bfp_s32_headroom()`        ,   , ":math:`\mathbb{V} \to \mathbb{S}`                      ", "Get Headroom"
+:c:func:`bfp_s32_shl()`             ,  x, ":math:`(\mathbb{V \times S}) \to \mathbb{V}`           ", "Shift Mantissas"
+:c:func:`bfp_s32_add()`             ,  x, ":math:`(\mathbb{V \times V}) \to \mathbb{V}`           ", "Add Vector"
+:c:func:`bfp_s32_add_scalar()`      ,   , ":math:`(\mathbb{V \times S}) \to \mathbb{V}`           ", "Add Scalar"
+:c:func:`bfp_s32_sub()`             ,  x, ":math:`(\mathbb{V \times V}) \to \mathbb{V}`           ", "Subtract Vector"
+:c:func:`bfp_s32_mul()`             ,  x, ":math:`(\mathbb{V \times V}) \to \mathbb{V}`           ", "Multiply Vector"
+:c:func:`bfp_s32_macc()`            ,  x, ":math:`(\mathbb{V \times V \times V}) \to \mathbb{V}`  ", "Multiply-Accumulate"
+:c:func:`bfp_s32_nmacc()`           ,  x, ":math:`(\mathbb{V \times V \times V}) \to \mathbb{V}`  ", "Negated Multiply-Accumulate"
+:c:func:`bfp_s32_scale()`           ,   , ":math:`(\mathbb{V \times S}) \to \mathbb{V}`           ", "Multiply Scalar"
+:c:func:`bfp_s32_abs()`             ,  x, ":math:`\mathbb{V} \to \mathbb{V}`                      ", "Absolute Values"
+:c:func:`bfp_s32_sum()`             ,   , ":math:`\mathbb{V} \to \mathbb{S}`                      ", "Sum Elements"
+:c:func:`bfp_s32_dot()`             ,   , ":math:`(\mathbb{V \times V}) \to \mathbb{S}`           ", "Inner Product"
+:c:func:`bfp_s32_clip()`            ,  x, ":math:`(\mathbb{V \times S \times S}) \to \mathbb{V}`  ", "Clip Bounds"
+:c:func:`bfp_s32_rect()`            ,  x, ":math:`\mathbb{V} \to \mathbb{V}`                      ", "Rectify Elements"
+:c:func:`bfp_s32_to_bfp_s16()`      ,   , ":math:`\mathbb{V} \to \mathbb{V}`                      ", "Convert to 16-bit"
+:c:func:`bfp_s32_sqrt()`            ,  x, ":math:`\mathbb{V} \to \mathbb{V}`                      ", "Square Root"
+:c:func:`bfp_s32_inverse()`         ,  x, ":math:`\mathbb{V} \to \mathbb{V}`                      ", "Multiplicative Inverse"
+:c:func:`bfp_s32_abs_sum()`         ,   , ":math:`\mathbb{V} \to \mathbb{S}`                      ", "Absolute Sum Elements"
+:c:func:`bfp_s32_mean()`            ,   , ":math:`\mathbb{V} \to \mathbb{S}`                      ", "Vector Mean Value"
+:c:func:`bfp_s32_energy()`          ,   , ":math:`\mathbb{V} \to \mathbb{S}`                      ", "Vector Energy"
+:c:func:`bfp_s32_rms()`             ,   , ":math:`\mathbb{V} \to \mathbb{S}`                      ", "Vector RMS Value"
+:c:func:`bfp_s32_max()`             ,   , ":math:`\mathbb{V} \to \mathbb{S}`                      ", "Vector Max Element"
+:c:func:`bfp_s32_min()`             ,   , ":math:`\mathbb{V} \to \mathbb{S}`                      ", "Vector Min Element"
+:c:func:`bfp_s32_max_elementwise()` ,  x, ":math:`(\mathbb{V \times V}) \to \mathbb{V}`           ", "Elementwise Max"
+:c:func:`bfp_s32_min_elementwise()` ,  x, ":math:`(\mathbb{V \times V}) \to \mathbb{V}`           ", "Elementwise Min"
+:c:func:`bfp_s32_argmax()`          ,   , ":math:`\mathbb{V} \to \mathbb{S}`                      ", "Max Element Index"
+:c:func:`bfp_s32_argmin()`          ,   , ":math:`\mathbb{V} \to \mathbb{S}`                      ", "Min Element Index"
+:c:func:`bfp_s32_convolve_valid()`  ,   , ":math:`(\mathbb{V \times V}) \to \mathbb{V}`           ", "Convolve With Kernel (Valid mode)"
+:c:func:`bfp_s32_convolve_same()`   ,   , ":math:`(\mathbb{V \times V}) \to \mathbb{V}`           ", "Convolve With Kernel (Same mode)"
--- a/lib_xcore_math/doc/rst/src/reference/bfp/csv/complex_16bit_bfp_quickref.csv
+++ b/lib_xcore_math/doc/rst/src/reference/bfp/csv/complex_16bit_bfp_quickref.csv
@@ -0,0 +1,26 @@
+Function                                       , EW  ,  Signature                                                  ,  Brief
+:c:func:`bfp_complex_s16_init()`               ,     , ":math:`(\mathbb{V \times S}) \to \mathbb{V}`              ",  Initialize (static)
+:c:func:`bfp_complex_s16_alloc()`              ,     , ":math:`\varnothing \to \mathbb{V}`                        ",  Initialize (dynamic)
+:c:func:`bfp_complex_s16_dealloc()`            ,     , ":math:`\mathbb{V} \to \mathbb{\varnothing}`               ",  Deinitialize
+:c:func:`bfp_complex_s16_set()`                ,  x  , ":math:`(\mathbb{V \times S}) \to \mathbb{V}`              ",  Set All Elements
+:c:func:`bfp_complex_s16_use_exponent()`       ,     , ":math:`(\mathbb{V \times S}) \to \mathbb{V}`              ",  Force Exponent
+:c:func:`bfp_complex_s16_headroom()`           ,     , ":math:`\mathbb{V} \to \mathbb{S}`                         ",  Get Headroom
+:c:func:`bfp_complex_s16_shl()`                ,  x  , ":math:`(\mathbb{V \times S}) \to \mathbb{V}`              ",  Shift Mantissas
+:c:func:`bfp_complex_s16_real_mul()`           ,  x  , ":math:`(\mathbb{V \times V}) \to \mathbb{V}`              ",  Real Vector Multiply
+:c:func:`bfp_complex_s16_mul()`                ,  x  , ":math:`(\mathbb{V \times V}) \to \mathbb{V}`              ",  Complex Vector Multiply
+:c:func:`bfp_complex_s16_conj_mul()`           ,  x  , ":math:`(\mathbb{V \times V}) \to \mathbb{V}`              ",  Complex Vector Conjugate Multiply
+:c:func:`bfp_complex_s16_macc()`               ,  x  , ":math:`(\mathbb{V \times V \times V}) \to \mathbb{V}`     ",  Complex Vector Multiply-Accumulate
+:c:func:`bfp_complex_s16_nmacc()`              ,  x  , ":math:`(\mathbb{V \times V \times V}) \to \mathbb{V}`     ",  Complex Vector Negated Multiply-Accumulate
+:c:func:`bfp_complex_s16_conj_macc()`          ,  x  , ":math:`(\mathbb{V \times V \times V}) \to \mathbb{V}`     ",  Complex Vector Conjugate Multiply-Accumulate
+:c:func:`bfp_complex_s16_conj_nmacc()`         ,  x  , ":math:`(\mathbb{V \times V \times V}) \to \mathbb{V}`     ",  Complex Vector Negated Conjugate Multiply-Accumulate
+:c:func:`bfp_complex_s16_real_scale()`         ,     , ":math:`(\mathbb{V \times S}) \to \mathbb{V}`              ",  Real Scalar Multiply
+:c:func:`bfp_complex_s16_scale()`              ,     , ":math:`(\mathbb{V \times S}) \to \mathbb{V}`              ",  Complex Scalar Multiply
+:c:func:`bfp_complex_s16_add()`                ,  x  , ":math:`(\mathbb{V \times V}) \to \mathbb{V}`              ",  Complex Vector Add
+:c:func:`bfp_complex_s16_add_scalar()`         ,     , ":math:`(\mathbb{V \times S}) \to \mathbb{V}`              ",  Complex Scalar Add
+:c:func:`bfp_complex_s16_sub()`                ,     , ":math:`(\mathbb{V \times V}) \to \mathbb{V}`              ",  Complex Vector Subtract
+:c:func:`bfp_complex_s16_to_bfp_complex_s32()` ,  x  , ":math:`\mathbb{V} \to \mathbb{V}`                         ",  Convert to 32-bit
+:c:func:`bfp_complex_s16_squared_mag()`        ,  x  , ":math:`\mathbb{V} \to \mathbb{V}`                         ",  Squared Magnitude
+:c:func:`bfp_complex_s16_sum()`                ,     , ":math:`\mathbb{V} \to \mathbb{S}`                         ",  Vector Sum
+:c:func:`bfp_complex_s16_mag()`                ,  x  , ":math:`\mathbb{V} \to \mathbb{V}`                         ",  Magnitude
+:c:func:`bfp_complex_s16_conjugate()`          ,  x  , ":math:`\mathbb{V} \to \mathbb{V}`                         ",  Complex Conjugate
+:c:func:`bfp_complex_s16_energy()`             ,     , ":math:`\mathbb{V} \to \mathbb{S}`                         ",  Vector Energy
--- a/lib_xcore_math/doc/rst/src/reference/bfp/csv/complex_32bit_bfp_quickref.csv
+++ b/lib_xcore_math/doc/rst/src/reference/bfp/csv/complex_32bit_bfp_quickref.csv
@@ -0,0 +1,29 @@
+Function,EW,Signature,Brief
+:c:func:`bfp_complex_s32_init()`               ,   , ":math:`(\mathbb{V \times S}) \to \mathbb{V}`              ", Initialize (static)
+:c:func:`bfp_complex_s32_alloc()`              ,   , ":math:`\varnothing \to \mathbb{V}`                        ", Initialize (dynamic)
+:c:func:`bfp_complex_s32_dealloc()`            ,   , ":math:`\mathbb{V} \to \mathbb{\varnothing}`               ", Deinitialize
+:c:func:`bfp_complex_s32_set()`                ,  x, ":math:`(\mathbb{V \times S}) \to \mathbb{V}`              ", Set All Elements
+:c:func:`bfp_complex_s32_use_exponent()`       ,   , ":math:`(\mathbb{V \times S}) \to \mathbb{V}`              ", Force Exponent
+:c:func:`bfp_complex_s32_headroom()`           ,   , ":math:`\mathbb{V} \to \mathbb{S}`                         ", Get Headroom
+:c:func:`bfp_complex_s32_shl()`                ,  x, ":math:`(\mathbb{V \times S}) \to \mathbb{V}`              ", Shift Mantissas
+:c:func:`bfp_complex_s32_real_mul()`           ,  x, ":math:`(\mathbb{V \times V}) \to \mathbb{V}`              ", Real Vector Multiply
+:c:func:`bfp_complex_s32_mul()`                ,  x, ":math:`(\mathbb{V \times V}) \to \mathbb{V}`              ", Complex Vector Multiply
+:c:func:`bfp_complex_s32_conj_mul()`           ,  x, ":math:`(\mathbb{V \times V}) \to \mathbb{V}`              ", Complex Vector Conjugate Multiply
+:c:func:`bfp_complex_s32_macc()`               ,  x, ":math:`(\mathbb{V \times V \times V}) \to \mathbb{V}`     ", Complex Vector Multiply-Accumulate
+:c:func:`bfp_complex_s32_nmacc()`              ,  x, ":math:`(\mathbb{V \times V \times V}) \to \mathbb{V}`     ", Complex Vector Negated Multiply-Accumulate
+:c:func:`bfp_complex_s32_conj_macc()`          ,  x, ":math:`(\mathbb{V \times V \times V}) \to \mathbb{V}`     ", Complex Vector Conjugate Multiply-Accumulate
+:c:func:`bfp_complex_s32_conj_nmacc()`         ,  x, ":math:`(\mathbb{V \times V \times V}) \to \mathbb{V}`     ", Complex Vector Negated Conjugate Multiply-Accumulate
+:c:func:`bfp_complex_s32_real_scale()`         ,   , ":math:`(\mathbb{V \times S}) \to \mathbb{V}`              ", Real Scalar Multiply
+:c:func:`bfp_complex_s32_scale()`              ,   , ":math:`(\mathbb{V \times S}) \to \mathbb{V}`              ", Complex Scalar Multiply
+:c:func:`bfp_complex_s32_add()`                ,  x, ":math:`(\mathbb{V \times V}) \to \mathbb{V}`              ", Complex Vector Add
+:c:func:`bfp_complex_s32_add_scalar()`         ,   , ":math:`(\mathbb{V \times S}) \to \mathbb{V}`              ", Complex Scalar Add
+:c:func:`bfp_complex_s32_sub()`                ,   , ":math:`(\mathbb{V \times V}) \to \mathbb{V}`              ", Complex Vector Subtract
+:c:func:`bfp_complex_s32_to_bfp_complex_s16()` ,  x, ":math:`\mathbb{V} \to \mathbb{V}`                         ", Convert to 16-bit
+:c:func:`bfp_complex_s32_squared_mag()`        ,  x, ":math:`\mathbb{V} \to \mathbb{V}`                         ", Squared Magnitude
+:c:func:`bfp_complex_s32_mag()`                ,  x, ":math:`\mathbb{V} \to \mathbb{V}`                         ", Magnitude
+:c:func:`bfp_complex_s32_sum()`                ,   , ":math:`\mathbb{V} \to \mathbb{S}`                         ", Vector Sum
+:c:func:`bfp_complex_s32_conjugate()`          ,  x, ":math:`\mathbb{V} \to \mathbb{V}`                         ", Complex Conjugate
+:c:func:`bfp_complex_s32_energy()`             ,   , ":math:`\mathbb{V} \to \mathbb{S}`                         ", Vector Energy
+:c:func:`bfp_complex_s32_make()`               ,  x, ":math:`(\mathbb{V \times V}) \to \mathbb{V}`              ", Construct Complex From Real and Imaginary
+:c:func:`bfp_complex_s32_real_part()`          ,  x, ":math:`\mathbb{V} \to \mathbb{V}`                         ", Real Part
+:c:func:`bfp_complex_s32_imag_part()`          ,  x, ":math:`\mathbb{V} \to \mathbb{V}`                         ", Imaginary Part
--- a/lib_xcore_math/doc/rst/src/reference/config_options.rst
+++ b/lib_xcore_math/doc/rst/src/reference/config_options.rst
@@ -0,0 +1,9 @@
+.. _compile_time_opts:
+
+Library Configuration
+=====================
+
+Configuration Options
+---------------------
+
+.. doxygengroup:: config_options
--- a/lib_xcore_math/doc/rst/src/reference/csv/common_scalar_types.csv
+++ b/lib_xcore_math/doc/rst/src/reference/csv/common_scalar_types.csv
@@ -0,0 +1,18 @@
+Prefix                , Object Type                   ,  Notes
+``s32``               , ``int32_t``                   ,  "32-bit signed integer. May be a simple integer, a fixed-point value or the mantissa of a floating-point value."
+``s16``               , ``int16_t``                   ,  "16-bit signed integer. May be a simple integer, a fixed-point value or the mantissa of a floating-point value."
+``s8``                , ``int8_t``                    ,  "8-bit signed integer. May be a simple integer, a fixed-point value or the mantissa of a floating-point value."
+``complex_s32``       , :c:type:`complex_s32_t`       ,  "Signed complex integer with 32-bit real and 32-bit imaginary parts."
+``complex_s16``       , :c:type:`complex_s16_t`       ,  "Signed complex integer with 16-bit real and 16-bit imaginary parts."
+``float_s64``         , :c:type:`float_s64_t`         ,  "Non-standard floating-point scalar with exponent and 64-bit mantissa."
+``float_s32``         , :c:type:`float_s32_t`         ,  "Non-standard floating-point scalar with exponent and 32-bit mantissa."
+``qXX``               , ``int32_t``                   ,  "32-bit fixed-point value with ``XX`` fractional bits (i.e. exponent of  ``-XX``)."
+``f32``               , ``float``                     ,  "Standard IEEE 754 single-precision ``float``."
+``f64``               , ``double``                    ,  "Standard IEEE 754 double-precision ``float``."
+``float_complex_s64`` , :c:type:`float_complex_s64_t` ,  "Floating-point value with exponent and complex mantissa with 64-bit real and imaginary parts."
+``float_complex_s32`` , :c:type:`float_complex_s32_t` ,  "Floating-point value with exponent and complex mantissa with 32-bit real and imaginary parts."
+``float_complex_s16`` , :c:type:`float_complex_s16_t` ,  "Floating-point value with exponent and complex mantissa with 16-bit real and imaginary parts."
+N/A                   , :c:type:`exponent_t`          ,  "Represents an exponent :math:`p` as in :math:`2^p`. Unless otherwise specified exponent are always assumed to have a base of :math:`2`."
+N/A                   , :c:type:`headroom_t`          ,  "The headroom of a scalar or vector. See :ref:`headroom_intro` for more  information."
+N/A                   , :c:type:`right_shift_t`       ,  "Represents a rightward bit-shift of a certain number of bits. Care should be taken, as sometimes this is treated as unsigned."
+N/A                   , :c:type:`left_shift_t`        ,  "Represents a leftward bit-shift of a certain number of bits. Care should be taken, as sometimes this is treated as unsigned."
--- a/lib_xcore_math/doc/rst/src/reference/csv/common_vector_types.csv
+++ b/lib_xcore_math/doc/rst/src/reference/csv/common_vector_types.csv
@@ -0,0 +1,14 @@
+Prefix,Object Type,Notes
+``vect_s32``         , "``int32_t[]``                      ", "Raw vector of signed 32-bit integers."
+``vect_s16``         , "``int16_t[]``                      ", "Raw vector of signed 16-bit integers."
+``vect_s8``          , "``int8_t[]``                       ", "Raw vector of signed 8-bit integers."
+``vect_complex_s32`` , ":c:type:`complex_s32_t`\ ``[]``    ", "Raw vector of complex 32-bit integers."
+``vect_complex_s16`` , "(``int16_t[]``, ``int16_t[]``)     ", "Complex 16-bit vectors are usually represented as a pair of 16-bit vectors. This is an optimization due to the word-alignment requirement when loading data into the VPU's vector registers."
+``chunk_s32``        , "``int32_t[8]``                     ", "A 'chunk' is a fixed size vector corresponding to the size of the VPU vector registers."
+``vect_qXX``         , "``int32_t[]``                      ", "When used in an API function name, the ``XX`` will be an actual number (e.g. :c:func:`vect_q30_exp_small()`) indicating the fixed-point interpretation used by that function."
+``vect_f32``         , "``float[]``                        ", "Raw vector of standard IEEE ``float``"
+``vect_float_s32``   , ":c:type:`float_s32_t`\ ``[]``      ", "Vector of non-standard 32-bit floating-point scalars."
+``bfp_s32``          , ":c:type:`bfp_s32_t`                ", "Block floating-point vector contianing 32-bit mantissas."
+``bfp_s16``          , ":c:type:`bfp_s16_t`                ", "Block floating-point vector contianing 16-bit mantissas."
+``bfp_complex_s32``  , ":c:type:`bfp_complex_s32_t`        ", "Block floating-point vector contianing complex 32-bit mantissas."
+``bfp_complex_s16``  , ":c:type:`bfp_complex_s16_t`        ", "Block floating-point vector contianing complex 16-bit mantissas."
--- a/lib_xcore_math/doc/rst/src/reference/dct/dct_functions.csv
+++ b/lib_xcore_math/doc/rst/src/reference/dct/dct_functions.csv
@@ -0,0 +1,10 @@
+Brief                              , Forward Function           , Inverse Function
+6-point DCT                        , :c:func:`dct6_forward()`   , :c:func:`dct6_inverse()`
+8-point DCT                        , :c:func:`dct8_forward()`   , :c:func:`dct8_inverse()`
+12-point DCT                       , :c:func:`dct12_forward()`  , :c:func:`dct12_inverse()`
+16-point DCT                       , :c:func:`dct16_forward()`  , :c:func:`dct16_inverse()`
+24-point DCT                       , :c:func:`dct24_forward()`  , :c:func:`dct24_inverse()`
+32-point DCT                       , :c:func:`dct32_forward()`  , :c:func:`dct32_inverse()`
+48-point DCT                       , :c:func:`dct48_forward()`  , :c:func:`dct48_inverse()`
+64-point DCT                       , :c:func:`dct64_forward()`  , :c:func:`dct64_inverse()`
+8-by-8 2-dimensional DCT           , :c:func:`dct8x8_forward()` , :c:func:`dct8x8_inverse()`
--- a/lib_xcore_math/doc/rst/src/reference/dct/dct_index.rst
+++ b/lib_xcore_math/doc/rst/src/reference/dct/dct_index.rst
@@ -0,0 +1,22 @@
+.. _dct_api:
+
+Discrete Cosine Transform API
+-----------------------------
+
+DCT API quick reference
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Note: The forward DCTs are Type-II.  The inverse of the Type-II DCT is the Type-III DCT, so Type-II
+and Type-III are supported here.
+
+.. csv-table:: DCT functions - quick reference
+    :file: dct_functions.csv
+    :widths: 40, 30, 30
+    :header-rows: 1
+    :class: longtable
+
+|newpage|
+
+
+.. doxygengroup:: dct_api
+
--- a/lib_xcore_math/doc/rst/src/reference/fft/fft_functions.csv
+++ b/lib_xcore_math/doc/rst/src/reference/fft/fft_functions.csv
@@ -0,0 +1,8 @@
+Brief                                 , Forward Function                    , Inverse Function
+BFP FFT on single real signal         , :c:func:`bfp_fft_forward_mono()`    , :c:func:`bfp_fft_inverse_mono()`   
+BFP FFT on single complex signal      , :c:func:`bfp_fft_forward_complex()` , :c:func:`bfp_fft_inverse_complex()`
+BFP FFT on pair of real signals       , :c:func:`bfp_fft_forward_stereo()`  , :c:func:`bfp_fft_inverse_stereo()` 
+BFP spectrum packing                  , :c:func:`bfp_fft_unpack_mono()`     , :c:func:`bfp_fft_pack_mono()`      
+Low-level decimation-in-time FFT      , :c:func:`fft_dit_forward()`         , :c:func:`fft_dit_inverse()`        
+Low-level decimation-in-frequency FFT , :c:func:`fft_dif_forward()`         , :c:func:`fft_dif_inverse()`        
+FFT on real signal of ``float``       , :c:func:`fft_f32_forward()`         , :c:func:`fft_f32_inverse()`        
--- a/lib_xcore_math/doc/rst/src/reference/fft/fft_index.rst
+++ b/lib_xcore_math/doc/rst/src/reference/fft/fft_index.rst
@@ -0,0 +1,22 @@
+.. _fft_api:
+
+Fast Fourier Transform API
+--------------------------
+
+FFT API quick reference
+^^^^^^^^^^^^^^^^^^^^^^^
+
+|beginfullwidth|
+
+.. csv-table:: FFT functions - quick reference
+    :file: fft_functions.csv
+    :widths: 40, 30, 30
+    :header-rows: 1
+    :class: longtable
+
+|endfullwidth|
+
+|newpage|
+
+.. doxygengroup:: fft_api
+
--- a/lib_xcore_math/doc/rst/src/reference/filter/filter_functions.csv
+++ b/lib_xcore_math/doc/rst/src/reference/filter/filter_functions.csv
@@ -0,0 +1,9 @@
+Filter,Function,Brief
+32-bit FIR       , :c:func:`filter_fir_s32_init()`                 , Initialize filter                      
+32-bit FIR       , :c:func:`filter_fir_s32_add_sample()`           , Add sample (without computing output)  
+32-bit FIR       , :c:func:`filter_fir_s32()`                      , Process next sample                    
+16-bit FIR       , :c:func:`filter_fir_s16_init()`                 , Initialize filter                      
+16-bit FIR       , :c:func:`filter_fir_s16_add_sample()`           , Add sample (without computing output)  
+16-bit FIR       , :c:func:`filter_fir_s16()`                      , Process next sample                    
+32-bit Biquad    , :c:func:`filter_biquad_s32()`                   , Process next sample (single block)     
+32-bit Biquad    , :c:func:`filter_biquads_s32()`                  , Process next sample (multi block)      
--- a/lib_xcore_math/doc/rst/src/reference/filter/filter_index.rst
+++ b/lib_xcore_math/doc/rst/src/reference/filter/filter_index.rst
@@ -0,0 +1,18 @@
+.. _filter_api:
+
+Filtering API
+-------------
+
+|beginfullwidth|
+
+.. csv-table:: Filtering API - quick reference
+    :file: filter_functions.csv
+    :widths: 15,40,45
+    :header-rows: 1
+    :class: longtable
+
+|endfullwidth|
+
+|newpage|
+
+.. doxygengroup:: filter_api
--- a/lib_xcore_math/doc/rst/src/reference/notes.h
+++ b/lib_xcore_math/doc/rst/src/reference/notes.h
@@ -0,0 +1,261 @@
+// Copyright 2021-2024 XMOS LIMITED.
+// This Software is subject to the terms of the XMOS Public Licence: Version 1.
+
+// This file exists as a compatibility work-around between vanilla Doxygen and
+// sphinx+breathe+doxygen.
+// If these notes are written in an .md file, sphinx can't interpret them. If they're
+// written in an .rst file, Doxygen can't interpret them and can't build link references
+// to them.
+
+/**
+ * @page note_vector_alignment Note: Vector Alignment
+ *
+ *
+ * This library makes use of the XMOS architecture's vector processing unit (VPU). All loads and
+ * stores to and from the XS3 VPU have the requirement that the loaded/stored addresses must be
+ * aligned to a 4-byte boundary (word-aligned).
+ *
+ * In the current version of the API, this leads to the requirement that most API functions
+ * require vectors (or the data backing a BFP vector) to begin at word-aligned addresses.
+ * Vectors are *not* required, however, to have a size (in bytes) that is a multiple of 4.
+ *
+ * @par Writing Alignment-safe Code
+ * @parblock
+ *
+ * The alignment requirement is ultimately always on the data that backs a vector. For the
+ * low-level API, that is the pointers passed to the functions themselves. For the high-level
+ * API, that is the memory to which the `data` field (or the `real` and `imag` fields in the
+ * case of `bfp_complex_s16_t`) points, specified when the BFP vector is initialized.
+ *
+ * Arrays of type `int32_t` and `complex_s32_t` will normally be guaranteed to be word-aligned
+ * by the compiler. However, if the user manually specifies the beginning of an `int32_t` array,
+ * as in the following..
+ *
+ * @code{.c}
+ *  uint8_t byte_buffer[100];
+ *  int32_t* integer_array = (int32_t*) &byte_buffer[1];
+ * @endcode
+ *
+ * .. the vector may not be word-aligned. It is the responsibility of the user to ensure proper
+ * alignment of data.
+ *
+ * For `int16_t` arrays, the compiler does not by default guarantee that the array starts on a
+ * word-aligned address. To force word-alignment on arrays of this type, use
+ * `__attribute__((aligned (4)))` in the variable definition, as in the following.
+ *
+ * @code{.c}
+ *     int16_t __attribute__((aligned (4))) data[100];
+ * @endcode
+ *
+ * Occasionally, 8-byte (double word) alignment is required. In this case, neither `int32_t` nor
+ * `int16_t` is necessarily guaranteed to align as required. Similar to the above, this can be
+ * hinted to the compiler as in the following.
+ *
+ * @code{.c}
+ *     int32_t __attribute__((aligned (8))) data[100];
+ * @endcode
+ *
+ * This library also provides the macros `WORD_ALIGNED` and `DWORD_ALIGNED` which force 4- and
+ * 8-byte alignment respectively as above.
+ *
+ * @endparblock
+ */
+
+
+
+/**
+ * @page note_symmetric_saturation Note: Symmetrically Saturating Arithmetic
+ *
+ * With ordinary integer arithmetic the block floating-point logic chooses exponents and operand
+ * shifts to prevent integer overflow with worst-case input values. However, the XS3 VPU uses
+ * symmetrically saturating integer arithmetic.
+ *
+ * Saturating arithmetic is that where partial results of the applied operation use a bit depth
+ * greater than the output bit depth, and values that can't be properly expressed with the output
+ * bit depth are set to the nearest expressible value.
+ *
+ * For example, in ordinary C integer arithmetic, a function which multiplies two 32-bit integers
+ * may internally compute the full 64-bit product and then clamp values to the range `(INT32_MIN,
+ * INT32_MAX)` before returning a 32-bit result.
+ *
+ * Symmetrically saturating arithmetic also includes the property that the lower bound of the
+ * expressible range is the negative of the upper bound of the expressible range.
+ *
+ * One of the major troubles with non-saturating integer arithmetic is that in a twos complement
+ * encoding, there exists a non-zero integer (e.g. INT16_MIN in 16-bit twos complement arithmetic)
+ * value @math{x} for which  @math{-1 \cdot x = x}. Serious arithmetic errors can result when this
+ * case is not accounted for.
+ *
+ * One of the results of _symmetric_ saturation, on the other hand, is that there is a corner case
+ * where (using the same exponent and shift logic as non-saturating arithmetic) saturation may occur
+ * for a particular combination of input mantissas. The corner case is different for different
+ * operations.
+ *
+ * When the corner case occurs, the minimum (and largest magnitude) value of the resulting vector is
+ * 1 LSb greater than its ideal value (e.g. `-0x3FFF` instead of `-0x4000` for 16-bit arithmetic).
+ * The error in this output element's mantissa is then 1 LSb, or
+ * @math{2^p}, where @math{p} is the exponent of the resulting BFP vector.
+ *
+ * Of course, the very nature of BFP arithmetic routinely involves errors of this magnitude.
+ */
+
+
+/**
+ * @page note_spectrum_packing Note: Spectrum Packing
+ *
+ *
+ * In its general form, the @math{N}-point Discrete Fourier Transform is an operation applied
+ * to a complex @math{N}-point signal @math{x[n]} to produce a complex spectrum @math{X[f]}.
+ * Any spectrum @math{X[f]} which is the result of a @math{N}-point DFT has the property that
+ * @math{X[f+N] = X[f]}. Thus, the complete representation of the @math{N}-point DFT of
+ * @math{X[n]} requires @math{N} complex elements.
+ *
+ * @par Complex DFT and IDFT
+ * @parblock
+ *
+ * In this library, when performing a complex DFT (e.g. using fft_bfp_forward_complex()), the
+ * spectral representation that results in a straight-forward mapping:
+ *
+ * `X[f]` @math{\longleftarrow X[f]} for @math{0 \le f < N}
+ *
+ * where `X` is an @math{N}-element array of `complex_s32_t`, where the real part of @math{X[f]}
+ * is in `X[f].re` and the imaginary part in `X[f].im`.
+ *
+ * Likewise, when performing an @math{N}-point complex inverse DFT, that is also the
+ * representation that is expected.
+ * @endparblock
+ *
+ * @par Real DFT and IDFT
+ * @parblock
+ *
+ * Oftentimes we instead wish to compute the DFT of real signals. In addition to the periodicity
+ * property (@math{X[f+N] = X[f]}), the DFT of a real signal also has a complex conjugate symmetry
+ * such that @math{X[-f] = X^*[f]}, where @math{X^*[f]} is the complex conjugate of @math{X[f]}.
+ * This symmetry makes it redundant (and thus undesirable) tostore such symmetric pairs of elements.
+ * This would allow us to get away with only explicitly storing @math{X[f} for
+ * @math{0 \le f \le N/2} in @math{(N/2)+1} complex elements.
+ *
+ * Unfortunately, using such a representation has the undesirable property that the DFT of an
+ * @math{N}-point real signal cannot be computed in-place, as the representation requires more
+ * memory than we started with.
+
+ * However, if we take the periodicity and complex conjugate symmetry properties together:
+ *
+ * @f[
+ * &    X[0] = X^*[0] \rightarrow Imag\{X[0]\} = 0      \\
+ * &    X[-(N/2) + N] = X[N/2]                          \\
+ * &    X[-N/2] = X^*[N/2] \rightarrow X[N/2] = X^*[N/2] \rightarrow Imag \{ X[N/2] \} = 0
+ * @f]
+ *
+ * Because both @math{X[0]} and @math{X[N/2]} are guaranteed to be real, we can recover the benefit
+ * of in-place computation in our representation by packing the real part of @math{X[N/2]} into the
+ * imaginary part of @math{X[0]}.
+ *
+ * Therefore, the functions in this library that produce the spectra of real signals (such as
+ * fft_bfp_forward_mono() and fft_bfp_forward_stereo()) will pack the spectra in a slightly less
+ * straight-forward manner (as compared with the complex DFTs):
+ *
+ * `X[f]` @math{\longleftarrow X[f]} for @math{1 \le f < N/2}
+ *
+ * `X[0]` @math{\longleftarrow X[0] + j X[N/2]}
+ *
+ * where `X` is an @math{N/2}-element array of `complex_s32_t`.
+ *
+ * Likewise, this is the encoding expected when computing the @math{N}-point inverse DFT, such as by
+ * fft_bfp_inverse_mono() or fft_bfp_inverse_stereo().
+ *
+ * @note One additional note, when performing a stereo DFT or inverse DFT, so as to preserve the
+ * in-place computation of the result, the spectra of the two signals will be encoded into adjacent
+ * blocks of memory, with the second spectrum (i.e. associated with 'channel b') occupying the
+ * higher memory address.
+ *
+ * @endparblock
+ */
+
+
+
+/**
+ * @page fft_length_support Note: Library FFT Length Support
+ *
+ * When computing DFTs this library relies on one or both of a pair of look-up tables which contain
+ * portions of the Discrete Fourier Transform matrix.  Longer FFT lengths require larger look-up
+ * tables.  When building using CMake, the maximum FFT length can be specified as a CMake option,
+ * and these tables are auto-generated at build time.
+ *
+ * If not using CMake, you can manually generate these files using a python script included with the
+ * library. The script is located at `lib_xcore_math/python/gen_fft_table.py`. If generated
+ * manually, you must add the generated .c file as a source, and the directory containing
+ * `xmath_fft_lut.h` must be added as an include directory when compiling the library's files.
+ *
+ * Note that the header file must be named `xmath_fft_lut.h` as it is included via
+ * `#include "xmath_fft_lut.h"`.
+ *
+ * By default the tables contain the coefficients necessary to perform forward or inverse DFTs of up
+ * to 1024 points. If larger DFTs are required, or if the maximum required DFT size is known to be
+ * less than 1024 points, the `MAX_FFT_LEN_LOG2` CMake option can be modified from its default value
+ * of `10`.
+ *
+ * The two look-up tables correspond to the decimation-in-time and decimation-in-frequency FFT
+ * algorithms, and the run-time symbols for the tables are `xmath_dit_fft_lut` and
+ * `xmath_dif_fft_lut` respectively. Each table contains @math{N-4} complex 32-bit values, with a
+ * size of @math{8\cdot (N-4)} bytes each.
+ *
+ * To manually regenerate the tables for amaximum FFT length of @math{16384} (@math{=2^{14}}),
+ * supporting only the decimation-in-time algorithm, for example, use the following:
+ *
+ * @code{.c}
+ *     python lib_xcore_math/script/gen_fft_table.py --dit --max_fft_log2 14
+ * @endcode
+ *
+ * Use the `--help` flag with `gen_fft_table.py` for a more detailed description of its syntax and
+ * parameters.
+ */
+
+
+
+/**
+ * @page filter_conversion Note: Digital Filter Conversion
+ *
+ * This library supports optimized implementations of 16- and 32-bit FIR filters, as well as
+ * cascaded 32-bit biquad filters.  Each of these filter implementations requires that the
+ * filter coefficients be represented in a compatible form.
+ *
+ * To assist with that, several python scripts are distributed with this library which can be
+ * used to convert existing floating-point filter coefficients into a code which is easily
+ * callable from within an xCore application.
+ *
+ * Each script reads in floating-point filter coefficients from a file and computes a new
+ * representation for the filter with coefficients which attempt to maximize precision and are
+ * compatible with the `lib_xcore_math` filtering API.
+ *
+ * Each script outputs two files which can be included in your own xCore application. The first
+ * output is a C source (`.c`) file containing the computed filter parameters and
+ * several function definitions for initializing and executing the generated filter.  The second
+ * output is a C header (`.h`) file which can be `#include`d into your own application to
+ * give access to those functions.
+ *
+ * Additionally, each script also takes a user-provided filter name as an input parameter.  The
+ * output files (as well as the function names within) include the filter name so that more than
+ * one filter can be generated and executed using this mechanism.
+ *
+ * As an example, take the following command to generate a 32-bit FIR filter:
+ *
+ *    python lib_xcore_math/script/gen_fir_filter_s32.py MyFilter filter_coefs.txt
+ *
+ * This command creates a filter named "MyFilter", with coefficients taken from a file
+ * `filter_coefs.txt`.  Two output files will be generated, `MyFilter.c` and `MyFilter.h`.
+ * Including ``MyFilter.h`` provides access to 3 functions, ``MyFilter_init()``,
+ * `MyFilter_add_sample()`, and `MyFilter()` which correspond to the library functions
+ * `filter_fir_s32_init()`, `filter_fir_s32_add_sample()` and `filter_fir_s32()`
+ * respectively.
+ *
+ * Use the `--help` flag with the scripts for more detailed descriptions of inputs and other
+ * options.
+ *
+ * |  Filter Type   | Script                                           |
+ * |  -----------   | ------                                           |
+ * | 32-bit FIR     | `lib_xcore_math/script/gen_fir_filter_s32.py`    |
+ * | 16-bit FIR     | `lib_xcore_math/script/gen_fir_filter_s16.py`    |
+ * | 32-bit Biquad  | `lib_xcore_math/script/gen_biquad_filter_s32.py` |
+ *
+ */
--- a/lib_xcore_math/doc/rst/src/reference/notes.rst
+++ b/lib_xcore_math/doc/rst/src/reference/notes.rst
@@ -0,0 +1,37 @@
+.. _notes_page:
+
+#############
+Library Notes
+#############
+
+Note: Vector Alignment
+======================
+
+.. doxygenpage:: note_vector_alignment
+
+
+Note: Symmetrically Saturating Arithmetic
+=========================================
+
+.. doxygenpage:: note_symmetric_saturation
+
+
+Note: Spectrum Packing
+======================
+
+.. doxygenpage:: note_spectrum_packing
+
+
+Note: Library FFT Length Support
+================================
+
+.. doxygenpage:: fft_length_support
+
+
+
+
+Note: Digital Filter Conversion
+================================
+
+.. doxygenpage:: filter_conversion
+
--- a/lib_xcore_math/doc/rst/src/reference/q_format.rst
+++ b/lib_xcore_math/doc/rst/src/reference/q_format.rst
@@ -0,0 +1,7 @@
+
+Q-format macros
+---------------
+
+.. doxygengroup:: qfmt_macros
+    :members:
+
--- a/lib_xcore_math/doc/rst/src/reference/reference_index.rst
+++ b/lib_xcore_math/doc/rst/src/reference/reference_index.rst
@@ -0,0 +1,27 @@
+
+*************
+API Reference
+*************
+
+.. toctree::
+    :maxdepth: 2
+
+    types
+
+.. toctree::
+    :maxdepth: 1
+
+    bfp/bfp_index
+    dct/dct_index
+    fft/fft_index
+    filter/filter_index
+    scalar/scalar_index
+    vect/vect_index
+    q_format
+    utils
+    config_options
+
+.. toctree::
+    :maxdepth: 2
+
+    notes
--- a/lib_xcore_math/doc/rst/src/reference/scalar/csv/scalar_complex_float_ops.csv
+++ b/lib_xcore_math/doc/rst/src/reference/scalar/csv/scalar_complex_float_ops.csv
@@ -0,0 +1,7 @@
+Function, Brief
+:c:func:`float_complex_s16_mul()`    , ":math:`x \times y`           "
+:c:func:`float_complex_s16_add()`    , ":math:`x + y`                "
+:c:func:`float_complex_s16_sub()`    , ":math:`x - y`                "
+:c:func:`float_complex_s32_mul()`    , ":math:`x \times y`           "
+:c:func:`float_complex_s32_add()`    , ":math:`x + y`                "
+:c:func:`float_complex_s32_sub()`    , ":math:`x - y`                "
--- a/lib_xcore_math/doc/rst/src/reference/scalar/csv/scalar_f32_ops.csv
+++ b/lib_xcore_math/doc/rst/src/reference/scalar/csv/scalar_f32_ops.csv
@@ -0,0 +1,6 @@
+Function, Brief
+:c:func:`f32_sin()`          ,  :math:`sin(x)`
+:c:func:`f32_cos()`          ,  :math:`cos(x)`
+:c:func:`f32_log2()`         ,  :math:`log_2(x)`
+:c:func:`f32_power_series()` ,  Evaluate Power Series
+:c:func:`f32_normA()`        ,  Normalized Form A
--- a/lib_xcore_math/doc/rst/src/reference/scalar/csv/scalar_fixed_point_ops.csv
+++ b/lib_xcore_math/doc/rst/src/reference/scalar/csv/scalar_fixed_point_ops.csv
@@ -0,0 +1,13 @@
+Function,Input Depth, Fractional Bits, Brief
+:c:func:`s16_inverse()`       ,  16  ,  0   , ":math:`x^{-1}`                "
+:c:func:`s32_inverse()`       ,  32  ,  0   , ":math:`x^{-1}`                "
+:c:func:`sbrad_sin()`         ,  32  ,  31  , ":math:`\sin(x)`               "
+:c:func:`sbrad_tan()`         ,  32  ,  31  , ":math:`\tan(x)`               "
+:c:func:`q24_sin()`           ,  32  ,  24  , ":math:`\sin(x)`               "
+:c:func:`q24_cos()`           ,  32  ,  24  , ":math:`\cos(x)`               "
+:c:func:`q24_tan()`           ,  32  ,  24  , ":math:`\tan(x)`               "
+:c:func:`q30_exp_small()`     ,  32  ,  30  , ":math:`\exp(x)`               "
+:c:func:`q24_logistic()`      ,  32  ,  24  , ":math:`\frac{1}{1+e^{-x}}`    "
+:c:func:`q24_logistic_fast()` ,  32  ,  24  , ":math:`\frac{1}{1+e^{-x}}`    "
+:c:func:`q30_powers()`        ,  32  ,  30  , ":math:`(0,x,x^2,x^3,\dots)`   "
+:c:func:`u32_ceil_log2()`     ,  32  ,  0   , ":math:`\lceil\log_2(x)\rceil` "
--- a/lib_xcore_math/doc/rst/src/reference/scalar/csv/scalar_float_ops.csv
+++ b/lib_xcore_math/doc/rst/src/reference/scalar/csv/scalar_float_ops.csv
@@ -0,0 +1,15 @@
+Function, Brief
+:c:func:`float_s32_mul()`      , ":math:`x \times y`                   "
+:c:func:`float_s32_add()`      , ":math:`x + y`                        "
+:c:func:`float_s32_sub()`      , ":math:`x - y`                        "
+:c:func:`float_s32_div()`      , ":math:`\frac{x}{y}`                  "
+:c:func:`float_s32_abs()`      , ":math:`\left|x\right|`               "
+:c:func:`float_s32_gt()`       , ":math:`x > y`                        "
+:c:func:`float_s32_gte()`      , ":math:`x \ge y`                      "
+:c:func:`float_s32_ema()`      , ":math:`\alpha x + (1 - \alpha) y`    "
+:c:func:`float_s32_sqrt()`     , ":math:`\sqrt{x}`                     "
+:c:func:`float_s32_exp()`      , ":math:`exp(x)`                       "
+:c:func:`s16_mul()`            , ":math:`x \times y`                   "
+:c:func:`s32_sqrt()`           , ":math:`\sqrt{x}`                     "
+:c:func:`s32_mul()`            , ":math:`x \times y`                   "
+:c:func:`s32_odd_powers()`     , ":math:`x, x^3, x^5, x^7, \dots`      "
--- a/lib_xcore_math/doc/rst/src/reference/scalar/csv/scalar_type_conversion.csv
+++ b/lib_xcore_math/doc/rst/src/reference/scalar/csv/scalar_type_conversion.csv
@@ -0,0 +1,15 @@
+Function,Type In, Type Out
+:c:func:`f32_unpack()`             , "``float``                              ", "``int32_t``, :c:type:`exponent_t`"
+:c:func:`f32_unpack_s16()`         , "``float``                              ", "``int16_t``, :c:type:`exponent_t`  "
+:c:func:`f32_to_float_s32()`       , "``float``                              ", ":c:type:`float_s32_t`   "
+:c:func:`f64_to_float_s32()`       , "``double``                             ", ":c:type:`float_s32_t`   "
+:c:func:`float_s32_to_float_s64()` , ":c:type:`float_s32_t`                  ", ":c:type:`float_s64_t`   "
+:c:func:`float_s32_to_float()`     , ":c:type:`float_s32_t`                  ", "``float``               "
+:c:func:`float_s32_to_double()`    , ":c:type:`float_s32_t`                  ", "``double``              "
+:c:func:`s16_to_s32()`             , "``int16_t``, :c:type:`exponent_t`      ", "``int32_t``, :c:type:`exponent_t`"
+:c:func:`s32_to_s16()`             , "``int32_t``, :c:type:`exponent_t`      ", "``int16_t``, :c:type:`exponent_t`"
+:c:func:`s64_to_s32()`             , "``int64_t``, :c:type:`exponent_t`      ", "``int32_t``, :c:type:`exponent_t`"
+:c:func:`s32_to_f32()`             , "``int32_t``, :c:type:`exponent_t`      ", "``float``"
+:c:func:`radians_to_sbrads()`      , ":c:type:`radian_q24_t`                 ", ":c:type:`sbrad_t`"
+:c:func:`s32_to_chunk_s32()`       , "``int32_t``                            ", "``int32_t[8]``"
+:c:func:`float_s64_to_float_s32()` , ":c:type:`float_s64_t`                  ", ":c:type:`float_s32_t`"
--- a/lib_xcore_math/doc/rst/src/reference/scalar/scalar_f32.rst
+++ b/lib_xcore_math/doc/rst/src/reference/scalar/scalar_f32.rst
@@ -0,0 +1,6 @@
+
+Scalar IEEE 754 float API
+-------------------------
+
+.. doxygengroup:: scalar_f32_api
+    :members:
--- a/lib_xcore_math/doc/rst/src/reference/scalar/scalar_float_complex_s16.rst
+++ b/lib_xcore_math/doc/rst/src/reference/scalar/scalar_float_complex_s16.rst
@@ -0,0 +1,6 @@
+
+16-bit complex scalar floating-point API
+----------------------------------------
+
+.. doxygengroup:: float_complex_s16_api
+    :members:
--- a/lib_xcore_math/doc/rst/src/reference/scalar/scalar_float_complex_s32.rst
+++ b/lib_xcore_math/doc/rst/src/reference/scalar/scalar_float_complex_s32.rst
@@ -0,0 +1,6 @@
+
+32-bit complex scalar floating-point API
+----------------------------------------
+
+.. doxygengroup:: float_complex_s32_api
+    :members:
--- a/lib_xcore_math/doc/rst/src/reference/scalar/scalar_float_s32.rst
+++ b/lib_xcore_math/doc/rst/src/reference/scalar/scalar_float_s32.rst
@@ -0,0 +1,7 @@
+
+32-bit scalar float API
+-----------------------
+
+.. doxygengroup:: float_s32_api
+    :members:
+
--- a/lib_xcore_math/doc/rst/src/reference/scalar/scalar_index.rst
+++ b/lib_xcore_math/doc/rst/src/reference/scalar/scalar_index.rst
@@ -0,0 +1,17 @@
+.. _scalar_api:
+
+Scalar API
+----------
+
+.. toctree::
+    :maxdepth: 1
+
+    scalar_quickref
+
+    scalar_s16
+    scalar_s32
+    scalar_f32
+    scalar_float_s32
+    scalar_float_complex_s16
+    scalar_float_complex_s32
+    scalar_misc
--- a/lib_xcore_math/doc/rst/src/reference/scalar/scalar_misc.rst
+++ b/lib_xcore_math/doc/rst/src/reference/scalar/scalar_misc.rst
@@ -0,0 +1,6 @@
+
+Miscellaneous scalar API
+------------------------
+
+.. doxygengroup:: scalar_misc_api
+    :members:
--- a/lib_xcore_math/doc/rst/src/reference/scalar/scalar_quickref.rst
+++ b/lib_xcore_math/doc/rst/src/reference/scalar/scalar_quickref.rst
@@ -0,0 +1,70 @@
+
+Scalar API quick reference
+--------------------------
+
+* `Scalar Type Conversion <scalar_type_conversion_>`_
+* `Fixed-Point Scalar Ops <scalar_fixed_point_ops_>`_
+* `IEEE 754 Float Scalar Ops <scalar_f32_ops_>`_
+* `Non-standard Float Scalar Ops <scalar_float_ops_>`_
+* `Non-standard Complex Float Scalar Ops <scalar_complex_float_ops_>`_
+
+Scalar type conversion
+^^^^^^^^^^^^^^^^^^^^^^
+
+.. _scalar_type_conversion:
+
+|beginfullwidth|
+
+.. csv-table:: Scalar type conversion
+    :file: csv/scalar_type_conversion.csv
+    :widths: 40,30,30
+    :header-rows: 1
+    :class: longtable
+
+|endfullwidth|
+
+Fixed-point scalar ops
+^^^^^^^^^^^^^^^^^^^^^^
+
+.. _scalar_fixed_point_ops:
+
+.. csv-table:: Fixed-point scalar ops
+    :file: csv/scalar_fixed_point_ops.csv
+    :widths: 35,15,15,35
+    :header-rows: 1
+    :class: longtable
+
+
+IEEE 754 float ops
+^^^^^^^^^^^^^^^^^^
+
+.. _scalar_f32_ops:
+
+.. csv-table:: IEEE 754 float ops
+    :file: csv/scalar_f32_ops.csv
+    :widths: 50,50
+    :header-rows: 1
+    :class: longtable
+
+Non-standard scalar float ops
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. _scalar_float_ops:
+
+.. csv-table:: Non-standard scalar float ops
+    :file: csv/scalar_float_ops.csv
+    :widths: 50,50
+    :header-rows: 1
+    :class: longtable
+
+Non-standard complex scalar float ops
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. _scalar_complex_float_ops:
+
+.. csv-table:: Non-standard complex scalar float ops
+    :file: csv/scalar_complex_float_ops.csv
+    :widths: 50,50
+    :header-rows: 1
+    :class: longtable
+
--- a/lib_xcore_math/doc/rst/src/reference/scalar/scalar_s16.rst
+++ b/lib_xcore_math/doc/rst/src/reference/scalar/scalar_s16.rst
@@ -0,0 +1,6 @@
+
+16-bit scalar API
+-----------------
+
+.. doxygengroup:: scalar_s16_api
+    :members:
--- a/lib_xcore_math/doc/rst/src/reference/scalar/scalar_s32.rst
+++ b/lib_xcore_math/doc/rst/src/reference/scalar/scalar_s32.rst
@@ -0,0 +1,6 @@
+
+32-bit scalar API
+-----------------
+
+.. doxygengroup:: scalar_s32_api
+    :members:
--- a/lib_xcore_math/doc/rst/src/reference/types.rst
+++ b/lib_xcore_math/doc/rst/src/reference/types.rst
@@ -0,0 +1,76 @@
+
+XMath Types
+===========
+
+Each of the main operand types used in this library has a short-hand which is used as a prefix in 
+the naming of API operations. The following tables can be used for reference.
+
+
+Common Vector Types
+-------------------
+
+The following table indicates the types and abbreviations associated with various common vector 
+types.
+
+|beginfullwidth|
+
+.. csv-table:: Common Vector Types
+    :file: csv/common_vector_types.csv
+    :widths: 25, 25, 50
+    :header-rows: 1
+    :class: longtable
+
+|endfullwidth|
+
+Common Scalar Types
+-------------------
+
+The following table indicates the types and abbreviations associated with various common scalar
+types.
+
+|beginfullwidth|
+
+.. csv-table:: Common Scalar Types
+    :file: csv/common_scalar_types.csv
+    :widths: 25, 25, 50
+    :header-rows: 1
+    :class: longtable
+
+|endfullwidth|
+
+|newpage|
+ 
+
+Block Floating-Point Types
+--------------------------
+
+.. doxygengroup:: type_bfp
+  :members:
+
+
+Scalar Types (Integer)
+----------------------
+
+.. doxygengroup:: type_scalar_int
+    :members:
+
+Scalar Types (Floating-Point)
+-----------------------------
+
+.. doxygengroup:: type_scalar_float
+    :members:
+
+
+Scalar Types (Fixed-Point)
+--------------------------
+
+.. doxygengroup:: type_scalar_fixed
+    :members:
+
+
+Misc Types
+----------
+
+.. doxygengroup:: type_misc
+  :members:
+  
--- a/lib_xcore_math/doc/rst/src/reference/utils.rst
+++ b/lib_xcore_math/doc/rst/src/reference/utils.rst
@@ -0,0 +1,9 @@
+
+Util functions and macros
+-------------------------
+
+.. doxygengroup:: util_macros
+    :members:
+
+
+
--- a/lib_xcore_math/doc/rst/src/reference/vect/chunk_s32.rst
+++ b/lib_xcore_math/doc/rst/src/reference/vect/chunk_s32.rst
@@ -0,0 +1,5 @@
+
+32-Bit Vector Chunk (8-Element) API
+===================================
+
+.. doxygengroup:: chunk32_api
--- a/lib_xcore_math/doc/rst/src/reference/vect/vect_complex_s16.rst
+++ b/lib_xcore_math/doc/rst/src/reference/vect/vect_complex_s16.rst
@@ -0,0 +1,5 @@
+
+Complex 16-bit vector API
+-------------------------
+
+.. doxygengroup:: vect_complex_s16_api
--- a/lib_xcore_math/doc/rst/src/reference/vect/vect_complex_s16_prepare.rst
+++ b/lib_xcore_math/doc/rst/src/reference/vect/vect_complex_s16_prepare.rst
@@ -0,0 +1,5 @@
+
+16-Bit vomplex vector prepare functions
+---------------------------------------
+
+.. doxygengroup:: vect_complex_s16_prepare_api
--- a/lib_xcore_math/doc/rst/src/reference/vect/vect_complex_s32.rst
+++ b/lib_xcore_math/doc/rst/src/reference/vect/vect_complex_s32.rst
@@ -0,0 +1,5 @@
+
+Complex 32-bit vector API
+-------------------------
+
+.. doxygengroup:: vect_complex_s32_api
--- a/lib_xcore_math/doc/rst/src/reference/vect/vect_complex_s32_prepare.rst
+++ b/lib_xcore_math/doc/rst/src/reference/vect/vect_complex_s32_prepare.rst
@@ -0,0 +1,5 @@
+
+32-Bit complex vector prepare functions
+---------------------------------------
+
+.. doxygengroup:: vect_complex_s32_prepare_api
--- a/lib_xcore_math/doc/rst/src/reference/vect/vect_f32.rst
+++ b/lib_xcore_math/doc/rst/src/reference/vect/vect_f32.rst
@@ -0,0 +1,5 @@
+
+32-bit IEEE 754 float API
+-------------------------
+
+.. doxygengroup:: vect_f32_api
--- a/lib_xcore_math/doc/rst/src/reference/vect/vect_index.rst
+++ b/lib_xcore_math/doc/rst/src/reference/vect/vect_index.rst
@@ -0,0 +1,26 @@
+.. _vect_api:
+
+Vector API
+==========
+
+.. toctree::
+  vect_quickref
+
+.. toctree::
+
+  vect_s8
+  vect_s16
+  vect_s32
+  vect_f32
+  vect_complex_s16
+  vect_complex_s32
+  vect_mixed
+
+.. toctree::
+  vect_s16_prepare
+  vect_s32_prepare
+  vect_complex_s16_prepare
+  vect_complex_s32_prepare
+
+.. toctree::
+  chunk_s32
--- a/lib_xcore_math/doc/rst/src/reference/vect/vect_mixed.rst
+++ b/lib_xcore_math/doc/rst/src/reference/vect/vect_mixed.rst
@@ -0,0 +1,5 @@
+
+Mixed-precision vector API
+--------------------------
+
+.. doxygengroup:: vect_mixed_api
--- a/lib_xcore_math/doc/rst/src/reference/vect/vect_quickref.rst
+++ b/lib_xcore_math/doc/rst/src/reference/vect/vect_quickref.rst
@@ -0,0 +1,570 @@
+
+Vector API quick reference
+--------------------------
+
+The tables below list the functions of the vector API. The "EW" column indicates whether the
+operation acts element-wise.
+
+The "Signature" column is intended as a hint which quickly conveys the kind of the conceptual inputs
+to and outputs from the operation.  The signatures are only intended to convey how many (conceptual)
+inputs and outputs there are, and their dimensionality.
+
+The functions themselves will typically take more arguments than these signatures indicate.  For
+example, most functions take vector lengths as input, and many take shift values which are used to
+control growth of element bit-depth.  Check the function's full documentation to get more detailed
+information.
+
+The following symbols are used in the signatures:
+
+.. table::
+    :widths: 30 70
+    :class: longtable
+
+    +--------------------------------------+---------------------------------------------+
+    |  Symbol                              | Description                                 |
+    +======================================+=============================================+
+    | :math:`\mathbb{S}`                   | A scalar input or output value.             |
+    +--------------------------------------+---------------------------------------------+
+    | :math:`\mathbb{V}`                   | A vector-valued input or output.            |
+    +--------------------------------------+---------------------------------------------+
+    | :math:`\mathbb{M}`                   | A matrix-valued input or output.            |
+    +--------------------------------------+---------------------------------------------+
+    | :math:`\varnothing`                  | Placeholder indicating no input or output.  |
+    +--------------------------------------+---------------------------------------------+
+
+For example, the operation signature :math:`(\mathbb{V \times V \times S}) \to \mathbb{V}` indicates
+the operation takes two vector inputs and a scalar input, and the output is a vector.
+
+
+* `32-Bit Vector Ops <vect32_api_>`_
+* `16-Bit Vector Ops <vect16_api_>`_
+* `8-Bit Vector Ops <vect8_api_>`_
+* `Complex 32-Bit Vector Ops <vect32_complex_api_>`_
+* `Complex 16-Bit Vector Ops <vect16_complex_api_>`_
+* `Fixed-Point Vector Ops <vect_fixed_point_api_>`_
+* `Floating-Point Vector Ops <vect_float_api_>`_
+* `Other Vector Ops <vect_other_api_>`_
+* `Vector Type Conversions <vect_conversion_api_>`_
+
+
+.. _vect32_api:
+
+.. table::
+    :widths: 50 10 35
+    :class: longtable
+
+    +--------------------------------------------------------------------------------------------------+
+    | **32-bit Vector Ops**                                                                            |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | Function                                        | EW  |  Signature                               |
+    +=================================================+=====+==========================================+
+    | :c:func:`vect_s32_copy()`                       |     | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_abs()`                        |  x  | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_abs_sum()`                    |     | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{S}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_add()`                        |  x  | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_add_scalar()`                 |  x  | :math:`(\mathbb{V \times S})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_argmax()`                     |     | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{S}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_argmin()`                     |     | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{S}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_clip()`                       |  x  | :math:`(\mathbb{V \times S \times S})`   |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_dot()`                        |     | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{S}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_energy()`                     |     | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{S}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_headroom()`                   |     | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{S}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_inverse()`                    |  x  | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_max()`                        |     | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{S}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_max_elementwise()`            |  x  | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_min()`                        |     | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{S}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_min_elementwise()`            |  x  | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_mul()`                        |  x  | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_macc()`                       |  x  | :math:`(\mathbb{V \times V \times V})`   |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_nmacc()`                      |  x  | :math:`(\mathbb{V \times V \times V})`   |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_rect()`                       |  x  | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_scale()`                      |  x  | :math:`(\mathbb{V \times S})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_set()`                        |  x  | :math:`\mathbb{S}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_shl()`                        |  x  | :math:`(\mathbb{V \times S})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_shr()`                        |  x  | :math:`(\mathbb{V \times S})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_sqrt()`                       |  x  | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_sub()`                        |  x  | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_sum()`                        |     | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{S}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_zip()`                        |     | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_unzip()`                      |     | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to (\mathbb{V \times V})`        |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_convolve_valid()`             |     | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_convolve_same()`              |     | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_log_base()`                   |  x  | :math:`(\mathbb{V \times S})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_log()`                        |  x  | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_log2()`                       |  x  | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s32_log10()`                      |  x  | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`chunk_s32_dot()`                       |     | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{S}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`chunk_s32_log()`                       |  x  | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+
+.. _vect16_api:
+
+.. table::
+    :widths: 50 10 35
+    :class: longtable
+
+    +--------------------------------------------------------------------------------------------------+
+    | **16-bit Vector Ops**                                                                            |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | Function                                        | EW  |  Signature                               |
+    +=================================================+=====+==========================================+
+    | :c:func:`vect_s16_abs()`                        |  x  | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_abs_sum()`                    |     | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{S}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_add()`                        |  x  | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_add_scalar()`                 |  x  | :math:`(\mathbb{V \times S})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_argmax()`                     |     | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{S}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_argmin()`                     |     | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{S}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_clip()`                       |  x  | :math:`(\mathbb{V \times S \times S})`   |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_dot()`                        |     | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{S}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_energy()`                     |     | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{S}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_headroom()`                   |     | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{S}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_inverse()`                    |  x  | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_max()`                        |     | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{S}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_max_elementwise()`            |  x  | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_min()`                        |     | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{S}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_min_elementwise()`            |  x  | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_mul()`                        |  x  | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_macc()`                       |  x  | :math:`(\mathbb{V \times V \times V})`   |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_nmacc()`                      |  x  | :math:`(\mathbb{V \times V \times V})`   |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_rect()`                       |  x  | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_scale()`                      |  x  | :math:`(\mathbb{V \times S})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_set()`                        |  x  | :math:`\mathbb{S}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_shl()`                        |  x  | :math:`(\mathbb{V \times S})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_shr()`                        |  x  | :math:`(\mathbb{V \times S})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_sqrt()`                       |  x  | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_sub()`                        |  x  | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_sum()`                        |     | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{S}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_extract_high_byte()`          |  x  | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_s16_extract_low_byte()`           |  x  | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+
+.. _vect8_api:
+
+.. table::
+    :widths: 40 10 20 30
+    :class: longtable
+
+    +---------------------------------------------------------------------------------------------------------------+
+    | **8-bit Vector Ops**                                                                                          |
+    +---------------------------------+-----+-----------------------------------------------+-----------------------+
+    | Function                        | EW  |  Signature                                    | Brief                 |
+    +=================================+=====+===============================================+=======================+
+    | :c:func:`vect_s8_is_negative()` |  x  | :math:`\mathbb{V}`                            | Identify negative     |
+    |                                 |     | :math:`\to \mathbb{V}`                        | elements              |
+    +---------------------------------+-----+-----------------------------------------------+-----------------------+
+
+
+.. _vect32_complex_api:
+
+.. table::
+    :widths: 50 10 35
+    :class: longtable
+
+    +--------------------------------------------------------------------------------------------------+
+    | **32-bit Complex Vector Ops**                                                                    |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | Function                                        | EW  |  Signature                               |
+    +=================================================+=====+==========================================+
+    | :c:func:`vect_complex_s32_add()`                |  x  | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s32_add_scalar()`         |  x  | :math:`(\mathbb{V \times S})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s32_conj_macc()`          |  x  | :math:`(\mathbb{V \times V \times V})`   |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s32_conj_mul()`           |  x  | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s32_conj_nmacc()`         |  x  | :math:`(\mathbb{V \times V \times V})`   |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s32_conjugate()`          |  x  | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s32_headroom()`           |     | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{S}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s32_macc()`               |  x  | :math:`(\mathbb{V \times V \times V})`   |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s32_mag()`                |  x  | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s32_mul()`                |  x  | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s32_nmacc()`              |  x  | :math:`(\mathbb{V \times V \times V})`   |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s32_real_mul()`           |  x  | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s32_real_scale()`         |  x  | :math:`(\mathbb{V \times S})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s32_scale()`              |  x  | :math:`(\mathbb{V \times S})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s32_set()`                |  x  | :math:`\mathbb{S}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s32_shl()`                |  x  | :math:`(\mathbb{V \times S})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s32_shr()`                |  x  | :math:`(\mathbb{V \times S})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s32_squared_mag()`        |  x  | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s32_sub()`                |  x  | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s32_sum()`                |     | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{S}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s32_tail_reverse()`       |     | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+
+.. _vect16_complex_api:
+
+.. table::
+    :widths: 50 10 35
+    :class: longtable
+
+    +--------------------------------------------------------------------------------------------------+
+    | **16-bit Complex Vector Ops**                                                                    |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | Function                                        | EW  |  Signature                               |
+    +=================================================+=====+==========================================+
+    | :c:func:`vect_complex_s16_add()`                |  x  | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s16_add_scalar()`         |  x  | :math:`(\mathbb{V \times S})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s16_conj_mul()`           |  x  | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s16_conj_macc()`          |  x  | :math:`(\mathbb{V \times V \times V})`   |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s16_conj_nmacc()`         |  x  | :math:`(\mathbb{V \times V \times V})`   |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s16_headroom()`           |     | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{S}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s16_macc()`               |  x  | :math:`(\mathbb{V \times V \times V})`   |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s16_mag()`                |  x  | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s16_mul()`                |  x  | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s16_nmacc()`              |  x  | :math:`(\mathbb{V \times V \times V})`   |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s16_real_mul()`           |  x  | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s16_real_scale()`         |  x  | :math:`(\mathbb{V \times S})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s16_scale()`              |  x  | :math:`(\mathbb{V \times S})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s16_set()`                |  x  | :math:`\mathbb{S}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s16_shl()`                |  x  | :math:`(\mathbb{V \times S})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s16_shr()`                |  x  | :math:`(\mathbb{V \times S})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s16_squared_mag()`        |  x  | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s16_sub()`                |  x  | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_s16_sum()`                |     | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{S}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+
+.. _vect_fixed_point_api:
+
+.. table::
+    :widths: 50 10 35
+    :class: longtable
+
+    +--------------------------------------------------------------------------------------------------+
+    | **Fixed-Point Vector Ops**                                                                       |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | Function                                        | EW  |  Signature                               |
+    +=================================================+=====+==========================================+
+    | :c:func:`vect_q30_power_series()`               |  x  | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_q30_exp_small()`                  |  x  | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`chunk_q30_power_series()`              |  x  | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`chunk_q30_exp_small()`                 |  x  | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+
+.. _vect_float_api:
+
+.. table::
+    :widths: 50 10 35
+    :class: longtable
+
+    +--------------------------------------------------------------------------------------------------+
+    | **Floating-Point Vector Ops**                                                                    |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | Function                                        | EW  |  Signature                               |
+    +=================================================+=====+==========================================+
+    | :c:func:`vect_f32_max_exponent()`               |     | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{S}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_f32_dot()`                        |     | :math:`(\mathbb{V \times V})`            |
+    |                                                 |     | :math:`\to \mathbb{S}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_f32_add()`                        |  x  | :math:`\mathbb{V \times V}`              |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_float_s32_log_base()`             |  x  | :math:`(\mathbb{V \times S})`            |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_float_s32_log()`                  |  x  | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_float_s32_log2()`                 |  x  | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_float_s32_log10()`                |  x  | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`chunk_float_s32_log()`                 |  x  | :math:`\mathbb{V}`                       |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_f32_add()`                |  x  | :math:`\mathbb{V \times V}`              |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_f32_mul()`                |  x  | :math:`\mathbb{V \times V}`              |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_f32_conj_mul()`           |  x  | :math:`\mathbb{V \times V}`              |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_f32_macc()`               |  x  | :math:`\mathbb{V \times V \times V}`     |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+    | :c:func:`vect_complex_f32_conj_macc()`          |  x  | :math:`\mathbb{V \times V \times V}`     |
+    |                                                 |     | :math:`\to \mathbb{V}`                   |
+    +-------------------------------------------------+-----+------------------------------------------+
+
+
+.. _vect_other_api:
+
+Note that several of the functions below take vectors of the :c:struct:`split_acc_s32_t` type. This
+is a 32-bit vector type used for accumulating results of 8- or 16-bit operations in a manner
+optimized for the XS3 VPU.
+
+
+.. table::
+    :widths: 50 10 35
+    :class: longtable
+
+    +--------------------------------------------------------------------------------+
+    | **Other Vector Ops**                                                           |
+    +----------------------------------------+---+-----------------------------------+
+    | Function                               |EW |  Signature                        |
+    +========================================+===+===================================+
+    | :c:func:`vect_split_acc_s32_shr()`     | x | :math:`(\mathbb{V \times S})`     |
+    |                                        |   | :math:`\to \mathbb{V}`            |
+    +----------------------------------------+---+-----------------------------------+
+    | :c:func:`vect_s32_merge_accs()`        | x | :math:`\mathbb{V}`                |
+    |                                        |   | :math:`\to \mathbb{V}`            |
+    +----------------------------------------+---+-----------------------------------+
+    | :c:func:`vect_s32_split_accs()`        | x | :math:`\mathbb{V}`                |
+    |                                        |   | :math:`\to \mathbb{V}`            |
+    +----------------------------------------+---+-----------------------------------+
+    | :c:func:`chunk_s16_accumulate()`       | x | :math:`\mathbb{V}`                |
+    |                                        |   | :math:`\to \mathbb{V}`            |
+    +----------------------------------------+---+-----------------------------------+
+    | :c:func:`mat_mul_s8_x_s8_yield_s32()`  |   | :math:`(\mathbb{M \times V})`     |
+    |                                        |   | :math:`\to \mathbb{V}`            |
+    +----------------------------------------+---+-----------------------------------+
+    | :c:func:`mat_mul_s8_x_s16_yield_s32()` |   | :math:`(\mathbb{M \times V})`     |
+    |                                        |   | :math:`\to \mathbb{V}`            |
+    +----------------------------------------+---+-----------------------------------+
+
+
+.. _vect_conversion_api:
+
+|beginfullwidth|
+
+.. table::
+    :widths: 50 25 25
+    :class: longtable
+
+    +----------------------------------------------------------------------------------------------------------+
+    | **Vector Type Conversion Ops**                                                                           |
+    +--------------------------------------------------+-------------------------------------------------------+
+    | Function                                         | Array Element Type                                    |
+    +--------------------------------------------------+---------------------------+---------------------------+
+    |                                                  | Input                     | Output                    |
+    +==================================================+===========================+===========================+
+    | :c:func:`vect_s16_to_vect_s32()`                 | ``int16_t``               | ``int32_t``               |
+    +--------------------------------------------------+---------------------------+---------------------------+
+    | :c:func:`vect_s32_to_vect_s16()`                 | ``int32_t``               | ``int16_t``               |
+    +--------------------------------------------------+---------------------------+---------------------------+
+    | :c:func:`vect_s32_to_vect_f32()`                 | ``int32_t``               | ``float``                 |
+    +--------------------------------------------------+---------------------------+---------------------------+
+    | :c:func:`vect_f32_to_vect_s32()`                 | ``float``                 | ``int32_t``               |
+    +--------------------------------------------------+---------------------------+---------------------------+
+    | :c:func:`vect_complex_s16_to_vect_complex_s32()` | :c:struct:`complex_s16_t` | :c:struct:`complex_s32_t` |
+    +--------------------------------------------------+---------------------------+---------------------------+
+    | :c:func:`vect_complex_s32_to_vect_complex_s16()` | :c:struct:`complex_s32_t` | :c:struct:`complex_s16_t` |
+    +--------------------------------------------------+---------------------------+---------------------------+
+
+
+|endfullwidth|
--- a/lib_xcore_math/doc/rst/src/reference/vect/vect_s16.rst
+++ b/lib_xcore_math/doc/rst/src/reference/vect/vect_s16.rst
@@ -0,0 +1,5 @@
+
+16-bit vector API
+-----------------
+
+.. doxygengroup:: vect_s16_api
--- a/lib_xcore_math/doc/rst/src/reference/vect/vect_s16_prepare.rst
+++ b/lib_xcore_math/doc/rst/src/reference/vect/vect_s16_prepare.rst
@@ -0,0 +1,5 @@
+
+16-bit vector prepare functions
+===============================
+
+.. doxygengroup:: vect_s16_prepare_api
--- a/lib_xcore_math/doc/rst/src/reference/vect/vect_s32.rst
+++ b/lib_xcore_math/doc/rst/src/reference/vect/vect_s32.rst
@@ -0,0 +1,5 @@
+
+32-bit vector API
+-----------------
+
+.. doxygengroup:: vect_s32_api
--- a/lib_xcore_math/doc/rst/src/reference/vect/vect_s32_prepare.rst
+++ b/lib_xcore_math/doc/rst/src/reference/vect/vect_s32_prepare.rst
@@ -0,0 +1,5 @@
+
+32-bit vector prepare functions
+-------------------------------
+
+.. doxygengroup:: vect_s32_prepare_api
--- a/lib_xcore_math/doc/rst/src/reference/vect/vect_s8.rst
+++ b/lib_xcore_math/doc/rst/src/reference/vect/vect_s8.rst
@@ -0,0 +1,5 @@
+
+8-bit vector API
+----------------
+
+.. doxygengroup:: vect_s8_api
--- a/lib_xcore_math/doc/rst/src/tests.rst
+++ b/lib_xcore_math/doc/rst/src/tests.rst
@@ -0,0 +1,65 @@
+
+**********
+Unit tests
+**********
+
+This project uses `XCommon CMake` to build the unit tests in a similar fashion to the examples.
+
+Unit tests target the `XK-EVK-XU316` board and x86 platforms.
+All unit tests are located in the */tests/* directory:
+
+* `/tests/ <https://github.com/xmos/lib_xcore_math/tree/develop/tests/>`_ - Unit test projects for ``lib_xcore_math``:
+
+  * `bfp_tests/ <https://github.com/xmos/lib_xcore_math/tree/develop/tests/bfp_tests/>`_ - BFP unit tests
+  * `dct_tests/ <https://github.com/xmos/lib_xcore_math/tree/develop/tests/dct_tests/>`_ - DCT unit tests
+  * `filter_tests/ <https://github.com/xmos/lib_xcore_math/tree/develop/tests/filter_tests/>`_ - Filtering unit tests
+  * `fft_tests/ <https://github.com/xmos/lib_xcore_math/tree/develop/tests/fft_tests/>`_ - FFT unit tests
+  * `scalar_tests/ <https://github.com/xmos/lib_xcore_math/tree/develop/tests/scalar_tests/>`_ - Scalar op unit tests
+  * `vect_tests/ <https://github.com/xmos/lib_xcore_math/tree/develop/tests/vect_tests/>`_ - Vector op unit tests
+  * `xs3_tests/ <https://github.com/xmos/lib_xcore_math/tree/develop/tests/xs3_tests/>`_ - XS3-specific unit tests
+
+All unit tests and examples are built and executed in a similar manner. The following shows how to do this with
+the BFP unit tests.
+
+BFP unit tests
+==============
+
+This application runs unit tests for the various 16- and 32-bit BFP vectorized arithmetic functions.
+This application is located at `/tests/bfp_tests/
+<https://github.com/xmos/lib_xcore_math/tree/develop/tests/bfp_tests>`_.
+
+To build the test, from an XTC command prompt run the following commands in the
+`lib_xcore_math/tests/bfp_tests` directory::
+
+    cmake -B build -G "Unix Makefiles"
+    xmake -C build
+
+To execute the BFP unit tests on the `XK-EVK-XU316`, use the
+following (after ensuring that the hardware is connected and drivers properly installed): ::
+
+    xrun --xscope bin/bfp_tests.xe
+
+Or, to run the unit tests in the software simulator: ::
+
+    xsim bin/bfp_tests.xe
+
+.. warning::
+
+    Running the unit tests in the simulator may be *very* slow.
+
+To execute the BFP unit tests built for an x86 host platform, configure the build using the
+``NATIVE_BUILD`` option: ::
+
+    cmake -B build_x86 -G "Unix Makefiles" -D BUILD_NATIVE=TRUE
+    xmake -C build_x86
+
+on Linux and macOS run the tests as follows: ::
+
+    bin/bfp_tests/bfp_tests -v
+
+and on Windows: ::
+
+   bin\bfp_tests\bfp_tests.exe -v
+
+where ``-v`` is an optional argument to increase verbosity.
+