lib_xcore_math change log
=========================

2.4.0
-----

  * CHANGED: Documentation updated
  * CHANGED: Renamed examples to app_
  * FIXED:   Added missing xcore_math.h
  * REMOVED: xmos_cmake_toolchain submodule

2.3.0
-----

  * CHANGED: Examples and tests to build using XCommon CMake

2.2.0
-----

  * ADDED:   `filter_biquad_sat_s32()` - Apply a biquad filter to a 32-bit
    signal with saturation.
  * ADDED:   XCommon CMake build system support.
  * CHANGED: Removed deprecated `numpy` types in `xmath_script.py` (issue #139).
  * CHANGED: Made FFT documentation more clear (issue #169).
  * FIXED:   Bug (issue #170) in `xs3_memcpy()`.
  * FIXED:   Saturation in `filter_fir_s32()` when `shift <= 0`.

2.1.3
-----

  * Fixes bug (issue #147) in `s16_to_s32()`.
  * Fixes bug (issue #146) in `bfp_s32_macc()` and `bfp_s32_nmacc()`.
  * Fixes bug with the `vect_s32_prepare_api` not appearing in the
    documentation.
  * Fixes bug in `bfp_s32_mean()` and `bfp_s16_mean()` when hitting a corner
    case scenario.
  * Cleans up internal functions.
  * Allows compiling and running demos and tests on Windows Native x86
    platforms.
  * Removes several warnings.

2.1.2
-----

  * Optimisation fix (issue #128) for `filter_fir_s32()`
  * Documentation improvements.

2.1.1
-----

  * Fixes bug (issue #116) in `vect_packed_complex_s32_macc()`.
  * Fixes bug (issue #119) in `filter_fir_s32()`.
  * Adds `--scale` option to the filter conversion script
    `gen_biquad_filter_s32.py`
  * If internal filter states (outputs from internal biquad sections) grow too
    large integer overflows may occur. Using the `--scale` option can help avoid
    this by effectively applying a gain factor to all coefficients.
  * Fixes bug (mentioned in issue #119) in the `gen_fir_filter_s32.py` and
    `gen_fir_filter_s16.py` filter conversion scripts where in a certain corner
    case filter coefficients can overflow.

2.1.0
-----

  * Adds several new operations for IEEE float vectors.
  * Corrects `module_build_info` for legacy build tools.
  * Fixes potential issue with include paths.

2.0.2
-----

  * Updated CMake configuration to support Darwin platform.

2.0.1
-----

  * Bugfix: Fixed issue with including ``xmath/xmath.h`` from XC files.
  * Doc Update: Corrected instructions for configuring CMake using XS3
    toolchain.


Legacy release history
======================

2.0.0
-----

Major, backwards compatibility-breaking update to the library.

Background
**********

This update does not add new features (compared to 1.1.0) to the library.  Instead, it is a
major refactoring of the library in a few different ways.  The overarching purpose of this
refactoring is to generalize the library so that when any future xcore ISAs are released, support
for those architectures can be included within this library in a minimally disruptive way.  The
xcore XS3 architecture currently remains the only xcore ISA officially supported by the library.

Major Changes
*************

As mentioned, the update from 1.1.0 to 2.0.0 is does not add, remove or functionally change
supported features.  Rather, the changes are primarily related to organization and the conventions
used.  However, these changes do break backwards compatibility; hence the major version increment.

The relevant major changes are:

* **Library name has changed from `lib_xs3_math` to `lib_xcore_math`**

  * This is to reflect that this library is not intended to be deprecated when future xcore ISA
    versions are released.

* **Library re-organized into multiple sub-APIs**

  * The organization of the library into the 'high-level' (BFP) and 'low-level' (non-BFP) APIs has
    grown cumbersome and no longer cleanly fits the content of the library. To this end, the
    operations made available by this library have been re-organized into several sub-APIs. These
    APIs are all versioned together and are in many cases interdependent. This new grouping is
    primarily for conceptual simplicity.

    * BFP API -- Mostly unchanged from the previous high-level API.
    * Vector API -- Corresponds to most of the functionality in the previous low-level API, less the
      operations that were moved to the new APIs.
    * Filtering API -- Operations related to supported linear filters (FIR and Biquad).
    * FFT API -- Collects the FFT-related functions (both low-level and high-level) from the
      previous API and groups them together.
    * DCT API -- Like the FFT API, collected DCT-related functions.
    * Scalar API -- A small API for implementations of scalar operations, with particular emphasis
      on the non-IEEE 754 floating-point scalars provided (but also includes some ``float``
      operations).

* **Many functions renamed**

  * Previous versions of this library used a function naming convention whereby the majority of
    low-level functions (and some types) had names prefixed with ``xs3_``, even where those
    functions were written in C (rather than XS3 assembly) and were not intimately related to the
    specifics of the XS3 hardware (i.e. where the same C implementation could plausibly be used in
    future ISA versions).
  * To that end, most functions prefixed with ``xs3_`` have had that prefix removed. This way, when
    a future ISA is released (and once support is added to this library), applications can be
    retargeted at the newer architecture without the unnecessary effort of going through the code to
    rename all the function calls.

    * e.g.  ``xs3_vect_s32_mul() --> vect_s32_mul()``
    * Note that most BFP function names remain unchanged.

  * The intention going forward is that the public API should avoid ISA version-specific naming when
    the object being named is not conceptually specific to a particular ISA (except possibly where
    optimizing different ISA versions necessitates mutually incompatible implementations).


1.1.0
-----

Major Changes
*************

* Support for channel-pair related types and operations has been dropped. These were considered to
  be too narrowly focused on making use of a single optimization (stereo FFT).

  * This is a backwards compatibility-breaking change, requiring a major version increment.

* Added various scalar arithmetic functions for `float_s32_t` type.

* Adds Discrete Cosine Transform API

* Adds various trig and exponential functions.

Bugfixes
********

* Fixed bug in `bfp_fft_inverse_stereo()` where length of output BFP vector was half of correct
  length.

New Functions
*************
* BFP API

  * FFT spectrum unpacking

    * `bfp_fft_unpack_mono()` -- Used to expand the output spectrum from `bfp_fft_forward_mono()`
      from `FFT_N/2` elements (with the Nyquist component packed into the DC component) to
      `FFT_N/2 + 1` elements. This is useful as many complex operations behave undesirably on the
      packed representation.
    * `bfp_fft_pack_mono()` -- Opposite of `bfp_fft_unpack_mono()`. Used to repack the spectrum into
      a form suitable for calling `bfp_fft_inverse_mono()`.

  * Dynamic BFP vector allocation

    * Functions for allocating and deallocating BFP vectors dynamically from the heap.
    * `bfp_sXX_alloc()`, `bfp_complex_sXX_alloc()`
    * `bfp_sXX_dealloc()`, `bfp_complex_sXX_dealloc()`

  * Multiply-accumulate functions

    * A handful of element-wise multiply-accumulate functions have been added for both 16-bit and
      32-bit, and both real and complex vector types. e.g...

    * `bfp_sXX_macc()` -- Element-wise multiply accumulate for real 16/32-bit vectors
    * `bfp_sXX_nmacc()` -- Element-wise negated multiply accumulate (i.e. multiply-subtract) for
      real vectors
    * `bfp_complex_sXX_macc()` -- Element-wise multiply accumulate for complex vectors.
    * `bfp_complex_sXX_conj_macc()` -- Element-wise conjugate multiply accumulate for complex
      vectors.
    * (and various others)

  * `bfp_complex_sXX_conjugate()` -- Get the complex conjugate of a vector
  * `bfp_complex_sXX_energy()` -- Compute the sum of a complex vector's elements' squared
    magnitudes.
  * `bfp_sXX_use_exponent()` / `bfp_complex_sXX_use_exponent()` -- Force BFP vector to encode
    mantissas using specified exponent (i.e. convert to specified Q-format)
  * `bfp_s32_convolve_valid()` / `bfp_complex_s32_convolve_same()` -- Filter a 32-bit signal using a
    short convolution kernel. Both "valid" and "same" padding modes are supported.
  * `xs3_vect_sXX_add_scalar()` / `xs3_vect_complex_sXX_add_scalar()` -- Functions to add scalar to
    a vector (16/32-bit real/complex)


* Vector API

  * Functions supporting mixed-depth operations

    * `xs3_mat_mul_s8_x_s8_yield_s32()` -- Multiply-accumulate an 8-bit vector by an 8-bit matrix
      into 32-bit accumulators.
    * `xs3_mat_mul_s8_x_s16_yield_s32()` -- Multiply a 16-bit vector by an 8-bit matrix for a 32-bit
      result.
    * `xs3_vect_s8_is_negative()` -- Determine whether each element of an 8-bit vector is negative.
    * `xs3_vect_s16_extract_high_byte()` -- Extract the most significant byte of each element of a
      16-bit vector.
    * `xs3_vect_s16_extract_low_byte()` -- Extract the least significant byte of each element of a
      16-bit vector.

  * Memory ops

    * `xs3_vect_s32_zip()` -- Interleave elements from two `int32_t` vectors.
    * `xs3_vect_s32_unzip()` -- De-interleave elements from a `int32_t` vector.
    * `xs3_vect_s32_copy()` -- Copy an `int32_t` vector.
    * `xs3_memcpy()` -- Quickly copy word-aligned vector to another word-aligned vector.
  * Various low-level functions used in the implementation of the high-level multiply-accumulate
    functions (e.g. `xs3_vect_s32_macc()`).
  * `xs2_vect_s32_convolve_valid()` / `xs3_vect_complex_s32_convolve_same()` -- Filter a 32-bit
    signal using a short convolution kernel. Both "valid" and "same" padding modes are supported.
  * `xs3_vect_sXX_add_scalar()` / `xs3_vect_complex_sXX_add_scalar()` -- Add a scalar to a 16- or
    32-bit real or complex vector.

  * IEEE754 single-precision float vector functions

    * `xs3_vect_f32_fft_forward()` / `xs3_vect_f32_fft_inverse()` -- Forward/Inverse FFT functions
      for vectors of floats.
    * `xs3_vect_f32_max_exponent()` -- Get maximum exponent from vector of floats.
    * `xs3_vect_f32_to_s32()` / `xs3_vect_s32_to_f32()` -- Convert between float vector and BFP
      vector.
    * `xs3_vect_f32_dot()` -- Inner product between two float vectors.

  * `xs3_vect_sXX_max_elementwise()` / `xs3_vect_sXX_min_elementwise()` -- Element-wise maximum and
    minimum between two 16-/32-bit vectors.

* DCT API

  * `dctXX_forward()` / `dctXX_inverse()` -- Forward (type-II) and inverse (type-III) `XX`-point DCT
    implementations.

    * Current sizes supported are `6`, `8`, `12`, `16`, `24`, `32`, `48` and `64`

  * `dct8x8_forward()` / `dct8x8_inverse()` -- Fast 2D 8-by-8 forward and inverse DCTs.


Miscellaneous
*************

* Unit tests have been refactored to make use of Unity fixtures.
* Added example apps: `vect_demo`, `bfp_demo`, `fft_demo` and `filter_demo`
* Removed configuration support for `XS3_MATH_VECTOR_TAIL_SUPPORT`
* Added `QXX()` and `FXX()` macros (e.g. `Q24()`; taken from `lib_dsp`) for converting (constants)
  between floating-point and fixed-point values.
* Added python scripts to generate code for filters

  * `lib_xs3_math/script/gen_fir_filter_s16.py`
  * `lib_xs3_math/script/gen_fir_filter_s32.py`
  * `lib_xs3_math/script/gen_biquad_filter_s32.py`

* Changed low-level API so that each function `foo()` that has an associated 'prepare' function (to
  calculate shifts or output exponents) can be prepared with `foo_prepare()`. This makes the
  low-level API more consistent.
* Separated filtering-related unit tests into a separate unit test application.
* Various improvements to CMake project files.

  * Includes automatic fetching of Unity repository during build

1.0.0
-----

  * Initial version