Computational science and all its applied forms, statistics, Deep Learning, computational finance, computational biology, etc., depend on a discrete model of equations that describe the system under study and some operator to extract information from that model. The vast majority of these operators are expressed in linear algebra form, and the dominant sub-operation in these operators is a dot product between two vectors. With the advent of big data and Deep Learning, tensor expressions have become wide-spread, but the basic operations remains squarely the dot product, as can be observed by the key operator that DL accelerators like the Google TPU implement in hardware.

Let's take a closer look at the numerical properties of that dot product. Let:

what would be the value of

Interestingly enough, neither 32-bit floats nor 64-bit double IEEE floating point produce the correct answer when calculating this dot product in order. In contrast, even a very small posit, a

The individual element values of the vector are relatively ordinary numbers. But in dot products it is the dynamic range of products that the number system needs to capture, and this is where IEEE floating point fails us. IEEE floating point rounds after each product and thus the order of the computation can change the result. This is particularly troubling for concurrent systems, such as multi-core and many-core, where the programmer typically gives up control.

In our particular example, the products

Posits, on the other hand, have access to a

In general, posits with their finer control over precision and dynamic range, present an improvement over IEEE floating point for computational science applications. In particular, business intelligence and decision support systems benefit from posits and their quires, as statistics algorithms tend to aggregate lots of data to render an assessment on which a decision depends. As the example above demonstrates, that assessment can be very wrong when using the wrong approach.

Let's take a closer look at the numerical properties of that dot product. Let:

*vector*

**a**= ( 3.2e8, 1.0, -1.0, 8.0e7)

*vector*= ( 4.0e7, 1.0, -1.0, -1.6e8)

**b**what would be the value of

*dot***(a**,**b)**? Simple inspection confirms that*cancels***a**[0]**b**[0]*and thus the answer is***a**[3]**b**[3]**2**.Interestingly enough, neither 32-bit floats nor 64-bit double IEEE floating point produce the correct answer when calculating this dot product in order. In contrast, even a very small posit, a

*16-bit***posit with 2 exponent bits, is able to produce the correct result. What gives?**The individual element values of the vector are relatively ordinary numbers. But in dot products it is the dynamic range of products that the number system needs to capture, and this is where IEEE floating point fails us. IEEE floating point rounds after each product and thus the order of the computation can change the result. This is particularly troubling for concurrent systems, such as multi-core and many-core, where the programmer typically gives up control.

In our particular example, the products

*and***a**[0]**b**[0]*represent 1.28e16 and -1.28e16 respectively, and to be able to represent the sum of 1.28e16 + 1 requires 53bits of fraction, which 64-bit IEEE doubles do not have, creating cancellation and in the end an incorrect sum.***a**[3]**b**[3]Posits, on the other hand, have access to a

*fused*dot product, which uses a*quire*, which can be thought of as a super accumulator that enables the computation to defer rounding till the end of the dot product. This makes it possible for a 16-bit posit to beat a 64-bit IEEE double.,In general, posits with their finer control over precision and dynamic range, present an improvement over IEEE floating point for computational science applications. In particular, business intelligence and decision support systems benefit from posits and their quires, as statistics algorithms tend to aggregate lots of data to render an assessment on which a decision depends. As the example above demonstrates, that assessment can be very wrong when using the wrong approach.

## No comments:

Post a Comment