Let's take a closer look at the numerical properties of that dot product. Let:
what would be the value of dot(a, b)? Simple inspection confirms that ab cancels ab and thus the answer is 2.
Interestingly enough, neither 32-bit floats nor 64-bit double IEEE floating point produce the correct answer when calculating this dot product in order. In contrast, even a very small posit, a 16-bit posit with 2 exponent bits, is able to produce the correct result. What gives?
The individual element values of the vector are relatively ordinary numbers. But in dot products it is the dynamic range of products that the number system needs to capture, and this is where IEEE floating point fails us. IEEE floating point rounds after each product and thus the order of the computation can change the result. This is particularly troubling for concurrent systems, such as multi-core and many-core, where the programmer typically gives up control.
In our particular example, the products ab and ab represent 1.28e16 and -1.28e16 respectively, and to be able to represent the sum of 1.28e16 + 1 requires 53bits of fraction, which 64-bit IEEE doubles do not have, creating cancellation and in the end an incorrect sum.
Posits, on the other hand, have access to a fused dot product, which uses a quire, which can be thought of as a super accumulator that enables the computation to defer rounding till the end of the dot product. This makes it possible for a 16-bit posit to beat a 64-bit IEEE double.,
In general, posits with their finer control over precision and dynamic range, present an improvement over IEEE floating point for computational science applications. In particular, business intelligence and decision support systems benefit from posits and their quires, as statistics algorithms tend to aggregate lots of data to render an assessment on which a decision depends. As the example above demonstrates, that assessment can be very wrong when using the wrong approach.