Errors in numerical computation

Goto Computer Systems Notes main page

Contents

Relative and Absolute Errors

Inherent, truncation and round off errors

Overflow and underflow

Many computer users, who are not aware of how numbers are represented and processed within computer systems, tend to assume that the results of programs are accurate. For example, if a decimal number is printed as 123.45678 all the figures are assumed to be an accurate representation of the result of whatever calculation was performed.

When computer programs are being implemented and tested the errors which can occur include:
 

  1. Errors in the syntax of the program language, i.e. keywords misspelt, operators missing, etc. The compiler  will find such errors and present a more or less meaningful error message indicating the problem.
  2. Errors in coding the program specification or algorithms into the programming language, i.e. the program does not do what the user had specified. This type of error is mainly due to inexperience and a lack of understanding of the language.
  3. Errors in the program specification or algorithms. The set of rules and/or formulas and/or algebraic steps that the user has specified to solve the problem are incorrect, e.g. a mathematician has made a mistake in the that the user has specified to solve the problem are incorrect, e.g. a mathematician has made a mistake in the development of the equations. It is up to the user to ensure that algorithms being used are correct to the extent required.
  4. Errors due to the storage and processing of numerical data, i.e. the accuracy of the hardware used to perform numerical calculations is finite. For example, real number data (with fractional components) is stored in floating point format with a typical accuracy between 8 and 20 significant decimal figures. Thus the results of programs using such floating point representation cannot be exact. In addition many mathematical functions are infinite series which have to be truncated This set of notes considers the type of error listed in 4 above.

Relative and Absolute Errors

Absolute error

The error in the evaluation of an algorithm can be defined to be the true value minus the approximate value as calculated. Usually in practice only the approximate value (as calculated by the program) is known. In general, however, it is possible to determine something about the error without knowing it precisely. For example, using the technique of Interval Halving to find the root of:
                   
       4 - 9x3 - 2x2 + 120x - 130

The iterations could stop at the point when a root was between -3.60016 and 3.60013 and it is not necessary to halve and try again. In this case the maximum error would be 0.00003 and the root was -3.600145 + 0.000015. Thus, although the value of the error is not known it can be stated that it does not exceed 1.5 * 10-5. This error is called an absolute error and one common notation is to write a bar over the symbol to indicate an approximation and to write an e with subscript to stand for the error. Thus, if x is the true value:

       x = x  + ex

Thus the absolute error is the difference between the true value and the approximation.
 

Relative error

Another way to define the error is using the relative error form which is defined as the absolute error divided by the approximation. It may seem more reasonable to divide the absolute error by the true value but usually this is not known. So long as the error is small the above definition using approximate value should have no sizable bearing on the numerical value of the relative error.

Comparison of using absolute and relative error

For numbers close to 1 the absolute and relative errors are nearly equal but for large and small numbers the error values can be very different. For example, if we have a true value of 0.00006 and an approximation of 0.00005, the absolute error is only 10-5but the relative error is 0.2 or 20%. On the other hand, if the true value is 100,500 and approximation is 100,100 the absolute error is 500 but the relative error is only 0.005, or 0.5%. Thus in general relative errors are the more useful way of specifying errors.

One extension of the above when writing general purpose software, for example subroutines to find roots, is to use relative error tests to determine if convergence has occurred to within a certain error value. The same routines can then be used for small and large magnitude numbers. For example, to find if convergence has occurred between successive iterations Xn and Xn+1 the relative test ABS((Xn - Xn+1) / Xn) < error will work well in general. The value 'error' would then be the relative error of the result (assuming that successive iterations converge smoothly).
 

Real numbers

Real number types are required in many mathematical, scientific and engineering applications to hold numeric values that may not be represented in integer (whole number) form. For example, numeric values tha may:

  1. have fractional components: n has the value 3.141592653589793238462643,e = 2.718281828459045235360287 the base of natural logarithms),
  2. be less than one: the permittivity of free space Eo = 8.854*10-12 and the permeability of free  space  uo = 4n * 10 -7,
  3. be very large: the resistivity of mica is of the order of 5.0 * 10 11    ohm metres.
The notation for numbers such as 8.854 * 10-12 or  5.0  * 10 11   is 8.854e-12  and  5.0e11 respectively (called exponential or scientific notation where e represents the power of 10).

Within the computer hardware real numbers are implemented in floating point format which has two important attributes:

The C/C++ programming languages have three real number types: The accuracy of real number calculations

The accuracy to which real number calculations can be carried out using a particular real type is determined by the precision, i.e. the maximum number of decimal places that the floating value carries. In practice, the number of bits used to represent real types is implementation dependent so long as the minimum precision is maintained and each type must give at least the same range and precision as the previous type. It is therefore possible for the real types to be three distinct sizes or all the same length (float is typically 32-bit, double 64-bit and many compilers implement long double as double).

The standard header file <float.h> provides details on the range and precision of real numbers enabling programs to determine if the specific implementation can support the application. Table 1 shows the range and precision of the real types under Turbo C++ V3.00 running on an IBM PC compatible.
 
float double long double
size in bytes                 4          8        10
range minimum 1.175494e-38  2.2250738585e-308  3.3621031431e-4932
range maximum 3.402823e+38 1.7976931348e+308 1.1897314953e+4932
precision digits                 6         15        19
precision epsilon 1.192092e-07 2.2204460493e-16  1.0842021725e-19

Table 1 Range and precision of the real types under Turbo C++ V3.00

In addition to the number of digits of precision for each real type <float.h> contains a value epsilon which is an indicator of the precision to which calculations can be carried out (see Table 1). Epsilon is specified as (!= means not equal):

        1.0 + epsilon != 1.0

The value of epsilon is the smallest number which when added to 1.0 changes the value, i.e. adding any smaller value to 1.0 still gives 1.0. The higher the precision of the real type the smaller epsilon and programs can check this value to see if calculations can be carried out to the required precision, i.e. using floating point numbers with a precision of 4 digits epsilon would be 0.001. For example, consider calculating the roots of the equation:
 
         x2 + 0.4002x + 0.00008 = 0

Using the formula
 
          x =  - b + ((b2 - 4ac))1/2
                       2a

The true root is - 0.0002. However, when using four-digit arithmetic the root would be - 0.0001625 (an error of 25%), i.e.:

using 8 digit arithmetic

            x =   - 0.4002 + ((0.16016004 - 0.00032))1/2    = - 0.0002
                             2.0

using 4 digit arithmetic

                 x =    - 0.4002 + ((0.1602 - 0.0003))1/2   = - 0.0001625
                                      2.0

The four-digit arithmetic assumed that numbers were rounded to the nearest digit.

The type chosen in practice is application dependent, i.e. on the precision and range of the data to be processed. For example, there is no point in sampling experimental data accurate to 3 decimal digits (e.g. measurements made using a good analogue meter) and then processing it using double variables and printing the results to 10 or 12 digits. In fact results so presented Consider the Taylor series for the sine function:

          sin x = x -  x3 +   x5  -  x7 + .......
                       3!     5!     7!

This series is, in theory, valid for any finite angle. The truncation error committed by stopping the summation after a finite number of terms is less than the absolute value of the first term neglected. This is true if there were some way to keep an infinite number of digits when calculating each arithmetic result. For example, using six-digit arithmetic to calculate the sine of 1470oand terminating the summation when a term is less than 10-8 gives
sine (1470o) = 891.799. Even if sixteen-digit arithmetic is used sine(2550o) = 158. Thus serious errors are occurring due to limitations in the accuracy of computing the terms in the series.

Below is shown a summary of results of a C program which calculates and prints the sine for various angles (using GNU C++ V2.6.3 under DOS). It can be seen that when the angle is below 5.0 radians the value calculated is the same as that returned by the maths function sin(x) to approximately six figures. When the angle is 10 radians the accuracy of the sum is approximately 2 to 3 figures and when the angle is above 20 radians the results are complete nonsense ! The series is valid for all angles, assuming that the calculations are performed to infinite accuracy, and the problem with the results is due to rounding errors in floating point calculations (which will be discussed in the next section).

        sin( 0.50) =               0.47942552,         library =         0.47942554
        sin( 1.00) =               0.84147096,         library =         0.84147098
        sin( 2.00) =               0.90929741,         library =         0.90929743
        sin( 5.00) =             - 0.95892441,         library =       - 0.95892427
        sin(10.00) =            - 0.54420567,         library =       - 0.54402111
        sin(15.00) =               0.66512936,        library =         0.65028784
       sin(20.00) =               1.52592909,        library =         0.91294525
        sin(25.00) =             - 5.36588192,        library =       - 0.13235175
        sin(30.00) =    - 24048.72656250,         library =       - 0.98803162
        sin(35.00) =- 2340664.25000000,         library =      - 0.42818267
 
 

Inherent, truncation and round off errors

Inherent, truncation and round off are the three basic types of error which occur in numerical computation. Each of the three can be expressed in absolute or relative form.

Inherent Errors

Inherent errors are errors in the values of data. For example, caused by uncertainty in measurements, by outright blunders, or by the necessary approximate nature of representing some number by a finite number of digits when that number cannot be represented exactly by that number of digits, e.g.SQRT(2.0).

When physical data is being processed errors of measurement are bound to occur. When measuring a voltage with an analogue meter (meter with a dial and indicator pointer) it is difficult to take readings with an accuracy greater than 2 or 3 significant figures. This can be improved by using a digital voltmeter (reading presented on a digital readout) where an accuracy of 4 to 8 significant can be achieved depending upon the meter. Thus physical values are usually presented + some error.

Mistakes in taking measurements are common, varying from a simple misreading of a meter scale to copying data down incorrectly on to paper. Computer systems that can sample information directly using analogue to digital converters can be a great aid in this area in addition to being able to sample far more data in a shorter time than a human operator.

Many numbers cannot be represented exactly in a given number of decimal digits. For example, n can be written 3.14, 3.14159265 or 3.141592653589793. In any case we have no exact value for which is a transcendental number and therefore has no exact decimal representation. When using such numbers in computation one would give the number to the maximum number of significant figures that the arithmetic works to. For example, if the computer uses 8 figure arithmetic it is no use specifying it to 20 figures, the computer would not store the rightmost digits. An additional problem is that computers tend to work with the binary number system and numbers that can be represented in decimal exactly have no exact equivalent in binary. One example is the number 0.1 decimal whose binary equivalent 0.0001100110011001100 ...., which is a non terminating binary number. Thus, forming the sum of 10 numbers, each of which uses the binary equivalent of 0.1 , will not give the result 1.0 but some approximation.

Note that transcendental and irrational numbers do not have a terminating representation in any number base. Rational numbers, however, may or may not have a terminating representation depending upon the number base.

Truncation Errors

Many mathematical processes are infinite (e.g. the sine series), which in practice have to be terminated at some point. The terms omitted (which are infinite in number) introduce an error into the result. This error is called the truncation error, since it is caused by the truncation of an infinite process. Many of the algorithms used in processing data and calculating results use infinite series or processes so this error is of great importance.

The decision to be taken is at what point to terminate the process. This can be a simple examination of the relative change in the result between two terms of the process or even when a term falls below a particular number, e.g. in computing sine we could ignore all terms after 10-8. This is usually fine if the series is well behaved and converges smoothly but if the terms can vary in size making the convergence not so smooth some other test must be applied. For example, the series could be truncated if the change in the result was below a certain value over ten terms (or even more).

Rounding errors

The above discussion showed how four and eight digit real number arithmetic affected the determination of the roots of a quadratic equation. In programming languages such as Fortran, C/C++ real numbers are stored in floating point form, i.e. a fractional part and an exponent. Decimal floating point numbers are in the form f*10e where f is the fraction in the range 0.1 < f < 1.0 and e is the exponent, e.g. 7392.0 would be 0.7392 * 104 and 0.0007392 would be 0.7392 * 10-3. Note, however, that computer systems use base 2 or 16 for floating point numbers, e.g. f * 2e, but the following discussion still applies.

When computation is being carried out the fractional part of a decimal floating point must be within the range
0.1 < f < 1 and if at any time the leading digit becomes 0 then the fractional part is shifted one place left and the exponent decremented, e.g. 0.0023 *104 becomes 0.23 * 102. This process of getting the fractional part in the correct range is called normalisation.

Now suppose that two floating point numbers which are accurate to four significant figures are to be added:
                                                     
        z = 1.246 + 0.03290 = 0.1246*101 + 0.3290*10-1

Before the add can take place the floating point number exponents must be aligned, i.e. one of the numbers unnormalized:
                             
        z = 0.1246*101  +  0.00329*101

This gives the result 0.1278 * 101 assuming four figure accuracy. Note that the digits of the fraction shifted out of the capacity of the system have been lost. Furthermore, no rounding was carried out when the digits were lost. Many computer systems do not round in this situation so it is a likely source of error in the program. Now consider subtraction:
                                                
        z = 26.31 - 19.76 = 0.2631*102 - 0.1976*102 = 0.0655*102

this has resulted in a leading zero so it will be shifted:
                  
        z = 0.6550*101

Notice two things:
 

  1. The last digit of the result has no significance whatever. The zero has been inserted in the shifting process because something has to be placed in that position in storage.
  2. There can easily be more than one leading zero in such sums. For example if 0.3471*105  is substracted from 0.3472*105  the answer is 0.0001*105 which when normalised is 0.1000*102 The last three digits have no meaning.
Thus when subtracting it is possible to end up with numbers with far less significant figures than the calculation started with. However, succeeding operations will act on these results as though all the figures were significant. This is a major source of run-time errors.

Now consider the evaluation of the sine series. As the signs of the terms alternate rounding errors can result from the subtraction process (as described above). In addition, if x is large the value of xn can become a very large number. Below is a run of a program evaluating the sine of an angle of 25 radians. The final value of the sine of an angle must be in the range -1.0 to 1.0 but the run shows values as large as 5726042112.0. As the float calculations are only carried out to 6 or 7 figures of accuracy the last four figures of the number are meaningless. We are therefore attempting to get a result in the range -1.0 to 1.0 using values that are not accurate to hundreds or even thousands.

        n                                                term_n                                                     sin_x

  5.000000000                       -2604.166748047                                      -2579.166748047
  7.000000000                     81380.210937500                                     78801.046875000
  9.000000000                -1211015.000000000                                -1132214.000000000
11.000000000              10512283.000000000                                 9380069.000000000
13.000000000             -59728880.000000000                             -50348812.000000000
15.000000000            239298400.000000000                            188949584.000000000
17.000000000           -712197632.000000000                           -523248064.000000000
19.000000000          1636483584.000000000                         1113235456.000000000
21.000000000         -2990649856.000000000                        -1877414400.000000000
23.000000000          4450371584.000000000                          2572957184.000000000
25.000000000         -5497000448.000000000                         -2924043264.000000000
27.000000000          5726042112.000000000                          2801998848.000000000
29.000000000        -5097971712.000000000                          -2295972864.000000000
31.000000000         3923931392.000000000                            1627958528.000000000
33.000000000        -2637050624.000000000                          -1009092096.000000000
35.000000000         1560754432.000000000                             551662336.000000000
37.000000000         -819723968.000000000                             -268061632.000000000
39.000000000          384630240.000000000                              116568608.000000000
41.000000000         -162209104.000000000                               -45640496.000000000
43.000000000            61817492.000000000                                 16176996.000000000
45.000000000           -21393096.000000000                                  -5216100.000000000
47.000000000               6752871.000000000                                   1536771.000000000
49.000000000              -1952148.125000000                                    -415377.125000000
51.000000000                 518746.843750000                                      103369.718750000
53.000000000                -127143.835937500                                       -23774.117187500
55.000000000                   28833.417968750                                           5059.300781250
57.000000000                    -6067.638671875                                          -1008.337890625
59.000000000                     1188.055786133                                              179.717895508
61.000000000                      -216.988571167                                               -37.270675659
63.000000000                         37.054058075                                                 -0.216617584
65.000000000                          -5.929028988                                                 -6.145646572
67.000000000                           0.890779614                                                 -5.254867077
69.000000000                          -0.125901684                                                 -5.380768776
71.000000000                           0.016770791                                                 -5.363997936
73.000000000                          -0.002109003                                                 -5.366106987
75.000000000                           0.000250785                                                 -5.365856171
77.000000000                          -0.000028242                                                 -5.365884304
79.000000000                           0.000003016                                                 -5.365881443
81.000000000                          -0.000000306                                                 -5.365881920
83.000000000                           0.000000030                                                 -5.365881920
85.000000000                          -0.000000003                                                -5.365881920

          sin(25.00) = -5.36588192, library = -0.13235175

Using higher precision types to overcome rounding errors

One way to reduce the effect of rounding errors is to increase the precision and range of the variables and constants used. This, however, has an overhead in that the extra computation involved increases the run-time of the program. In many applications there is often a trade-off between the accuracy desired and the run-time acceptable.

Increasing precision, however, only reduces the problem, it does not eliminate it. Consider the following results produced using float, double and long double versions of the sin(x) function (under GNU C++). It may be seen that even long double (19 decimal digits of precision) is only effective up to about 30 radians.
 
   angle float double long double library
1
2.00
5.00
10.00
20.00
30.00
40.00
50.00
60.00
0.841471
0.909297
-0.958924
-0.544206
1.525929
-24048.726562
 523443136.000000
-8887603298304.000001
-22202444742131712.000689
 0.841471
0.909297
-0.958924
-0.544021
0.912945
-0.988053
-0.605842
9991.828290
27872935.154430
 0.841471
0.909297
-0.958924
-0.544021
0.912945
-0.988032
0.744634
13.111824
147617.705161
 0.841471
0.909297
-0.958924
-0.544021
0.912945
-0.988032
0.745113
-0.262375
-0.304811

Modification of the algorithm to overcome rounding errors

A change to the algorithm may be possible, e.g.:
 

  1.  Modify the way that the mathematical equation is mapped into program code. For example, try to avoid situations where numbers with simila  values are subtracted.
  2.  Use different mathematical equations and/or algorithms over different ranges of the input values (see the  problem at the end of this chapter  which discusses the evaluation of the exponential integral).
For example, because the sin(x) function is periodic, a simple approach is to subtract 2n from the angle until its value is in the range 0 to 2n (for angles greater than 0), e.g. assuming an angle x in radians and two_pi equals 2n:

        x = x - two_pi  *  int(x  /  two_pi);        // get x in range 0 to 2pi

The expression int(x / two_pi) evaluates, as an integer, how many times 2n goes into the angle x. This integer value is then multiplied by 2n and subtracted from the angle x the result being an angle in the range 0 to 2n. The problem is that casting the floating result of x / two_pi to an int may result in integer overflow. This can be overcome by using the maths library function floor:

      double floor(double x);           // returns the largest integer below x

which returns as a double function result the largest integer below x (the effect is similar to an int cast but avoids the danger of overflow). Hence the statement can be rewritten (and it also works for negative angles):

      x = x - two_pi * floor(x  /  two_pi);     // get x in range 0 to 2pi

The table below shows results produced using float, double and long double versions of the updated sine function. The double and long double results are as accurate as the maths library function to the six figures displayed. The error in the float is due to rounding errors in the evaluation of the angle in the range 0 to 2n.

The above discussion has necessarily been very brief; refer to a text on numerical methods for a full discussion of such problems and the techniques used to overcome them (Press et al. 1994).
 
angle float double long double library
       1 
         10.00
        100.00
       1000.00
      10000.00
     100000.00
    1000000.00
   10000000.00
  100000000.00
 1000000000.00
10000000000.00
  0.841471
 -0.544021
 -0.506368
  0.826864
 -0.305349 
  0.038530
 -0.375923
  0.653595
 -0.744683
 -0.854296 
 -0.729213
  0.841471
 -0.544021
 -0.506368
  0.826864
 -0.305615
  0.035749
 -0.349992 
  0.420548
  0.931639
  0.545843
 -0.487507
  0.841471
 -0.544021
 -0.506368
  0.826864
 -0.305615
  0.035749
 -0.349992 
  0.420548
  0.931639
  0.545843
 -0.487505
  0.841471
 -0.544021
 -0.506368
  0.826864
 -0.305614
  0.035749
 -0.349994
  0.420548
  0.931639
  0.545843
 -0.487506

 

Overflow and underflow

Two other things that should be mentioned are floating underflow and overflow. In a computer system only a certain amount of storage is allocated to the exponent so its maximum absolute size is limited. For example, we can consider that the maximum exponent is 99 and this has two results:

1     If the result of a calculation is less than 0.1*10-99 the number is lost and usually treated as zero. This can
        lead to fantastic results if this result is used for further calculations. This is underflow.

2     If the result of an operation produces an exponent of a 100 or greater this is larger than the system can
        store and floating overflow has occurred. This is generally a fatal run time error and the program stops.

Use of guard digits

In the arithmetic/logic unit in some systems (e.g. the IBM 360/370 series) there is a guard digit position. This may be thought of as standing to the immediate right of the fraction of the number with the smaller exponent, into which digits are moved during the shifting process required to align the exponents, i.e. during unnormalization. Then if the sum turns out not to be normalised, that one digit can be brought back during normalisation.

For example:
                                                  
        z = 0.1064*10 - 0.3173*102 = 0.1064*103 - 0.0317(3)*103

note the extra digit position. What happens is that 0.03173 is subtracted from 0.10640 to give 0.07467. This can now be normalised to give four figures:0.7467*102. Without the guard digit the result would have been
using four figure arithmetic 0.7470*102.

This use of a guard digit can give greater accuracy in computation when such unnormalization/normalisation processing. In addition the guard digit can be used to make provision for rounding but this is not usually done.

Other number bases

In general computer arithmetic is carried out in binary:
         
        x = f * 2e            where 1 / 2  <  | f |  < 1

or hexadecimal:
                   
        x = f * 16e            where 1 / 16  <  | f |  <  1

but the arguments applied above still apply.
 
 

Round off Errors Continued

Consider the addition using four digit arithmetic of:
 
        z = 0.124610  + 0.3290*10-1  =  0.12789*101

but seeing how we work to four digit accuracy this will be chopped to 0.1278*101 (no rounding) giving an error of 0.00009*101. We can write this as:
                              
        Z = 0.1278*10  +  0.9*10-3

notice that the exponent of the second term is four less than the exponent of the first term using our four figure arithmetic.

In general we can consider that arithmetic operations will produce a result that can be broken into halves which we can denote (before rounding):
 
        y =  fy * 10 +  gy * 10e-t

where t is the number of digits in the mantissa of the floating point number. Now the range of fy as it is normalised is:

            1 / 10  <  | f |  < 1

But, gy being the error part, may not be normalised and could be zero:

              0   <   | gy |   <  1

The question is 'what can we do to fy, our result, taking into account the value of gy'. Many computer systems just ignore it and this process is usually called chopping. In fact if no guard digits are kept this is the only option. However, if guard digits are kept the value of gy could be used to modify fy in some way to reduce the error.

A bound on the maximum relative error in a chopped arithmetic result is easily found. The maximum error occurs when gy is large and fy is small, i.e. the maximum value of gy is less than 1.0 and the minimum size of fy is 0.1. The absolute value of the relative error is therefore:
                                                       
       | ey |     | gy * 10e-t  |     1.0 * 10e-t 
       |----|  =  |-------------|  < -------------  = 10-t+1
       | y  |     | fy * 1-e    |      0.1*10e

Remembering that t is the number of digits in the mantissa or fractional part of any floating point number we have the result that the maximum relative rounding error in the result of floating point arithmetic operations does not depend upon the size of the numbers. For example using four figure arithmetic the maximum relative rounding error is 10-3.

Now consider the more usual form of rounding called symmetric rounding. This can be expressed as follows: given the two parts of a result, as above, let the rounded approximation to y to be given by:

                 
                 | fy| * 10e          if | gy | < 1/2
        | y | =
                 | fy | 10e + 10e-t   if | gy | > 1/2
 

where y- has the same sign as fy. The addition of the term 10e-t in the second line corresponds to adding 1 into the last digit position retained if the first figure dropped is 5 or greater.

Thus if gy < 1 the absolute error is:
 
        ey  = gy * 10e-t

if  gy > 1/2 it is:
        
        ey  = 1 - gy * 10e-t
 

In either case we have 10e-t multiplied by a factor whose absolute value is no greater than 1. Thus the absolute value of the absolute error is therefore:

 
        ey * 10 e-t

and the absolute value of the relative error is therefore:
                                 
        |  ey  |   |    1 * 10   |e-t 
        | ---  | = | ----------- |     < 5 * 10-t = 1 * 10-t+1
        | _    |   |             |
        | y    |   | 0.1 * 10 -e |

For example, of the difference between the two rounding rules consider:
 
        y =0.7324 * 103 + 0.8261 * 10-1

                          
using chopping
          _   
          y = 0.7324 * 103 and
 
        | ey |   | 0.8261 * 10-1|
        | -- | = | ------------ | ~ 1.1 * 10-4
        | _  |   |              |
        | y  |   | 0.7324 * 103 |
 

whereas for symmetric rounding
      _
      y = 0.7325*103 and ey = -0.1739 * 10-1 giving:

        | ey |
        | -- | = 0.24 * 10-4
        | _  |
        | y  |

Thus one can see that the error from rounding is considerably less than that from chopping. In fact the rounding error never exceeds the chopping error and is less about half the time.