Tuesday, June 12, 2012

TCL - Computers and numbers


If you are new to programming, then this lesson may contain some surprising information. But even if you are used to writing programs, computers can do unexpected things with numbers. The purpose of this lesson is to shed some light on some of the mysteries and quirks you can encounter.
These mysteries exist independently of the programming language, though one programming language may be better at isolating you from them than another. The problem is that computers can not deal with the numbers weare used and in the way we are used to.
For instance (*):
# Compute 1 million times 1 million
% puts [expr {1000000*1000000}]
-727379968
To most people's surprise, the result is negative! Instead of 1000000000000, a negative number is returned.
Important note: I used Tcl 8.4.1 for all examples. In Tcl 8.5 the results will hopefully be more intuitive, as a result of adding so-called big integers. Nevertheless, the general theme remains the same.
Now consider the following example, it is almost the same, with the exception of a decimal dot:
# Compute 1 million times 1 million
% puts [expr {1000000*1000000.}]
1e+012
The reason is simple, well if you know more about the background of computer arithmetic:
  • In the first example we multiplied two integer numbers, or short integers. While we are used to these numbers ranging from -infinity to +infinity, computers can not deal with that range (at least not that easily). So, instead, computers deal with a subset of the actual mathematical integer numbers. They deal with numbers from -231 to 231-1 (in general) - that is, with numbers from -2147483648 to 2147483647.

    Numbers outside that range can not be dealt with that easily. This is also true of the results of a computation (or the intermediate results of a computation, even if the final result does fit).
  • In the second example we multiplied an integer with a floating-point number, in common parlance: a real number or real (though there are very significant differences between the mathematical real numbers and the computer's real numbers that are at the heart of another group of mysteries and quirks). Floating-point numbers have a much larger range and they can be used to deal with such numbers as 1.2 and 3.1415026.

    The typical range for floating-point numbers is roughly: -1.0e+300 to -1.0e-300, 0.0 and 1.0e-300 to 1.0e+300. Floating-point numbers have a limited precision: usually about 12 decimals.
    What this means in practical terms, is that a floating-point number can be as a 1 followed by 300 zeros or as small as 0.0000...1 (where "..." stands for 295 zeros).
    Because the range is so much bigger, in the second example the result falls within the limits and we get the answer we expected.

Tcl's strategy

Tcl uses a simple but efficient strategy to decide what kind of numbers to use for the computations:
  • If you add, subtract, multiply and divide two integer numbers, then the result is an integer. If the result fits within the range you have the exact answer. If not, you end up with something that appears to be completely wrong. (Note: not too long ago, floating-point computations were much more time-consuming than integer computations. And most computers do not warn about integer results outside the range, because that is too time-consuming too: a computer typically uses lots of such operations, most of which do fit into the designated range.)
  • If you add, subtract, multiply and divide an integer number and a floating-point number, then the integer number is first converted to a floating-point number with the same value and then the computation is done, resulting in a floating-point number.

    Floating-point computations are quite complex, and the current (IEEE) standard prescribes what should happen in minute detail. One such detail is that results outside the proper ranges are reported. Tcl catches these and displays a warning:
    # Compute 1.0e+300/1.0-300
    % puts [expr {1.0e300/1.0e-300}]
    floating-point value too large to represent

What are those mysteries and quirks?

Now some of the mysteries you can find yourself involved in. Run the following scripts:
#
# Division
#
ts "1/2 is [expr {1/2}]" p
p uuts "-1/2 is [expr {-1/2}]"
uts "1/3 is [expr {1./3}]"
puts "1/2 is [expr {1./2}]"
pputs "1/3 is [expr {double(1)/3}]"
The first two computations have the surprising result: 0 and -1. That is because the result is an integer number and the mathematically exact results 1/2 and -1/2 are not.
If you interested in the details of how Tcl works, the outcome q is determined as follows:
a = q * b + r
0 <= |r| < |b|
r has the same sign as q
Here are some examples with floating-point numbers:
set tcl_precision 17 ;# One of Tcl's few magic variables:
;# Show all decimals needed to exactly
;# reproduce a particular number
set a [expr {1.0/3.0}] pu
puts "1/2 is [expr {1./2}]" puts "1/3 is [expr {1./3}] "ts "3*(1/3) is [expr {3.0*$a}]" set b [expr {10.0/3.0}]
] set d [expr {2.0/3.0}] puts "(10
puts "3*(10/3) is [expr {3.0*$b}]" set c [expr {10.0/3.0 }.0/3.0) / (2.0/3.0) is [expr {$c/$d}]" set e [expr {1.0/10.0}]
puts "1.2 / 0.1 is [expr {1.2/$e}]"
While many of the above computations give the result you would expect, note however the last decimals, the last two do not give exactly 5 and 12! This is because computers can only deal with numbers with a limited precision: floating-point numbers are not our mathematical real numbers.
Somewhat unexpectedly, 1/10 also gives problems. 1.2/0.1 results in 11.999999999999998, not 12. That is an example of a very nasty aspect of most computers and programming languages today: they do not work with ordinary decimal fractions, but with binary fractions. So, 0.5 can be represented exactly, but 0.1 can not.

Some practical consequences

The fact that floating-point numbers are not ordinary decimal or real numbers and the actual way computers deal with floating-point numbers, has a number of consequences:
  • Results obtained on one computer may not exactly match the results on another computer. Usually the differences are small, but if you have a lot of computations, they can add up!
  • Whenever you convert from floating-point numbers to integer numbers, for instance when determining the labels for a graph (the range is 0 to 1.2 and you want a stepsize of 0.1), you need to be careful:

    #
    # The wrong way
    #
    t number [expr {int(1.2/0.1)}] ;# Force an integer -
    s
    e ;# accidentally number = 11
    for { set i 0 } { $i <= $number } { incr i } {
    } # # A right way - no
    set x [expr {$i*0.1}] ... create label $x te the limit #
    ... create la
    set x 0.0 set delta 0.1 while { $x < 1.2+0.5*$delta } { bel $x set x [expr {$x + $delta}]
    }
  • If you want to do financial computations, take care: there are specific standards for doing such computations that unfortunately depend on the country where they are used - the US standard is slightly different from the European standard.
  • Transcendental functions, like sin() and exp() are not standardised at all. The outcome could differ in one or more decimals from one computer to the next. So, if you want to be absolutely certain that π (pi) is a specific value, use that value and do not rely on formulae like these:
    #
    # Two different estimates of "pi" - on Windows 98
    # set pi1 [expr {4.0*atan(1.0)}]
    ts [expr {$pi1-$pi2}] -4.44089
    set pi2 [expr {6.0*asin(0.5)}] p
    u20985006262e-016

No comments:

Post a Comment

Popular Posts