Representation of Integers in Computers

This document is about representing numbers in computers. We will first explore the concept of binary numbers. Then, we will learn how integers are stored in the computer. Finally, we will learn about floating point representation of real numbers.

1. Troubles with integers and real numbers

We will first look at two sample programs with bugs. These bugs are examples of unpleasant surprises that can occur if we do not properly understand how the numbers are stored. Once we learn how the numbers are represented in the computer, we will learn the limitations and tradeoffs that must be made in designing the computers. Then we will be able to write the code that will not have unwanted behavior due to vulnerabilities resulting from representations of integers and real numbers.

Example 1: Trouble with integers
#include<iostream>
int main(){
    int x=1 ;
    for(int i=0;i<35;++i){
        std::cout<<i<<"\t"<<x<<"\n";
        x = x*2; 
    }
    return 0;
}

The program prints the powers of \(2\). Everything goes well with the exponents \(0\), \(1\), \(\dots\), \(30\). We get \(2^0=1\), \(2^1=2\), \(2^2=4\), \(2^3=8\), \(2^4=16\), \(2^5=32\), \(\dots\), \(2^{30}=1073741824\). However, this is where it gets surprising, and, mathematically incorrect: The computer thinks that

\[2^{31}=-2147483648,\mbox{ and } 2^{32}=2^{33}=\cdots = 0.\]

We will learn that this is due to the fact that int is stored as a \(32\)-bit number whose first bit is the sign. We will learn even more than that. We will learn how the storage is implemented using technique called Two's complement.

Example 2: Trouble with real numbers
#include<iostream>
int main(){
    double a=1.0, b=0.1, c=0.0;
    for(int i=0;i<10;++i){ c = c+b; }
    std::cout<<"a="<<a<<"\n"<<"c="<<c<<"\n";
    if(a==c){
        std::cout<<a<<"=="<<c<<"\n";
    }
    else{
        std::cout<<a<<"!="<<c<<"\n";
    }
    return 0;
}

The above code prints the following:

a=1
c=1
1!=1

The two values \(1\) stored in the variables a and c are not the same for our computer. The first is obtained by directly writing \(1\) into the variable a. The second value \(1\) is obtained by adding the number \(0.1\) ten times to the variable c inside the for loop.

Once we learn how the real numbers are stored, we will see that when the number \(0.1\) is stored in binary system, its representation has infinitely many digits. In decimal system, there is only one digit after the decimal point. In binary, it will be infinitely many. Thus, \(0.1\) has to be rounded in the computer. When we add this rounded value ten times, the rounding error becomes significant. The computer sees the result as a number different from \(1\). When the computer displays the results, it rounds it again and prints \(1\) on the output. That is how we end up with the confusing message 1!=1.

2. Numbers in base \(b\) and binary numbers

Until 12th century, the Europeans did not use the decimal system to represent numbers. They wrote I, II, III, IV, and V instead of \(1\), \(2\), \(3\), \(4\), and \(5\). Chinese did almost the same. The characters for the first five numbers in Chinese alphabet are , , , , and .

These old traditional number systems need infinitely many symbols to support infinitely many numbers. The decimal system needs only \(10\) symbols: \(0\), \(1\), \(2\), \(\dots\), \(9\). Let us consider a positive integer \(n\) and let us assume that it has \(k\) digits. We will denote the digits from right to left by \(d_0\), \(d_1\), \(\dots\), \(d_{k-1}\). To avoid confusion with multiplication in the case when digits are labeled as variables, we will write a horizontal line over the digits and have an equation of the form \[ n=\overline{d_{k-1}\dots d_1d_0}.\] For example, if \(n=4529\), then we will write \(k=4\), \(d_0=9\), \(d_1=2\), \(d_2=5\), and \(d_3=4\). We can write \(n=\overline{4529}\) as well, but the horizontal line is not necessary because all digits are known and none of them is represented with a variable.

The following equation holds: \[\overline{d_{k-1}\dots d_1d_0}=d_{k-1}\cdot 10^{k-1}+d_{k-2}\cdot 10^{k-2}+\cdots+ d_1\cdot 10^1+d_0\cdot 10^0.\]

We are very used to this representation of numbers. The system is called decimal system and the number \(10\) is called the base of the decimal system. It is believed that \(10\) is chosen as the base for the system because humans have \(10\) fingers on their hands. The universe doesn't treat the number \(10\) as a special one. We can easily replace \(10\) with any other base and have a perfectly well functioning mathematics.

We will now define the representation of integers in an arbitrary base \(b\), where \(b\) is an integer bigger than \(1\).

We will now assume that \(b >1\) and that \(n\) is a positive integer.

Definition The sequence of numbers \(d_0\), \(d_1\), \(\dots\), \(d_{k-1}\) is called representation of the number \(n\) in base \(b\) if \(d_0\), \(d_1\), \(\dots\), \(d_{k-1}\) are integers from \(\{0\), \(1\), \(\dots\), \(b-1\}\) that satisfy \[n=d_{k-1}\cdot b^{k-1}+d_{k-2}\cdot b^{k-2}+\cdots + d_1\cdot b^1+d_0\cdot b^0.\] The numbers \(d_0\), \(d_1\), \(\dots\), \(d_{k-1}\) are also called digits of \(n\) in base \(b\).

Theorem If \(n\) and \(b\) are integers such that \(n\geq 1\) and \(b >1\), then there exists a representation of the integer \(n\) in base \(b\). If we require the first digit \(d_{k-1}\) to be non-zero, then the representation is unique.

Problem 1. If the representation of the number \(n\) in base \(8\) is \(\overline{37105}_8\), what is the decimal representation of \(n\)?

Problem 2. If the representation of the number \(n\) in base \(5\) is \(\overline{2144}_5\), what is the decimal representation of \(n\)?

Problem 3. What is the representation of the number \(352\) in base \(5\)?

Definition The representation in base \(2\) is also called a binary representation.

Problem 4. Determine the integer \(n\) whose binary representation is \(\overline{1101011}_2\).

Problem 5. Determine binary representation of the number \(n=150\).

Problem 6. Create the program that reads a number from the user input and prints its binary representation.

Problem 7. Create a program that reads the string s consisting of digits \(0\) and \(1\) only and determines the integer \(n\) whose binary representation is s.

3. Unsigned integers

Non-negative integers are the easiest to handle. In computer science they are called unsigned integers because we do not need to worry about the sign. We can assume it is \(+\). The binary digits are the only elements that have to be stored.

Unsigned integers were more common in past, but not so much today. The computers had very small memory in past. If your program needed both positive and negative integers, then you would be kind of sad. You would need to devote one entire bit for the sign. If you knew that your program does not need negative numbers, you would feel happy and lucky. You would use unsigned integers. That extra bit would be an extra digit. The possible range for the data would be bigger than if one of the bits were reserved for the sign.

In modern C++ there are two major data types for unsigned integers. The first one is unsigned int and it occupies \(32\) bits. The second type is unsigned long and it takes \(64\) bits. They are rarely used any more.

4. Signed integers

Let us denote by \(l\) the total number of bits that we are allowed to use for storing our integer. For most C++ compilers, the type int has \(l=32\) while the type long has \(l=64\).

We will treat \(l\) as a general number and in our examples we will keep things simple by using \(l\) that is much smaller than \(32\) and \(64\). We will often use \(l=8\).

The maximal positive number that we will be able to store is \(2^{l-1}-1\). The minimal negative number will be \(-2^{l-1}\). Thus, the range of all integers will be \([-2^{l-1},2^{l-1}-1]\). There are exactly \(2^l\) numbers in this range. Instead of storing the number \(x\), the computer memory will contain Two's complement of \(x\). This is the formal definition of two's complement.

Definition For \(x\in[-2^{l-1},2^{l-1}-1]\) the two's complement of \(x\) is the number consisting of the last \(l\) binary digits of \(2^l+x\).

Problem 8. Assume that \(l=8\). Determine the two's complement of numbers \(0\), \(10\), \(35\), \(-35\), \(-90\), \(127\), \(-128\).

From the previous exercise we can observe that for positive \(x\) that satisfies \(x\leq 2^{l-1}-1\), the two's complement is always equal to the binary expansion of \(x\). The number \(2^l+x\) will have digit \(1\) at the position \(l+1\) which gets discarded. Moreover, the left-most digit of the two's complement is equal to \(0\). However, if \(x\) is a negative number that satisfies \(x\geq -2^{l-1}\), then the two's complement will have exactly \(l\) digits. There won't be any digit to discard. The left-most digit of the two's complement will be \(1\).

Thus, the left-most digit of the two's complement is \(0\) if the number is positive, and \(1\) if the number is negative. The left-most digit is the same as the sign.

The arithmetic operations with signed integers can be performed in the same way as if the integers are unsigned. If there is any carryover that would occupy more than \(l\) bits, that carryover should just be ignored.

We are now ready to analyze the code from the Example 1.

Problem 9. What does the following code print?
#include<iostream>
int main(){
    int x=1 ;
    for(int i=0;i<35;++i){
        std::cout<<i<<"\t"<<x<<"\n";
        x = x*2; 
    }
    return 0;
}

Problem 10.

Create the function addOneInBase8. The arguments of the functions are the pointer to the first element of the sequence and its length. The sequence represents the number n in the base 8. The function should update the sequence so it represents the number n+1 in base 8.

Your code should replace the text // ??? // below.

void addOneInBase8(long* b, long M){
   // ??? //
}