Representation of Integers and Shift Instructions

1. Introduction

We have to develop a good theoretical understanding of binary representation of numbers. The focus of this section is on integers. In later sections, we will learn how fractions and real numbers are represented in computers.

We will start with introducing the representation of integers and binary representation. Then, we will learn about two's complement, which is the technique used to store integer in modern computers. In the end we will learn about shift instructions LSL, LSR, and ASR in the assembly language. The shift instruction LSL is in certain situations much faster than the multiplication instruction MUL. The shift instructions LSR and ASR can be used to implement most basic divisions of integers.

2. Numbers in base \(b\) and binary numbers

Until 12th century, the Europeans did not use the decimal system to represent numbers. They wrote I, II, III, IV, and V instead of \(1\), \(2\), \(3\), \(4\), and \(5\). Chinese did almost the same. The characters for the first five numbers in Chinese alphabet are , , , , and .

These old traditional number systems need infinitely many symbols to support infinitely many numbers. The decimal system needs only \(10\) symbols: \(0\), \(1\), \(2\), \(\dots\), \(9\). Let us consider a positive integer \(n\) and let us assume that it has \(k\) digits. We will denote the digits from right to left by \(d_0\), \(d_1\), \(\dots\), \(d_{k-1}\). To avoid confusion with multiplication in the case when digits are labeled as variables, we will write a horizontal line over the digits and have an equation of the form \[ n=\overline{d_{k-1}\dots d_1d_0}.\] For example, if \(n=4529\), then we will write \(k=4\), \(d_0=9\), \(d_1=2\), \(d_2=5\), and \(d_3=4\). We can write \(n=\overline{4529}\) as well, but the horizontal line is not necessary because all digits are known and none of them is represented with a variable.

The following equation holds: \[\overline{d_{k-1}\dots d_1d_0}=d_{k-1}\cdot 10^{k-1}+d_{k-2}\cdot 10^{k-2}+\cdots+ d_1\cdot 10^1+d_0\cdot 10^0.\]

We are very used to this representation of numbers. The system is called decimal system and the number \(10\) is called the base of the decimal system. It is believed that \(10\) is chosen as the base for the system because humans have \(10\) fingers on their hands. The universe doesn't treat the number \(10\) as a special one. We can easily replace \(10\) with any other base and have a perfectly well functioning mathematics.

We will now define the representation of integers in an arbitrary base \(b\), where \(b\) is an integer bigger than \(1\).

We will now assume that \(b> 1\) and that \(n\) is a positive integer.

Definition The sequence of numbers \(d_0\), \(d_1\), \(\dots\), \(d_{k-1}\) is called representation of the number \(n\) in base \(b\) if \(d_0\), \(d_1\), \(\dots\), \(d_{k-1}\) are integers from \(\{0\), \(1\), \(\dots\), \(b-1\}\) that satisfy \[n=d_{k-1}\cdot b^{k-1}+d_{k-2}\cdot b^{k-2}+\cdots + d_1\cdot b^1+d_0\cdot b^0.\] The numbers \(d_0\), \(d_1\), \(\dots\), \(d_{k-1}\) are also called digits of \(n\) in base \(b\).

Theorem If \(n\) and \(b\) are integers such that \(n\geq 1\) and \(b > 1\), then there exists a representation of the integer \(n\) in base \(b\). If we require the first digit \(d_{k-1}\) to be non-zero, then the representation is unique.

Problem 1. If the representation of the number \(n\) in base \(8\) is \(\overline{37105}_8\), what is the decimal representation of \(n\)?

Problem 2. If the representation of the number \(n\) in base \(5\) is \(\overline{2144}_5\), what is the decimal representation of \(n\)?

Problem 3. What is the representation of the number \(352\) in base \(5\)?

Definition The representation in base \(2\) is also called a binary representation.

Problem 4. Determine the integer \(n\) whose binary representation is \(\overline{1101011}_2\).

Problem 5. Determine binary representation of the number \(n=150\).

Problem 6. Create the program that reads a number from the user input and prints its binary representation.

Problem 7. Create a program that reads the string s consisting of digits \(0\) and \(1\) only and determines the integer \(n\) whose binary representation is s.

3. Unsigned integers

Non-negative integers are the easiest to handle. In computer science they are called unsigned integers because we do not need to worry about the sign. We can assume it is \(+\). The binary digits are the only elements that have to be stored.

Unsigned integers were more common in past, but not so much today. The computers had very small memory in past. If your program needed both positive and negative integers, then you would be kind of sad. You would need to devote one entire bit for the sign. If you knew that your program does not need negative numbers, you would feel happy and lucky. You would use unsigned integers. That extra bit would be an extra digit. The possible range for the data would be bigger than if one of the bits were reserved for the sign.

In modern C++ there are two major data types for unsigned integers. The first one is unsigned int and it occupies \(32\) bits. The second type is unsigned long and it takes \(64\) bits. They are rarely used any more.

4. Signed integers

Let us denote by \(l\) the total number of bits that we are allowed to use for storing our integer. For most C++ compilers, the type int has \(l=32\) while the type long has \(l=64\).

We will treat \(l\) as a general number and in our examples we will keep things simple by using \(l\) that is much smaller than \(32\) and \(64\). We will often use \(l=8\).

The maximal positive number that we will be able to store is \(2^{l-1}-1\). The minimal negative number will be \(-2^{l-1}\). Thus, the range of all integers will be \([-2^{l-1},2^{l-1}-1]\). There are exactly \(2^l\) numbers in this range. Instead of storing the number \(x\), the computer memory will contain Two's complement of \(x\). This is the formal definition of two's complement.

Definition For \(x\in[-2^{l-1},2^{l-1}-1]\) the two's complement of \(x\) is the number consisting of the last \(l\) binary digits of \(2^l+x\).

Problem 8. Assume that \(l=8\). Determine the two's complement of numbers \(0\), \(10\), \(35\), \(-35\), \(-90\), \(127\), \(-128\).

From the previous exercise we can observe that for positive \(x\) that satisfies \(x\leq 2^{l-1}-1\), the two's complement is always equal to the binary expansion of \(x\). The number \(2^l+x\) will have digit \(1\) at the position \(l+1\) which gets discarded. Moreover, the left-most digit of the two's complement is equal to \(0\). However, if \(x\) is a negative number that satisfies \(x\geq -2^{l-1}\), then the two's complement will have exactly \(l\) digits. There won't be any digit to discard. The left-most digit of the two's complement will be \(1\).

Thus, the left-most digit of the two's complement is \(0\) if the number is positive, and \(1\) if the number is negative. The left-most digit is the same as the sign.

The arithmetic operations with signed integers can be performed in the same way as if the integers are unsigned. If there is any carryover that would occupy more than \(l\) bits, that carryover should just be ignored.

We are now ready to analyze the code from the Example 1.

Problem 9. What does the following code print?
#include<iostream>
int main(){
    int x=1 ;
    for(int i=0;i<35;++i){
        std::cout<<i<<"\t"<<x<<"\n";
        x = x*2; 
    }
    return 0;
}

5. Left shift

From very young age, all humans learn that it is the easiest to multiply by numbers such as 10, 100, and 1000. The multiplication by \(10^k\) in decimal system consists of adding \(k\) zeroes to the right end of the number.

Similarly, multiplication by \(2^k\) in binary is equivalent to adding \(k\) zeroes at the end of the number. This operation is called left shift or logical left shift.

If we want to multiply the content of the register X9 by the number by \(2^5\) and store the result in X10, we would need to give the following instruction.

LSL     X10,     X9,     5

Mathematically, the instruction is equivalent to MUL X10, X9, 32. However, MUL is much slower. Hence, if we are multiplying the number by a power of \(2\), then it is much faster to use the shift instruction instead of the multiplication.

The shift operator can be a faster alternative even to multiplications by certain numbers that are not exactly powers of two. Here is one example: Multiplication by 17 can be achieved with

MUL     X10,     X9,     17

The faster way to multiply by 17 is

LSL     X8,      X9,     4      /* X8 = X9 * 2^4 = 16 * X9 */
ADD     X10,     X9,     X8     /* X10 = X9 + X8 = 17 * X9 */
Problem 10.

Assume that the register X1 contain the integer \(a\). Use the appropriate shift instruction to evaluate \(512\cdot a\) and store the result in the register X2. Do not use the instruction MUL.

Problem 11.

Assume that the register X1 contain the integer \(a\). Evaluate \(520\cdot a\) and store the result in the register X2. Do not use the instruction MUL.

6 Right shifts

There are two right shift instructions: arithmetic right shift ASR, and logical right shift LSR.

The logical right shift always adds zeroes to the left. The arithmetic right shift copies the first bit and adds it to the left.

When the numbers are signed, they are expressed in two's complement, then the arithmetic right shift by \(k\) bits corresponds to the integer division by \(2^k\). The integer division is the operation that discards the remainder.

When the numbers are unsigned, then the logical right shift is the operation that corresponds to the integer division by \(2^k\).

If we want to make the arithmetic right shift of the register X9 by \(5\) places and store the result in X10, we would need to give the following instruction.

ASR     X10,     X9,     5

An analogous syntax is used for the logical right shift.

In the exercise below, the numbers are signed, and you should use the arithmetic right shift.

Problem 12.

Assume that the register X1 contain the integer \(a\). Use the appropriate shift instruction to evaluate the integer division \(a/64\) and store the result in the register X2.