You are on page 1of 102

Algorithms and Complexity, I

Anders Yeo

CS2860

September 2011

Department of Computer Science Egham, Surrey TW20 0EX, England

Abstract The aim of these notes is to provide the reader with a toolbox of useful algorithms and a comfortable familiarity with the kinds of data structures that they rely on. We do this by studying algorithms in isolation as small programs doing not-very-useful jobs. We establish some order within the presentation by looking at families of algorithms grouped rstly by their function (that is what they are supposed to do) and then by their strategy (that is how they do what they do). We introduce engineering techniques that provide analyses of running programs and measure the eciency of a particular implementation running with a particular input data set. We also develop mathematical tools that allow us to say something about the general behaviour of an algorithm, independently of the kind of computer it is running on and in terms of the amount of data to be processed. As well as the standard families of sorting and searching problems, we look at heuristic solutions to combinatorial search problems, and provide literature pointers to more advanced works dealing with topics such as probabilistic and parallel algorithms.

This document is c Anders Yeo 2005. Several parts of this document are inspired by the CS2490 blue book by Adrian Johnstone and Elizabeth Scott. Permission is given to freely distribute this document electronically and on paper. You may not change this document or incorporate parts of it in other documents: it must be distributed intact. Please send errata to the author at the address on the title page or electronically to anders@cs.rhul.ac.uk.

Contents

1 The 1.1 1.2 1.3 1.4 1.5 1.6

1.7

1.8 1.9

1.10 1.11 1.12 1.13

eciency of programs The speed of computers The study of algorithms Is an algorithm a program? The relationship between data structures and algorithms Pseudo-code and the telephone book structure Searching 1.6.1 Linear search 1.6.2 Binary search Sorting 1.7.1 Bubble sort 1.7.2 Insertion sort Algorithms based on indexable data structures compared to reference based structures Concrete implementations 1.9.1 Bubble sort on an array 1.9.2 Insertion sort on an array 1.9.3 Bubble sort on the phone book 1.9.4 Insertion sort on the phone book The performance of bubble sort Rates of growth and tractability Instrumenting a program counting operations Counting operations by hand

1 2 3 3 4 5 6 6 7 9 10 11 12 12 13 13 14 15 16 16 17 21 23 23 23 25 25 26 28 28 29 30 30 31

2 Divide and conquer? 2.1 Merge sorts 2.1.1 Sorting in two halves 2.1.2 Merge sort 0 2.1.3 Worst case analysis of merge sort 0 2.1.4 Merge sort 1 2.1.5 Worst case analysis of merge sort 1 2.1.6 Java code for merge sort 1 2.2 Quick sort 2.2.1 Partition algorithm 2.2.2 Worst case analysis of quick sort 2.2.3 Java code for quick sort

CONTENTS ii 2.2.4 2.2.5 Heap 2.3.1 2.3.2 2.3.3 2.3.4 2.3.5 Choosing the pivot value Worst case analysis sort Heaps Turning arrays into heaps Sorting heaps Complexity analysis of heap sort Java code for heap sort 32 35 35 35 36 39 39 41 43 43 44 44 45 46 46 48 48 48 49 50 50 51 52 52 54 54 54 55 55 55 56 56 58 58 58 59 60 61 63 66 66 66 67

2.3

3 So what do we mean by complexity? 3.1 Big-O notation - O(f (n)) 3.2 Omega notation - (f (n)) 3.3 Theta notation - (f (n)) 3.4 Small-o notation - o(f (n)) 3.5 A hierarchy of orders 3.6 Worst case analysis 3.7 Best case analysis 3.7.1 Best case analysis of bubble sort 3.7.2 Best case analysis of merge sort1 3.7.3 Best case analysis of quick sort 3.8 Average case analysis 3.8.1 Expected and typical values 3.8.2 Average case analysis of linear search 3.8.3 Average case analysis of merge sort 3.8.4 Average case analysis of quick sort 4 Searching algorithms 4.1 Overview 4.2 Doing better than linear search 4.2.1 Ordering the data? 4.2.2 Binary search 4.2.3 Interpolation search 4.3 Dealing with dynamic data structures 4.4 Hash coding 4.5 Binary search trees 4.5.1 Traversing binary trees 4.5.2 Structure of binary search trees 4.5.3 Optimum search trees 4.5.4 Balanced trees 4.5.5 Rotations in a binary search tree 4.5.6 Insertion in a balanced binary tree 4.5.7 Building balanced binary search trees 4.6 Multiway search trees 4.6.1 Insertion in multiway search trees 4.6.2 B-trees

CONTENTS iii 5 String matching 5.1 The naive string matching algorithm 5.2 The Rabin-Karp algorithm 6 Game Theory 6.1 Drop-down tic-tac-toe 6.2 Evaluation functions 6.3 Thoughts on min-max trees 6.4 Alpha-Beta pruning and pseudo codes 6.5 Pseudocode for the min-max algorithm 6.6 Alpha-Beta pruning. 6.6.1 Pseudocode for the Alpha-Beta pruning 6.6.2 General thoughts on Alpha-Beta pruning A CS2860 questions B Appendix B: Some useful inductive proofs B.1 Examples 69 70 72 76 76 80 83 86 86 87 88 90 93 96 96

Chapter 1 The eciency of programs


On rst acquaintance, computers seem impossibly fast. Forty years ago, nearly all accounting was done by hand. Armies of clerks used to process payroll data for large companies but within a very few years between 1955 and 1965 most of this activity was displaced by the arrival of computer-based payroll systems. These could calculate and print pay slips at a rate that would be astounding if it had not become so familiar to us1 . Neophyte computer users, that is people who are new to computing nd it hard to believe that we nd our machines are too slow. Amongst everyday computer users a dierent view prevails which we might summarise as therell be a faster one along in a moment syndrome. These technology bus believe that although they might nd their machines a bit sluggish sometimes, the hardware manufacturers will soon be able to sell them a machine that is twice as fast for no extra money. There is good reason to accept this view. The performance of computers in the 12,000 bracket has shown spectacular improvements over the last twenty years. Imagine how it would be if motor cars showed similar improvements. There is another commonly held view which is that no matter how much faster the hardware becomes, the software vendors will load more and more functionality into their products in a way that eectively nullies the improvement and forces everybody to continuously upgrade their computers. This is a natural thing for the computer industry to do: they exist to sell new machines and new software so it is important to them that products should become rapidly obsolescent. As academic computer scientists we need to recognise that there is some truth in the views of each of the neophytes, the technology bus and the cynics. However, our task is to study the underlying engineering and theoretical realities. We are not interested here in why large software companies do what they do, but in what could be done in an ideal world. In this course, we study fundamentals, not the eects of marketing. The purpose of this course is to study the art of the possible in computer programming. We do this by measuring the performance of computer programs and developing insights that allow us to design programs that are ecient. All
1

You can read the story of one of the pioneering commercial computers in [Bir94].

The speed of computers 2 real computer systems are nite: they have limited memory (both core and disk space); they have limited network bandwidth; and they have limited execution speed. Our aim is to write programs that deliver (provable) acceptable performance within those nite limitations.

1.1

The speed of computers


Real computer systems also cost real money, and at any given time a computer that is ten times as fast an another will cost considerably more than ten times as much. In practice, therefore, high performance computers have a high price/performance ratio compared to cheap, low performance computers and buying extra computing resource becomes increasingly expensive as the demands increase. To make matters worse, at any given time in the history of computing the range of available computers has rarely displayed more than a factor of a few hundred in relative performance. If we need a computer that is millions of times faster than presently available technology, we are unlikely to be able to buy it at any price. These facts of life have in recent years been obscured by the rapid improvement in technology with time. For many years during the 1950s to 1970s it turned out that execution speed and memory capacity of the best available computers tended to improve by roughly 2025% per year. During the late 1970s and 1980s the emergence of the microprocessor fuelled a growth rate of around 30% per year which was, incidentally, coupled to an extraordinary plummeting in the cost of systems. This increase in the rate of improvement and drop in cost was caused by the shift to highly integrated systems which could immediately benet from the improvements in technology provided by the integrated circuit processing engineers. In the late 1980s and through to the present day the adoption of new ways of designing computers have provided architectural improvements that have in 1994, for instance, generated a year-on-year performance improvement of 158%. (Source: [HP96, pp 13]) These engineering achievements, whilst impressive, still only yield a net performance improvement for the most powerful available computer of around 1,000 times over the last thirty years. The fact that prices have dropped at a much greater rate has had an enormous impact on society, in that computers have found many new cost-eective applications such as word-processors and spreadsheets. However, the absolute limit on what is feasible to computerise, as opposed to what is cost-eective to computerise, is not much aected by a factor of 1,000 performance increase. The harsh reality of life to the Computer Scientist is that many interesting and deceptively simple problems, such as nding the shortest route between fty towns, require such astounding amounts of computation that they are infeasible on any realistic computer2 .
2 All is not lost! If we are prepared to accept a short route between fty towns rather than insisting on nding the guaranteed shortest route then we do know how to write a program that nishes in a reasonable time. This will be discussed more in the second term

The study of algorithms 3

1.2

The study of algorithms


In the pioneering days of computing (a period between 1945 and the arrival of the rst commercial computers in the mid-1950s) programmers and designers were for the most part wholly concerned with working out how to program solutions to individual practical problems rather than philosophising on the generalised aspects of computing. Eciency was certainly a great concern given the low performance of those machines but problems and their solutions were considered together as an amalgam of a particular technique and a particular machine. The generalised study of methods by which problems might be attacked came later after considerable experience had been built up of real programs. These methods are called algorithms to distinguish them from particular programs that implement the method specied by the algorithm. By the early 1960s books on the design and analysis of algorithms were beginning to appear which oered three things: 1. a compendium of algorithms for solving standard problems, 2. mathematical techniques for analysing the behaviour of algorithms in terms of their performance on input data sets of various sizes, 3. taxonomies (classications by family) of algorithms that reveal their underlying strategies and illuminate the issues involved in the design of novel algorithms. We will in this course consider all of these three topics.

1.3

Is an algorithm a program?
In short, no, although algorithms are the foundation of programs. A famous book by the designer of the Pascal language (Niklaus Wirth) is called Algorithms + data structures = programs which rather sums up the conventional view of computer programming. In this view, when programming a solution to a problem you rst select relevant data structures with which to model the elements of your problem inside the computer, then you think of a broad brush solution to your problem which may be rened into individual operations such as sorting a set of numbers, locating a particular record and so on. Finally, these basic data level operations are programmed using a particular selected algorithm that is known to be ecient on the chosen data structure. Becoming procient in this style of program design requires a familiarity with elementary data structures such as arrays, queues, lists, stacks, trees and graphs along with a working repertoire of ecient algorithms for completing common tasks. Real programs are often made up of a network of largely independent data structures and algorithms which are used as necessary to assemble an overall solution to some problem. A range of data structures and algorithms is deployed within each single program and the programmer needs to be familiar with a rather wide range of techniques if they are to be eective.

The relationship between data structures and algorithms 4 One of the aims of this course, therefore, is to equip you with such a toolbox of algorithms and a comfortable familiarity with the kinds of data structures that you have already encountered as part of your elementary programming courses. We will do this by studying algorithms in isolation as small programs doing not-very-useful jobs. We will establish some order within the presentation by looking at families of algorithms grouped rstly by their function (that is what they are supposed to do) and then by their strategy (that is how they do what they do). Along the way, we will introduce engineering techniques that allow us to analyse running programs and measure the eciency of a particular implementation running with a particular input data set, and we will also develop mathematical tools that allow us to say something about the general behaviour of an algorithm, independently of the kind of computer it is running on and in terms of the amount of data to be processed. Occasionally, we shall stray away from the conventional doctrine of algorithms and data structures by looking at approaches to programming that represent the problem to be solved as a generalised cost that must be manipulated to nd a solution in a way that may involve some statistically random behaviour on the part of the computer. At the end of the course we will have a very brief look at some of the more intriguing aspects of program design that involve other unconventional approaches and give some pointers into the relevant computing literature.

1.4

The relationship between data structures and algorithms


The independent selection of (a) data structures to represent the quantities in a program and (b) algorithms to operate on those structures is rarely helpful. The key observation is that dierent ways of representing data make explicit dierent relationships within the data. If the structure makes explicit a useful property of the data, then algorithms working with that property will be more ecient. Perhaps the simplest example is the subscriber telephone directory. This is a data structure comprising a list of telephone users and their telephone numbers in alphabetical order of the subscribers name. To nd the number for a particular subscriber we look up the subscribers name, which can be done in a variety of reasonably ecient ways, as we shall see. Humans tend to use a rather informal algorithm which we might call icking through the pages. We make a guess at where in the directory the name might be, based on our subscribers surname. If we nd we have opened the directory too far in we jump backwards and try again. If we have found a page that comes before the one we are interested in then we jump forwards and try again. This informal algorithm, which is based on triangulating down onto the result is the basis of a very ecient search algorithm called binary search which we shall examine in section 1.6.2. Imagine, on the other hand, that we had been asked to nd the subscriber belonging to a particular telephone number. Since telephone numbers are allocated to names in a way that is for all intents and purposes random, the best

Pseudo-code and the telephone book structure 5 we can do with a conventional telephone directory is to read it entry by entry searching for the right number. The order in which we search doesnt matter, but we often call this kind of searching linear or sequential searching because we cant do any better than looking at the numbers one-by-one starting at the beginning. In the worst case (being given a telephone number that does not appear in the particular directory we are searching) we shall have to check every single telephone number before getting a result. Doing this manually would be so time consuming that we might consider it intractable. If, on the other hand, we had a telephone directory ordered by number, the lookup-by-number problem becomes essentially identical to the lookup-by-name problem and may be solved as easily. We can tell, as soon as we see a number greater than the one we are looking for that we have gone too far. On the other hand, searching by name has become intractable unless we keep both copies of the telephone directory to hand. A related problem is the location of an example of a particular kind of subscriber such bakers or car repair shops. The telephone companies do provide us with a data structure to make this kind of searching easier: it is called the Yellow pages and comprises a classied list of subscribers by business type. Within each class of business the subscribers are again arranged in alphabetical order, and it is normal to scan these in turn to nd a business which is in the right area. This is another linear search but it is a much less intimidating one because the number of subscribers in each class is small typically a few pages worth at most. The order in which data elements are held is only one aspect of the way in which choice of data structure aects the eciency of algorithms. It turns out that representing a list of records as a sorted array allows some searching operations to be performed more eciently than if the list is represented by a linked-list of records, regardless of whether they are sorted or not. On the other hand, putting new records into a linked-list representation is more ecient than putting new records into a large array of data. We shall have more to say on the distinction between data structures that allow direct access (indexable structures) such as arrays and data structures that force sequential access to records (non-indexable structures) such as linked lists in section 1.8. First, though, we shall look at some real algorithms.

1.5

Pseudo-code and the telephone book structure


We will use the metaphor of the telephone book to explore the properties of four simple algorithms. We shall illustrate most of the algorithms discussed in this section by implementing them for use with the telephone book, and we shall provide a copy of the telephone book and Java implementations of some of the algorithms for you to experiment with. We are going to use an informal pseudo-code to describe most of the algorithms, which looks basically like a simple mixture of C++, Java and Pascal. Later on, we shall also give some real implementations of these algorithms in Java. In general we shall use the Pascal symbol, :=, for assignment and the Java

Searching 6 symbol, ==, for equals. If an element, array say, is an array, we shall write array[i] for the entry at the ith index in the array, and as in Java, array indexes will start at 0. If an element is a record, record say, with elds name1, name2 etc, then we shall write record.name2 for the entry in eld name2 of record. We shall also use a generic print statement which takes as many arguments as you like, separated by commas. Arguments enclosed in quotes are treated as strings and printed literally, all other arguments are treated as identiers and their values will be printed. We shall declare identiers in C/Java style, for example

integer n integer a[6] phone_book_type { string name integer number } phone_book.name := "john"

/* declares an integer n */ /* declares an array of integers of size 6 */ /* declares a new type which has a record structure */

/* assigns the string john to the record */

1.6

Searching
Searching is perhaps the most common large-scale application for computer systems. All kinds of organisations now hold databases ranging in size from a few dozen to many millions of records. Of course, databases are not useful unless we can also provide ecient algorithms to retrieve information by searching for records with a particular property. We will now examine two algorithms with very dierent eciencies: the linear and binary searches.

1.6.1

Linear search

For our rst example, let us code up the steps that a human would have to go through if they had to perform the lookup of a person in a telephone directory by number. We will begin with perhaps the simplest search algorithm: the linear search. The idea is to simply scan up sequentially through the phone book records until we nd a name that matches the target. Since this is our rst algorithm, we include the declarations and initialisations described above for completeness.

Searching 7
1 2 3 4 5 6 7 phone_book_type { integer number, string name} integer phone_book_size := 60_000_000 phone_book_type phone_book[phone_book_size] integer index := 0 while (index < phone_book_size) && (phone_book[index].number != search_target.number) { index := index + 1 }

8 9 10 if index == phone_book_size 11 { print("Target ", target.number, " not found") } 12 else 13 { print("Target ", target.number, " belongs to ", phone_book[index].name) }

In line 5 a variable called index is declared and initialised to zero, the index value for the rst record in the array phone_book[]. (The code for loading this array has not been included.) The while loop in lines 78 then scans up through the list comparing the number in the target record with the phone numbers in the array. The while loop condition is in two parts, so there are two ways in which the loop might terminate: either a matching record might be found or the end of the array might be encountered. The if statement in lines 1013 checks to see whether the index has run o the end of the array, and if not it prints out the name of the corresponding subscriber. In the worst case (a missing number) this algorithm will compare the target against every record in the phone book, so its execution time will grow in proportion to the length of the list. Can we do any better? Well, if nothing is known about the ordering of the numbers in directory then no. If on the other hand we can preprocess the directory by sorting it into numerical order then a very ecient algorithm called binary search may be used.

1.6.2

Binary search

The binary search is a distillation of the algorithm that people seem to use when looking up a name in the telephone directory. In section 1.4 above we noted that people rst guess where in the telephone directory the name is, and then jump forwards or backwards as necessary, homing in on the right answer. The guesses that people make are based on their knowledge of the distribution of names in everyday life: it is pretty likely that a name beginning with the letter C will be near the front, so we guess a little way in. In general, when we are sorting arrays of data we do not know the distribution of the data, so we cannot exploit such knowledge. We want to balance the two probabilities that the record we want will be (a) before our rst guess and (b) after our rst guess. In the absence of any information on the distribution, the best we can do is to choose the record in the middle of the array. Over a large number of

Searching 8 random choices of target record, we would expect roughly half of them to be before our guess and half of them after. Having made our initial guess, we can look to see if we have found the right record (in which case we stop) or if we have to guess again. If we must guess again, we can narrow the choice down to one half of the array. This is a powerful technique: simply by making one comparison between the target and the record in the middle of the array we have managed to rule out half of the records in our array. For our second guess, we cannot, over the long run, do better than to guess the record that is in the middle of the half of the array containing our target. The guess may succeed, but if not we can simply throw away half of that segment and guess again in the middle of the other half of the segment. We continue in this way, discarding half of the remaining records at each step until either we nd a record that matches the target or we run out of records to check.
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 integer index binary_search(integer low_index, high_index) { if low_index > high_index { index := -1 } // search failed else { integer mid_index := (low_index + high_index) / 2 if phone_book[mid_index].number == target.number { index := mid_index } // search succeeded else { // search continues if target.number < phone_book[mid_index].number { binary_search(low_index, mid_index - 1) } else { binary_search(mid_index + 1, high_index) } } } } binary_search(0, phone_book_size - 1) if index == -1 { print("Target ", target.number, " not found") } else { print("Target ", target.number, " belongs to ", phone_book[index].name) }

As in our linear search algorithm, we declare a variable called index in line 5, although in this case it is only used to hold the nal result of the search rather than playing any rle in the search algorithm itself. The printing of the o results in lines 2629 is made a little easier because this algorithm sets the value of index to -1 (an illegal array index value) if the search fails. The body of the code in lines 722 denes a recursive function called binary_search that rst checks the middle record, and if that is not the record

Sorting 9 sought, calls itself again on one half of the array. The whole sort process is started by a call to the function binary_search in line 24 (note that 0 is the rst index of the array and phone_book_size - 1 is the last index of the array to be sorted). You will recall that the linear search required, in the worst case, every record to be compared against the target and so execution times rose in proportion to the number of records in the phone book. The worst case for searching algorithms is in general triggered by being asked to search for a record that is not in the set being searched. What is the worst case for binary search? Well, at each stage of the process we discard (approximately) half of the remaining records. Eventually we no records left, at which point the process must terminate. The worst case performance for n phone book records therefore amounts to the number of times we can apply the divide-by-two operation to n, ignoring fractions, until we have less than one left. This number is the logarithm to base 2 of n, written log2 n. In this course, when we speak of logarithms they are usually to base 2, so we will simply write log n. We will insert an explicit base if necessary. Let us compare the number of comparisons required by the linear and binary search algorithms for dierent sized phone books. Linear search (n) 16 256 65536 1,048,576 1,073,741,824 Binary search approximately (log n) 4 8 16 20 30

Note that the above values are approximate, and we will compute more accurate values later in this blue book. We can store a thousand-million records (one billion in the US) and search them all in only 30 steps a truly remarkable result. To achieve this, all we need to do is pre-sort the records by the eld that we are interested in. Let us now look at some sorting algorithms.

1.7

Sorting
Watching people sort can give some insights. Given a stack of cards to sort, most people perform something close to what we call an insertion sort. They pick a card o the top and then try and place it within the body of the cards in a sequence with its neighbours. The choice of cards, and the places they are inserted into the deck seem to depend on the cards as they present themselves along with the whims of the (human) sorter. We shall look at a more methodical way of performing insertion sorts in the next section. First we look at what many people nd to be the conceptually easiest sorting algorithm: the bubble sort.

Sorting 10

1.7.1

Bubble sort

Imagine that our phone book is in random order and we want to sort into order by number. We can examine rst two entries. If the lower entry rightfully belongs above the upper entry then we can exchange them, and they will now be in correct relative order. Now let us examine entries two and three. Entry two will be the result of our last comparison, but it might need to be above entry three in which case we exchange again. We keep doing this, checking a pair, optionally exchanging and then using the highest result as part of the next pair until we have worked our way up to the top of the array. After this pass we can guarantee that the highest entry in the phone book really is the highest (largest) phone number. We say that this number has bubbled up to the top, hence the name. This is all very well, but we can only guarantee that the highest element is now in its correctly sorted position. We need to repeat the process again looking at every remaining pair of elements up to but not including the last one to get the next sorted record in place. In fact, we need to make as many passes as there are records in the phone, less one (why? ) Here is a worked example for a set of six telephone numbers.

index 5 4 3 2 1 0

Pass 1 Pass 2 Pass 3 Pass 4 Pass 5 4601007 7843425 7843425 7843425 7843425 7843425 1737592 4601007 6297732 6297732 6297732 6297732 5641863 1737592 4601007 5641863 5641863 5641863 6297732 5641863 1737592 4601007 4601007 4601007 3772980 6297732 5641863 1737592 3772980 3772980 7843425 3772980 3772980 3772980 1737592 1737592

Here is some pseudo-code to bubble sort our pre-loaded phone_book array.


5 6 7 8 9 10 11 12 13 14 15 16 17 18 integer pass for pass from 1 to phone_book_size - 1 do { integer element for element from 0 to (phone_book_size - 1) - pass do { if (phone_book[element].number > phone_book[element + 1].number) { phone_book_record_type temp := phone_book[element] phone_book[element] := phone_book[element + 1] phone_book[element + 1] := temp } } }

The bubble sort comprises two nested for loops, one based on the induction variable pass which scans across the entire phone book and an inner one based on element which scans across the unsorted part of the array. For each value of element, we check the book entries phone_book[element] and phone_book[element + 1]. If they are out-of-order then we interchange through a temporary record called temp.

Sorting 11 It is easy to see that a bubble sort of n records requires n1 comparisons on the rst pass, n 2 on the second and so on down for a total of n 1 passes. Its execution time is therefore proportional to the square of the number of records. We will discuss this more later in this chapter.

1.7.2

Insertion sort

Bubble sort is a type of exchange (or interchange) sort, so called because the dominant action is the swapping over of two elements. Exchanging is a natural and easy operation to implement because it is localised: when we exchange two elements the rest of the array is unchanged. As we have already noted, people tend to use a dierent style of sorting in which they make a small sorted sequence and then insert elements into it. The problem with this from our point of view as computer programmers is that the only way we can insert an element into an array is to make a gap at the required place by moving part of the array up one slot. We will now look at a version of the insertion sort which attempts to minimise this disruption to the array by inserting in a very methodical fashion. A na implementation of insertion sort might start searching our alreadyve sorted section of the array at the rst record and scan up until it nds an element that is greater than the target element, then continue the scan up to the end of the array, moving elements up so as to make a space for the element to be inserted. Actually, we can do a little better than this by recognising that we are going to have to move elements up anyway, so we might as well combine the searching and moving operations into a single scan of the top part of the array in which elements are checked and moved at the same time. Here is an algorithm which takes an unsorted array phone_book[] and sorts it in place, using a single temporary phone book record.
5 6 7 8 9 10 11 12 13 14 15 16 17 integer pass for pass from 1 to phone_book_size-1 do { integer count phone_book_type temp := phone_book[pass] count :=pass - 1 while (temp.number < phone_book[count].number) && (count>=0) do { phone_book[count + 1] := phone_book[count] count := count -1 } phone_book[count + 1] := temp //insert }

This algorithm makes n 1 passes over the phone book, controlled by the for pass ... do loop which starts at line 7. On each pass, it starts at the record indexed by pass which it copies to a temporary variable temp. The while loop in lines 1215 then runs down the array moving each element up by one until it nds an element that is less than or equal to temp. At this point,

Algorithms based on indexable data structures compared to reference based structures 12 all of the elements that are greater than temp have been moved up by one slot, and there is a gap beneath them which in line 16 is lled by inserting temp. Is this algorithm any faster than bubble sort? Well, in the worst case we might have to check every record on each pass, so there would be 1 + 2 + 3 + . . . + (n 2) + (n 1) + n comparisons altogether. This is the same as for the bubble sort, so in this worst case there seems no advantage. We shall look at the algorithms behaviour versus bubble sort again later and try them on partially sorted data. It turns out that it is much easier to analyse the worst case behaviour of an algorithm than any form of average case because algorithm performance is usually data dependent, and so we need to know how well the data is ordered before we start sorting to say anything useful. These two algorithms tend to perform better on partially sorted data. Remarkably, one of the highperformance sorting algorithms that we shall look at later actually performs worse on ready-sorted data than on random data!

1.8

Algorithms based on indexable data structures compared to reference based structures


We have already seen that the order in which data is stored critically aects the eciency of algorithms. Another fundamental aspect of our data structures is indexing. An indexable data structure such as our array of telephone book records can be interrogated by record number. A non-indexable structure, such as a linked list, does not allow this direct (or random) access. If we put all of our records into a linked list then we have to access them sequentially, but not all non-indexable structures are as inecient as that: if we were to load the records as leaves of a binary tree in some predened order then we might be able to access an individual record in fewer steps by tracing down tree nodes. We will examine this idea in detail in Chapter 4. For now we note that some algorithms require indexable access (binary search, for instance) and so are unsuitable for implementation on a non-indexable structure. Some other algorithms only make sequential access to the data (such as insertion sort), and these can be implemented on both indexable and nonindexable structures. We also note that linked structures allow insertion in constant time, whereas arrays require all records to be moved during an insertion in the worst case. Finally, linked structures impose a space overhead which will be at least proportional to the number of records, and might be greater in the case of a tree structure.

1.9

Concrete implementations
We use pseudo-code so as to suppress some of the detail of our implementations, but we also need implementations in real languages. In this section we give a representative implementation of bubble sort and insertion sort in Java.

Concrete implementations 13

1.9.1

Bubble sort on an array

In the following example Java program we declare a four element array of integers and write a Java function called Bubble() that performs the bubble sort.

// Bubble sort running on a 4-element integer array public class Bubble{ int s[] = {4, 3, 2, 1}; private int n, i, j, temp, pass;

public Bubble() { n=4; System.out.println("The unsorted array: "); for (i = 0; i < n; i++) System.out.println(s[i]); for (pass=1; pass<n; pass++) for(i = 0; i < n-pass; i++) if(s[i] > s[i+1]) { temp = s[i]; s[i] = s[i+1]; s[i+1] = temp; } System.out.println("The sorted array: "); for (j = 0; j<n; j++) System.out.println(s[j]); } public static void main(String[] arg) { Bubble b = new Bubble(); } }

In this case, the translation between pseudo-code and Java is rather straightforward. Sometimes it is not quite as simple.

1.9.2

Insertion sort on an array

In the following example Java program we declare a four element array of integers and write a Java function called Insertion() that performs the insertion sort.

Concrete implementations 14

// Insertion sort running on a 4-element integer array public class Insertion{ int s[] = {4, 3, 2, 1}; private int n, i, j, temp, flag; public Insertion() { n=4; System.out.println("The unsorted array: "); for (i = 0; i < n; i++) System.out.println(s[i]); for(i = 1; i < n; i++) { temp = s[i]; for(j = i-1; j >= 0 && temp < s[j]; j--) s[j+1] = s[j]; s[j+1] = temp; } System.out.println("The sorted array: "); for (i = 0; i<n; i++) System.out.println(s[i]); } public static void main(String[] arg) { Insertion b = new Insertion(); } }

In this case, the translation between pseudo-code and Java is again straightforward.

1.9.3

Bubble sort on the phone book

The Java functions above only works on arrays of integers. In practice, we are usually sorting arrays of compound records, and we may be sorting on names, numbers or even combinations of the two. Before giving the Java code for this type of bubble sort, we need to dene the phone book as follows.
public class PhoneBook { String number, name; public PhoneBook(String n, String s) { number = n; name = s; } public String getNumber() { return number; }

Concrete implementations 15
public String getName() { return name; } }

If the above code is saved in a le called PhoneBook.java and the below code is saved in a le called Bubble phone.java, then the rst 1000 entries in the le phone.ran will be sorted (these les can be downloaded from the course webpage). The output will appear in the le phone.bubble.
// ===================== CS2860 example of Bubble sort on a phone book =========== import java.io.*; import java.util.*; class Bubble_phone { public static void main (String[] arg) throws IOException, FileNotFoundException { int size_of_phonebook=1000; int i,k; PhoneBook temp; PhoneBook[] array = new PhoneBook [size_of_phonebook]; String s, temp1, temp2; // ============ Read in phone book from "phone.ran" =============================== BufferedReader inFile = new BufferedReader(new FileReader("phone.ran")); for (i=0; i<size_of_phonebook; i++) { s=inFile.readLine(); StringTokenizer stok=new StringTokenizer(s); temp1 = stok.nextToken(); temp2 = stok.nextToken(); array[i] = new PhoneBook(temp1, temp2); } // ============ Sort the phone book ============================================= for (i=1; i<size_of_phonebook; i++) { for (k=0; k<size_of_phonebook-i; k++) { if ((array[k].name).compareTo(array[k+1].name)>0) { temp = array[k]; array[k] = array[k+1]; array[k+1] = temp; } } } // ============== Save the sorted phone book in "phone.bubble" ==================== PrintWriter outFile=new PrintWriter(new BufferedWriter(new FileWriter("phone.bubble"))); for (i=0; i<size_of_phonebook; i++) outFile.println(array[i].getNumber() + " " + array[i].getName()); outFile.close(); } }

1.9.4

Insertion sort on the phone book

We will in this section give the Java code for insertion sort on the phone book. As in the previous section you will need the les PhoneBook.java and phone.ran. The following code (saved as Insertion phone.java) will now sort the phone book in phone.ran and output the result in phone.insertion.

The performance of bubble sort 16


// ===================== CS2860 example of Insertion sort on a phone book =========== import java.io.*; import java.util.*; class Insertion_phone { public static void main (String[] arg) throws IOException, FileNotFoundException { int size_of_phonebook=1000; int i,j; PhoneBook temp; PhoneBook[] array = new PhoneBook [size_of_phonebook]; String s, temp1, temp2; // ============ Read in phone book from "phone.ran" =============================== BufferedReader inFile = new BufferedReader(new FileReader("phone.ran")); for (i=0; i<size_of_phonebook; i++) { s=inFile.readLine(); StringTokenizer stok=new StringTokenizer(s); temp1 = stok.nextToken(); temp2 = stok.nextToken(); array[i] = new PhoneBook(temp1, temp2); } // ============ Sort the phone book ============================================= for (i=1; i<size_of_phonebook; i++) { temp = array[i]; for (j=i-1; j>=0 && (temp.name).compareTo(array[j].name)<0; j--) array[j+1] = array[j]; array[j+1] = temp; } // ============== Save the sorted phone book in "phone.insertion" ==================== PrintWriter outFile=new PrintWriter(new BufferedWriter(new FileWriter("phone.insertion"))); for (i=0; i<size_of_phonebook; i++) outFile.println(array[i].getNumber() + " " + array[i].getName()); outFile.close(); } }

1.10

The performance of bubble sort

It is instructive to look at the behaviour of algorithms in terms of the way their performance changes as we present larger and larger input data sets. Table 1.1 shows the performance of a bubble sort algorithm working on a decreasing array and on a random array. The leftmost column shows the number of elements being sorted in each case. In the body of the table we show the number of copy and comparisons the algorithm uses (this will be explained later). We also show the same data but normalised against n2 , that is divided by n n. From the table we can see that both our measures do grow roughly as n2 once we get a large enough number of records.

1.11

Rates of growth and tractability

Our sorting algorithms require work proportional to n2 , but the binary search algorithm requires computation proportional to log2 n. What does this mean

Instrumenting a program counting operations 17


number of elements 10000 6666 4444 2962 1974 1316 877 584 389 259 172 114 76 50 33 22 14 9 6 4 Decreasing sequence copy & compare (copy & comp)/n2 199980000 1.9998 88857780 1.9997 39489384 1.99955 17540964 1.99932 7789404 1.99899 3461080 1.99848 1536504 1.99772 680944 1.99658 301864 1.99486 133644 1.99228 58824 1.98837 25764 1.98246 11400 1.97368 4900 1.96 2112 1.93939 924 1.90909 364 1.85714 144 1.77778 60 1.66667 24 1.5 Random copy & compare 124678258 55199379 24742702 10969846 4857715 2150978 958627 424991 189112 85462 36148 17068 7476 2900 1335 514 217 96 40 12 sequence (copy & comp)/n2 1.24678 1.24223 1.25285 1.25035 1.24663 1.24201 1.24638 1.2461 1.24974 1.27401 1.22188 1.31333 1.29432 1.16 1.2259 1.06198 1.10714 1.18519 1.11111 0.75

Table 1.1 Performance of bubble sort in practice? Table 1.2 shows the growth of some functions for values between 10 and 100. These show us that functions such as n2 and n3 grow rapidly, but that they are completely swamped by the exponential and factorial functions. In practice, if an algorithm can be shown to be exponential or worse then we say that it is an infeasible algorithm, because even the fastest computer will quickly become overwhelmed by the required computation as we increase the size of the input data set. If we could show that a given problem admitted only exponential algorithms then we would say that the problem was intractable.

1.12

Instrumenting a program counting operations

How do we nd out the behaviour of an algorithm? One way is to work through the algorithm by hand counting the number of operations, we shall do this at the end of this section. Another method is to add some book-keeping code to the implementation and get the program to do the counting for us. We shall look at this approach below. It is also possible to get a debugger to step through an implementation or to run a proling program which reports the amount of time spent in the various sections of your code. If we wanted to decide how long time a program will take to complete its task, we would need to know exactly how long every possible command takes to execute, and then count how many times each command is performed. However this would normally be to complicated and take way to long to do, so therefore we simplify this by just counting important operations.

Instrumenting a program counting operations 18 n 10 20 30 40 50 60 70 80 90 100 log2 (n) 3.32 4.32 4.91 5.32 5.64 5.91 6.13 6.32 6.49 6.64 n log2 (n) 33.22 86.44 147.21 212.88 282.19 354.41 429.05 505.75 584.27 664.39 n2 100 400 900 1,600 2,500 3,600 4,900 6,400 8,100 10,000 n3 1,000 8,000 27,000 64,000 125,000 216,000 343,000 512,000 729,000 1,000,000 2n 1.02E+03 1.05E+06 1.07E+09 1.10E+12 1.13E+15 1.15E+18 1.18E+21 1.21E+24 1.24E+27 1.27E+30 n! 3.63E+06 2.43E+18 2.65E+32 8.16E+47 3.04E+64 8.32E+81 1.20E+100 7.16E+118 1.49E+138 9.33E+157

Table 1.2 Growth of functions An important operation can be operations that take extra long time to perform, or the operation that is performed the most times. If we choose the correct operation (or operations) to be important, and then just count the number of times they are performed, then this will give a very good indication of how long time a program will need to complete its task. We will for the remainder of this course consider the record comparisons and record copy operations as our important operations. For most searching, sorting and other data structures, this turns out to be the most important operations. There are several reasons for this, which we shall not go into here, but the most important reason is that copy and compare operations are the operations that are performed most times in most of our algorithms. A record comparison is any time we compare an element of our array (record) to some other value (which may also be an element of our array). A record copy operation is an operation where an element of our array is either assigned some value, or its value is assigned to some other variable. If we consider the array s[], then statements of the following form are record comparison. If (s[i] == 7) j = i; If (s[7] < s[j]) i = 10; If ((s[21] > 9) && (s[i] < s[j])) ... (This is in fact two compare operations). If (s[j]! = 11) s[4] = 11; (This is also counted as a record copy). for (i = 1; i < s[4]; i++)... (This is counted every time the check i < s[4] is made, which may be many times). The following statements are considered as record copy operations. s[4] = 11; s[i] = s[j];

Instrumenting a program counting operations 19 s[2] = s[8] s[11] + s[i]/2; i = 27 + 22 s[i]; j = s[3] + 1;

Even though several array entries are looked at in some of the above examples, they are (for simplicity) only considered as one operation. How many copy operations are performed in the below code? How many compare operations are performed?

s[3]=4; For (s[2]=2; s[2]<s[3]; j=s[2]) s[2]=s[2]+1;

The answer is that the following copy and compare operations are performed.

s[3]=4 s[2]=2 s[2]<s[3] s[2]=s[2]+1 j=s[2] s[2]<s[3] s[2]=s[2]+1 j=s[2] s[2]<s[3]

copy copy compare copy copy compare copy copy compare

(s[2] is now 3)

(s[2] is now 4)

So there is 6 copy operations and 3 compare operations. Now that we know what a copy and compare operation is, lets return to our simplied example, let us add some code to instrument the program. We are interested in the contents of the sort array at each step, and the number of record comparison or record copies. We add output statements to print the contents of the array, and use a local variable ops to count the number of basic operations:

Instrumenting a program counting operations 20


// Chatty Insertion sort running on a 4-element integer array public class Insertion_chatty{ int s[] = {4, 3, 2, 1}; private int n, i, j, k, temp, flag, ops; public Insertion_chatty() { ops=0; n=4; System.out.print("Initial: s={"); for (i = 0; i < n; i++) System.out.print(s[i]); System.out.println("}, ops=" + ops); for(i = 1; i < n; i++) { temp = s[i]; ops++; for(j = i-1; j >= 0 && temp<s[j]; j--) { ops++; // This is for the comparison temp<s[j] s[j+1] = s[j]; ops++; System.out.print("i=" + i + " j=" + j + ", s={"); for (k = 0; k < n; k++) System.out.print(s[k]); System.out.println("}, ops=" + ops); } if (j>=0) ops++;

// This is because every time the loop exits // and j>=0, then we did check temp<s[j]. s[j+1] = temp; ops++; } System.out.print("Final: s={"); for (i = 0; i < n; i++) System.out.print(s[i]); System.out.println("}, ops=" + ops); } public static void main(String[] arg) { Insertion_chatty ic = new Insertion_chatty(); } }

When we run the above program we get a good idea of how the algorithm works, as well as a count on the number of copy and compare operations. In fact if we run he above code we get the following output:

Counting operations by hand 21 Initial: s={4321}, ops=0 i=1 j=0, s={4421}, ops=2 i=2 j=1, s={3441}, ops=6 i=2 j=0, s={3341}, ops=8 i=3 j=2, s={2344}, ops=12 i=3 j=1, s={2334}, ops=14 i=3 j=0, s={2234}, ops=16 Final: s={1234}, ops=18

1.13

Counting operations by hand

In this section we are going to compare the bubble and insertion sorts by counting the numbers of compare and copy operations carried out in the worst case. Recall the Bubble sort algorithm. Bubble sort for i from 1 to n-1 do { for j from 0 to n-1-i do { if ( S[j] > S[j+1] ) { x := S[j] S[j] := S[j+1] S[j+1] := x } }} If we count the compare operations carried out when we input the array [n, n-1, ... , 2, 1], we get the following. Compare Operations n-1 n-2 n-3 ... ............ n-1 1, 2, 3, ..., n 1, n TOTAL = 1
n(n1) 2

i 1 2 3 ...

ARRAY S n, n 1, n 2, n 3, ..., 1 n 1, n 2, n 3, ..., 1, n n 2, n 3, ..., 1, n 1, n ............

Note that when i = 1 the variable j takes on the values 0, 1, ..., n 1 1 which is n 1 distinct values, and for each value we perform one compare operation (S[j] > S[j + 1]). When i = 2 the variable j takes on the values 0, 1, ..., n 1 2 which is n 2 distinct values. This continues as illustrated

Counting operations by hand 22 above. Therefore we have 1 + 2 + 3 + . . . + (n 1) compare operations. see that this is exactly n(n 1)/2 compare operations in class (see also Appendix B). So how many copy operations do we use? By considering how the algorithm works, we note that if we start with a decreasing sequence, then every time we perform the comparison S[j] > S[j +1] it will be true. So every time we perform a comparison we will perform 3 copy operations (x := S[j], S[j] := S[j + 1] and S[j + 1] := x). So the number of copy operations is 3n(n 1)/2. Therefore Bubble sort will use 2n(n 1) copy and compare operations on a decreasing sequence. How many operations does insertion sort use? Insertion sort for i from 1 to n-1 do { x := S[i] j := i- 1 while (j>=0) && (x < S[j]) do { S[j+1] := S[j] j:=j-1 } S[j+1] := x } If we count the operations carried on input array [n, n-1, ... , 2, 1] for this algorithm we get (1+2+1)+(1+4+1)+. . .+(1+2n2+1) = (n+2)(n1) copy and compare operations. We will in class see why the above equation is true, but can you already see why it holds? So in the worst case the insertion sort algorithm is slightly better than bubble sort.

Chapter 2 Divide and conquer?


In this section we shall begin by counting record copy and record compare operations separately, and consider the worst case performance of various algorithms according to these values. Recall that in the worst case, sorting n elements using a bubble sort can b take Copyn = 3n(n 1)/2 record copy operations and Compb = n(n 1)/2 n record compare operations (the superscript b stands for bubble sort). b b Note that Copy100 = 14850 and Compb = 4950, while Copy50 = 3675 100 and Compb = 1225. Sorting half of the list only needs about a quarter of the 50 number of operations, and sorting two lists of length 50 takes half as long as sorting one list of length 100. Generally, n n 1 n(n 2) 2 = Compb = 2 n/2 2 8
b Copyn/2 = 3 n 2

n 2

3n(n 2) 8

So sorting two equal halves takes about half the number of operations required for one full length list. Perhaps we can improve eciency by splitting the list into two parts, sorting each part and then recombining them. (This chapter only contains outline material. You will need to take notes in lectures and possibly read up about the algorithms discussed in text books.)

2.1 2.1.1

Merge sorts Sorting in two halves

We assume that we have an array S of size n with the elements up to the (n/2) 1 position sorted and the elements from the n/2 position separately sorted (e.g. [2, 5, 7, 11, 1, 8, 12, 14]). The following pseudocode will merge the two halfs into a sorted array, using a insertion-sort type approach.

Merge sorts 24 int k := n/2 for i from k to n-1 do { int temp := S[i] j:=i-1 while (j>=i-k) && (temp < S[j]) { S[j+1] := S[j] j:=j-1 } S[j+1] := temp } To calculate the order of this algorithm we use the following result (see appendix B, for a proof of this):

Theorem 1 For n 0, 1 + 2 + 3 + 4 + . . . + n = n(n + 1)/2. In worst case the algorithm uses the following number of record copy and compare operations (when n is even).
m Copyn = (1+k+1)(n1k+1) =

(n + 4)n 4

Compm = k(n1k+1) = n

n2 4

The above is computed by noting that the outer loop (ie for i from k to n-1) has n 1 k + 1 = n/2 iterations and the inner loop (i.e. while (j>=i-k) && (temp < S[j])) has at most k iterations. So if we sort the two halves of the array using bubble sort and then merge them using the above algorithm, the result is still of order n2 but the sort cost is (see the previous page for the number of copy and compare operations for bubble sort on an array of size n/2). n 3n(n 2) 3n(n 2) n(n + 4) + + = n2 8 8 4 2 n2 n n(n 2) n(n 2) n2 + + = 8 8 4 2

Copy operations =

Compare operations = giving a total of

3 operations, instead of

n2 n 2

2n2 2n copy and compare operations for bubble sort. Thus the constant of proportionality is lower and the sort will be faster.

Merge sorts 25

2.1.2

Merge sort 0

If it is worth splitting the array in half to sort then it is worth splitting each part to sort it. We end up with a recursive algorithm. We need to be able to merge part of the array, so we provide the rst and last index of the region whose entries are to be merged. merge0(int low, int high, int S[]){ int k := (high - low + 1)/2 for i from low+k to high do { int temp := S[i] j := i-1 while (j>=i-k) && (temp < S[j]) { S[j+1] := S[j] j:=j-1 } S[j+1] := temp } } which in worst case takes a total of n2 /2 + n record copy and compare operations (see previous page). We can write the sort routine recursively merge_sort0(int low, int high, int S[]){ if low < high { int k := (high - low + 1)/2 merge_sort0(low, low+k-1, S) merge_sort0(low+k, high, S) merge0(low, high, S) } } To save writing out two essentially equivalent calculations for each algorithm, from now on we shall count the total number of record compare and copy operations rather than counting the two numbers separately.

2.1.3

Worst case analysis of merge sort 0

We shall now show that in worst case merge sort 0 can take W (n) = n(n 1) + n log n

operations to sort an array of size n.

To calculate the order of merge sort 0 we use Theorem 2 1. 1 + 2 + 22 + . . . + 2r = 2r+1 1

2. 1 + 1/2 + 1/22 + . . . + 1/2r1 = (2r 1)/2r1

Merge sorts 26 Suppose that merge_sort0 takes W (n) record copy and compare operations in worst case. What is W (n)? If n = 1 then low = high and so W (1) = 0. We have already seen that merge0(l, l+n-1, S) takes n2 2n + 2 2 operations in the worst case. So, if n is even (so high = n 1 is odd), W (n) = W (n/2) + W (n/2) + ( If n/2 is even W (n/2) = 2W (n/4) + ( and so W (n) = 22 W (n/4) + ( If n = 2r we have n2 2n n2 2n ) + ... + ( + ). + 2r 2 2 2 Recall that if n = 2r then log2 n = r, and that W (n) = 2r W (1) + ( log ab = b log a, 2log2 n = n. There are r = log2 n terms in the equation for W (n) above and W (1) = 0 so we have W (n) = n2 1 1 1 2n log n ( r1 + r2 + . . . + + 1) + 2 2 2 2 2 W (n) = n2 2r 1 ( ) + n log n 2 2r1 n2 2n n2 2n + )+( + ). 4 2 2 2 n2 2n + ) 8 4 n2 2n n2 2n + ) = 2W (n/2) + ( + ). 2 2 2 2

At the end of the day this sort is still of order n2 .

W (n) = n(n 1) + n log n

2.1.4

Merge sort 1

[TA, p.415] [N&N, p.52] Part of the problem with the merge part of the sort as we have described it so far is that when we nd where to insert the current element we have to move all of the elements above that point up one place. In the worst case the top element in the rst half of the array gets moved n/2 times. We now consider having a second array which we use to put the elements once we have established their correct position (the details of this algorithm will be given in class).

Merge sorts 27 int U[] int k := n/2 int j := 0 int p := 0 int i := n/2 for q from 0 to n-1 { U[q] := S[q] } while (j <= n/2-1) and (i <= n-1) do { if ( U[j] <= U[i] ) { S[p] := U[j] j := j+1 } else { S[p] := U[i] i := i+1 } p := p+1 } if (j <= n/2 -1) { for q from p to n-1 do { S[q] := U[j] j := j+1 } } } In worst case this takes 3n 1 record copy and compare operations. We now use the above merge as the basis for a recursive sort algorithm. merge1(int low, int high, int S[], U[]) { int k := (high - low + 1)/2 for q from low to high { U[q] := S[q] } int j := low int p := low int i := low + k while (j <= low + k - 1) and (i <= high) do { if ( U[j] <= U[i] ) { S[p] := U[j] j := j+1 } else { S[p] := U[i] i := i+1 } p := p+1 } if (j <= low + k - 1) { for q from p to high do { S[q] := U[j] j := j+1 } }

Merge sorts 28 } merge_sort1(int low, int high, int S[], U[]) { if low < high { int k := (high - low + 1)/2 merge_sort1(low, low+k-1, S, U) merge_sort1(low+k, high, S, U) merge1(low, high, S, U) } }

2.1.5

Worst case analysis of merge sort 1

Suppose that the worst case record copy and compare operations for merge sort is W (n). Then we have W (n) = 2W (n/2) + 3n 1. Using the same kind of calculations that we used in Section 2.1.3 (see also Appendix B) we can show that W (n) = 3n log2 n n + 1. This is a genuine improvement over the bubble and insertion sorts, but at the cost of doubling the data space required.

2.1.6

Java code for merge sort 1

public class MergeSort1{ int S[] = {-1, 5, 1, 3, 2, 27, -3}; int U[]= new int[7]; private int i, j, k, p, q; public MergeSort1() { System.out.println("The original array: "); for (i = 0; i < 7; i++) System.out.println(S[i]); merge_sort1(0,6,S,U); System.out.println("The sorted array: "); for (j = 0; j<7; j++) System.out.println(S[j]); } public void merge1(int low, int high, int S[], int U[]) { k = (high - low + 1)/2;

Quick sort 29 for(i = low; i <= high; i++) U[i]=S[i]; j = low; p = low; i = low + k; while (j <= (low + k - 1) && i <= high) { if (U[j] <= U[i]) { S[p] = U[j]; j = j + 1; } else { S[p] = U[i]; i = i + 1; } p=p+1; } if (j <= low + k - 1) for(q = p; q <= high; q++) { S[q] = U[j]; j = j + 1; } } public void merge_sort1(int low, int high, int S[], int U[]) { int k; if (low < high) { k = (high - low + 1)/2; merge_sort1(low, low+k-1, S, U); merge_sort1(low+k, high, S, U); merge1(low, high, S, U); } } public static void main(String[] arg) { MergeSort1 m = new MergeSort1(); }

2.2

Quick sort
[N&N p.59] Suppose that we split the array into two parts so that all the elements in one part are smaller than any of the elements in the other part. Then when the two parts were sorted they would not need to be merged; they would automatically be in the correct order. Thus we would save in the worst case 3n 1 merge operations.

Quick sort 30

2.2.1

Partition algorithm

The idea is to pick an element then put all the smaller elements to the left and all the larger elements to the right. Initially we pick the rst element. int partition(int low, high, S[]) { int pivot := S[low] int k := low for i from low + 1 to high do{ if (S[i] < pivot) { S[k] := S[i] S[i] := S[k+1] k := k+1 } } S[k] := pivot return k } In the worst case on a n element array this takes W (n) = 1 + 3(n 1) + 1 = 3n 1 record copy and compare operations. We turn this into a sorting algorithm by partitioning and sorting the two parts. There is no need to merge afterwards because the two parts are already relatively ordered. quick_sort(int low, high, S[]){ if (low < high){ int k := partition(low, high, S) quick_sort(low, k-1, S) quick_sort(k+1, high, S) } } The details of quick sort will be discussed in detail in class.

2.2.2

Worst case analysis of quick sort

We shall show that in the worst case the number of operations required to perform a quick sort on an array of size n is 3n2 n + 2. 2 2 If W (n) is the worst case number of record compare and copy operations required for an n element array, then W (n) = (3n 1) + W (n 1 k) + W (k).

Quick sort 31 The minimum number of partition steps that we will need is r where n = 2r , so r = log2 n. But the maximum possible number of partitions is n 1, when the array is sorted. So in worst case we have W (n) = (3n 1) + W (n 1) + W (0) = (3n 1) + W (n 1) W (n) = (3n1)+(3(n1)1)+W (n2) = . . . = (3n1)+. . .+(321)+W (1) 3n2 n 3n(n + 1) 3 (n 1) = + 2 2 2 2 (which is actually the same order as bubble sort!). W (n) =

2.2.3

Java code for quick sort

public class QuickSort{ int n=4; int S[] = {4,3,2,1}; private int i, j; public QuickSort() { System.out.println("The original array: "); for (i = 0; i < n; i++) System.out.println(S[i]); quick(0,n-1,S); System.out.println("The sorted array: "); for (j = 0; j<n; j++) System.out.println(S[j]); } public int partition(int low, int high, int S[]) { int pivot = S[low]; int k = low; for(i = low+1; i <= high; i++) if (S[i] < pivot) { S[k] = S[i]; S[i] = S[k+1]; k = k+1; } S[k] = pivot; return k; }

Quick sort 32 public void quick(int low, int high, int S[]) { if (low < high) { int k = partition(low, high, S); quick(low, k-1, S); quick(k+1, high, S); } } public static void main(String[] arg) { QuickSort q = new QuickSort(); } }

2.2.4

Choosing the pivot value

After an array has been partitioned the pivot value ends up in its correct position. If we could choose the mid-value (median) in the array to be the pivot then the two partitions would have the same size. If, at each stage in the sort, the pivot can be chosen to be the median value we will thus minimise the number of partitions, and hence recursive calls, required during the sorting process. This in turn makes the sort more ecient. In general it is too expensive to calculate the median value in the array, but if the array is already sorted then the median will be in the mid point of the array. Thus, if we have reason to believe that the array is at least partially sorted then eventually some of the partitions will be fully sorted, so it may be more ecient to use the mid point of the array as the pivot. This also has the comfortable consequence that sorting an already sorted array has best rather than worst case time complexity. The following algorithm for partnMid() partitions an array on the basis of its mid point entry. int partnMid(low, high, S) { int p := low + (high - low)/2 int pivot := S[p] int k := p for i from p-1 to low do{ if (S[i] > pivot) { S[k] := S[i] S[i] := S[k-1] k := k-1 } } for i from p+1 to high do{ if (S[i] < pivot) { S[k] := S[i] S[i] := S[k+1]

Quick sort 33 k := k+1 } } S[k] := pivot return k } For example, suppose we have a sorted array to which we then add another element [a0 a1 a2 a3 a4 a5 b]. Suppose that b is smaller than all the keys already in the array. We sort the new array using the version of quick sort which employs the mid-point partition algorithm. This works as follows.

Quick sort 34 S = [2 3 5 7 9 11 0] quick sort (0,6,S) -----------------low=0 high=6 p=3 pivot=S[3]= 7 first loop k=3 i=2 2 3 5 7 9 11 0 i=1 2 3 5 7 9 11 0 i=0 2 3 5 7 9 11 0 second loop k=3 i=4 2 3 5 7 9 11 0 i=5 2 3 5 7 9 11 0 i=6 2 3 5 0 9 11 9 k=4 2 3 5 0 7 11 9 quick sort (0,3,S) -----------------low=0 high=3 p=1 pivot=S[1]=3 first loop k=1 i=0 2 3 5 0 7 11 9 second loop i=2 2 3 5 0 7 11 9 i=3 2 0 5 5 7 11 9 k=2 2 0 3 5 7 11 9 quick sort (0,1,S) -----------------low=0 high=1 p=0 pivot=S[0]=2 first loop second loop k=0 i=1 0 0 3 5 7 11 9 k=1 0 2 3 5 7 11 9 quick sort (0,0,S) -----------------quick sort (2,1,S) -----------------quick sort (3,3,S) -----------------quick sort (5,6,S) -----------------low=5 high=6 p=5 pivot=S[5]=11 first loop second loop k=5 i=6 0 2 3 5 7 9 9 k=6 0 2 3 5 7 9 11 quick sort (5,5,S) -----------------quick sort (7,6,S) ------------------

1 op 1 op 1 op 1 op 1 1 3 1 op op ops op

1 op 1 op 1 op 3 ops 1 op

1 op

3 ops 1 op

1 op

3 ops 1 op

Heap sort 35 Thus the array has been sorted using 27 record copy and compare operations.

2.2.5

Worst case analysis

In worst case partnMid takes 3n1 copy and compare operations. Thus, in this case if W (n) is the worst case number of copy and compare operations required to sort an array with n elements using this partition algorithm, we have W (n) = 3n 1 + W (k) + W (n 1 k) Thus, if we can produce equal partitions for the two subsorts, so that k = (n 1)/2 then we have W S(n) = 3n 1 + 2W S(n 1/2). (WS stands for worst sorted because it corresponds to the behaviour of quick sort on an already sorted array.) If sorting on each half again produces two equal partitions we get W S(n) = 4W S( n3 ) + (3(n 1) 2) + (3n 1). 4

If we can produce equally sized partitions at every level of the sort then, using the same type of calculations as in Section 2.1.3, it can be shown that W S(n) = 3(n + 1) log(n + 1) 5n 1 However, in the worst case, where one of partitions at each stage turns out to have size 1, then number of operations required is the same as for the original quick sort. So the full worst case for quick sort with mid-point pivot is no better than that when the rst element is used as the pivot.

2.3

Heap sort
Now we look at another type of sort which is n log n but which doesnt need to make a copy of the array. So far we have treated arrays in their natural order, dealing with each element in turn. In this section we view arrays as binary trees and deal with the elements in an order which is natural for trees.

2.3.1

Heaps

We think of an array as a binary tree by taking the rst entry to be the root, the next two entries to be its children, then next four entries to be the grandchildren, and so on. So we think of [a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10] as the tree

Heap sort 36 a0 / a1 / \ / a3 a4 a5 / \ / \ a7 a8 a9 a10 \ a2 \ a6

The top node is at index 0 in the array. The element at indexes 1 and 2 are the next level nodes, the children of the node at index 0. The next 4 indexes are the nodes at the third level in the tree, and so on. It can be seen that the children of the node at index i are at indexes 2i + 1 and 2i + 2. Consider the index (n/2) 1 in the array. If n is odd then n/2 = (n 1)/2 and the children of (n/2) 1 are at indexes n 2, n 1. If n is even there is only one child at index n 1. Any children of nodes to the right of (n/2) 1 would be outside the range of the array, so all the indexes in the second, right hand, half of the array must be leaf nodes. An array is a heap if when it is written as a binary tree every parent node is greater than or equal to each of its children. For example, 33 / 27 / \ 10 / \ 7 2 is a heap, but 33 / 27 / \ 10 / \ 7 6 8 / \ 5 4 3 \ 2 / \ 1 [33 27 2 10 8 3 1 7 6 5 4] 8 / \ 5 4 1 \ 6 / \ 3 [33 27 6 10 8 1 3 7 2 5 4]

is not a heap. We use this model as this basis for heap sort. The sort has two stages. First the array to be sorted is turned into a heap, then the heap is reordered so that the nal array is in ascending order.

2.3.2

Turning arrays into heaps

In what follows we shall call the entries in the array keys.

Heap sort 37 We begin by describing the algorithm sift which takes an index p (for place) and an array with the property that the keys to the right of p are in heaps. The algorithm returns the array with the keys to the right of p 1 in heaps. sift(int p, size, S[]) { int siftkey := S[p] int parent := p bool notfound := true int largerch while ((2*parent+1 <= (size-1)) && (notfound)) { if (2*parent+1 <(size-1) && S[2*parent+1]<S[2*parent + 2]) largerch := 2*parent + 2 else largerch := 2*parent+1 if (siftkey < S[largerch]) { S[parent] := S[largerch] parent := largerch } else notfound := false } S[parent] := siftkey }

The input size is the number of elements in the input array and p is the current index to be put in heap order. This element is compared to its two children. If it is smaller than either of them then it is swapped with the larger of its children. This may have destroyed the heap order of the tree beneath the new position of the parent. So the process must be applied again until the parent is in the correct position. For example, suppose S is the following array

[2 5 11 8 10 1 4 3 7 9 6] / 5 / \

2 \ 11 /\ 1 4

8 10 /\ /\ 3 7 9 6

which has the property that the elements to the right of index 1 are in heaps. Calling sift(1,11,S) has the following eect

Heap sort 38 siftkey = 5 parent = 1 S[3] < S[4] largerch = 4 so S[1] := S[4] parent := 4 2 / 10 / \ 8 /\ 3 7 siftkey = 5 parent = 4 S[9] > S[10] largerch = 9 so S[4] := S[9] parent := 9 10 /\ 9 6 2 / 10 / \ 8 /\ 3 7 siftkey = 5 parent = 9 2*parent+1 > 10 largerch = 9 so S[9] := siftkey / 10 / \ 8 /\ 3 7 9 /\ 5 6 9 /\ 9 6 2 \ 11 /\ 1 4 \ 11 /\ 1 4 \ 11 /\ 1 4

We end up with the array [2 10 11 8 9 1 4 3 7 5 6] which has the property that entries to the right of index 0 form a forest of heaps. We noted in the previous section that all the nodes at indexes to the right of (n/2) 1 are leaf nodes, so the right hand half of an array is automatically a forest of heaps. We use an algorithm called makeHeap which calls sift on the indexes from (n/2) 1 to 0 in turn, at the end of each call the array will have one extra index in heap order. At the end of makeHeap the array will be a heap. makeHeap(int size, S[]){ for i from ((size/2)-1) to 0 do { sift(i,size,S) } } For example, calling sift(0,11,S) where S is the array [2 10 11 8 9 1 4 3 7 5 6] from the previous example, will turn S into a heap [11 10 4 8 9 1 2 3 7 5 6] 11 / \ 10 4 / \ / \ 8 9 1 2 /\ /\ 3 7 5 6

Heap sort 39

2.3.3

Sorting heaps

To turn this heap based approach into a sorting algorithm we need to give an algorithm which takes a heap and returns the same elements sorted in increasing order. The basis of such an algorithm is the observation that the largest element of a heap is always at index 0. We swap the rst and last element of the heap, so that the largest element is now in its correct position. Using the example from the previous section we have [11 10 4 8 9 1 2 3 7 5 6] becomes [6 10 4 8 9 1 2 3 7 5 11]

We then call sift(0,n-1,S) on the rst n 1 elements in the array, after which the second largest element of the array will be at index 0. [6 10 4 8 9 1 2 3 7 5 11] becomes [10 9 4 8 6 1 2 3 7 5 11]

then we can put the key at index 0 in its correct place [10 9 4 8 6 1 2 3 7 5 11] becomes [5 9 4 8 6 1 2 3 7 10 11].

We carry on calling sift and then putting the key at index 0 into its correct place until the array is sorted. The following is the algorithm for heap sort. heap(int size, S[]) { makeHeap(size, S) for i from size-1 to 1 do{ int hold := S[i] S[i] := S[0] S[0] := hold sift(0,i,S) } }

2.3.4

Complexity analysis of heap sort

Suppose that we have an array of size n, and that n = 2r 1 for some integer r. When p = 0, i.e. at depth 1, the while loop in sift(0,n,S) is executed for parent=0, parent=1 or 2, parent=3 or 4 or 5 or 6, etc. In other words the while loop is executed for one value of parent at each level in the tree except for the bottom level. parent=0 / \ parent=1 parent=2 / \ / \ parent=3 parent=4 parent=5 parent=6 . . . . . . parent=2^(r-1)-1 ............. parent=2^r-2 level 1 2 3 . . . r

Heap sort 40 In worst case the while loop in sift(p,n,S) takes 3 record copy and compare operations, and, when p = 0, is executed r 1 times. Thus in worst case sift(0,n,S) takes 2+3(r-1) record operations where n = 2r 1. When p is at level 2, i.e. when p = 1 or p = 2, the while loop can be executed r 2 times. So in worst case sift(1,n,S), sift(2,n,S) each take sift(p,n,S) takes 2r1 2, 2+3(r-2) record operations. In general for the 2l1 nodes p = 2l1 1, 2l1 , 2l1 + 1, . . . , 2l 2 at level l 2+3(r-l) operations. For p = (n/2)1 = p is the last node at level r 1, so in makeHeap, sift(p,n,S) is executed for all values of p in levels 1, 2, 3, ..., r 1. Thus the worst case number of operations in makeHeap(n,S) is M (n) = (2+3(r1))+2.(2+3(r2))+4.(2+3(r3))+. . .+2r2 (2+3(r(r1))) where n = 2r 1. We shall not give the calculation here, but it is possible to re-arrange the expression for M (n) and show that M (n) = 4n 3 log(n + 1) 1.

Finally, heap(n,S) executes makeHeap(n,S) then performs 3 record copies and executes sift(0,j,S) for every value of j from n 1 down to 1. We have seen above that in the worst case, the while loop in sift(0,j,S) executes for a node in all but the last level of the tree. If j is the size of the tree then its depth is s where 2s1 j 2s 1 and the worst case number of record operations in sift(0,j,S) is 2 + 3(s 1). Now, j starts at n 1 = 2r 2 and for 2r 2 j 2r1 sift(0,j,S) can execute 2 + 3(r 1) operations, giving record operations in total. For 2r1 1 j 2r2 sift(0,j,S) can execute 2 + 3(r 2) operations, giving record operations. Carrying on in this way we get that heap takes W (n) = M (n)+(5+3(r1))(2r1 1)+(5+3(r2))2r2 +. . .+(5+3(1))2+(5+3(0)) operations. This can then be re-arranged to get W (n) = 3(n 1) log(n + 1) + 3n 3 (5 + 3(r 2))2r2 (5 + 3(r 1))(2r1 1)

Heap sort 41

2.3.5

Java code for heap sort

public class HeapSort{ final int size = 12; int S[] = {6,10,4,8,9,1,7,3,2,5,11,5}; private int i, j; public HeapSort() { System.out.println("The original array: "); for (i = 0; i < size; i++) System.out.println(S[i]); heap(size,S); System.out.println("The sorted array: "); for (j = 0; j<size; j++) System.out.println(S[j]); } public void sift(int p, int S[], int n) { int siftkey = S[p]; int parent = p; int notfound = 1; int largerch; while (2*parent+1 <= (n-1) && notfound > 0) { if (2*parent+1<(n-1) && S[2*parent+1]<S[2*parent + 2]) largerch = 2*parent + 2; else largerch = 2*parent+1; if (siftkey < S[largerch]) { S[parent] = S[largerch]; parent = largerch; } else notfound = 0; } S[parent] = siftkey; } public void makeHeap(int size, int S[]){ for (i = ((size/2)-1); i>=0; i--) sift(i, S, size); } public void heap(int size, int S[]) { makeHeap(size, S);

Heap sort 42 for (i=size-1; i>=1; i--){ int hold = S[i]; S[i] = S[0]; S[0] = hold; sift(0, S, i); } } public static void main(String[] arg) { HeapSort h = new HeapSort(); } }

Chapter 3 So what do we mean by complexity?


So far we have analysed algorithms by counting the compare and copy operations which can arise in the worst case. Rather than counting the exact number of operations, we can get a reasonable idea of the performance of an algorithm by approximating the number of operations. For example, we may say that insertion sort takes approximately n2 operations on an input array of size n. We will now dene formally what is meant by approximately using the so-called big-O notation. Also, rather than looking just at how an algorithm behaves in worst case, it is often useful to know what the expected performance is in an average (and hopefully typical) case.

3.1

Big-O notation - O(f (n))


We have seen in the rst chapter that the time taken by an algorithm which performs n operations on input of size n increases much more slowly than one which performs n2 operations. While n is small, 20n behaves more like n2 than n, but once n gets large (over 40) 20n is closer to n than n2 . The set of all functions which are eventually dominated by a constant multiple of f is called big-O of f and written O(f (n)). Because our functions are counting operations they never have negative values, so we are only interested in functions f with the property that if n 0 then f (n) 0. Thus we can use the following denition O(f (n)) = {g | for some c, N, g(i) cf (i), for all i N }.

For i 1 we have i i2 so n O(n2 ).

We have i 2i and so log2 i log2 2i = i log2 2 = i. Thus n log2 n O(n2 ). If g O(f (n)) then for some N, c we have that for i N , g(i) c.f (n). So for any m we have m.g(i) m.c.f (i) and hence m.g(n) O(f (n)). {n, 20n, 3n2 + 5n + 8, n log2 n} O(n2 )

For i 7 we have 3i2 + 5i + 8 4i2 and so 3n2 + 5n + 8 O(n2 ).

For all i we have 379i2 379i2 so 379n2 O(n2 ).

For i 1 we have 20i 20i2 so 20n O(n2 ).

Omega notation - (f (n)) 44 If g, h O(f (n)) then for some M, d we have that for i M , h(i) d.f (i). So for i max(N, M ), g(i)+h(i) (c+d)f (i) and hence g(n)+h(n) O(f (n)). {n + 8, 79n 4, 23n2 6n + 11, 4n log2 (n + 3)} O(n2 ) If W (n) is the number of operations carried out in worst case using an algorithm and if W (n) O(f (n)) then we say the algorithm is order at most f (n). Since 3n2 n + 2 2n2 , for n 1, 2 2 we have that the version of quick sort described above is of order at most n2 .

3.2

Omega notation - (f (n))


It is also useful to be able to talk about the set of functions which eventually become bigger than some function f . The set of all functions which eventually dominate a positive constant multiple of f is called omega of f and written (f (n)). (f (n)) = {g | for some c > 0, N, g(i) cf (i), for all i N }. We have 5i2 i2 so 5n2 (n2 ). For i 1 we have 5i2 i and 5i3 i2 so 5n2 (n) and 5n3 (n2 ).
1 Since i2 10i 2 i2 for i 20 we have n2 10n (n2 ).

We also have n2 1 .5n2 , so n2 (5n2 ). 5

{5n2 , 5n3 , n2 10n, n!, 23n2 6n + 11} (n2 ) Since i i2 for any i 1 we have that n does not dominate n2 . Similarly, log2 i i for any i 4, so log2 n does not dominate n and hence n log2 n does not dominate n2 . If W (n) is the number of operations carried out in worst case using an algorithm and if W (n) (f (n)) then we say the algorithm is order at least f (n). Since 3n2 n + 2 n2 , for n 2, 2 2 we have that the version of quick sort described above is of order at least n2 .

3.3

Theta notation - (f (n))


We have seen that some functions both dominate and are dominated by n2 . Such functions essentially behave like n2 ; they dont grow very much faster or very much slower than n2 as n increases.

Small-o notation - o(f (n)) 45 n 3n2 + 2 all dominate and are dominated by 2 2 n2 . They are all also in both O(n2 ) and (n2 ). The functions which both dominate and are dominated by f (n) are exactly the functions which are in both O(f (n)) and (f (n)). We say that a function g(n) has order f (n) if it dominates and is dominated by f (n), and we call the set of functions of order f (n), (f (n)). So 5n2 , 23n2 6n + 11 and (f (n)) = O(f (n)) (f (n)). 3n2 n + 2} (n2 ). 2 2 So we now have a formal denition of what we mean when we say that the quick sort algorithm given in the previous section is, in worst case, of order n2 . Similarly, since 3n log nn+1 (n log n), we have that the second version of the merge sort algorithm above is of order n log n. Of course, it is also true that {5n2 , 23n2 6n + 11, 3n log n n + 1 (3n log n n + 1) and that 3n log n n + 1 (3n log n) so that merge sort is also of order 3n log nn+1 and of order 3n log n. However, these are considered to be the same order. For a composite polynomial function the convention is to describe its order to be its highest term, and to ignore any co-ecients. So, for example, we usually say that 28n5 6n + 28n3 11 + n1 has order n4 . We have that g(n) (f (n)) if and only if there exist c, d > 0 and an integer N such that c.f (i) g(i) d.f (i), for all i N.

3.4

Small-o notation - o(f (n))


The asymptotic bound provided by the big-O notation may or may not be asymptotically tight. The bound 2n2 O(n2 ) is asymptotically tight, but the bound 2n O(n2 ) is not. We use o-notation to denote an upper bound that is not asymptotically tight. We formally dene o(f (n)) as the set o(f (n)) = {g | for any c > 0, N, g(i) < cf (i), for all i N } For example 2n o(n2 ) but 2n2 o(n2 ).

A hierarchy of orders 46 The denitions of big-O and small-o are similar. The main dierence is that if g(n) O(f (n)), the bound g(n) cf (n) holds for some constant c, but in g(n) o(f (n)), the bound g(n) < cf (n) holds for all constants c > 0. Intuitively, in the small-o notation the function g(n) becomes insignicant relative to f (n) as n approaches innitely; that is, limn
g(n) f (n)

= 0,

(when g o(f )).

Some authors use this limit as a denition of the small-o notation. Examples: 4n3 + 2n2 o(n3 log(n)), 4n4 + 2n2 o(n4 ) and
1 n

o(1).

3.5

A hierarchy of orders
We have seen that 5n2 (n2 ), so 5n2 and n2 are of the same order. We have also seen that n log n O(n2 ) but n2 O(n log n), so n log n and 2 are not of the same order. In fact, n log n is of order strictly less than n2 . n If g(n) O(f (n)) and f (n) O(h(n)) then there exist c, d, N, M such that g(i) c.f (i) for i N and f (i) d.h(i) for i M . So g(i) cd.h(i) for i max(N, M ) and hence g(n) O(h(n)). Thus we have that if f (n) O(h(n)) then O(f (n)) O(h(n)). We say that f (n) has order type less than or equal to h(n) if O(f (n)) O(h(n)), and order type strictly less than h(n) if O(f (n)) O(h(n)). An algorithm A1 is an improvement over an equivalent algorithm A2 if the order type of the worst case number of operations required for A1 is strictly less than the order type for A2.

3.6

Worst case analysis


In most cases the number of operations performed by an algorithm depends not only on the size of the input but also on the particular contents of that input. This is not always the case. For example, an algorithm which counted the number of equal entries in two arrays of size n would need n compare operations regardless of the contents of the arrays. In every case this algorithm has order n. For algorithms which use dierent numbers of operations on dierent inputs of the same size, we have concentrated on calculating the order in the worst case. To be sure that we have an exact answer we need to do two steps: rst we analyse the algorithm to nd an upper bound on the number of operations that can be carried out, then we nd an example which actually takes this number of operations. For example, in the merge sort 0 algorithm we showed that there could be at most n(n 1) + n log n operations but we didnt show that there actually was an array which required this number of operations.

Worst case analysis 47 Consider [2, 1]. This takes 4 = 2(21)+2log 2 operations so it is worst case for n = 2. Now let n = 2r and suppose that merge sort 0 on [2r , 2r 1 , . . . , 2, 1] uses S(2r ) = 2r (2r 1) + 2r log 2r

operations for all r < r. This implies that the algorithm run on [2r , 2r1 , . . . , 2, 1] uses n n S( ) + S( ) + M (n) 2 2 operations, where M (n) is the number of merge0 operations. Calculating this number by stepping through the algorithm we get that M (n) = n2 /2 + n So, adding up the formulae, we eventually see that [n, n 1, . . . , 2, 1] takes n n n n n2 n2 n2 S(n) = 2( ( 1)+ log )+ +n = n+n((log(n))1)+ +n = n2 n+nlog(n) 2 2 2 2 2 2 2 operations. So this is the worst case value for merge sort 0. If we look at merge sort1 on the array [2, 1] we nd that this takes 5 = 3 2 log 2 2 + 1 operations. In general, we take the integers 2n, 2n 2, . . . , 2 and order them into an array [a0 , . . . , an1 ] which takes worst case number of operations to sort. Then we take the integers 2n 1, 2n 3, . . . , 1 and order them into an array [b0 , . . . , bn1 ] which takes worst case number of operations to sort. Then we form a new array [a0 , . . . , an1 , b0 , . . . , bn1 ], and assume by induction that merge sort1 takes 2(3n log n n + 1) operations to sort the two halves of this array into the form [2, 4, . . . , 2n, 1, 3, . . . , 2n 1] and then 3(2n) 1 operations to merge this into the nal sorted array. This gives a total of W (2n) = 6n log n 2n + 2 + 6n 1 = 3(2n) log 2n (2n) + 1 so again we get a example matching the worst case. This doesnt actually tell what the worst case array looks like but [4, 2, 3, 1], [8, 4, 6, 2, 7, 3, 5, 1]

are worst case arrays for n = 4 and n = 8 respectively. The details of the above computations will be discussed in class.

Best case analysis 48

3.7

Best case analysis


Looking at the number of operations required in an algorithm in the worst case does not give all the information that we may need. For example, it may be that the worst case only occurs very rarely. Also, how can we distinguish between two algorithms which perform in the same way on their worst cases? One thing that we can do is to also look at the best possible performance of the algorithm.

3.7.1

Best case analysis of bubble sort

The bubble sort algorithm that we looked at in the rst chapter is for i from 1 to n-1 do { for j from 0 to n-1-i do { if ( S[j] > S[j+1] ) { x := S[j] S[j] := S[j+1] S[j+1] := x }} } No matter what the actual values in the input array are, the outer and inner for loops are executed the full number of times. Also, in all cases the inner loop performs the test, so when this algorithm is executed there are at least (n 1) + (n 2) + . . . + 1 = n(n 1) 2

copy and compare operations carried out. This gives a lower bound on the best case complexity of bubble sort. It is not hard to see that an already sorted array takes this number of operations to sort (for example we can use induction). So it is in fact the best case. We often use B(n) to denote the best case number of operations required for input of size n. For bubble sort we have B(n) = n(n 1) . 2

3.7.2

Best case analysis of merge sort1

In all cases the while loop in merge1 is executed at least (n/2) times, each time involving 2 compare and copy operations. Thus, for n 2, merge1 always executes at least n n + 2( ) = 2n 2 operations. Thus if B(m) denotes the best case number of operations carried out by merge sort1 on an array of size m we have B(n) 2B(n/2) + 2n.

Best case analysis 49 Using the same techniques as for calculating the worst case, we get that B(n) 2n log n. It can be seen by induction that if S is an already sorted array of size n then merge sort1 executes 2n log n copy and compare operations. So the lower bound can be achieved and, for merge sort1 B(n) = 2n log n.

3.7.3

Best case analysis of quick sort

We analyse the quick sort algorithm described in Section 2.2.1 above. The partition aspect of the quick sort algorithm is most ecient on already sorted arrays. But in these cases it partitions the array into two unequal parts, requiring, over the full execution of the sort, more recursive calls to the sort routine. The minimum number of subcalls to quick sort occurs when the partition always produces two equally size parts, i.e. returns k = (n 1)/2. To ensure that k = (n 1)/2 the if statement inside partition has to be entered (n 1)/2 times. This, in this case, partition executes 2+ 3(n 1) n 1 + = 2 + 2(n 1) = 2n 2 2

copy and compare operations. Then a lower bound on the best case performance of quick sort is given by n1 B(n) 2n + 2B( ) 2 We can calculate this value in the same way as for merge sort1 and show that this has order n log n. Actually running the quick sort algorithm on the array [2, 1, 3] we nd that it takes 6 = 2(n + 1) log(n + 1) 3n 1 operations, where n = 3. So B(n) 2(n + 1) log(n + 1) 3n 1, when n = 3

It is possible, using techniques similar to those in Chapter 2, to show, by induction, that B(n) 2(n + 1) log(n + 1) 3n 1 and so in best case quick sort is no worse than n log n. Although we shall not show it here, it turns out that this lower bound is in fact the best case and B(n) = 2(n + 1) log(n + 1) 3n 1.

Average case analysis 50

3.8 3.8.1

Average case analysis Expected and typical values

Suppose that we have an algorithm with two classes, C1 and C7, of inputs of size n. Inputs from C1 cause the algorithm to execute n operations and then terminate and inputs from C7 cause the algorithm to execute 7n operations and then terminate. Suppose also that we have a six sided fair die with 1 on ve of its sides and 7 on the other side. We repeatedly run the algorithm using the die to determine which class to take the next input from. We select an input from C1 if we throw a 1 and from C7 if we throw a 7. Since the die is fair we expect to throw a 1 about ve times as often as we throw a 7. Thus we have an experiment, or a trial, whose possible outcomes are that n operations are executed and that 7n operations are executed. Recall from CS110 that the probability of the die landing with a given side up is 1/6, and so the probability of throwing a 7 is 1/6 and the probability of throwing a 1 is 5/6. If a typical outcome of an experiment is one which is most likely to happen, then the typical number of operations executed by the algorithm is n. Unfortunately it is not always easy to determine the typical outcome of an experiment, and indeed there may not be one. For example, if the die had three 1s and three 7s then both outcomes would be equally likely. Rather than talking about a typical case we talk about the expected or average case. If we repeat the algorithm running experiment often enough we expect that in roughly 5/6ths of the time n operations are executed and in one in six cases 7n operations are executed. Thus, if we run the algorithm 600 times we expect about 1200n operations to be executed. (Remember, this is only an expectation not an estimate. It is possible that either 600n or 4200n operations will be executed.) If we have an experiment whose possible outcomes (sample space) are {n1 , n2 , . . . , nk }, and if the probability of outcome ni is pi then the expected or average value of the experiment is n 1 p1 + n 2 p2 + . . . n m pm . So in the above example, the average number of operations executed is 1 5 n + 7n = 2n. 6 6 Note, no actual execution of the algorithm will use 2n operations, this is just an expected average over many executions. When we talk about the average case complexity of an algorithm we shall mean the expected value, not a typical value. However, expected and typical values are similar if the behaviour of the algorithm is reasonably uniform.

Average case analysis 51 For example, if we have three classes of input, C1, C4, and C7, which take n, 4n, and 7n operations respectively, and a die with 1, 4, 4, 4, 4, 7 on its sides, then the expected number of operations is given by 1 4 1 n + 4n + 7n = 4n 6 6 6 which we would also say is the typical, or most likely value.

3.8.2

Average case analysis of linear search

Suppose that we have an unordered array of n elements and suppose that we have an algorithm which starts at the rst entry and searches the array for a particular given value. int linear_search(int value, int S[]) { int i := 0 boolean Found := false i=0 while (i<=n-1) && (!Found) do { if (S[i] == value) notFound := true else i++ } if (Found) return i else return -1 } If the value being searched for is at index k then the algorithm must perform k + 1 record comparisons to nd it. If the value is not in the array then the algorithm will perform n record comparisons. If the array is unordered we assume that the element we are looking for is equally likely to be in any position. If the element is in the array then the probability that it is at index k is 1/n. Thus the expected number of record comparisons performed by linear search in this case is 1 1 1 n+1 + 2 + ... + n = . n n n 2 If the element is not in the array then the number of record comparisons executed is n. Thus if we have that the probability that the given element is in the array is p then the expected (average) number of record comparisons carried out by linear search is A(n) = p p p(n + 1) + (1 p)n = n(1 ) + . 2 2 2

Average case analysis 52

3.8.3

Average case analysis of merge sort

We shall not calculate the average case number of compare and copy operations carried out by merge sort1. But recall that in best case B(n) = 2n log n 2(n 1) so merge sort1 is of order n log n in the best case. In the worst case W (n) = 3n log n n + 1.

so merge sort 1 is of order n log n in the worst case. If A(n) is the average case number of copy and compare operations carried out by merge sort1 we have 2n log n 2(n 1) = B(n) A(n) W (n) = 3n log n n + 1. So A(n) (n log n) and hence merge sort1 is of order n log n in the average case, and in fact in every case.

3.8.4

Average case analysis of quick sort

When we run the partition algorithm on an array of size n it will return an index k which is the correct position for the pivot value, the value originally at index 0. This pivot position is determined by the number of elements in the array that are smaller than the pivot value. If there are k elements in S which are smaller than the pivot value, S[0], then partition performs 2 + 3k + (n 1 k) = n + 2k + 1 copy and compare operations. There can be 0, or 1, or ,... , or n 1 elements in the array that are less than the pivot value. If the array is unordered then the probability of each of these numbers is equally likely, so, for each k between 0 and n 1, the probability 1 that there are k elements less than the pivot value is n . Thus if A(n) is the average number of copy and compare operations carried out by quick sort then we have
n1

A(n) =

1 (2k + n + 1 + A(k) + A(n 1 k)) n k=0 2 n1 A(i) n i=0 2 n2 A(i) n 1 i=0

This can be added up to give A(n) = 2n + Substituting n 1 for n we have A(n 1) = 2(n 1) +

Average case analysis 53 and so nA(n) (n 1)A(n 1) = 4n 2 + 2A(n 1)

nA(n) = 4n 2 + (n + 1)A(n 1) 2 n+1 + A(n 1) n n n+1 2 4 + n1 n2

A(n) = 4

A(n) = 4

2 n+1 2 + 4 n n n1 ... + 2 n+1 4 3 2

+ +

n+1 A(1) 2

A(n) = 4(n + 1) 2(n + 1)

1 1 1 + + ... + n+1 n 3 + n+1 2

1 1 1 + + ... + (n + 1)n n(n 1) 32

Theorem 3

1 1 1 1 + . . . + < loge n < 1 + + . . . + 2 n 2 n1

Using this result and the fact that (loge n) = (log2 n) it can be shown that A(n) (n log2 n). Thus quick sort is of order n log n in average case.

Chapter 4 Searching algorithms


4.1 Overview
We shall assume that data items contain several data elds, one of which will be nominated as the key. In dierent contexts the same data may have dierent keys. For example, a telephone entry contains a name, an address and a telephone number. It is possible to use either the name or the number as the key. The data is searched by searching for the specied key and the type of search which can be deployed depends on whether the keys are sorted. In a telephone book entries are normally sorted by name. Thus if the name is taken as the key, so we are searching for an entry by name, then we can use an ecient binary search. However, if we attempt to search for any entry by number, so that number is being used as the key, a linear search is the only alternative. In this section, with the help of example code, we shall look at ways of structuring data so that ecient search algorithms can be used. This chapter contains only outline sketch notes to provide the basic motivation for this part of the course. You will need to make notes of your own from lectures and should also look in the recommended textbooks.

4.2

Doing better than linear search


[TA86, pp. 431 ] [NN96, pp. 4 ]

If the data to be sorted is not ordered in any way then a linear search, in which each element in the data set is compared with the target, is the best we can do. A linear search has order (n), where n is the size of the data set to be searched. If the data is ordered then we may be able to search more eciently, if the data is also structured in an appropriate way. We shall see below that if the data is ordered and structured appropriately then there is a search algorithm, binary search, which is of order (log n). But rst we consider whether we should try to do better than a linear search.

Doing better than linear search 55

4.2.1

Ordering the data?

Linear searches can be used on any data set, and they are of order (n). Specialised searches require the data to be sorted and even a good sorting algorithm has order (n log n). The best possible search algorithms have order (log n) so if the data is only to be searched once then it is not worth sorting it rst. However, the assumption is that the data will be searched a large number, N say, of times and one (n log n) operation followed by N (log n) search operations will be more ecient that than N (n) search operations when N is large.

4.2.2

Binary search

[TA86, pp. 441 ] [NN96, pp. 1011 (iterative version)] [NN96, pp. 4752 (recursive version)]

This search assumes that the input is held in an array and sorted by key, with the lowest key rst. The key to be found, target say, is compared with the key at the mid-point on the list. If they are the same then the entry has been found. If target is less than the mid key then the search is repeated on the lower half of the data, if target is greater the mid key then the search is repeated on the upper half of the data. On a sorted array binary search has order (log n). Given the above description, it is natural to implement binary search as a recursive algorithm, as it was done in chapter 1. However, with any recursive algorithm there is a least a theoretical possibility of running out of stack space. On small machines and large sets of data this theoretical possibility can become a reality. It is possible to implement a binary search algorithm which uses iteration rather than recursion. You should attempt to re-write the algorithm so that the recursion is replace with iteration, if you get stuck you can nd out how to do it in [NN96].

4.2.3

Interpolation search

[TA86, pp. 443 ] [NN96, pp. 318320]

Interpolation search is a modication of binary search which attempts to exploit knowledge of the distribution of the data set. If we assume that the keys are evenly distributed then instead of starting our search for the target in the midpoint of the data set we begin at a point close to where we expect the target to be. For example, suppose we believe that our phone book contains approximately the same number of names beginning with each letter of the alphabet. If the target key for which we are searching begins with B then instead of comparing rst with the key in the middle of the set (which is likely to begin with J or K) we could begin by comparing with a key which is 1/13th of the way into the data set. (If you think about this it is quite likely that this is what you actually do when you use a telephone directory.) An algorithm for interpolation search in which the start key depends on the value of the target key is given in [NN96].

Dealing with dynamic data structures 56

4.3

Dealing with dynamic data structures


The discussion in the previous section applies realistically to telephone directories which are printed once and then used repeatedly. However, data structures used in programs often change as the program is executed. Thus we have to have insertion operations which allow entries to be added to the data set at run time without upsetting the order of the data. The order of such insertion operations has to be included in the overall order of the searching algorithm. The binary search algorithm as described above requires the data to be stored in a sorted array. If we wish to add or delete an entry from this array then we rst of all have to nd the insertion/deletion point and then move all the remaining entries up the array, for an insertion, or down the array for a deletion. This operation is ultimately linear, so, if the expected number of insertions and deletions is comparable with the expected number of searches this is worse than just using linear search. An alternative is to store the data in a linked list. In this case insertion and deletion are as ecient as searching because, once the insertion/deletion point has been found, the cost of inserting or deleting a link is constant. However, a binary search is not as ecient as linear search on a linked list because the list has to be traversed to nd its mid-point! In the next two sections we shall consider methods of structuring the data so that it can be dynamically modied but more ecient searching can still be be done. In the rst case, hash coding, we ultimately use a linear search on a linked list but we structure the data so that the length of the list is small. Hash coding was introduced in the course CS1211, so the material presented in this blue book shouldnt be completely new to you. In the second case, binary search trees, we structure the linked list into a tree to allow what is essentially a binary search to be carried out. Binary search trees were also introduced in CS1211, but will be explained in more detail here.

4.4

Hash coding
[TA86, pp. 521 ] [CLR90, pp. 219243] [NN96, pp. 326332] [Knu73, 506549]

Hash coding is a method of structuring data which is aimed at increasing the eciency of a search. Instead of a linked list of data we have several linked lists stored in an array (hash table). A hash function is used to determine which list a given data element is stored on, and only that list has to be searched for that element. Hash coded data is easier to maintain than a binary tree structure in that insertion and deletion of elements is reasonably straightforward but, in worst case, the order of a search on a hash coded structure is the same as for linear search. In practice, however, hash coding is usually much better than linear search and it is the method that most compilers use to keep track of the variables in a program. In this discussion we shall assume that our keys begin with letters, but it is clearly applicable to any situation. We make one linked list for each letter of the alphabet, and insert all the

Hash coding 57 keys that begin with a particular letter on a separate list. (The elements of the array which are the heads of each of these lists are often called buckets.)
E adrian E boing E count E delta E delay E drain E E angle E bcount E

beta

dozy

This might improve search times by a factor of 26, whilst keeping insertion time small. The catch is that the keys may not be evenly divided by initial letter. In worst case all the keys may begin with the same letter (which is the case when the data is the list of all reserved identier names in our compiler generator rdp) and there is no improvement in eciency. So rather than assigning keys to buckets on the basis of the rst letter of the key, we use a hash function. A hash function is simply a calculation on a key that yields a random number. Hash functions are deterministic that is they always yield the same number for a given key. Perhaps the simplest hash function for a key is to add together the ASCII values for all of the characters in the string, and then take the modulus of the result with the number of sub-lists available, giving a hash value of n say. If we are adding a new element we then insert it at the top of the list at index n, and if we are searching for an element then we perform a linear search on the list at index n. For example if we have a hash table of size 7 and a hash function which takes the ASCII value of each character and adds them up modulo seven, then our hash table might look like follows. 0 1 2 3 4 5 6 NULL bcb bcc cid NULL aab NULL NULL NULL cic NULL NULL

In the above hash table we note that the ASCII value for b=98, c=99 and i=105, which implies that the hash function for cic gives us the value

Binary search trees 58 c+i+c= 303 = 2 modulo 7. Therefore if we are searching for the string cic we only have to look at the linked list starting at two. There are many other options for hash functions, some of which perform considerably better than the above. But we will not discuss these in this course.

4.5

Binary search trees


[TA86, pp. 448 ] [CLR90, pp. 244262] [NN96, pp. 321 ] [Knu73, 422450]

In the case of hash coded structures the ultimate order of the searching algorithm is still (n), all that is done is that the constant of proportionality is reduced. In order to use a binary search on a dynamic data structure, and hence get worst case (log n) search time, we structure the data as a binary tree. Recall that a binary tree is a rooted tree in which each node has at most two children.

4.5.1

Traversing binary trees

The following are the three standard tree-traversal algorithms for binary trees: preorder visit the root, then traverse the left subtree in preorder then traverse the right subtree in preorder, inorder traverse the left subtree in inorder, then visit the root, then traverse the right subtree in inorder, postorder traverse the left subtree in postorder, then traverse the right subtree in postorder then visit the root.
A A

)
B

G
B

)
D


E

x
F

)
C

~
D


H

)
E


G

x
H

x
L

Preorder: Inorder: Postorder:

ABDGCEHIF DGBAHEICF GDBHIEFCA

Preorder: Inorder: Postorder:

ABCEIFJDGKHL EICFJBGKDHLA IEJFCKGLHDBA

4.5.2

Structure of binary search trees

In order to use a binary search, the keys in the binary tree have to be ordered.

Binary search trees 59 In a binary search tree, all the left-hand descendants of a node with search key k have keys that are less than or equal to k and all the right-hand descendants have keys that are greater than or equal to k. The inorder traversal of such a binary tree yields the records in ascending key order. The following a two binary search trees which both correspond to the ordered array 30 47 86 95 115 130 138 159 166 184 206 212 219 224 237 258 296 307 314
30 47 184 115 47 30 86 95 138 130 159 212 206 219 237 296 258 307 314 86 95 115 258 296 307 314

166

224

To search for a key in a binary tree the tree is traversed, starting at the root, and the target is simply compared with the current node key. If the target is equal to the current key then the search is complete, if it less than the current node then the left child is selected next, otherwise the right child is selected. To delete a record with key k in the tree, we rst nd it, and then if it has two children we look for its inorder successor. By the denition of a binary search tree, its successor will be the next element found using an inorder traversal. The successor element can not have a left subtree, since a left descendent would itself be an inorder successor of k. If the node containing k is a leaf (i.e. it has no children), it can be deleted immediately. If the node containing k has only one subtree, then its child node can be moved up to take its place. The process of inserting a node in the tree is similar.

4.5.3

Optimum search trees

[TA86, pp. 459 ]

The eciency of search on a binary search tree depends on the structure of the tree. The worst case number of comparison operations that will be required is the length of the longest branch of the tree. This is called the depth of the tree. The two examples in the previous section are binary search trees for the same data however, the maximum comparisons required is 5 for the rst tree and 19 for the second tree. In fact we can see that in worst case the binary search tree will just be a linked list and we will have made no search eciency improvement at all. We obviously want to construct the trees which allow for the most ecient search, i.e. the ones with the lowest depth. In the case where the probabilities of dierent keys being required are dierent, an optimum search tree is one which minimises the expected number of comparisons for a given set of keys and probabilities. We shall only consider the case where we

Binary search trees 60 assume that the probability of searching for each value is equally likely, but the techniques below can be extended to non-evenly distributed values by including the probability of each value in the balance calculations. For keys whose values are all equally likely, an optimum binary search tree has all of its branches essentially the same length. Of course, unless there are exactly a power of 2 data items then it will not be possible for the branches to be exactly the same length, but the branches could be within one of being the same length. Constructing optimum trees is hard (costly), the fastest known algorithm to construct such a tree has order (n2 ). But if we relax the requirement slightly and allow the branches to be close to optimum length then there are ecient algorithms for constructing and maintaining such trees.

4.5.4

Balanced trees

[TA86, pp. 461 ] [Knu73, pp. 451470]

The balance of a node in a binary tree is dened to be the depth of its left subtree minus the depth of its right subtree. A balanced binary tree is a binary tree in which the balance of every node is either 0, 1 or 1.
4 2 16 1 12 1 3 0 8 Balanced 0 14 0 18 0 3 0 12 0 14 0 29 27 1 16 0 18 1 34 0 31 0 32 0 37 0 49 60 1 27 0 41 1 52 0 71 0 0 97

Not balanced

We need to insure that when a node is inserted into a balanced tree the resulting tree is still balanced. This will not necessarily be the case if a simple insertion algorithm is used. We can add 39 in its natural position the above balanced tree without it becoming unbalanced, but not 4 or 101.
1 27 2 16 1 12 1 3 4 14 29 18 0 34 0 31 32 1 37 39 49 60 -1 41 2 52 -1 71 -1 97 101

Binary search trees 61 Fortunately there are ecient algorithms for inserting new nodes in to a balanced tree without destroying the balance. These algorithms perform local rotations to insure that the tree is structured correctly.

4.5.5

Rotations in a binary search tree

There are two rotations that we can perform which change the balances of the nodes but have the property that the new tree is still a binary search tree, so the nodes in a left subtree all have values which are less the parents value and the nodes in a right subtree all have values which are greater than the parents.
C A B

G
A

C D

)
G

~
A

~
F D

x
E

c
F

~
C

)
D

x
E Original

c
F Right rotation G

Left Rotation

We shall assume that below A the tree is balanced, so all the balances are 0, 1 or 1 and that A is unbalanced by 1, so it has a balance of 2 or 2. We shall look at four cases where rotations can restore the balance of such a tree, these cases will be when A has balance 2 and B has balance 1 or 1, and when A has balance 2 and C has balance 1 or 1. Case 1: A has balance 2 and B has balance 1 Since A has balance 2 the subtree from B has depth two greater than the subtree from C, so we assume that the depth from C is n and the depth from B is n + 2. Then the depth from A is n + 3. Since B has balance 1 the depth from D is one greater than the depth from E, and since B has depth n + 2, the depth from D must be n + 1.
A n+3 n+2 G B

C n

)
D n+1

x
E n

c
G

B n+2 ) ~ n+1 D A n+1 G n E C n c F G balance A = 0, balance B = 0

balance A = 2, balance B = 1

If we perform a right rotation on the node A then the depths of the trees from C, E and D remain unchanged, so the depth of the tree from A is now n + 1 and the depth of the whole tree (from B) is now one less, n + 2. Case 2: A has balance 2 and B has balance 1 Again, since A has balance 2, we can assume that the depth from C is n and the depth from B is n + 2. Since B has balance 1, the depth from E must be n + 1 and the depth from D must be n. Since E has depth n + 1, one of the children of E must have depth n, and since E has balance 0, 1 or 1 the other child must have depth n or n 1.

Binary search trees 62


% ) x
A n+3

B n+2

C n

a
E

~
C

a
G n

E n+2

n+1 D E F n H I n, n 1 n, n 1

c
G

)
B

c
F D

B n+1

n+1 C

c~ n c
F G

)
D

~
H left rotation on B

H I n, n 1 n, n 1

balance A = 2, balance B = 1

then right rotation on A

If we perform a left rotation on the node B and then a right rotation on the node A, then the depths of the trees from C, D, H and I remain unchanged, so the depth of the tree from B is now n + 1, the depth from A is now n + 1 and the depth of the whole tree (from E) is again one less, n + 2. The balances of the nodes C, D, F , G, H and I remain unchanged, the balance of E becomes 0, the balance of B becomes 0 or 1 depending on the depth of H, and the balance of A becomes 0 or 1, depending on the depth of I. Case 3: A has balance2 and C has balance 1 Since A has balance 2 we can assume that the depth from C is n + 2 and the depth from B is n. Since C has balance 1 and depth n + 2, the depth from F must be n and the depth from G must be n + 1.
A n+3 n G B C n+2 n+1 A n G B

F n

C n+2

C
F n

G n+1

)
D

x
E

c
G n+1 D

x
E

balance A = 2, balance C = 1

left rotation on A

If we perform a left rotation on the node A then the depths of the trees from B, F and G remain unchanged, so the depth of the tree from A is now n + 1 and the depth of the whole tree (from C) is now one less, n + 2. Case 4: A has balance 2 and C has balance 1 Since A has balance 2, we can assume that the depth from C is n + 2 and the depth from B is n. Since C has balance 1, the depth from F must be n + 1 and the depth from G must be n. Since F has depth n + 1, one of the children of F must have depth n, and since F has balance 0, 1 or 1 the other child must have depth n or n 1.
A n+3 n G B A

C n+2

G
B

w
F

n+1 n G B

C
A

n+2

n+1
C

) x
D

) x

C F E F G D E n G n + 1 c F G G G n, n 1 n, n 1 balance A = 2, balance C = 1 right rotation on C

G F G ) x n, n 1 n, n 1 n D E then left rotation on A

Binary search trees 63 If we perform a right rotation on the node C and then a left rotation on the node A, then the depths of the trees from B, G, H and I remain unchanged, so the depth of the tree from C is now n + 1, the depth from A is now n + 1 and the depth of the whole tree (from F ) is again one less, n + 2. The balances of the nodes B, D, E G, H and I remain unchanged, the balance of F becomes 0, the balance of C becomes 0 or 1 depending on the depth of I, and the balance of A becomes 0 or 1, depending on the depth of H.

4.5.6

Insertion in a balanced binary tree

We shall now show how to use the rotations described above as the basis of an insertion algorithm for balanced binary search trees. Begin with a balanced binary search tree, so all the nodes in the left subtree of a node have value less than that node, all the nodes in the right subtree have value higher than that node, and the balance of all the nodes is 0, 1 or 1. Suppose that we wish to add the key N . Traverse the tree until either a node labelled N is found, in which case the algorithm terminates, or until the insertion point for N is found, in which case make a new node labelled N and add it to the tree in the correct place. Re-calculate the balances of the nodes in the tree. If the tree is still balanced then the algorithm terminates. If not then rebalance the tree as described below. Before describing the balancing algorithm we consider the dierent ways in which the tree can become unbalanced, as these correspond to the special cases that the balancing algorithm has to deal with. Suppose that A is a node in what was a balanced binary tree which has become unbalanced due to the insertion of one new node, and that none of the nodes below A have become unbalanced. Suppose that A has left child B and right child C. For A to have become unbalanced then new node must have been added at the end of the subtree under B or the subtree under C. If the balance of A was 0 then the depths of its left and right subtrees were the same, and adding a node can only increase the depth by 1, so A would still be balanced. If the balance of A was 1 then the left subtree was deeper than the right one, so adding a node under C would not make A unbalanced. Similarly, if the balance of A was 1 then to make the node unbalanced we must add the node under C.
A n+2 n+1 G B A n+2 n

C n

G
B

C n+1

)
D

x
E

c
G D

x
E

c
G

Balance A = 1

Balance A = 1

Binary search trees 64 First suppose that A had balance 1 and that the new node was added under B. In order for the balance of A to change the depth of B must change and hence the new node must be added to the deepest subtree of B. If one subtree of B was deeper than the other before the new node was added then adding the new node to the deepest tree would make B unbalanced. We assumed that all the nodes below A were still balanced after the insertion, thus the only way that A can become unbalanced by adding a node under B is if the subtrees under B were the same length, i.e. if the balance of B was originally 0. Thus there are two cases, the new node was added to left subtree of B, the balance of B became 1, and the balance of A became 2, or the new node was added to right subtree of B, the balance of B became 1, and the balance of A became 2,

A n+3 n+2 G B

A n+3 n G n+2 G B

C F

n G

)
D n+1 . . .

x
E n

c
D n

E F .n+1 . .

Case 1: balance B = 1

Case 2: balance B = 1

Now suppose that A had balance 1 and that the new node was added under C. As for the case when the node was added under B, the only way that A can become unbalanced by adding a node under C is if the balance of C was originally 0. Thus again there are two cases, the new node was added to right subtree of C, the balance of C became 1, and the balance of A became 2, or the new node was added to left subtree of C, the balance of C became 1, and the balance of A became 2.
A n+3 n A n+3 n

G
B

F n

C n+2

G
B

C n+2

)
D

x
E

c
G n+1 . . . D

c
G n

E F n+1. . .

Case 3: balance C = 1

Case 4: balance C = 1

The four cases that we have identied correspond to the four types of rotation that we discussed in the previous section. If we are in case 1 after the new node has been added, so A has balance 2 and B has balance 1, we can apply a right rotation to A to get a new tree in which B has balance 0 and A has balance 1.

Binary search trees 65


n+2 B A n+2 n+1 G B

)
n+1 D . . .

)
E n

A n+1

C n

c
G

C n

D E n n original

c
G

after new node and rotation

The relative order of the elements in the new tree is still the same, the value of a node is greater than the values of the nodes in its left subtree and less than the values in its right subtree. So the new tree is still a binary search tree. All the nodes in the subtree beginning at B are now all balanced, and the depth of this tree is n + 2, the same as the depth of the original subtree it replaces. Thus the balance of the rest of the surrounding tree is unchanged by this operation. If we are in case 2 after the new node has been added, so A has balance 2 and B has balance 1, then we can apply the rotations as in case 2, a left rotation on B and then a right rotation to A, to get a new subtree which has depth n + 2 and all of whose nodes are balanced. If A has balance 2 and C has balance 1 then we perform the rotation as in case 3 to balance the tree, and if A has balance 2 and C has balance 1 then we perform the rotations as in case 4 to balance the tree.
En + 2 n+1 G B ) . . or . D n+1 Case 2 Cn + 2 n+1 n Fn + 2 n+1 n

a
A

A n+1 . . . C n

n+1
G . . .

a
A

G
B

wn
F E

G
B

C . . . . or . w n . G

n+1

c
F G D

)
Case 3

)
D Case 4

~
E

This gives the following algorithm to re-balance a tree which has had one node added to it. While (there are still unbalanced nodes in the tree) { 1. Find a node A which is unbalanced but all of whose descendants are balanced. 2. If A has balance 2 and its left child B has balance 1, perform a right rotation on A. 3. If A has balance 2 and its left child B has balance 1, perform a left rotation on the right child E of B and then perform a left rotation on A. 4. If A has balance 2 and its right child C has balance 1, perform a left rotation on A. 5. If A has balance 2 and its right child C has balance 1, perform a right rotation on the left child F of C and then perform a left rotation on A.

Multiway search trees 66

4.5.7

Building balanced binary search trees

Of course, we can use the insertion algorithm to construct balanced binary search trees. The tree constructed will depend on the order in which the elements were added. The following two trees were constructed by inserting one element at a time and re-balancing if necessary. They both contain the same data but in the rst case the elements were added in the order 1, 2, 3, 4, 5, 6 and in the second case they were added in the order 6, 5, 4, 3, 2, 1.
4 3

G
2

G
2

)
1

x
3

6 1

c
5 6 6, 5, 4, 3, 2, 1

1, 2, 3, 4, 5, 6

4.6

Multiway search trees


[TA86, pp. 473 ] [Knu73, pp. 471480]

In a multiway search tree the tree need not be binary, and several keys may be associated with the same node. If a node has m 2 children then it will have m 1 associated keys. There is a xed upper bound on the number of keys which can be in a given node. Thus a multiway tree of degree 3 can have at most 3 keys in any given node.
4,7,21

G
1,2

c
5

~ 
28,33

10 23,50,61

Such trees are searched by proceeding down the (m + 1)st child if the target is bigger than the mth key but smaller than the m + 1st key in the current node.

4.6.1

Insertion in multiway search trees

A simple insertion algorithm for multiway trees is to search for a node whose smallest element is greater than or equal to the element to be inserted, and whose largest element is less than or equal to the element to be inserted, and which has less than its full complement of keys. The new element is then just inserted in this node. If there is no such node then a node with keys larger and/or smaller than the new entry is found and a new leaf is created with the new entry in it. We insert 3, 3 and 56 in the above 3-tree as follows

Multiway search trees 67

4,7,21

4,7,21

4,7,21

G
3, 1, 2

c
5 10

~
23,50,61

G
3, 1, 2

c
5 3 10

~
23,50,61

G
3, 1, 2

c
5 3 10

~
23,50,61


28,33


28,33


28,33

56

The problem with this method is that the trees can become unbalanced and hence searching is not optimal.

4.6.2

B-trees

[TA86, pp. 486 ] [CLR90, pp. 381399] [NN96, pp. 325327]

B-trees are a particular form of multiway search tree for which there is an insertion algorithm which preserves the balance of the tree without having to rotate the nodes. Thus the levels of nodes in a tree are preserved after an insertion. In a B-tree there is a given integer m and all the nodes have between 1 and m entries in them. Every node apart from leaf nodes (and some nodes from the penultimate level if the number of elements is not a power of 2) have r + 1 children, where r is the number of elements in the node. We shall only look at 3, 2-trees, B-trees in which the nodes can contain at most two elements, but the generalisation to m+1, m trees is not very dierent. A 3, 2-tree is a tree, of depth d say, in which each node has one or two elements and every node of depth less that d 1 has one more child that it has elements in it.
20,30

G
10

c
24

~
50,60

G
1,2

c
12


21

x
27


45

56

70,81

Insertion of an element k is carried out as follows: Traverse the tree until either a node containing k is found, in which case stop, or until a node v at the bottom of the tree is reached. If v is not a leaf node then we can make a new leaf node with label k and add it as a child of v and stop. If v is a leaf node with only one element then add k to v and stop. If v already contains two elements then add k to v and then split v as follows:

Multiway search trees 68 1. Suppose that v has elements k1 < k2 < k3 and parent u. Remove v from the tree and make two new nodes v1 and v2 which have elements k1 and k3 respectively. Make these children of u and add k2 to u.
h1 , h2 h1 , h2 , k2

c ~ k1T2 , k3 ,k

c
k1

~
k3

2. If the node u now has three elements l1 < l2 < l3 then we split this by creating two new nodes with labels l1 and l3 , the rst gets the rst two subtrees of u and the second gets the other two subtrees, and l2 is added to the parent, w, of u.
. . . l2


l1

l3

k1

k3

3. If w now has three elements then repeat the splitting until all the tree nodes have at most two elements in them. The following shows the process of adding a key 85 to the example above.
20,30 20,30

G
10

c
24

~
50,60

G
10

c
24

j
50,60,81

G
1,2

c
12


21

x
27


45

1,2

c
12


21

x
27

56 70,81,85

)
45 30

c x
56 70

85

20,30,60

A j
50

j G

G
10

c
24

j
81

20

60

C
10

24

81

50

G
1,2

c
12


21

x
27

)
45

c x
56 70

85 1,2

c
12


21

c
56

x
70

85

27 45

B -

You can read more about B-trees, and the slightly more ecient variations and B+ -trees, in [TA86, pp. 486 ]

Chapter 5 String matching


In this chapter we shall consider string matching problems. This normally means nding all (or the rst) occurnces of a pattern in some text. For example nd all occurences of the pattern as in the below text. No one would have believed in the last years of the nineteenth century that this world was being watched keenly and closely by intelligences greater than mans and yet as mortal as his own; that as men busied themselves about their various concerns they were scrutinised and studied, perhaps almost as narrowly as a man with a microscope might scrutinise the transient creatures that swarm and multiply in a drop of water. There are in fact seven occurences of as in the above text. Other applications of string matching could be in word processors, nding patterns in DNA sequences as well as in many other topics. We formally dene the string-matching problem as follows. Let T be a text of length n, where the ith element of T is denoted by T [i]. For simplicity we will assume that each element in T is a text character. Let P denote the pattern which we are looking for in T . Let P have length m, and let the ith element in P be denoted by P [i]. We will often picture T and P as in Figure 5.1. Note that traditionally arrays considered in string matchings start at index 1, not zero.

text T

T [1]

T [2]

T [3]

..

..

T [n]

pattern P

P [1]

..

P [m]

Figure 5.1 An example of how we picture a pattern P and a text string T .

The naive string matching algorithm 70

text T

pattern P

Figure 5.2 An example of a pattern P that occurs with shift equal to two in T . We will say that P occurs with shift s in T if we by deleting the rst s element of T would end up with a string starting with the pattern P . See an example of a string occuring with shift 2 in Figure 5.2. In other words if P occurs with shift s in T then P = T [s + 1..s + m], where T [s + 1..s + m] denotes the string with the following characters: T [s + 1], T [s + 2], T [s + 3], ..., T [s + m]. We will give three dierent methods of nding all occurences of a P in a text T . The naive string matching algorithm is the striaght forward method that is easy to program and has a worst case time complexity of O(m(n m + 1)). The Rabin-Karp algorithm is similar to the naive string-matching algorithm and also has a worst case time complexity of O(m(n m + 1)). However using a similar approach to hash tables it turns out to be a very fast algorithm in most cases. The Knuth-Morris-Pratt algorithm is the fastest algorithm with a time complexity of O(n + m). However it is also the most complicated algorithm to program. We will now describe each of the above algorithms in more details.

5.1

The naive string matching algorithm


The naive string matching algorithm is the straight forward way of nding all occurences of P in T . We simply check if P occurs in T with shift s for all possible values of s. This is done in the pseudocode below.
1 naive_string_matching (string T, string P) { 2 3 for s from 0 to n-m do { 4 bool match:=true 5 for i from 1 to m do 6 if P[i]!=T[i+s] 7 match=false 8 if (match) 9 print << "Pattern occurs with shift " << s 10 } 11 }

Note that the loop s tries all possible shifts for P . Now convince yourself why s only goes up to n m. For each value of s, we rst set a boolean

The naive string matching algorithm 71

a match=false shift=0 b

b match=true

shift=1

a match=false shift=2

b match=false

shift=3

a match=false shift=4

b match=true

shift=5

a match=false shift=6

Figure 5.3 An example of naive string matching(T,P) on the strings T = abaabbaab and P = baa. value to true, and we then change this value to false if any of the characters P [1], P [2], . . . , P [m] do not match the corresponding character in T (ie T [s + 1], T [s + 2], . . . , T [s + m]). So if the boolean value remains true after the loop in lines 5-7, then P must occur with shift s in T . We will not give the java code for the naive string matching algorithm as it is quite simple to program. But look at Figure 5.3 for a very short illustration of how the algorithm works if T = abaabbaab and P = baa. Hopefully you can see exactly which steps the algorithm caries out. Of course once the boolean value match becomes false we do not have to continue comparing characters from T and P for that specic value of shif t. So the below pseudocode is a slight improvement on the previous code.
1 naive_string_matching2 (string T, string P) { 2 3 for s from 0 to n-m do { 4 bool match:=true 5 int i=1 6 while (match) and (i<=m) do { 7 if P[i]!=T[i+s] 8 match=false 9 i:=i+1 10 } 11 if (match) 12 print << "Pattern occurs with shift " << s 13 } 14 }

The Rabin-Karp algorithm 72 Try running this algorithm on the above example (when T = abaabbaab and P = baa). No matter which version of the naive string matching algorithm we use we note that if T contains n as and P contains m as then the inner loop will always have to compare m values from P with m values from T . As the outer loop goes from 0 to nm this gives us a worst case analysis of O(m(nm+1)). For our rst implementation of the naive string matching algorithm we note that this is also our best case analysis as the inner loop always compares m values from P with m values from T . However in our second implementation (ie naive string matching2) we note that if T contains n as and P contains m bs, then the algorithm would only use O(n m + 1) time. Can you see why?

5.2

The Rabin-Karp algorithm


Rabin and Karp have come up with a string matching algorithm which performs well in practice, even though its worst case running time is O((n m + 1)m) just like the naive string matching algorithm. The algorithm uses a kind of hash function, just as we did for hash coding. To make things simpler suppose that every character in both T and P belong to the set {0,1,2,. . .,9}. We can then view each string as a number. The string 27172 corresponds to the number 27,172 (note that the string 0027172 also corresponds to 27,172). With this interpretation the pattern P corresponds to a number, say p. Furthermore the strings T [1..m], T [2..m + 1], ..., T [n m + 1..n] all correspond to numbers, say t1 , t2 , ..., tnm+1 , respectively. Now our problem is to decide which of the numbers t1 , t2 , . . . , tnm+1 are equal to p, as if p = ti then P [1..m] = T [i..i + m 1].

The problem is that the numbers p, t1 , t2 , . . . , tnm+1 may be very large and cannot be stored as a number in the computer. However if p = ti then we know that p modular q is equal to ti modular q, for any positive interger, q. So if we could compute all of the following numbers, then we would perhaps be able to rule out some shifts (as if p modular q is not equal to ti modular q then P does not occur in T with shift i 1).

p = p mod q

t = t1 mod q 1 t = t2 mod q 2 .. .. ......... t nm+1 = tnm+1 mod q

As an example let P = 0302 and let T = 4030201503022 and let q = 17. In this case we get the following values.

The Rabin-Karp algorithm 73 p = 302 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 = = = = = = = = = = 4030 302 3020 201 2015 150 1503 5030 302 3022 p = 13 t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t 10 = = = = = = = = = = 1 13 11 14 9 14 7 15 13 13

We can now see that we only have to check t2 , t9 , t10 to see if they are actually matches. We see that t2 and t9 are matches, wheras t10 is not. We say that t10 is a spurious hit. So how is the above going to help us? Can we compute the above numbers quickly? YES! We can. We use an approach which is similar to Horners rule for hash functions. 1 2 3 4 5 6 7 8 9 10 11 12 13 compute modular values (string P) { p := 0 for j from 1 to m do p := (10 p + P [j]) mod q t := 0 1 for i from 1 to m do t := (10 t + T [i]) mod q 1 1 for i from 2 to n m + 1 do t := (10 (t 10m1 T [i 1]) + T [i + m 1]) mod q i1 i

In lines 3-5 we compute p by noting that p = 10m1 P [1] + 10m2 P [2] + . . . + 10P [m 1] + P [m] modular q. This is equal to the following where we may take modular q at any intermediate calculation: 10(10(10...(10P [0]) + P [1]) + P [2])...) + P [m] modular q. In lines 7-9 we compute t in the same way as we computed p . In lines 1 11-12 we compute t for all i = 2, 3, . . . , n m + 1 using the formula below. i
t i = = = 10m1 T [i] + 10m2 T [i + 1] + . . . + 10T [i + m 2] + T [i + m 1] 10 10m1 T [i 1] + 10m2 T [i] + . . . + T [i + m 2] 10m1 T [i 1] + T [i + m 1] 10 t 10m1 T [i 1] + T [i + m 1] i1

Note that the function compute modular values only uses time O(m + n) as

The Rabin-Karp algorithm 74 lines 3-5 take O(m) time while lines 7-12 take O(n) time. Now the Rabin-Karp algorithm can be written as follows: 1 rabin karp (string T, string P) { 2 3 compute modular values (string P) 4 for i from 1 to n m + 1 do 5 if (p == t ) { i 6 bool match:=true 7 for j from 1 to m do 8 if (P [j]! = T [i + j 1]) 9 match:=false 10 if (match) 11 print << Pattern occurs with shift << i-1 12 } 13 } 14 } The advantage of Rabin-Karp over the naive string matching algorithm is that if P does not occur in T with shift s, then p should only be equal to t s+1 with probability 1/q. So if q is a large prime number (why prime?) then there should be very few spurious hits. This means that we would expect to get the following time complexity if P occurs in T h times.

O(n + m) + h O(m) + (n m + 1 h)

q1 1 O(m) + 1 q q

This is bounded above by the following formula m(n m + 1) ) q

O(n + m + hm +

We note that if h is of the same order as n m, which is the case if P = aaaaa..aaa and T = aaaa...aaaaa, then this is still O(m(n m + 1)), so there is no improvement over the naive string matching algorithm. However if h = 0, then we would expect to use at most O(n + m + m(n m + 1)/q) time which is alot better than the naive string matching algorithm when q is large. Note that this is only on average, and some strings T and patterns P may use more operations than this (if the number of spurious hits are considerably larger than 1/q). This completes the Rabin Karp algorithm when all characters were in the set {0,1,. . ., 9}. What happens if the characters can be any ASCII-value (i.e they have a value between 0 and 255). Then the following algorithm will work. You should convince yourself of this.

The Rabin-Karp algorithm 75 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 rabin karp ascii (string T, string P) { p = 0 for j from 1 to m do p = (256 p + P [j]) mod q t = 0 1 for i from 1 to m do t = (256 t + T [i]) mod q 1 1 for i from 2 to n m + 1 do t = (256 (t 256m1 T [i 1]) + T [i + m 1]) mod q i1 i for i from 1 to n m + 1 do if (p == t ) { i bool match=true for j from 1 to m do if (P [j]! = T [i + j 1]) match=false if (match) print << Pattern occurs with shift << i-1 } }

Note that the value 256m1 should be pre-computed, so that we do not have to compute it from scratch everytime line 12 is performed. For an algorithm with worst case time complexity O(n + m), see appendix C.

Chapter 6 Game Theory


Computers are getting better and better at playing games. The best chess computers can now compete with the best human players and there is hardly any common games for which you cannot buy a computer game. So how do you write programs that are good at playing games? We will look at the most common method, called the min-max algorithm, when considering games such as chess, tic-tac-toe, draughts, chinese checkers, reversi, 4-in-a-row, 5-in-a-row, and many many more. As speed is a very important part of this kind of game theory, we will also look at ways of speeding up the algorithms. The most common way of increasing the speed is to use the so-called - pruning. In this chapter we restrict ourselves to two-person games, where the two players alternate making moves. We will furthermore assume that there are no elements of chance (such as poker, backgammon, etc). The rst game we will look at is called drop-down tic-tac-toe.

6.1

Drop-down tic-tac-toe
Consider a version of the tic-tac-toe, where we can only place a piece in coordinate (i, j) (that is in row i and column j) if i = 1 or if there is already a piece in coordinate (i 1, j). See the below for an example of how the game may develop. column row 3 2 1 1 2 3 x board 1 x o board 2 o x x o board 4

x x o board 3

o x x x o board 5

o x x board

o x o 6

o x x x board

o x o 7

o x o x x board

o x o 8

Drop-down tic-tac-toe 77 It is called drop-down tic-tac-toe as you may think of each piece having to drop all the way to the bottom of the 3 3 grid when dropped from the top of a column. The reason we consider this game instead of normal tic-tac-toe is that it actually requires (a little) more thought to play than tic-tac-toe, but more importantly there are fewer legal moves in each turn. Therefore we can actually analyse this game completely. Could x have won the match above, or got a draw if he had played better? Before answering this question we will try to decide if x could have won the match above, or got a draw if he had played better from board 2 onwards. We will give a value to each board according to the following rules. 1 0 -1 = = = o will win if both players play optimally. it will end in a draw if both players play optimally. x will win if both players play optimally.

It is easy to give a value to a board where the game is nished, as below. o o x o x x x o Value=1 x x o o o x x x o Value=0 x x x o o o x Value=-1

But how do we decide the board value for other boards? If it is os turn then we will assume that o makes the move that is best for o. So if we know the value for each board that can result after os move then o will choose the move resulting in the maximum possible value. Similarly, if it is xs turn we will assume that x makes the move that is optimal for x. In other words, x will place a piece resulting in a board of minimum value. In Figure 6.1 we know the board values of the right-most boards, as they represent nished games. In the boards in the second right-most column the game is either nished or it is xs turn. If it is xs turn then he picks the move that maximizes the board value (the board value is the number to the right of the board). In the third right-most column the game is either nished or it is os turn. If it is os turn then he picks the move that minimizes the board value. By continuing this process we see that our left-most board gets the value 1. Therefore o will always win if he always chooses the move that maximizes the board value (no matter what x does). The tree in Figure 6.1 is called a min-max tree, as we alternately take minimum values and maximum values when we move through the tree from bottom up. So if x always starts the game do you think there is a winning strategy for x (that is can x always win if he plays optimally)? Or is there is a winning strategy for 0 ? Or will the game always end in a draw if both players play optimally? In order to decide this we need to build a min-max tree as before, but starting from the empty board. See Figure 6.2 for the rst part of the tree. We see that the board value of the empty board is 0, which implies that if both players play optimally then the game will always end in a draw. So the min-max tree tells us who will win the game (if both players play optimally) and what moves they should make at any given board. As games

Drop-down tic-tac-toe 78

f f1 P P f P

! c ! cc f 1! a ff a c a cc
f1 f f f1 f c

1 1

c1 c c c1 c c c1 c Q cc 1 c

e e1 e e e e0 e @ e e1 e e e e1 e

@ @

f f f0 f

" " f 1" f T T T T T

f1 b f Ab

f1 f

" f " c A b f f 1" A %c A f % ! ! A f1 % !

D D D D D D D D D D D D f D

% % 1% ff e C Ce C e C C C C C C C

ff

( 1( (

e e e

J B BJ BJ B J B B 1 B ff B
ff

1h h h

b " Bb f " 1" B b f Z Z B Z B B B B f B 1 f @ @ @

e c1 X X X eee c X 1 e c1 a c e e a c e a e1 a f 1 e e cc e ee c f e e0 cc 1 1 e ff e c 1h h h # c c l h e e1 e # f 1# c 1 l l e cc ff c T @ l e 1 c ee @ c cc 1 T c f e T@ c 1` ` ` T @ e e0 1 ff @ e cc S T e @ c @ cc 1 S T e 1 S f1 ee S e ff a a c A ea c c 1 S e 1 e A ee f c A e c c1 a e 1 a a ff S A A A c @ a e e1 @ e S A cc 1 @ e f AS c @ e0 1 AS 1 ee ff L A cc J c e L A cc 1 J e0 f f LJ c ee 1 e e f T LJ cc 1 e 1 T LL c e , f T c c0 , e T 1 , e 1 f f e T c c1 , A ee c e c A e f e e e1 f1 A c c1 e % f S A c A A % e e 1 S A cc e1 % f AS c e S c1 % f 1 A ee c f e0 A c e A c1 c

Q Q eee Q 1

ff

f0 f

f f 0 ff

f0 ff

f0 ff

ff

f0 f

Figure 6.1 A min-max tree for the drop-down tic-tac-toe.

Drop-down tic-tac-toe 79

0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0

0 0 1 1 0 1 0 1 0 1 1 0 1 1 0 0 0 1 1 1 0 0 1 0 1 0 0 0 1 1 0 1 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 0 1 1 0 0 0 1 0 1 0 0 1 1 1 0 0 0 1 1 0 1 1 0 1 0 1 0 1 1 0 0

Figure 6.2 First part of the min-max tree for the drop-down tic-tac-toe.

Evaluation functions 80 such as chess, draughts, chinese checkers can all be analysed using min-max trees you may think that all of these games have been analysed in this way. However to build the full min-max tree for chess we would need billion of years on the fastest computer in the world, so this is not feasible. However there are ways around the above time problems, which we will discuss in the following chapters. We can speed up our algorithm using a technique called - pruning (or alpha-beta pruning) and we can use what is called an evaluation function instead of building the whole min-max tree.

6.2

Evaluation functions
Even though it would take billion of years to compute a move for the game of chess, if we had to build the complete min-max tree, we can still use the min-max tree to decide on good moves in chess. We do this by not building the complete min-max tree, but only part of it. At the bottom of the tree we will then not have a nished game, but some unnished board instead. So we will have to give a board value to such an unnished board. For chess this is typically done by giving a value to each type of piece (e.g. pawn=1, bishop=5, knight=5, rook=10,...) and summing up the value of all the white pieces and subtracting the value of all the black pieces. If either side is in checkmate then the value should be either or depending on who has won. This is a very simple evaluation function for chess, and more complicated ones take other factors such as double-pawns into account. The whole purpose of an evaluation function is to give a value to the board which indicates who is winning. For chess this will typically be a value which gets larger if the board gets better for white, and will be small (i.e. have a large negative value) if the board is good for black. However as chess is a very complicated game we will illustrate the evaluation function using normal tic-tac-toe (not drop-down tic-tac-toe). We want to give a value to a board, such that the higher the value the better the board is for o and the smaller the value the better the board is for x. We will give an example of such a function below. Given a board in tic-tac-toe there are 3 rows, 3 columns and 2 diagonals. It is good for o if many of these rows/columns/diagonals have lots of os and no xs. If a row/column/diagonal contains both os and xs then this is equally good/bad for both o and x. Therefore we can assign a value to each row/column/diagonal using the below table. Pieces in the row/column/diagonal o , o , o oo , o o, oo ooo x , x , x xx , x x, xx xxx All other Value of that row/column/diagonal 1 3 + -1 -3 0

Evaluation functions 81 The evaluation function now sums up the 8 values we get by using the above table on the 3 rows, 3 columns and 2 diagonals in our board. So what value will our evaluation function return when used on the board to the right. o o x x x o

We get the following values (where diagonal 1 is from the bottom left to the top right).
Row 1 2 3 Pieces x o ox ox Value 0 0 0 Column 1 2 3 Pieces oox xx o Value 0 -3 1 Diagonal 1 2 Pieces xx oxo Value -3 0

So the value of the board is 0 + 0 + 0 + 0 3 + 1 3 + 0 = 5. This indicates that the board is good for x, which is true. Verify that the following boards have the following values.

x o x x o x o Value=+

x o x o x Value=8

o o x o x x Value=

o x o o x x x o Value=0

Note that the evaluation function is not always accurate as if a board is good or bad for o also depends on whose turn it is. The below board is very good for o if it is os turn, even though it gets a negative value. o o

x x x Value=-4 However, as all the boards we will use the evaluation function on will have the same person to move, we will not incorporate whose move it is into the evaluation function. We also note that we are not claiming that the evaluation function always tells us accurately who is leading or who will win the game, it is only an indication of who seems to be leading. As we normally have to compute the evaluation function for thousands or millions of boards the main priority for the evaluation function is that we can compute it quickly. We will discuss what makes good evaluation functions later. First we will see how we use evaluation function together with the min-max tree. Instead of computing the whole min-max tree, which normally will be too time consuming, we will only compute the tree down to a given depth, denoted by D. The higher the depth the slower the program will run, but the better moves it will nd. At the bottom of the tree (i.e. at depth D) we will then use our evaluation function. At all other depths we will use the normal min-max strategy, if the board has any children. If it has no children it means the game was nished and we use our evaluation function to determine who has won.

Evaluation functions 82

f f

%% %% %% 1 %X X XXX X

e e e e e e ee e

e e e e e e

e e e f f

HHH HHH XXXXX ZZ X ZZ ZZ HHH @@ H @@HH @@ XXXXX ZZ X ee ZZ ee Z eeZ e

ee e

e e e e e e

e f f

BTBT BBTT BB TT BB TT BB TT BB BB BB BB B

f f

e e ee e

e e e e e e

f f

e e e e e e e e e e e

f f

e e e e e e ee e

(((((( hhhhhh (( ((((hh 1 hhhh ((((`( 1 ````( ` (( (``(( ( ` 1 `` ` ((( 1 ((( `````` ` 1 XXX XXX 1 hhhhhh XXXXX X 1 hhhhhh XXXX XX hXXXX h Xhhhh XX hhhhhh PPPP PP hhhhhh PPPP P hhhhhP PPPP h PP 1 hP h Phh P hh Ph P P `````` PPPP PP 1 ``` aaa``` aaa 1 XXX aaaXXX aaa hhhhhh XXXXX aaa X aa 1 hhhhhh XXXXXa aaa aaX 1 hhhhhh XXXXX a aaaa X aX 1 hhhhha XXXXXah aaa X aaaX a
1
4 2 4 4 3 4

a a a a

a a a

a a a

a a a a a a a a aa a aa a aa a a a a a a a a a a a a a a

1 1 1 1 1 1 1 1 1 1 1 1
2 4 1 2 1 2 4 1 4 1 2 4 4 4 2 1 3 8 7 5 4 3 1 4 2 3 8 7 2 1 6 8 3 1 5 2 4 8

2 2

aa aa aa

a a a a a a a a aa a aa a aa a a a a a a a a a a a a a a a aa aa aa a a a

aa a aa a aa a a a a a a a a a a a a a a a a

a a a a

aa aa aa a a a a a

a a a a a a a a a a a a a a a a a a a a aa a a a a a a aa aa a a a a a a

1 1

1 1 1
6 4

a a a a a a a aa a a aa a aa a

Figure 6.3 A min-max tree for tic-tac-toe, with depth D = 3.

Thoughts on min-max trees 83 See Figure 6.3 for an example of the min-max tree with depth D = 3, using the evaluation function above and starting at the board to the right. We see from the min-max tree in Figure 6.3 which move x should make as his rst move! He should make a move which results in a board with value . If he does that then no matter what move o makes it will result in a board of value . Continuing this way x can make sure that the last board will have value , which means x has won.

o x

See Figure 6.4 for the rst part of the min-max tree with depth 3, when starting from the empty board. See Figure 6.5 for the rst part of the min-max tree with depth 4, when starting from the empty board. In both cases we see that the best move x can make is to start in the middle of the board. The best move o can then make is to put a piece in one of the corners.

6.3

Thoughts on min-max trees


There are many decisions to be made when you want to use the min-max algorithm. We will discuss a few of the common ones here. What depth should you use? This normally depends on the speed of your computer, the complexity of your evaluation function, the patience of the user and many more factors. The deeper the depth the better the computer will play, but the slower it will be. In many games there are also more possible moves at the beginning of the game, compared to the end of the game, so some programs use dierent depth depending on the state of the game. Another option is to use a certain (small) depth and if that went very quick it will try again with depth one larger. It will continue like this until it becomes too slow. What evaluation function should you use? This is often the most dicult part of producing a good game-playing computer program. It often requires a good knowledge of the game we are trying to program. So there is basically no easy answer to this question. This brings us to the next question. Should you have an advanced evaluation function, or should you search to a deeper depth? This is again a dicult question to answer. In chess there are often around 30 possible moves at a certain position (even though it varies a lot depending on the state of the game), so if the evaluation is made 30 times slower, we will have to search to depth one less, if the speed shouldnt suer. Often experiments will be run to decide what is best. You may basically try the simple and fast evaluation function and then try the better but slower evaluation function, and see which performs best. You may even make the computer play itself using the two dierent strategies. What if two moves are equally good? If the min-max tree indicates that two moves are equally good we may do several dierent things. Often we just pick the rst move we found. We may also choose to pick a random move, so the computer doesnt get too predictable. Or we may try to obtain more information on the two moves. There are many more decisions to be made when writing a games program. Many of these are also related to the - pruning which we will discuss in the

Thoughts on min-max trees 84 next chapter.


6 5 6 3 6 5 6 4 4 4 5 2 5 4 4 4 5 6 6 3 6 4 6 5 4 5 4 2 4 4 5 4 5 6 5 6 6 5 6 5 4 5 4 4 2 4 5 4 5 6 4 6 3 6 6 5 4 4 4 5 2 5 4 4 4 6 5 6 3 6 5 6

Figure 6.4 Part of the min-max tree for tic-tac-toe, starting at the empty board, with depth D = 3.

Thoughts on min-max trees 85


1 0 1 2 0 0 0 0 0 0 1 3 1 1 0 1 0 1 0 2 1 0 0 0 0 1 1 3 0 0 1 1 0 1 0 1 1 0 1 0 1 1 0 0 3 1 1 0 0 0 0 1 2 0 1 0 1 0 1 1 3 1 0 0 0 0 0 0 2 1 0 1

Figure 6.5 Part of the min-max tree for tic-tac-toe, starting at the empty board, with depth D = 4.

Alpha-Beta pruning and pseudo codes 86

6.4

Alpha-Beta pruning and pseudo codes


We will now give a short overview of the pseudocodes for the min-max algorithm and the Alpha-Beta pruning. These notes are only designed to give a rough overview of the area and you should be taking notes at the lectures.

6.5

Pseudocode for the min-max algorithm


The following is the pseudocode for the min-max algorithm. MinMax(Board,depth,turn) { If (depth=0) or (the game is over) return value of board If (turn=o) { // Max-step val=-M; for all possible moves, Q, that o has { Board= Board after move Q. val=max(val,MinMax(Board,depth-1,swap(turn))) } } else { // Min-step val=M; for all possible moves, Q, that x has { Board= Board after move Q. val=min(val,MinMax(Board,depth-1,swap(turn))) } } return val } The input contains a Board which is a representation of a given board, for which we want to compute the value which the min-max tree gives us. It also contains a variable to indicate to what depth we shall search and a variable to indicate whos turn it is. We assume that o wants to maximize the value of the board and x wants to minimize the value of the board. If the depth is zero or if somebody has already won the game, then we just return the value which our evaluation function gives us. Otherwise we check whose turn it is. If it is os turn then we need to nd the maximum value of the children of the board we are considering. We use val to store the maximum value we have found so far. We then make recursive calls to all possible boards that can be obtained by placing a o, where each recursive call will be searched to a depth of one less and it will be the other players turn. If it is xs turn we need to nd the minimum value of the children of the board we are considering instead of the maximum. It is a good idea to step through the above algorithm on some of the examples that are given in class.

Alpha-Beta pruning. 87

6.6

Alpha-Beta pruning.
Alpha-Beta pruning is a method to delete large parts of the min-max tree, without losing any information. For example consider the situation: A / MAX / B / | / | 10 8 / \ \ \ C / | \ / | \ 6 ? ?

MIN

We note that B will get the value 8, as this is the minimum of 10 and 8. When we consider C the rst child has the value 6. This means that the value of C is at most 6 (as C = min(6, ?, ?)) no matter what the value of the other children of C are. But this means that A will have the value 8 (as A = max(8, 6)) as the value of B is guaranteed to be larger than that of C (C 6 8 = B). This means that we do not have to consider the other children of C at all, as we already know the value of A. Alpha-Beta pruning deletes part of the tree using the above approach. Of course we do not have to examine the nodes marked ? in the below tree either (if we examine nodes in a left-to-right order). You should convince yourself of this! __ ___/ ___/ __/ U | | | 10 S | | | V /| / | / | 12 ? __ \___ \___ \__ W / | \ / | \ / | \ 8 14 ?

MIN

MAX

In order to write a program that deletes part of the tree according to the above approach we will for each board we consider have two variables Alpha and Beta. We will only be interested in values that are larger than Alpha and smaller than Beta. If the value of a node is smaller than Alpha our function will just return Alpha instead of the actual value. Analogously if the value of a node is greater than Beta then we will just return Beta instead of the actual value. For the example above we will have the following function calls (which is explained on the next page and will be further explained at the lecture):
MinMax(S): Alpha=-infinity, Beta=infinity MinMax(U)

Alpha-Beta pruning. 88
Alpha=-infinity, Beta=infinity MinMax(10) return 10 Set Alpha=10 return 10 Set Beta=10 MinMax(V) Alpha=-infinity, Beta=10 MinMax(12) return 12 As 12>Beta return 10 MinMax(W) Alpha=-infinity, Beta=10 MinMax(8) return 8 Set Alpha=8 MinMax(14) return 14 As 14>Beta return 10 Return 10 Return 10

So in order to compute the value of S we rst compute the value of U which is 10. Therefore we set Beta equal to 10, as for all the following children of S we are only interested in their value if it is less than 10 (ie greater than Alpha and less than Beta). If it is greater than 10 we will never use the value anyway! So as soon as a child of V has value greater than 10 we know that V will have value greater than 10, so we just return 10 (as we do not care what the actual value of V is. The same approach is followed when we consider W .

6.6.1

Pseudocode for the Alpha-Beta pruning

The following is the pseudocode for the min-max algorithm with Alpha-Beta pruning. MinMax_AB(Board,depth,turn,A,B) { If (depth=0) or (the game is over) return value of board If (turn=o) { // Max-step for all possible moves, Q, that o has { Board= Board after move Q. A=max(A,MinMax_AB(Board,depth-1,swap(turn),A,B)) If A>=B return B } return A } else { // Min-step for all possible moves, Q, that x has {

Alpha-Beta pruning. 89 Board= Board after move Q. B=min(B,MinMax_AB(Board,depth-1,swap(turn),A,B)) If A>=B return A } return B } }

We will now show which steps are performed on the following example.

Alpha-Beta pruning. 90 __ ___/ ___/ __/ U | | | C 14 S | | | V /| / | / | D E 27 22 __ \___ \___ \__ W / | \ / | \ / | \ F G H 20 10 40

MAX o

MIN x

MinMax_AB(S,2,o,-infinity,+infinity): A=-infinity, B=infinity Board = U A=max(A,MinMax_AB(U,1,x,-infinity,+infinity)): A=-infinity, B=infinity Board = C B=min(B,MinMax_AB(C,0,o,-infinity,+infinity)): return 14 So B=min(infinity,14)=14 return 14 So A=max(-infinity,14)=14 Board = V A=max(A,MinMax_AB(V,1,x,14,infinity)): A=14, B=infinity Board = D B=min(B,MinMax_AB(D,0,o,14,+infinity)): return 27 So B=min(infinity,27)=27 Board = E B=min(B,MinMax_AB(E,0,o,14,27)): return 22 So B=min(27,22)=22 return 22 So A=max(14,22)=22 Board = W A=max(A,MinMax_AB(W,1,x,22,infinity)): A=22, B=infinity Board = F B=min(B,MinMax_AB(F,0,o,22,+infinity)): return 20 So B=min(infinity,20)=20 As A=22>20=B we return 22 So A=max(22,22)=22 return A=22

6.6.2

General thoughts on Alpha-Beta pruning

Alpha-Beta pruning makes no dierence to the speed of the algorithm if at each step of the algorithm we consider the worse moves before the better moves. This

Alpha-Beta pruning. 91 is the case as we only delete a child of a node A from the tree if we know that a previous child of A is guaranteed to produce a better move. On the other hand if we in every step of the algorithm consider the best move rst, then Alpha-Beta pruning may delete a huge part of the tree. In fact if the min-max tree has N nodes we may have only O( N ) nodes in the tree if we use Alpha-Beta pruning. For example if a chess game has approximately 35 legal moves at each stage of the game and we search to depth 8, then we will consider close to 2.300 billion nodes (ie 358 2.300.000.000.000) whereas Alpha-Beta may (in the best case) reduce the tree to around 1.5 million nodes (ie 358 1.500.000). So it is very important to try and consider better moves before worse moves! One common approach is to run the algorithm at a smaller depth in order to try and determine which moves are likely to be best. We could for example run the algorithm for depths 2, 3, 4, .... until we run out of time. But use the result of depth i 1 in order to decide which order to visit nodes when we run the algorithm with depth i. There are other more advanced algorithms that can also speed up the minmax tree, but Alpha-Beta pruning is the easiest to implement and very eective. It is used in most games involving two players, where there is perfect information and no element of chance.

Bibliography
[Bir94] Peter J. Bird. LEO, the rst business computer. Hasler Publishing, 1994.

[CLR90] Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest. Introduction to algorithms. MIT Press, 1990. [HP96] John Hennessy and David A Patterson. Computer architecture, a quantitative approach. Morgan Kaufmann, second edition, 1996.

[Knu73] Donald Knuth. The art of computer programming. Volume 3 sorting and searching. Addison Wesley, 1973. [NN96] [TA86] Richard Neapolitan and Kumarss Naimipour. Foundations of algorithms. D. C. Heath and Co., 1996. Aaron M. Tenenbaum and Moshe J. Augenstein. Data structures using Pascal. Prentice Hall, second edition, 1986.

Appendix A CS2860 questions


1. Write out the insertion sort algorithm which sorts arrays of integer records into ascending order. Count the number of record copy and compare operations required to sort the array [2,1,3,4]. 2. Write out the bubble sort algorithm given in class which sorts arrays of integer records into ascending order. Show all the steps carried out when this algorithm is applied to the array [5,4,3,1,2] (draw the array after every iteration of the INNER loop). 3. Describe the binary search algorithm. Give the worst case number of compare operations required to nd a given element in an array containing 128 elements using binary search, justifying your answer. Give one reason why we might use (a) binary search (b) linear search 4. Write out the second version of the merge sort algorithm given in class, merge sort1, which uses a temporary array to hold values while the array is being sorted. What is the worst case order of this algorithm? Give an example of an array of four elements which requires the worst case number of operations to sort it using merge sort1. 5. Write a quick sort routine which sorts arrays of integers into increasing order, and whose pivot value is the value which is a third (1/3) of the way along the array. 6. Explain what is meant by a heap in the heap sort method discussed in class. Give a brief description of the sift algorithm, and explain how it is used to create the heap sort algorithm.

94 Use the makeHeap algorithm to turn the array [3,4,6,1,2,4,7] into a heap, showing the array at each step. 7. On the same axes, sketch the graphs of log x, x, x log2 x, x2 and 2x . Explain what is meant by an algorithm which has exponential order. Discuss why algorithms of exponential order are unacceptable for implementation on any computer. 8. Explain what is meant by the order of an algorithm, including in your explanation an explanation of O(f (n)), (f (n)), (f (n)) and o(f (n)). What are the worst case, best case and average case orders of (a) bubble sort (b) merge sort 1 (c) quick sort You are given an array of 100 integers, randomly sorted. Suppose that 1 you are given an integer n and told that there is a probability of 250 that the integer is in the array. Give the expected number of compare operations carried out by a linear search algorithm which returns the rst, left most, index which contains n or 1 if no such index exists. (Justify your answer.) 9. What is a binary search tree? Write out an algorithm for searching a binary search tree for a given key. For the array [23,14,10,18,18,32,1,45,56] draw the binary search tree that results from adding the elements to the tree in strict left-to-right order. 10. What is a balanced binary tree? Give a denition of the balance of a tree and specify which balance values a balanced tree may exhibit. With the aid of a diagram, describe the left and right rotation operations used to maintain tree balance during insertion. Write a sketch of an insertion routine for balanced binary trees, including a description of the balancing algorithm and the special cases it has to deal with. 11. Briey explain what is meant by (a) a multiway search tree (b) a B-tree Describe the insertion algorithm for a 3, 2-tree. Use the insertion algorithm to build a 3, 2-tree from the input [1, 7, 9, 4, 36, 5, 6, 8], where the elements are input in strict left-to-right order so 1 is input rst and 8 is input last, showing the trees constructed at each step.

95 12. Write out the naive string matching algorithm given in class, naive string matching algorithm. What is the worst case order of this algorithm? How many compare operations is used if the text, T , has length 20 and the pattern, P , has length 5. 13. Explain how the Rabin-Karp algorithm works. In which cases in Rabin-Karp considerably faster than the naive string matching algorithm? What is the worst case order of Rabin-Karp? 14. Dene the prex function used in the Knuth-Morris-Pratt algorithm. What is the prex function for the pattern aabaaabaab. Show how the Knuth-Morris-Pratt algorithm would run if we are looking for occurences of the pattern P = aabaaabaab in the text T = aabaaabaabaaabaab. 15. Explain in words what the min-max algorithm is. What is the main purpose of -prunning? Explain in words what -prunning is.

Appendix B Appendix B: Some useful inductive proofs


We will throughout the course be giving simple proofs using induction. So here we will describe what induction is, before giving a couple of examples of inductive proofs. If we want to prove that some statement holds for all integers n, then one way of doing this is to show that it holds for n = 1, and n = 2, and n = 3, etc... However we would never nish! Therefore a better way is to show that it holds for n = 1 (or n = 0), and then show that if it holds for all smaller values of n then it also holds for n, where n is arbitrary. If we have shown the above, then clearly it holds for n = 1 (we have shown this), and for n = 2 (as we know it held for 1 which is all smaller values than 2), and for n = 3 (as we know it held for 1 and 2, which are all smaller values than 3), etc. So we have shown that it must hold for n = 1, 2, 3, 4, . . ., which was the desired result.

B.1

Examples
Theorem 1: 1 + 2 + . . . + k =
k(k+1) 2

Proof, by induction on k: We consider the following steps. Basis step, k = 1: 1 =


12 2 ,

so OK

Induction step: Let k > 1. Assume that the statement holds for all k {1, 2, . . . , k 1}. Prove that it holds for k. 1 + 2 + . . . + k = (1 + 2 + . . . + (k 1)) + k = = = =
(k1)k + 2 k 2 k+2k 2 k 2 +k 2 k(k+1) 2

by our assumption

So the statement holds for all k.

Q.E.D.

Examples 97 Theorem 2: If W (n) = 2W ( n ) + 3n 1 and W (1) = 0, then W (n) = 2 3nlog(n) n + 1, for all n = 2r , for some integer r. Proof, by induction on r: We consider the following steps. Basis step, r = 0 (ie n = 1): W (n) = 0 = 3 1 0 1 + 1, so OK. Induction step: Assume that the theorem is true for all r , with 0 < r < r. W (n) = = = = 2W ( n ) + 3n 1 2 2(3 n log( n ) n + 1) + 3n 1 2 2 2 3n(log(n) 1) n + 2 + 3n 1 3nlog(n) n + 1

The above holds as n/2 = 2r1 and n = 2r , so we can use our inductive hypothesis on n/2, as r 1 < r. So the statement holds for all n. Q.E.D.

You might also like