You are on page 1of 33

UNIT V SORTING AND SEARCHING EC6301

Sorting

One of the fundamental problems of computer science is ordering a list of items. There's a plethora of
solutions to this problem, known as sorting algorithms. Some sorting algorithms are simple and intuitive,
such as the bubble sort. Others, such as the quick sort are extremely complicated, but produce lightning-
fast results.

Definitions:

Sorting algorithms are divided into two categories: internal and external sorts.

Internal Sort:

Any sort algorithm which uses main memory exclusively during the sort. This assumes
high-speed random access to all memory.

External Sort:

Any sort algorithm which uses external memory, such as tape or disk, during the sort.

Note: Algorithms may read the initial values from magnetic tape or write sorted values

to disk, but this is not using external memory during the sort. Note that even though
virtual memory may mask the use of disk, sorting sets of data much larger than main
memory may be much faster using an explicit external sort.

Sort Stable
A sort algorithm is said to be stable if multiple items which compare as equal will stay
in the same order they were in after a sort.

Insertion Sort

Logic : Here, sorting takes place by inserting a particular element at the appropriate position, thats
why the name- insertion sorting. In the First iteration, second element A[1] is compared with the first
element A[0]. In the second iteration third element is compared with first and second element. In
general, in every iteration an element is compared with all the elements before it. While comparing if it is
found that the element can be inserted at a suitable position, then space is created for it by shifting the
other elements one position up and inserts the desired element at the suitable position. This procedure
is repeated for all the elements in the list.

168
UNIT V SORTING AND SEARCHING EC6301
If we complement the if condition in this program, it will give out the sorted array in descending
order. Sorting can also be done in other methods, like selection sorting and bubble sorting, which follows
in the next pages.

#include<stdio.h>
void main()
{
int A[20], N, Temp, i, j;
clrscr();
printf("\n\n\t ENTER THE NUMBER OF TERMS...: ");
scanf("%d", &N);
printf("\n\t ENTER THE ELEMENTS OF THE ARRAY...:");
for(i=0; i<N; i++)
{
gotoxy(25,11+i);
scanf("\n\t\t%d", &A[i]);
}

169
UNIT V SORTING AND SEARCHING EC6301
for(i=1; i<N; i++)
{
Temp = A[i];
j = i-1;
while(Temp<A[j] && j>=0)
{
A[j+1] = A[j];
j = j-1;
}
A[j+1] = Temp;
}
printf("\n\tTHE ASCENDING ORDER LIST IS...:\n");
for(i=0; i<N; i++)
printf("\n\t\t\t%d", A[i]);
getch();
}

Heap Sort

The binary heap data structures is an array that can be viewed as a complete binary tree. Each
node of the binary tree corresponds to an element of the array. The array is completely filled on all levels
except possibly lowest.

We represent heaps in level order, going from left to right. The array corresponding to the heap
above is [25, 13, 17, 5, 8, 3].

The root of the tree A[1] and given index i of a node, the indices of its parent, left child and right
child can be computed

PARENT (i)
return floor(i/2)
LEFT (i)
return 2i
RIGHT (i)
return 2i + 1

Let's try these out on a heap to make sure we believe they are correct. Take this heap,

170
UNIT V SORTING AND SEARCHING EC6301

which is represented by the array [20, 14, 17, 8, 6, 9, 4, 1].

We'll go from the 20 to the 6 first. The index of the 20 is 1. To find the index of the left
child, we calculate 1 * 2 = 2. This takes us (correctly) to the 14. Now, we go right, so we
calculate 2 * 2 + 1 = 5. This takes us (again, correctly) to the 6.

Now let's try going from the 4 to the 20. 4's index is 7. We want to go to the parent, so
we calculate 7 / 2 = 3, which takes us to the 17. Now, to get 17's parent, we calculate 3 / 2 =
1, which takes us to the 20.

Heap Property

In a heap, for every node i other than the root, the value of a node is greater than or equal (at
most) to the value of its parent.

A[PARENT (i)] A[i]

Thus, the largest element in a heap is stored at the root.

Following is an example of Heap:

By the definition of a heap, all the tree levels are completely filled except possibly for the
lowest level, which is filled from the left up to a point. Clearly a heap of height h has the
minimum number of elements when it has just one node at the lowest level. The levels above
the lowest level form a complete binary tree of height h -1 and 2h -1 nodes. Hence the minimum
number of nodes possible in a heap of height h is 2h. Clearly a heap of height h, has the
maximum number of elements when its lowest level is completely filled. In this case the heap is
a complete binary tree of height h and hence has 2h+1 -1 nodes.

171
UNIT V SORTING AND SEARCHING EC6301
Following is not a heap, because it only has the heap property - it is not a complete binary tree.
Recall that to be complete, a binary tree has to fill up all of its levels with the possible exception
of the last one, which must be filled in from the left side.

Height of a node

We define the height of a node in a tree to be a number of edges on the longest simple
downward path from a node to a leaf.

Height of a tree

The number of edges on a simple downward path from a root to a leaf. Note that the
height of a tree with n node is lg n which is (lgn). This implies that an n-element heap has
height lg n

In order to show this let the height of the n-element heap be h. From the bounds
obtained on maximum and minimum number of elements in a heap, we get

2h n 2h+1-1

Where n is the number of elements in a heap.

2h n 2h+1

Taking logarithms to the base 2

h lgn h +1

It follows that h = lgn .

We known from above that largest element resides in root, A[1]. The natural question to
ask is where in a heap might the smallest element resides? Consider any path from root of the
tree to a leaf. Because of the heap property, as we follow that path, the elements are either
decreasing or staying the same. If it happens to be the case that all elements in the heap are
distinct, then the above implies that the smallest is in a leaf of the tree. It could also be that an
entire subtree of the heap is the smallest element or indeed that there is only one element in
the heap, which in the smallest element, so the smallest element is everywhere. Note that

172
UNIT V SORTING AND SEARCHING EC6301
anything below the smallest element must equal the smallest element, so in general, only entire
subtrees of the heap can contain the smallest element.

Inserting Element in the Heap

Suppose we have a heap as follows

Let's suppose we want to add a node with key 15 to the heap. First, we add the node to
the tree at the next spot available at the lowest level of the tree. This is to ensure that the tree
remains complete.

Let's suppose we want to add a node with key 15 to the heap. First, we add the node to
the tree at the next spot available at the lowest level of the tree. This is to ensure that the tree
remains complete.

173
UNIT V SORTING AND SEARCHING EC6301
Now we do the same thing again, comparing the new node to its parent. Since 14 < 15,
we have to do another swap:

Now we are done, because 15 20.

Four basic procedures on heap are

1. Heapify, which runs in O(lg n) time.


2. Build-Heap, which runs in linear time.

3. Heap Sort, which runs in O(n lg n) time.

4. Extract-Max, which runs in O(lg n) time.

Maintaining the Heap Property

Heapify is a procedure for manipulating heap data structures. It is given an array A and
index i into the array. The subtree rooted at the children of A[ i] are heap but node A[i] itself
may possibly violate the heap property i.e., A[ i] < A[2i] or A[i] < A[2i +1]. The procedure
'Heapify' manipulates the tree rooted at A[i] so it becomes a heap. In other words, 'Heapify' is
let the value at A[i] "float down" in a heap so that subtree rooted at index i becomes a heap.

Outline of Procedure Heapify

Heapify picks the largest child key and compare it to the parent key. If parent key is
larger than heapify quits, otherwise it swaps the parent key with the largest child key. So that
the parent is now becomes larger than its children.

It is important to note that swap may destroy the heap property of the subtree rooted at
the largest child node. If this is the case, Heapify calls itself again using largest child node as
the new root.

Heapify (A, i)

l left [i]
r right [i]
if l heap-size [A] and A[l] > A[i]
then largest l
else largest i

174
UNIT V SORTING AND SEARCHING EC6301
if r heap-size [A] and A[i] > A[largest]
then largest r
if largest i
then exchange A[i] A[largest]
Heapify (A, largest)

Analysis

If we put a value at root that is less than every value in the left and right subtree, then
'Heapify' will be called recursively until leaf is reached. To make recursive calls traverse the
longest path to a leaf, choose value that make 'Heapify' always recurse on the left child. It
follows the left branch when left child is greater than or equal to the right child, so putting 0 at
the root and 1 at all other nodes, for example, will accomplished this task. With such values
'Heapify' will called h times, where h is the heap height so its running time will be (h) (since
each call does (1) work), which is (lgn). Since we have a case in which Heapify's running
time (lg n), its worst-case running time is (lgn).

Example of Heapify
Suppose we have a complete binary tree somewhere whose subtrees are heaps. In the
following complete binary tree, the subtrees of 6 are heaps:

The Heapify procedure alters the heap so that the tree rooted at 6's position is a heap.
Here's how it works. First, we look at the root of our tree and its two children.

We then determine which of the three nodes is the greatest. If it is the root, we are
done, because we have a heap. If not, we exchange the appropriate child with the root, and
continue recursively down the tree. In this case, we exchange 6 and 8, and continue.

175
UNIT V SORTING AND SEARCHING EC6301

Now, 7 is greater than 6, so we exchange them.

We are at the bottom of the tree, and can't continue, so we terminate.


Building a Heap

We can use the procedure 'Heapify' in a bottom-up fashion to convert an array A[1 . . n] into a
heap. Since the elements in the subarray A[ n/2 +1 . . n] are all leaves, the procedure BUILD_HEAP
goes through the remaining nodes of the tree and runs 'Heapify' on each one. The bottom-up order of
processing node guarantees that the subtree rooted at children are heap before 'Heapify' is run at their
parent.

BUILD_HEAP (A)

heap-size (A) length [A]


For i floor(length[A]/2) down to 1 do
Heapify (A, i)

We can build a heap from an unordered array in linear time.

Heap Sort Algorithm

The heap sort combines the best of both merge sort and insertion sort. Like merge sort, the
worst case time of heap sort is O(n log n) and like insertion sort, heap sort sorts in-place. The heap sort
algorithm starts by using procedure BUILD-HEAP to build a heap on the input array A[1 . . n]. Since the
maximum element of the array stored at the root A[1], it can be put into its correct final position by

176
UNIT V SORTING AND SEARCHING EC6301
exchanging it with A[n] (the last element in A). If we now discard node n from the heap than the
remaining elements can be made into heap. Note that the new element at the root may violate the heap
property. All that is needed to restore the heap property.

HEAPSORT (A)
BUILD_HEAP (A)
for i length (A) down to 2 do
exchange A[1] A[i]
heap-size [A] heap-size [A] - 1
Heapify (A, 1)

The HEAPSORT procedure takes time O(n lg n), since the call to BUILD_HEAP takes time O(n) and
each of the n -1 calls to Heapify takes time O(lg n).

Now we show that there are at most n/2h+1 nodes of height h in any n-element heap. We need
two observations to show this. The first is that if we consider the set of nodes of height h, they have the
property that the subtree rooted at these nodes are disjoint. In other words, we cannot have two nodes
of height h with one being an ancestor of the other. The second property is that all subtrees are
complete binary trees except for one subtree. Let X h be the number of nodes of height h. Since X h-1 o ft
hese subtrees are full, they each contain exactly 2 h+1 -1 nodes. One of the height h subtrees may not full,
but contain at least 1 node at its lower level and has at least 2 h nodes. The exact count is 1+2+4+ . . . +
2h+1 + 1 = 2h. The remaining nodes have height strictly more than h. To connect all subtrees rooted at
node of height h., there must be exactly Xh -1 such nodes. The total of nodes is at least
(Xh-1)(2h+1 + 1) + 2h + Xh-1 which is at most n.
Simplifying gives
Xh n/2h+1 + 1/2.

177
UNIT V SORTING AND SEARCHING EC6301
In the conclusion, it is a property of binary trees that the number of nodes at any level is half of
the total number of nodes up to that level. The number of leaves in a binary heap is equal to n/2, where
n is the total number of nodes in the tree, is even and n/2 when n is odd. If these leaves are
removed, the number of new leaves will be lg(n/2/2 or n/4 . If this process is continued for h
levels the number of leaves at that level will be n/2h+1 .

Implementation
void heapSort(int numbers[], int array_size)
{
int i, temp;

for (i = (array_size / 2)-1; i >= 0; i--)


siftDown(numbers, i, array_size);

for (i = array_size-1; i >= 1; i--)


{
temp = numbers[0];
numbers[0] = numbers[i];
numbers[i] = temp;
siftDown(numbers, 0, i-1);
}
}
void siftDown(int numbers[], int root, int bottom)
{
int done, maxChild, temp;

done = 0;
while ((root*2 <= bottom) && (!done))
{
if (root*2 == bottom)
maxChild = root * 2;
else if (numbers[root * 2] > numbers[root * 2 + 1])
maxChild = root * 2;
else
maxChild = root * 2 + 1;

if (numbers[root] < numbers[maxChild])


{
temp = numbers[root];
numbers[root] = numbers[maxChild];
numbers[maxChild] = temp;
root = maxChild;
}
else
done = 1;
}
}

Merge Sort
Merge sort is based on the divide-and-conquer paradigm. Its worst-case running time has a
lower order of growth than insertion sort. Since we are dealing with subproblems, we state each
subproblem as sorting a subarray A[p .. r]. Initially, p = 1 and r = n, but these values change as
we recurse through subproblems.

178
UNIT V SORTING AND SEARCHING EC6301
To sort A[p .. r]:

1. Divide Step

If a given array A has zero or one element, simply return; it is already sorted. Otherwise, split
A[p .. r] into two subarrays A[p .. q] and A[q + 1 .. r], each containing about half of the elements
of A[p .. r]. That is, q is the halfway point of A[p .. r].

2. Conquer Step

Conquer by recursively sorting the two subarrays A[p .. q] and A[q + 1 .. r].

3. Combine Step

Combine the elements back in A[p .. r] by merging the two sorted subarrays A[p .. q] and A[q +
1 .. r] into a sorted sequence. To accomplish this step, we will define a procedure MERGE (A, p,
q, r).

Note that the recursion bottoms out when the subarray has just one element, so that it is trivially
sorted.

Algorithm: Merge Sort

To sort the entire sequence A[1 .. n], make the initial call to the procedure MERGE-SORT (A, 1,
n).

MERGE-SORT (A, p, r)

1. IF p < r // Check for base case


2. THEN q = FLOOR[(p + r)/2] // Divide step
3. MERGE (A, p, q) // Conquer step.
4. MERGE (A, q + 1, r) // Conquer step.
5. MERGE (A, p, q, r) // Conquer step.

Example: Bottom-up view of the above procedure for n = 8.

179
UNIT V SORTING AND SEARCHING EC6301

Merging

What remains is the MERGE procedure. The following is the input and output of the MERGE
procedure.

INPUT: Array A and indices p, q, r such that p q r and subarray A[p .. q] is sorted and
subarray A[q + 1 .. r] is sorted. By restrictions on p, q, r, neither subarray is empty.

OUTPUT: The two subarrays are merged into a single sorted subarray in A[p .. r].

We implement it so that it takes (n) time, where n = r p + 1, which is the number of elements
being merged.

Idea Behind Linear Time Merging

Think of two piles of cards, Each pile is sorted and placed face-up on a table with the smallest
cards on top. We will merge these into a single sorted pile, face-down on the table.

A basic step:

Choose the smaller of the two top cards.


Remove it from its pile, thereby exposing a new top card.
Place the chosen card face-down onto the output pile.

180
UNIT V SORTING AND SEARCHING EC6301
Repeatedly perform basic steps until one input pile is empty.
Once one input pile empties, just take the remaining input pile and place it face-down
onto the output pile.

Each basic step should take constant time, since we check just the two top cards. There are at
most n basic steps, since each basic step removes one card from the input piles, and we started
with n cards in the input piles. Therefore, this procedure should take (n) time.

Now the question is do we actually need to check whether a pile is empty before each basic step?

The answer is no, we do not. Put on the bottom of each input pile a special sentinel card. It
contains a special value that we use to simplify the code. We use , since that's guaranteed to
lose to any other value. The only way that cannot lose is when both piles have exposed as
their top cards. But when that happens, all the nonsentinel cards have already been placed into
the output pile. We know in advance that there are exactly r p + 1 nonsentinel cards so stop
once we have performed r p + 1 basic steps. Never a need to check for sentinels, since they will
always lose. Rather than even counting basic steps, just fill up the output array from index p up
through and including index r .

The pseudocode of the MERGE procedure is as follow:

MERGE (A, p, q, r )

n1 q p + 1. n2 r q
Create arrays L[1 . . n1 + 1] and R[1 . . n2 + 1]
FOR i 1 TO n1
DO L[i] A[p + i 1]
FOR j 1 TO n2
DO R[j] A[q + j ]
L[n1 + 1]
R[n2 + 1]
i1
j1
FOR k p TO r
DO IF L[i ] R[ j]
THEN A[k] L[i]
ii+1
ELSE A[k] R[j]
jj+1

Example [from CLRS-Figure 2.3]: A call of MERGE(A, 9, 12, 16). Read the following figure
row by row. That is how we have done in the class.

The first part shows the arrays at the start of the "for k p to r" loop, where A[p . . q] is
copied into L[1 . . n1] and A[q + 1 . . r ] is
copied into R[1 . . n2].

181
UNIT V SORTING AND SEARCHING EC6301
Succeeding parts show the situation at the start of successive iterations.
Entries in A with slashes have had their values copied to either L or R and have not had a
value copied back in yet. Entries in L and R with slashes have been copied back into A.
The last part shows that the subarrays are merged back into A[p . . r], which is now
sorted, and that only the sentinels () are exposed in the arrays L and R.]

182
UNIT V SORTING AND SEARCHING EC6301
Running Time

The first two for loops (that is, the loop in line 4 and the loop in line 6) take (n1 + n2) = (n)
time. The last for loop (that is, the loop in line 12) makes n iterations, each taking constant time,
for (n) time. Therefore, the total running time is (n).

Analyzing Merge Sort

For simplicity, assume that n is a power of 2 so that each divide step yields two subproblems,
both of size exactly n/2.

The base case occurs when n = 1.

When n 2, time for merge sort steps:

Divide: Just compute q as the average of p and r, which takes constant time i.e. (1).
Conquer: Recursively solve 2 subproblems, each of size n/2, which is 2T(n/2).
Combine: MERGE on an n-element subarray takes (n) time.

Summed together they give a function that is linear in n, which is (n). Therefore, the recurrence
for merge sort running time is

Solving the Merge Sort Recurrence

By the master theorem in CLRS-Chapter 4 (page 73), we can show that this recurrence has the
solution

T(n) = (n lg n).

Reminder: lg n stands for log2 n.

Compared to insertion sort [(n2) worst-case time], merge sort is faster. Trading a factor of n for
a factor of lg n is a good deal. On small inputs, insertion sort may be faster. But for large enough
inputs, merge sort will always be faster, because its running time grows more slowly than
insertion sorts.

Recursion Tree

183
UNIT V SORTING AND SEARCHING EC6301
We can understand how to solve the merge-sort recurrence without the master theorem. There is a
drawing of recursion tree on page 35 in CLRS, which shows successive expansions of the
recurrence.

The following figure (Figure 2.5b in CLRS) shows that for the original problem, we have a cost
of cn, plus the two subproblems, each costing T (n/2).

The following figure (Figure 2.5c in CLRS) shows that for each of the size-n/2 subproblems, we
have a cost of cn/2, plus two subproblems, each costing T (n/4).

The following figure (Figure: 2.5d in CLRS) tells to continue expanding until the problem sizes
get down to 1.

184
UNIT V SORTING AND SEARCHING EC6301

In the above recursion tree, each level has cost cn.

The top level has cost cn.


The next level down has 2 subproblems, each contributing cost cn/2.
The next level has 4 subproblems, each contributing cost cn/4.
Each time we go down one level, the number of subproblems doubles but the cost per
subproblem halves. Therefore, cost per level stays the same.

The height of this recursion tree is lg n and there are lg n + 1 levels.

Mathematical Induction

We use induction on the size of a given subproblem n.

Base case: n = 1

Implies that there is 1 level, and lg 1 + 1 = 0 + 1 = 1.

185
UNIT V SORTING AND SEARCHING EC6301
Inductive Step

Our inductive hypothesis is that a tree for a problem size of 2i has lg 2i + 1 = i +1 levels. Because
we assume that the problem size is a power of 2, the next problem size up after 2i is 2i + 1. A tree
for a problem size of 2i + 1 has one more level than the size-2i tree implying i + 2 levels. Since lg
2i + 1 = i + 2, we are done with the inductive argument.

Total cost is sum of costs at each level of the tree. Since we have lg n +1 levels, each costing cn,
the total cost is

cn lg n + cn.

Ignore low-order term of cn and constant coefcient c, and we have,

(n lg n)

which is the desired result.

Implementation

void mergeSort(int numbers[], int temp[], int array_size)

186
UNIT V SORTING AND SEARCHING EC6301
{
m_sort(numbers, temp, 0, array_size - 1);

void m_sort(int numbers[], int temp[], int left, int right)

{
int mid;

if (right > left)

mid = (right + left) / 2;

m_sort(numbers, temp, left, mid);

m_sort(numbers, temp, mid+1, right);

merge(numbers, temp, left, mid+1, right);

void merge(int numbers[], int temp[], int left, int mid, int right)

int i, left_end, num_elements, tmp_pos;

left_end = mid - 1;

tmp_pos = left;

num_elements = right - left + 1;

while ((left <= left_end) && (mid <= right))

187
UNIT V SORTING AND SEARCHING EC6301

if (numbers[left] <= numbers[mid])

temp[tmp_pos] = numbers[left];

tmp_pos = tmp_pos + 1;

left = left +1;

else

temp[tmp_pos] = numbers[mid];

tmp_pos = tmp_pos + 1;

mid = mid + 1;

while (left <= left_end)

temp[tmp_pos] = numbers[left];

left = left + 1;

tmp_pos = tmp_pos + 1;

while (mid <= right)

temp[tmp_pos] = numbers[mid];

mid = mid + 1;

188
UNIT V SORTING AND SEARCHING EC6301
tmp_pos = tmp_pos + 1;

for (i = 0; i <= num_elements; i++)

numbers[right] = temp[right];

right = right - 1;

QUICK SORT

Quicksort is a very efficient sorting algorithm invented by C.A.R. Hoare. It has two phases:

the partition phase and


the sort phase.

As we will see, most of the work is done in the partition phase - it works out where to divide the
work. The sort phase simply sorts the two smaller problems that are generated in the partition
phase.

This makes Quicksort a good example of the divide and conquer strategy for solving problems.
(You've already seen an example of this approach in the binary search procedure.) In quicksort,
we divide the array of items to be sorted into two partitions and then call the quicksort procedure
recursively to sort the two partitions, ie we divide the problem into two smaller ones and conquer
by solving the smaller ones. The conquer part of the quicksort routine looks like this:

quicksort( void *a, int low, int


high )
{
int pivot;
/* Termination condition! */
if ( high > low )
{
pivot = partition( a, low,
high ); Initial step: Partition data

189
UNIT V SORTING AND SEARCHING EC6301

quicksort( a, low, pivot-1 );


quicksort( a, pivot+1,
high );
}
}

Sort Left partition in the same way

For the strategy to be effective, the partition phase must ensure that the pivot is greater than all
the items in one part (the lower part) and less than all those in the other (upper) part.

To do this, we choose a pivot element and arrange that all the items in the lower part are less than
the pivot and all those in the upper part greater than it. In the most general case, we don't know
anything about the items to be sorted, so that any choice of the pivot element will do - the first
element is a convenient one.

If several items are the same as the pivot, these items can be grouped with the pivot in a third
(middle) partition or left in the lower part: change "less than" in the description above to "less
than or equal".

As an illustration of this idea, you can view this animation, which shows a partition algorithm in
which items to be sorted are copied from the original array to a new one: items less than the pivot
are placed to the left of the new array and items greater than the pivot are placed on the right. In
the final step, the pivot is dropped into the remaining slot in the middle.

190
UNIT V SORTING AND SEARCHING EC6301

Indirect Sorting
1 If we are sorting an array whose elements are large, copying the items can be very
expensive.
2 We can get around this by doing indirect sorting.
0 o We have an additional array of pointers where each element of the pointer array
points to an element in the original array.
1 o When sorting, we compare keys in the original array but we swap the element
in the pointer array.
3 This saves a lot of copying at the expense of indirect references to the elements of the
original array.

191
UNIT V SORTING AND SEARCHING EC6301
Basic Sorting Algorithms 3
Example
We want to sort the following file by the field Dept.

1. Bucket Sort

Bucket sort is possibly the simplest distribution sorting algorithm. The essential requirement is
that the size of the universe from which the elements to be sorted are drawn is a small, fixed
constant, say m.

For example, suppose that we are sorting elements drawn from {0, 1, . . ., m-1}, i.e., the set of
integers in the interval [0, m-1]. Bucket sort uses m counters. The ith counter keeps track of the
number of occurrences of the ith element of the universe. The figure below illustrates how this is
done.

192
UNIT V SORTING AND SEARCHING EC6301

Figure: Bucket Sorting

In the figure above, the universal set is assumed to be {0, 1, . . ., 9}. Therefore, ten counters are required-
one to keep track of the number of zeroes, one to keep track of the number of ones, and so on. A single
pass through the data suffices to count all of the elements. Once the counts have been determined, the
sorted sequence is easily obtained. E.g., the sorted sequence contains no zeroes, two ones, one two, and
so on.

Program Implementation

void bucketSort(dataElem array[], int array_size)

int i, j;

dataElem count[array_size];

for(i =0; i < array_size; i++)

count[i] = 0;

for(j =0; j < array_size; j++)

++count[array[j]];

for(i =0, j=0; i < array_size; i++)

for(; count[i]>0; --count[i])

array[j++] = i;

193
UNIT V SORTING AND SEARCHING EC6301
}

SEARCHING ALGORITHMS

Suppose that you want to determine whether 27 is in the list

First compare 27 with list[0]; that is, compare 27 with 35

Because list[0] 27, you then compare 27 with list[1]

Because list[1] 27, you compare 27 with the next element in the list

Because list[2] = 27, the search stops

This search is successful!

Lets now search for 10

The search starts at the first element in the list; that is, at list[0]

Proceeding as before, we see that this time the search item, which is 10, is compared with
every item in the list

Eventually, no more data is left in the list to compare with the search item; this is an
unsuccessful search

public static int linSearch(int[] list, int listLength, int key) {


int loc;
boolean found = false;
for(int loc = 0; loc < listLength; loc++) {
if(list[loc] == key) {
found = true;
break;
}
}
if(found)
return loc;
else
return -1;
}

194
UNIT V SORTING AND SEARCHING EC6301

public static int linSearch(int[] list, int listLength, int key) {

int loc;

for(int loc = 0; loc < listLength; loc++) {

if(list[loc] == key)

return loc;

return -1;

Performance of the Linear Search

Suppose that the first element in the array list contains the variable key, then we have
performed one comparison to find the key.
Suppose that the second element in the array list contains the variable key, then we have
performed two comparisons to find the key.

Carry on the same analysis till the key is contained in the last element of the array list. In
this case, we have performed N comparisons (N is the size of the array list) to find the
key.

Finally if the key is NOT in the array list, then we would have performed N comparisons
and the key is NOT found and we would return -1.

PERFORMANCE OF LINEAR SEARCH

195
UNIT V SORTING AND SEARCHING EC6301

Binary Search Algorithm

Can only be performed on a sorted list !!!


Uses divide and conquer technique to search list

BINARY SEARCH ALGORITHM

Search item is compared with middle element of list


If search item < middle element of list, search is restricted to first half of the list

If search item > middle element of list, search second half of the list

If search item = middle element, search is complete

Determine whether 75 is in the list

196
UNIT V SORTING AND SEARCHING EC6301

BINARY SEARCH ALGORITHM

public static int binarySearch(int[] list, int listLength, int key) {


int first = 0, last = listLength - 1;
int mid;
boolean found = false;
while (first <= last && !found) {
mid = (first + last) / 2;
if (list[mid] == key)
found = true;
else
if(list[mid] > key)
last = mid - 1;
else
first = mid + 1;
}
if (found)
return mid;
else
return 1;
} //end binarySearch
Sorted list

197
UNIT V SORTING AND SEARCHING EC6301

key = 89

Sorted list for binary search

198
UNIT V SORTING AND SEARCHING EC6301

A Sorted list for binary search

Search list after first iteration

Search list after second iteration

Performance of binary search

199
UNIT V SORTING AND SEARCHING EC6301
To determine whether an element is in L, binary search makes at most 42 item comparisons

On the other hand, on average, a linear search will make 500,000 key (item)
comparisons to determine whether an element is in L

In general, if L is a sorted list of size N, to determine whether an element is in L, the


binary search makes at most 2log2N + 2 key (item) comparisons

200

You might also like