You are on page 1of 324

1.

Overview of Data
Structures

2008, University of Colombo School of Computing

1.1. Introduction to data structures


A data structure is an arrangement of data in a computer's
memory or even disk storage.
An example of several common data structures are

Arrays
Linked lists
Queues
Stacks
Binary trees
Hash tables

Algorithms, on the other hand, are used to manipulate the


data contained in these data structures as in searching and
sorting.
Many algorithms apply directly to a specific data structures.
When working with certain data structures you need to
know how to insert new data, search for a specified item,
and deleting a specific item.
2008, University of Colombo School of Computing

1.1. Characteristics of Data Structures


Data Structure

Advantages

Disadvantages

Array

Quick inserts
Fast access if index
known

Slow search
Slow deletes
Fixed size

Ordered Array

Faster search than


unsorted array

Slow inserts
Slow deletes
Fixed size

Stack

Last-in, first-out acces

Slow access to other items

Linked List

Queue

Q Slow search
u
i
First-in, first-out access

L Quick inserts
i Quick deletes
n
Slow access to other items

2008, University of Colombo School of Computing

1.1. Characteristics of Data Structures cont


Advantages

Disadvantages

Binary Tree

Quick search
Quick inserts
Quick deletes
(If the tree remains balanced)

Deletion algorithm is complex

Red-Black Tree

Quick search
Quick inserts
Quick deletes
(Tree always remains
balanced)

Complex to implement

2-3-4 Tree

Quick search
Quick inserts
Quick deletes
(Tree always remains
balanced)
(Similar trees good for disk
storage)

Complex to implement

Data Structure

2008, University of Colombo School of Computing

1.1. Characteristics of Data Structures cont


Data Structure

Advantages

Disadvantages

Hash Table

Very fast access if key is


known
Quick inserts

Slow deletes
Access slow if key is not known
Inefficient memory usage

Heap

Quick inserts
Quick deletes
Access to largest item

Slow access to other items

Graph

Best models real-world


situations

Some algorithms are slow and very


complex

NOTE: The data structures shown above (with the exception of the array) can be thought
of as Abstract Data Types (ADTs).

2008, University of Colombo School of Computing

1.1. Abstract Data Types


An Abstract Data Type (ADT) is a way of looking at a
data structure: focusing on what it does.
A stack or a queue is an example of an ADT.
It is important to understand that both stacks and queues
can be implemented using an array.
It is also possible to implement stacks and queues using
a linked list.
This demonstrates the "abstract" nature of stacks and
queues: how they can be considered separately from
their implementation.
To best describe the term Abstract Data Type, it is best
to break the term down into "data type" and then
"abstract".
2008, University of Colombo School of Computing

1.1. Abstract Data Types cont


Data type
Primitive data types refer to two things: a data item
with certain characteristics and the permissible
operations on that data.
Eg: A short in Java, can contain any whole number value
from -32,768 to 32,767.
It can also be used with the operators +, -, *, and /.

The data type's permissible operations are an


inseparable part of its identity; understanding the type
means understanding what operations can be
performed on it.

2008, University of Colombo School of Computing

1.1. Abstract Data Types cont


Data type cont..
In Java, any class represents a data type, in the
sense that a class is made up of data (fields) and
permissible operations on that data (methods).
By extension, when a data storage structure like a
stack or queue is represented by a class, it too can be
referred to as a data type.
A stack is different in many ways from an int, but they
are both defined as a certain arrangement of data and
a set of operations on that data
2008, University of Colombo School of Computing

1.1. Abstract Data Types cont


Abstract
In Java, an Abstract Data Type is a class considered
without regard to its implementation. It can be thought
of as a "description" of the data in the class and a list
of operations that can be carried out on that data and
instructions on how to use these operations.
What is excluded though, is the details of how the
methods carry out their tasks.
End user (or class user), should be told what methods
to call, how to call them, and the results that should
be expected, but not how they work.
2008, University of Colombo School of Computing

1.1. Abstract Data Types cont


Abstract cont
Can further extend the meaning of the ADT when
applying it to data structures such as a stack and
queue. In Java, as with any class, it means the data
and the operations that can be performed on it. In this
context, although, even the fundamentals of how the
data is stored should be invisible to the user.
Users not only should not know how the methods
work, they should also not know what structures are
being used to store the data.

2008, University of Colombo School of Computing

10

1.1. Abstract Data Types cont


The Interface
The ADT specification is often called an
interface.
It's what the user of the class actually sees.
In Java, this would often be the public
methods. Consider for example, the stack
class - the public methods push() and pop()
and similar methods from the interface would
be published to the end user.
2008, University of Colombo School of Computing

11

1.2. Practical data storage structures


Many of the structures and techniques
considered here are about how to handle realworld data storage.
By real-world data, we mean data that describes
physical entities external to the computer.
Examples: A personnel record describes a actual human
being, an inventory record describes an existing car part
or grocery item, and a financial transaction record
describes, say, an actual check written to pay the electric
bill.

2008, University of Colombo School of Computing

12

1.2. Practical data storage structures


Cont
A non-computer example of real-world
data storage is a stack of index cards.
These cards can be used for a variety of
purposes. If each card holds a person's
name, address, and phone number, the
result is an address book. If each card
holds the name location, and value of a
household possession, the result is a
home inventory.
2008, University of Colombo School of Computing

13

1.3. Programmers Tools for data storage


Not all data storage structures are used to store
real-world data.
Typically, real-world data is accessed more or
less directly by a program's user.
Some data storage structures, however, are not
meant to be accessed by the user, but by the
program itself.
A programmer uses such structures as tools to
facilitate some other operation.
Eg: Stacks, queues, and priority queues are
often used in this way.
2008, University of Colombo School of Computing

14

1.3. Real- world Modeling for data


storage
Some data structures directly model a real-world
situation.
The most important data structure of this type is the
graph.
You can use graphs to represent airline routes
between cities, connections in an electrical circuit,
or tasks in a project.
Other data structures, such as stacks, queues, and
priority queues, may also be used in simulations.
A queue, for example, can model customers
waiting in line at a bank.
2008, University of Colombo School of Computing

15

2. Stacks, Queues and


Hashing

2008, University of Colombo School of Computing

2.1. Stacks
A stack is a data structure in which all the
access is restricted to the most recently
inserted items.
If we remove this item, then we can
access the next-to-last item inserted, and
etc.
A stack is also a handy aid for algorithms
applied to certain complex data structures.
2008, University of Colombo School of Computing

Stack Model contd


Input to a stack is by push
Access is by top
Deletion is by pop

2008, University of Colombo School of Computing

Example
Stack contains 3, 4
Push item 9
Now the stack contains 3,4,9

If we pop now, we get 9


If we pop again, we get 4

2008, University of Colombo School of Computing

Stack Properties
The last item added to the stack is placed
on the top and is easily accessible.
Thus the stack is appropriate if we expect
to access on the top item, all other items
are inaccessible.

2008, University of Colombo School of Computing

2.1.3. Important stack applications

Compiler Design
Mathematical Expression Evaluation
Balanced Spell Checker
Simple Calculator

2008, University of Colombo School of Computing

The stack operations

clear() Clear the stack


isEmpty() Check to see if the stack is empty
push(el) Put the element el on top of the stack.
pop() Take the topmost element from stack.
topEl() Return the topmost element in the stack
without removing it.

2008, University of Colombo School of Computing

2.1.1. Implementation Of Stacks


There are two basic ways to arrange for
constant time operations.
The first is to store the items contiguously in
an array.
And the second is to store items noncontiguously in a linked list

2008, University of Colombo School of Computing

Array Implementation
To push an item into an empty stack, we
insert it at array location 0. ( since all java
arrays start at 0)
To push the next item into the location 0
over the location to make room for new
item.

2008, University of Colombo School of Computing

Array Implementation contd


This is easily done by defining an auxiliary
integer variable known as stack pointer, which
stores the array index currently being used as
the top of the stack.
A stack can be implemented with an array and
an integer.
The integer TOS (top of stack) provides the
array index of the top element of the stack,
when TOS is 1 , the stack is empty.
2008, University of Colombo School of Computing

10

Stacks specifies two data fields


such as

The array (which is expanded as needed


stores the items in the stack)

TopOfStack (TOS) ( gives the index of


the current top of the stack, if stack is
empty, this index is 1.)

2008, University of Colombo School of Computing

11

Algorithms For Pushing & Popping


PUSH
If stack is not full then
Add 1 to the stack pointer.
Store item at stack pointer location.

2008, University of Colombo School of Computing

12

Algorithms For Pushing & Popping


contd
POP
If stack is not empty then
Read item at stack pointer location.
Subtract 1 from the stack pointer.

2008, University of Colombo School of Computing

13

How to stack routines work:empty


stack;push(a),push(b);pop

TOS (0)
a
Stack is empty TOS(-1)

Push (a)

TOS (1)
b
a
a
pop
Push (b) 2008, University of Colombo School of Computing

TOS (0)
14

Java implementation
Zero parameter constructor for array
based stack
public stackar( )
/* construct the stack
{
thearray= new object[default-capacity];
Tos = -1;
}
2008, University of Colombo School of Computing

15

Isempty : returns true if stack is


empty, false otherwise
Isempty( )

public Boolean Isempty( )


{
return tos= = -1
}

2008, University of Colombo School of Computing

16

Isfull( ) : returns true if stack


is full , false otherwise
Isfull( )
public Boolean isfull( )
{
return tos= = stacksize-1; (default capacity)
}

2008, University of Colombo School of Computing

17

push method for array based stack


Insert a new item into the stack
public void push (object x)
{
If isfull( )
throw new stackexception ( stack is full)
theArray[++tos]=x;
}

2008, University of Colombo School of Computing

18

Pop method for array based stack


Remove the most recently inserted item from
the stack
Exception underflow if the stack is empty
public void pop( ) throws underflow
{
If (isempty( ))
throw new underflow (stackpop);
tos - - ;
}

2008, University of Colombo School of Computing

19

Top method for array based stack


Return the most recently inserted item
from the stack
Exception underflow if the stack is
empty
public object top( ) throws underflow
{
if (isEmpty( ))
throw new underflow( stacktop)
return theArray[tos];
}
2008, University of Colombo School of Computing

20

Top and pop method for array based stack


Return and remove the most recently
inserted item from the stack
Exception underflow if the stack is empty
public object topandpop( ) throws underflow
{
if isEmpty( ) )
throw new underflow (stack topandpop);
return theArray[tos - - ];
}

2008, University of Colombo School of Computing

21

Java Code for a Stack


import java.io.*; // for I/O
class StackX
{
private int maxSize; // size of stack array
private double[ ] stackArray;
private int top; // top of stack
//------------------------------------------------------------public StackX(int s) // constructor
{
maxSize = s; // set array size
stackArray = new double[maxSize]; // create array
top = -1; // no items yet
}
//-------------------------------------------------------------

2008, University of Colombo School of Computing

22

Java Code for a Stack


public void push(double j) // put item on top of stack {
stackArray[++top] = j; // increment top, insert item
}
//------------------------------------------------------------public double pop() // take item from top of stack {
return stackArray[top--]; // access item, decrement top
}
//------------------------------------------------------------public double peek() // peek at top of stack {
return stackArray[top];
}
//------------------------------------------------------------public boolean isEmpty() // true if stack is empty
{
return (top == -1);
}
2008, University of Colombo School of Computing

23

Java Code for a Stack


//------------------------------------------------------------public boolean isFull() // true if stack is full
{
return (top == maxSize-1);
}
//------------------------------------------------------------} // end class StackX

2008, University of Colombo School of Computing

24

Java Code for a Stack


class StackApp {
public static void main(String[] args)
{
StackX theStack = new StackX(10); // make new stack
theStack.push(20); // push items onto stack
theStack.push(40);
theStack.push(60);
theStack.push(80);
while( !theStack.isEmpty() ) // until it's empty,
{ // delete item from
stack double value = theStack.pop();
System.out.print(value); // display it
System.out.print(" ");
} // end while
System.out.println("");
} // end main()
} // end class StackApp
2008, University of Colombo School of Computing

25

Java Code for a Stack


The main() method in the StackApp class
creates a stack that can hold 10 items, pushes 4
items onto the stack, and then displays all the
items by popping them off the stack until it's
empty.
Here's the output:
80 60 40 20

2008, University of Colombo School of Computing

26

StackX Class Methods


The constructor creates a new stack of a size
specified in its argument.
The fields of the stack comprise a variable to hold
its maximum size (the size of the array), the array
itself, and a variable top, which stores the index of
the item on the top of the stack.
(Note that we need to specify a stack size only because the stack is
implemented using an array. If it had been implemented using a linked
list, for example, the size specification would be unnecessary.)

2008, University of Colombo School of Computing

27

StackX Class Methods


The push() method increments top so it points to the space just
above the previous top, and stores a data item there. Notice that top
is incremented before the item is inserted.
The pop() method returns the value at top and then decrements
top. This effectively removes the item from the stack; it's
inaccessible, although the value remains in the array (until another
item is pushed into the cell).
The peek() method simply returns the value at top, without changing
the stack.
The isEmpty() and isFull() methods return true if the stack is empty
or full, respectively. The top variable is at 1 if the stack is empty
and maxSize-1 if the stack is full.

2008, University of Colombo School of Computing

28

Error Handling
There are different philosophies about how to
handle stack errors. What happens if you try to
push an item onto a stack that's already full, or
pop an item from a stack that's empty?
We've left the responsibility for handling such
errors up to the class user. The user should
always check to be sure the stack is not full
before inserting an item:
2008, University of Colombo School of Computing

29

Error Handling
if( !theStack.isFull() )
insert(item);
else
System.out.print("Can't insert, stack is full");
In the interest of simplicity, we've left this code out of the main() routine (and
anyway, in this simple program, we know the stack isn't full because it has just
been initialized).
We do include the check for an empty stack when main() calls pop().

2008, University of Colombo School of Computing

30

2.1.2. Efficiency of Stacks


Items can be both pushed and popped
from the stack implemented in the StackX
class in constant O(1) time.
That is, the time is not dependent on how
many items are in the stack, and is
therefore very quick.
No comparisons or moves are necessary.

2008, University of Colombo School of Computing

31

2.2. Queues
A queue is a special kind of list
Items are inserted at one end (the rear) and
(enqueue)
deleted at the other end (the front).
(dequeue)
Queues are also known as FIFO lists
Front
Inserted
Removed
A B
Back
2008, University of Colombo School of Computing

32

2.2. Queues
There are various queues quietly doing their job in our
computer's (or the network's) operating system.
There's a printer queue where print jobs wait for the
printer to be available.
A queue also stores keystroke data as we type at the
keyboard.
This way, if we are using a word processor but the
computer is briefly doing something else when we hit a
key, the keystroke won't be lost; it waits in the queue
until the word processor has time to read it.
Using a queue guarantees the keystrokes stay in order
until they can be processed.
2008, University of Colombo School of Computing

33

2.2.1. The Queue operations

clear() Clear the queue


isEmpty() Check to see if the queue is empty
enqueue(el) Put the element el on top of the queue.
dequeue() Take the first element from queue.
firstEl() Return the first element in the queue without
removing it.

2008, University of Colombo School of Computing

34

2.2.2. A Circular Queue


When we insert a new item in the queue in the
Workshop applet, the Front arrow moves upward, toward
higher numbers in the array.
When we remove an item, Rear also moves upward.
we may find the arrangement counter-intuitive, because
the people in a line at the movies all move forward,
toward the front, when a person leaves the line.
We could move all the items in a queue whenever we
deleted one, but that wouldn't be very efficient.
Instead we keep all the items in the same place and
move the front and rear of the queue.

2008, University of Colombo School of Computing

35

2.2.2. A Circular Queue contd


To avoid the problem of not being able to
insert more items into the queue even
when it's not full, the Front and Rear
arrows wrap around to the beginning of
the array. The result is a circular queue

2008, University of Colombo School of Computing

36

2.2.3. Java Code for a Queue


The queue.java program features a Queue class
with insert(), remove(), peek(), isFull(),
isEmpty(), and size() methods.
The main() program creates a queue of five
cells, inserts four items, removes three items,
and inserts four more. The sixth insertion
invokes the wraparound feature. All the items
are then removed and displayed.
The output looks like this:
40 50 60 70 80
2008, University of Colombo School of Computing

37

The Queue.java Program


import java.io.*; // for I/O
class Queue {
private int maxSize;
private int[] queArray;
private int front;
private int rear;
private int nItems;
//------------------------------------------------------------public Queue(int s) // constructor
{
maxSize = s;
queArray = new int[maxSize];
front = 0;
rear = -1;
nItems = 0;
}
//------------------------------------------------------------ 2008, University of Colombo School of Computing

38

The Queue.java Program


public void insert(int j) // put item at rear of queue
{
if(rear == maxSize-1) // deal with wraparound
rear = -1;
queArray[++rear] = j; // increment rear and
insert
nItems++; // one more item
}
//------------------------------------------------------------public int remove() // take item from front of queue
{
int temp = queArray[front++]; // get value and incr front
if(front == maxSize) // deal with wraparound
front = 0;
nItems--; // one less item
return temp;
}
2008, University of Colombo School of Computing
//-------------------------------------------------------------

39

The Queue.java Program


public int peekFront() { // peek at front of queue
return queArray[front];
}
//------------------------------------------------------------public boolean isEmpty() { // true if queue is empty
return (nItems==0);
}
//------------------------------------------------------------public boolean isFull() { // true if queue is full
return (nItems==maxSize);
}
//------------------------------------------------------------public int size() { // number of items in queue
return nItems;
}
//------------------------------------------------------------} // end class Queue
2008, University of Colombo School of Computing

40

The Queue.java Program


class QueueApp {
public static void main(String[] args) {
Queue theQueue = new Queue(5); // queue holds 5 items
theQueue.insert(10); // insert 4 items
theQueue.insert(20);
theQueue.insert(30);
theQueue.insert(40);
theQueue.remove(); // remove 3 items
theQueue.remove(); // (10, 20, 30)
theQueue.remove();
theQueue.insert(50); // insert 4 more items
theQueue.insert(60); // (wraps around)
theQueue.insert(70);
theQueue.insert(80);
while( !theQueue.isEmpty() ) { // remove and display all items
int n = theQueue.remove(); // (40, 50, 60, 70, 80)
System.out.print(n);
2008, University of Colombo School of Computing

41

The Queue.java Program


System.out.print(" ");
}
System.out.println("");
} // end main()
} // end class QueueApp

2008, University of Colombo School of Computing

42

2.2.3.1. Efficiency of Queues


As with a stack, items can be inserted and
removed from a queue in O(1) time.

2008, University of Colombo School of Computing

43

2.2.3.2. Deques
A deque is a double-ended queue.
We can insert items at either end and delete them from
either end.
The methods might be called insertLeft() and
insertRight(), and removeLeft() and removeRight().
If we restrict ourself to insertLeft() and removeLeft() (or
their equivalents on the right), then the deque acts like a
stack.
If we restrict ourself to insertLeft() and removeRight() (or
the opposite pair), then it acts like a queue.
A deque provides a more versatile data structure than
either a stack or a queue, and is sometimes used in
container class libraries to serve both purposes.
However, it's not used as often as stacks and queues, so
we won't explore it further here.
2008, University of Colombo School of Computing

44

2.2.4. Priority Queues


A priority queue is a more specialized data structure than
a stack or a queue.
However, it's a useful tool in a surprising number of
situations.
Like an ordinary queue, a priority queue has a front and
a rear, and items are removed from the front.
However, in a priority queue, items are ordered by key
value, so that the item with the lowest key (or in some
implementations the highest key) is always at the front.
Items are inserted in the proper position to maintain the
order.

2008, University of Colombo School of Computing

45

2.2.4.1. Efficiency of Priority


Queues
In the priority-queue implementation we
show here, insertion runs in O(N) time,
while deletion takes O(1) time.

2008, University of Colombo School of Computing

46

2.3. Hash Functions


Choosing a good hashing function, h(k), is
essential for hash-table based searching.
The key criterion is that there should be a
minimum number of collisions.
Sophisticated hash functions may avoid
collisions,computational cost in determining h(k)
can be prohibitive.
Less sophisticated methods may be faster.
2008, University of Colombo School of Computing

47

2.3. Hash Functions contd


The number of hash functions that can be used
to assign positions to n items in a table of m
positions (for n <= m) is equal to mn.
Some specific types of hash functions are:
Division
h(k) = K mod TSize Tsize = sizeof(table)
Usually the preferred choice for the hash
function if very little is known about the keys.
2008, University of Colombo School of Computing

48

2.3. Hash Functions contd


Mid-Square Function
The key is squared and the middle or mid part
of the result is used as the address. If the key
is a string, it has to be pre-processed to
produce a number.
e.g. if the key is 3121 then 31212 = 9740641
and h(3121) = 406.

2008, University of Colombo School of Computing

49

2.3. Hash Functions contd


Extraction
Only a part of the key is used to compute the
address. For the key 123-45-6789 this
method might use the
first four digits 1234
last four digits 6789
the first two combined with last two 1289 or
some other combination
2008, University of Colombo School of Computing

50

2.3. Hash Functions contd


- Radix Transformation
The key is transformed into another
number base.
e.g. If K is the decimal number 345, then
its value in base 9 is 423. This value is
then divided modulo TSize, and the
resulting number is used as the address of
the location to which K should be hashed.
2008, University of Colombo School of Computing

51

3. Linked Lists

2008, University of Colombo School of Computing

3. Introduction to Linked Lists

An array is a very useful data structure provided in programming languages.


It has at least two limitations:
(1) changing the size of the array requires creating a new array and then copying
all data from the array with the old size to the array with the new size
(2) The data in the array are next to each other sequentially in memory, which
means that inserting an item inside the array requires shifting some other data in
this array.

These limitations can be overcome by using linked structures.


A linked structure is a collection of nodes storing data and links to other
nodes.
In this way, nodes can be located anywhere in memory, and passing from
one node of the linked structure to another is accomplished by storing the
reference(s) to other node(s) in the structure.
Although linked structures can be implemented in a variety of ways, the
most flexible implementation is by using a separate object for each node.

2008, University of Colombo School of Computing

3.1. Singly linked


lists
If a node contains a data field that is a reference to
another node, then many nodes can be used together
using only one variable to access the entire sequence of
nodes.
Such a sequence of nodes is the most frequently used
implementation of a linked list, which is a data structure
composed of nodes, each node holding some
information and a reference to another node in the list.
If a node has a link only to its successor in this
sequence, the list is called a singly linked list.
Note that only one variable p is used to access any node
in the list.
The last node on the list can be recognized by the null
reference field.
2008, University of Colombo School of Computing

3.2. Doubly Linked Lists


Each node has a reference or pointer back
to the previous nodes

2008, University of Colombo School of Computing

3.2. Doubly Linked Lists


If we wish to traverse a list both forwards and backwards efficiently,
or if we wish, given a list element, to determine the preceding and
following elements quickly, then the doubly-linked list comes in
handy. A list element contains the data plus pointers to the next and
previous list items as shown in the picture below.

2008, University of Colombo School of Computing

3.2. Doubly Linked List contd


Example:
The full example can be found in the directory:
/home/331/tamj/examples/lists/doublyLinked
class ListManager
{
private Node head;
private int length;
private int currentDataValue = 10;
private static final int MAX_DATA = 100;
::::
}

2008, University of Colombo School of Computing

3.2. More about Doubly Linked List


A doubly linked list provides a natural
implementation of the List ADT
Nodes implement Position and store:
element
link to the previous node
link to the next node

Special trailer and header nodes


2008, University of Colombo School of Computing

3.2. More about Doubly Linked List


contd.

2008, University of Colombo School of Computing

3.2.1. Adding a new node at the end of a


doubly linked list

To add a node to a list, the node has to be created, its fields


properly initialized, and then the node needs to be incorporated
into the list.
The process of Inserting a node at the end of a doubly linked list is
performed in six steps:
1.
2.
3.
4.
5.
6.

A new node is created, and then its three fields are initialize as
exaples info,el and prev.
the info field to the number el being inserted
the next field to null
and the prev field to the value of tail so that this field points to the last
node in the list. But now, the new node should become the last node;
therefore,
tail is set to reference the new node. But the new node is not yet
accessible from its predecessor; to rectify this,
the next field of the predecessor is set to reference the new node.

2008, University of Colombo School of Computing

3.2.1. Adding a new node at the end of a


doubly linked list
A special case concerns the last step. It is assumed in
this step that the newly created node has a predecessor,
so it accesses its prev field.
It should be obvious that for an empty linked list, the new
node is the only node in the list and it has no
predecessor.
In this case, both head and tail refer to this node, and the
sixth step is now setting head to refer to this node.
Note that step foursetting the prev field to the value of
tailis executed properly because for an initially empty
list, tail is null. Thus, null becomes the value of the prev
field of the new node.
2008, University of Colombo School of Computing

10

3.2.1. Doubly Linked List: Adding To The


End
public void addToEnd ()
{
Node anotherNode = new Node (currentDataValue);
Node temp;
if (isEmpty() == true)
head = anotherNode;
else {
temp = head;
while (temp.next != null){
temp = temp.next;
}
temp.next = anotherNode;
anotherNode.previous = temp;
}
currentDataValue += 10;
length++;
}
2008, University of Colombo School of Computing

11

3.2.2. Doubly Linked List: Adding


Anywhere(1)
public void addToPosition (int position)
{
Node anotherNode = new Node (currentDataValue);
Node temp;
Node prior;
Node after;
int index;
if ((position < 1) || (position > (length+1)))
{
System.out.println("Position must be a value between 1-" +
(length+1));
}

2008, University of Colombo School of Computing

12

3.2.2. Doubly Linked List: Adding


Anywhere(2)
else
{
// List is empty
if (head == null)
{
if (position == 1)
{
currentDataValue += 10;
length++;
head = anotherNode;
}
else
System.out.println("List empty, unable to add node to " +"position " +
position);
}

2008, University of Colombo School of Computing

13

3.2.2. Doubly Linked List: Adding


Anywhere(3)
// List is not empty, inserting into first position.
else if (position == 1)
{
head.previous = anotherNode;
anotherNode.next = head;
head = anotherNode;
currentDataValue += 10;
length++;
}

2008, University of Colombo School of Computing

14

3.2.2. Doubly Linked List: Adding


Anywhere (4)
// List is not empty inserting into a position other than the first
else
{
prior = head;
index = 1;
// Traverse list until current is referring to the node in front
// of the position that we wish to insert the new node into.
while (index < (position-1))
{
prior = prior.next;
index++;
}
after = prior.next;

2008, University of Colombo School of Computing

15

3.2.2. Doubly Linked List: Adding


Anywhere (5)
// Set the references to the node before the node to be
// inserted.
prior.next = anotherNode;
anotherNode.previous = prior;
// Set the references to the node after the node to be
// inserted.
if (after != null)
after.previous = anotherNode;
anotherNode.next = after;
currentDataValue += 10;
length++;
}
}
}
2008, University of Colombo School of Computing

16

3.2.3. Deleting the last node from


the doubly linked list
Deleting the last node from the doubly linked list is straightforward
because there is direct access from the last node to its predecessor,
and no loop is needed to remove the last node.
When deleting a node from the list, the temporary variable el is set
to the value in the last node, then tail is set to its predecessor, and
the last node is cut off from the list by setting the next field of the
next to last node to null.
In this way, the next to last node becomes the last node, and the
formerly last node is abandoned.
Although this node accesses the list, the node is inaccessible from
the list; hence, it will be claimed by the garbage collector.
The last step is returning the value stored in the removed node.

2008, University of Colombo School of Computing

17

3.2.3. Deleting the last node from


the doubly linked list
An attempt to delete a node from an empty list
may result in a program crash.
Therefore, the user has to check whether the list
is not empty before attempting to delete the last
node.
As with the singly linked list's deleteFromHead(),
the caller should have an if statement
if (!list.isEmpty())
n = list.deleteFromTail();
else do not remove;
2008, University of Colombo School of Computing

18

3.2.3. Deleting the last node from


the doubly linked list
The second special case is the deletion of the only node
from a single-node linked list. In this case, both head and
tail are set to null.
Because of the immediate accessibility of the last node,
both addToTail () and deleteFromTail () execute in
constant time O(l).
Methods for operating at the beginning of the doubly
linked list are easily obtained from the two methods
discussed by changing head to tail and vice versa,
changing next to prev and vice versa, and exchanging
the order of parameters when executing new.

2008, University of Colombo School of Computing

19

3.2.3. Doubly Linked List: Deleting A


Node(1)
public void delete (int key)
{
int indexToDelete;
int indexTemp;
Node previous;
Node toBeDeleted;
Node after;
indexToDelete = search(key);
// No match, nothing to delete.
if (indexToDelete == -1)
{
System.out.println("Cannot delete element with a
data value of "
+ key + " because it was not found.");
}
2008, University of Colombo School of Computing

20

else
{

3.2.3. Doubly Linked List: Deleting A


Node(2)
// Deleting first element.
if (indexToDelete == 1)
{
head = head.next;
length--;
}
else
{
previous = null;
toBeDeleted = head;
indexTemp = 1;
while (indexTemp < indexToDelete)
{
previous = toBeDeleted;
toBeDeleted = toBeDeleted.next;
indexTemp++;
}
previous.next = toBeDeleted.next;
after = toBeDeleted.next;
after.previous = previous;
length--;
:::
2008, University of Colombo School of Computing

21

3.2.4. Pros Of Doubly Linked


Lists
Pros
Traversing the list in reverse order is now
possible.
One can traverse a list without a trailing
reference (or by scanning ahead)
It is more efficient for lists that require
frequent additions and deletions near the front
and back

2008, University of Colombo School of Computing

22

3.2.5. Cons Of Doubly Linked


Lists
Cons
An extra reference is needed
Additions and deletions are more complex
(especially near the front and end of the list)

2008, University of Colombo School of Computing

23

3.2.6. An implementation of a doubly linked


list
/***************************IntDLLNode.java********************************/
public class IntDLLNode {
public int info;
public IntDLLNode next, prev;
public IntDLLNode (int el) {
this (el,null,null);
}
public IntDLLNode (int el, IntDLLNode n, IntDLLNode p) {
info = el; next =n; prev=p;
}
}

2008, University of Colombo School of Computing

24

3.2.6. An implementation of a doubly linked list(2)


/***************************IntDLList.java********************************/
public class IntDLList {
private IntDLLNode head, tail;
public IntDLList ( ) {
head = tail = null; }
public boolean isEmpty( ) {
return head == null; }
public void addToTail (int el) {
if (!isEmpty ( )) {
tail = new IntDLLNode (el, null, tail);
tail.prev.next = tail; }
else head = tail = new IntDLLNode(el);
}
public int removeFromTail ( ) {
int el = tail.info;
if (head == tail)
// if only one node in the list;
head = tail =null;
else {
// if more than one node in the list;
tail = tail.prev;
tail.next = null; }
return el;
}
. }
2008, University of Colombo School of Computing

25

3.3. Circular Lists

A circular list. The large yellow object represents the circular list as such. The
circular green nodes represent the elements of the list. The rectangular nodes
are instances of a class similar to LinkedListNode, which connect the
constituents of the list together.

2008, University of Colombo School of Computing

26

3.3. Circular Lists


A circular list is needed in which nodes form a ring: The
list is finite and each node has a successor.
An example of such a situation is when several
processes are using the same resource for the same
amount of time, and we have to assure that each
process has a fair share of the resource.
Therefore, all processeslet their numbers be 6, 5, 8,
and 10, are put on a circular list accessible through
current.
After one node in the list is accessed and the process
number is retrieved from the node to activate this
process, current moves to the next node so that the next
process can be activated the next time.
2008, University of Colombo School of Computing

27

3.3. Circular Lists


In an implementation of a circular singly
linked list, we can use only one permanent
reference, tail, to the list even though
operations on the list require access to the
tail and its successor, the head.
To that end, a linear singly linked list uses
two permanent references, head and tail.

2008, University of Colombo School of Computing

28

3.3. Circular Linked Lists


An extra link from the end of the list to the
front forms the list into a ring

2008, University of Colombo School of Computing

29

3.3.1. Uses Of A Circular List


e.g., Memory management by operating
systems

2008, University of Colombo School of Computing

30

3.3.2. Searches With A Circular


Linked Lists
Cannot use a null reference as the signal
that the end of the list has been reached.
Must use the list reference as a point
reference (stopping point) instead

2008, University of Colombo School of Computing

31

3.3.3. Traversing A Circular


Linked List
Cannot use a null reference as the signal
that the end of the list has been reached.
Must use the list reference as a point
reference (stopping point) instead

2008, University of Colombo School of Computing

32

3.3.4. An Example Of Traversing


A Circular Linked List
public void display ()
{
Node temp = list;
System.out.println("Displaying list");
if (isEmpty() == true)
{
System.out.println("Nothing to display, list is empty.");
}
do
{
System.out.println(temp.data);
temp = temp.next;
} while (temp != list);
System.out.println();
}

2008, University of Colombo School of Computing

33

3.3.5. Worse Case Times For


Circular Linked Lists

2008, University of Colombo School of Computing

34

3.4. Skip Lists


Linked lists have some serious drawbacks:
They require sequential scanning to locate a searched-for
element.
The search starts from the beginning of the list and stops when
either a searched-for element is found or the end of the list is
reached without finding this element.
Ordering elements on the list can speed up searching, but a
sequential search is still required. Therefore, we may think about
lists that allow for skipping certain nodes to avoid sequential
processing.
A skip list is an interesting variant of the ordered linked list that
makes such a nonsequential search possible (Pugh 1990).

2008, University of Colombo School of Computing

35

3.4. Skip Lists contd


In a skip list of n nodes, for each k and i such that 1 <= k
<= [log n] and 1 <= i <= [n/2k-1] - 1, the node in position 2k-1
i points to the node in position 2k-1 (i + 1).
This means that every second node points to the node two
positions ahead, every fourth node points to the node four
positions ahead, and so on.
This is accomplished by having different numbers of
reference fields in nodes on the list:
Half of the nodes have just one reference field
one-fourth of the nodes have two reference fields
one-eighth of the nodes have three reference fields and etc.

The number of reference fields indicates the level of each


node, and the number of levels is maxLevel = [lg n] + 1.
2008, University of Colombo School of Computing

36

3.5. Self-organizing lists

The introduction of skip lists was motivated by the need to speed up the
searching process.
Although singly and doubly linked lists require sequential search to locate
an element or to see that it is not in the list, we can improve the efficiency of
the search by dynamically organizing the list in a certain manner.
This organization depends on the configuration of data; thus, the stream of
data requires reorganizing the nodes already on the list.
There are many different ways to organize the lists, and this section
describes four of them:
Move-to-front method. After the desired element is located, put it at the
beginning of the list.
Transpose method. After the desired element is located, swap it with its
predecessor unless it is at the head of the list.
Count method. Order the list by the number of times elements are being
accessed.
Ordering method. Order the list using certain criteria natural for the information
under scrutiny.

2008, University of Colombo School of Computing

37

3.5. Self-organizing lists contd


In the first three methods, new information
is stored in a node added to the end of the
list;
In the fourth method, new information is
stored in a node inserted somewhere in
the list to maintain the order of the list.

2008, University of Colombo School of Computing

38

3.5. Self-organizing lists contd

With the first three methods, we try to locate the elements most likely to be
looked for near the beginning of the list, most explicitly with the move-tofront method and most cautiously with the transpose method.
The ordering method already uses some properties inherent to the
information stored in the list.
For example, if we are storing nodes pertaining to people, then the list can
be organized alphabetically by the name of the person or the city or in
ascending or descending order using, say, birthday or salary.
This is particularly advantageous when searching for information that is not
in the list, because the search can terminate without scanning the entire list.
Searching all the nodes of the list, however, is necessary in such cases
using the other three methods.
The count method can be subsumed in the category of the ordering
methods if frequency is part of the information.
In many cases, however, the count itself is an additional piece of
information required solely to maintain the list; hence, it may not be
considered "natural" to the information at hand.

2008, University of Colombo School of Computing

39

3.5. Self-organizing lists contd


Analyses of the efficiency of these methods
customarily compare their efficiency to that of
optimal static ordering.
With this ordering, all the data are already
ordered by the frequency of their occurrence in
the body of data so that the list is used only for
searching, not for inserting new items.
Therefore, this approach requires two passes
through the body of data,
one to build the list
another to use the list for search alone.
2008, University of Colombo School of Computing

40

4. Recursion

2008, University of Colombo School of Computing

4.1. Recursive Definitions


There are many programming concepts
that define themselves.
As it turns out, formal restrictions imposed
on definitions such as existence and
uniqueness are satisfied and no violation
of the rules takes place. These definitions
are called recursive definitions.
Recursive definitions are used primarily to
define infinite sets.
2008, University of Colombo School of Computing

4.1. Recursive Definitions contd


When defining such as set, giving a complete list
of elements is impossible, and for large finite
sets, it is inefficient.
A recursive definition consists of 2 parts :
In the first part, called the anchor or ground case
the basic elements that are building blocks of all other
elements of the set are listed.
In the second part, rules are given that allow for the
construction of new objects out of basic elements or
objects that have already been constructed.
2008, University of Colombo School of Computing

4.1. Recursive Definitions contd


These rules are applied again and again to
generate new objects.
Example : To construct the set of natural
numbers, one basic element, 0, is singled out,
and the operation of incrementing by 1 is given
as :
0 N;
if n N, then (n+1) N;
there are no other objects in the set N.
N consists of the following items : 0,1,2,3,4,5,6,7,9
2008, University of Colombo School of Computing

4.1. Recursive Definitions contd


Recursive Definitions serve two purposes:
Generating new elements
Testing whether an element belongs to a set.

Recursive definitions are frequently used


to define functions and sequence of
numbers.

2008, University of Colombo School of Computing

4.2. Method calls and recursion


implementation
What happens when a method is called? If
the method has formal parameters, they
have to be initialized to the values passed
as actual parameters.
In addition, the system has to know where
to resume execution of the program after
the method has finished.
The method can be called by other
methods or by the main program (main ()).
2008, University of Colombo School of Computing

4.2. Method calls and recursion


implementation contd
The information indicating where it has been called from
has to be remembered by the system.
This could be done by storing the return address in main
memory in a place set aside for return addresses, but we
do not know in advance how much space might be
needed, and allocating too much space for that purpose
alone is not efficient.
For a method call, more information has to be stored
than just a return address. Therefore, dynamic allocation
using the run-time stack is a much better solution.

2008, University of Colombo School of Computing

4.2. Method calls and recursion


implementation contd
What information should be preserved when a method is called?
First, automatic (local) variables must be stored.
If method f1(), which contains a declaration of an automatic variable
x, calls method f2(), which locally declares the variable x, the system
has to make a distinction between these two variables x.
If f2 () uses a variable x, then its own x is meant; if f2 () assigns a
value to x, then x belonging to f1 () should be left unchanged.
When f2 () is finished, f1 () can use the value assigned to its private
x before f2 () was called.
This is especially important in the context of the present chapter,
when f1 () is the same as f2 (), when a method calls itself
recursively.

2008, University of Colombo School of Computing

4.2. Method calls and recursion


implementation contd
The state of each method, including main ( ), is characterized by
the contents of all automatic variables,
the values of the method's parameters, and
the return address indicating where to restart its caller.

The data area containing all this information is called an activation


record or a stack frame and is allocated on the run-time stack.
An activation record exists for as long as a method owning it is
executing.
This record is a private pool of information for the method, a
repository that stores all information necessary for its proper
execution and how to return to where it was called from.
Activation records usually have a short lifespan because they are
dynamically allocated at method entry and deallocated upon exiting.
Only the activation record of main ( ) outlives every other activation
record.
2008, University of Colombo School of Computing

4.2. Method calls and recursion


implementation contd
An activation record usually contains the following
information:
Values for all parameters to the method, location of the first cell if
an array is passed or a variable is passed by reference, and
copies of all other data items.
Local (automatic) variables that can be stored elsewhere, in
which case, the activation record contains only their descriptors
and pointers to the locations where they are stored.
The return address to resume control by the caller, the address
of the caller's instruction immediately following the call.
A dynamic link, which is a pointer to the caller's activation record.
The returned value for a method not declared as void. Because
the size of the activation record may vary from one call to
another, the returned value is placed right above the activation
record of the caller.
2008, University of Colombo School of Computing

4.2. Method calls and recursion


implementation contd
If a method is called either by main () or by another
method, then its activation record is created on the runtime stack.
Creating an activation record whenever a method is
called allows the system to handle recursion properly.
Recursion is calling a method that happens to have the
same name as the caller.
Therefore, a recursive call is not literally a method calling
itself, but rather an instantiation of a method calling
another instantiation of the same original.
These invocations are represented internally by different
activation records and are thus differentiated by the
system.
2008, University of Colombo School of Computing

4.3. Anatomy of a Recursive Call


The function that defines raising any number x to a
nonnegative integer power n is good example of a
recursive function.
The most natural definition of this functions given by:

if n =0

x.xn-1

if n > 0

Xn =

2008, University of Colombo School of Computing

4.3. Anatomy of a Recursive Call


contd
A Java method for computing xn can be written
directly from the definition of a power:
double power (double x, int n) {
if (n == 0)
return 1.0;
else
return x* power (x, n-1);
}
2008, University of Colombo School of Computing

4.4. The implementation of


recursion
Thee are several implementations of
recursions such as
Tail recursion
NonTail Recursion
Indirect recursion
Nested Recursion
Excessive recursion

2008, University of Colombo School of Computing

4.4.1. Tail recursion


All recursive definitions contain a reference to a
set or function being defined.
There are, however, a variety of ways such a
reference can be implemented.
This reference can be done in a straightforward
manner or in an intricate fashion, just once or
many times.
There may be many possible levels of recursion
or different levels of complexity.
2008, University of Colombo School of Computing

4.4.1. Tail recursion contd


Tail recursion is characterized by the use
of only one recursive call at the very end
of a method implementation.
In other words, when the call is made,
there are no statements left to be
executed by the method; the recursive call
is not only the last statement but there are
no earlier recursive calls, direct or indirect.
2008, University of Colombo School of Computing

4.4.1. Tail recursion contd


Example : The method tail() defined as
void tail (int i) {
if (I > 0) {
System.out.print (I + );
tail (i-1);
}
}

2008, University of Colombo School of Computing

4.4.1. Tail recursion contd


Tail recursion is simply a glorified loop and
can be easily replaced by one.
In this example, it is replaced by
substituting a loop for the if statement and
incrementing or decrementing the variable
i in accordance with the level of recursion.
In this way, tail ( ) can be expressed by an
iterative method:
2008, University of Colombo School of Computing

4.4.1. Tail recursion contd


void iterativeEquivalentOfTail ( int i ) {
for ( ; i > 0; i-- )
System. out. print (i+ "");
}

2008, University of Colombo School of Computing

4.4.1. Tail recursion contd


Is there any advantage in using tail recursion
over iteration?
For languages such as Java, there may be no
compelling advantage, but in a language such as
Prolog, which has no explicit loop construct (loops are
simulated by recursion), tail recursion acquires a
much greater weight.
In languages endowed with a loop or its equivalents,
such as an if statement combined with a goto
statement or labeled statement, tail recursion should
not be used.
2008, University of Colombo School of Computing

4.4.2. NonTail Recursion


Another problem that can be implemented in
recursion is printing an input line in reverse
order.
Here is a simple recursive implementation:
void reverse() {
char ch = getChar();
if (ch != '\n') {
reverse() ;
System.out.print(ch);
}
}
2008, University of Colombo School of Computing

4.4.3. Indirect recursion


Direct recursion - where a method f ( ) called itself.
f ( ) can call itself indirectly via a chain of other calls.
For example, f ( ) can call g(), and g ( ) can call f ( ) . This
is the simplest case of indirect recursion.
The chain of intermediate calls can be of an arbitrary
length, as in:
f ()  f1()  f2()  ..  fn()  f()
There is also the situation when f ( ) can call itself
indirectly through different chains.
Thus, in addition to the chain just given, another chain
might also be possible. For instance
f()  g1 ()  g2()   gm()  f()
2008, University of Colombo School of Computing

4.4.3. Indirect recursion


This situation can be exemplified by three methods used
for decoding information.
receive () stores the incoming information in a buffer
decode () converts it into legible form
store () stores it in a file

receive () fills the buffer and calls decode (), which in


turn, after finishing its job, submits the buffer with
decoded information to store ().
After store () accomplishes its tasks, it calls receive () to
intercept more encoded information using the same
buffer.
Therefore, we have the chain of calls
receive()  decode()  store()  receive() 
decode() 
2008, University of Colombo School of Computing

4.4.3. Indirect recursion contd


Above three methods work in the following
manner:
receive (buffer)
while buffer is not filled up
if information is still incoming
get a character and store it in buffer;
else exit( );
decode (buffer);
decode (buffer)
decode information in buffer;
store (buffer);
store (buffer)
transfer information from buffer to file;
receive (buffer);
2008, University of Colombo School of Computing

4.4.3. Indirect recursion contd


As usual in the case of recursion, there has to
be an anchor in order to avoid falling into an
infinite loop of recursive calls.

2008, University of Colombo School of Computing

Nested Recursion
A more complicated case of recursion is found in
definitions in which a function is not only defined in terms
of itself, but also is used as one of the parameters. The
following definition is an example of such a nesting:

h(n) =

if n = 0

if n > 4

h(2 +h(2n))

if n <= 4

2008, University of Colombo School of Computing

4.4.4. Nested Recursion contd


Function h has a solution for all n >= 0.
This fact is obvious for all n > 4 and n = 0,
but it has to be proven for n = 1,2,3, and 4.
Thus, h(2) = h(2 + h(4)) = h(2 + h(2 +
h(8))) = 12. (What are the values of h(n)
for n = 1,3, and 4?)

2008, University of Colombo School of Computing

4.4.4. Nested Recursion contd


Another example of nested recursion is a very important
function originally suggested by Wilhelm Ackermann in
1928 and later modified by Rozsa Peter:

A(n,m) =

m+1

if n = 0

A(n-1,1)

if n > 0, m = 0

A(n-1, A(n,m-1)) otherwise


2008, University of Colombo School of Computing

8.4.4. Nested Recursion contd

Above function is interesting because of its remarkably rapid growth.


It grows so fast that it is guaranteed not to have a representation by a
formula that uses arithmetic operations such as addition, multiplication, and
exponentiation.
To illustrate the rate of growth of the Ackermann function, we need only
show that
A(3,m) = 2m+3 -3
A(4,m) = 22:

216

-3

16

with a stack of m 2s in the exponent; A(4,l) = 22 - 3 = 265536 - 3,


which exceeds even the number of atoms in the universe (which is
1080 according to current theories).
The definition translates very nicely into Java, but the task of
expressing it in a nonrecursive form is truly troublesome.

2008, University of Colombo School of Computing

4.4.4. Excessive recursion


Logical simplicity and readability are used as an argument
supporting the use of recursion.
The price for using recursion is slowing down execution time and
storing on the run-time stack more things than required in a
nonrecursive approach.
If recursion is too deep (for example, computing 5.6100'000), then we
can run out of space on the stack and our program terminates
abnormally by raising an unrecoverable StackOverflowError.
But usually, the number of recursive calls is much smaller than
100,000, so the danger of overflowing the stack may not be
imminent. However, if some recursive function repeats the
computations for some parameters, the run time can be prohibitively
long even for very simple cases.

2008, University of Colombo School of Computing

4.4.4. Excessive recursion contd


Consider Fibonacci numbers. A sequence of Fibonacci
numbers is defined as follows:
Fib(n) =

if n = 0

Fib(n-2)+Fib(n-1) otherwise

The definition states that if the first two numbers are 0


and 1, then any number in the sequence is the sum of its
two predecessors. But these predecessors are in turn
sums of their predecessors, and so on, to the beginning
of the sequence. The sequence pro-duced by the
definition is
0,1,1,2,3, 5,8,13,21, 34, 55,89,...
2008, University of Colombo School of Computing

4.4.4. Excessive recursion contd


How can this definition be implemented in Java?
It takes almost term-by-term translation to have
a recursive version, which is
int Fib (int n) {
if (n < 2)
return n;
else return Fib(n-2) + Fib(n-l);
}

The method is simple and easy to understand


but extremely inefficient fibonacci heap.
2008, University of Colombo School of Computing

5. Trees
Part -1

2008, University of Colombo School of Computing

5.1. Trees, Binary trees and Binary


Search trees

What Is a Tree?

A tree consists of nodes connected by edges.

In the above picture of the tree, the nodes are represented as circles,
and the edges as lines connecting the circles.
Trees have been studied extensively as abstract mathematical entities,
so there's a large amount of theoretical knowledge about them.
A tree is actually an instance of a more general category called a graph.

2008, University of Colombo School of Computing

5.1. Trees, Binary trees and Binary


Search trees contd
What Is a Tree? contd
In computer programs, nodes often represent
such entities as people, car parts, airline
reservations, and etc; in other words, the
typical items we store in any kind of data
structure.
The lines (edges) between the nodes
represent the way the nodes are related.

2008, University of Colombo School of Computing

5.1. Trees, Binary trees and Binary


Search trees contd
There are different kinds of trees
Binary Tree : each node in a binary tree has a
maximum of two children.
Multiway trees : more general trees, in which
nodes can have more than two children, are
called

2008, University of Colombo School of Computing

5.1. Trees, Binary trees and Binary


Search trees contd
Why Use Binary Trees?
Why might you want to use a tree?
Usually, because it combines the advantages
of two other structures:
an ordered array and
a linked list.

You can search a tree quickly, as you can an


ordered array, and you can also insert and
delete items quickly, as you can with a linked
list.
2008, University of Colombo School of Computing

5.2. Implementation of Binary trees


The Node Class
First, we need a class of node objects.
These objects contain the data representing
the objects being stored (employees in an
employee database, for example) and also
references to each of the node's two children.
Here's how that looks:

2008, University of Colombo School of Computing

5.2. Implementation of Binary trees


contd
class Node
{
int iData; // data used as key value
float fData; // other data
node leftChild; // this node's left child
node rightChild; // this node's right child
public void displayNode()
{
// method body
}

}
2008, University of Colombo School of Computing

5.2. Implementation of Binary trees


contd
There are other approaches to designing class Node. Instead of
placing the data items directly into the node, you could use a
reference to an object representing the data item:
class Node
{
person p1; // reference to person
object
node leftChild; // this node's left child
node rightChild; // this node's right
child
}
class person
{
int iData;
float fData;
}
2008, University of Colombo School of Computing

5.2. Implementation of Binary trees


contd
The Tree Class
We'll also need a class from which to instantiate the
tree itself; the object that holds all the nodes.
We'll call this class Tree. It has only one field: a Node
variable that holds the root.
It doesn't need fields for the other nodes because
they are all accessed from the root.
The Tree class has a number of methods: some for
finding, inserting, and deleting nodes, several for
different kinds of traverses, and one to display the
tree.
2008, University of Colombo School of Computing

5.2. Implementation of Binary trees


contd
Here's a skeleton version:
class Tree
{
private Node root; // the only data field in Tree
public void find(int key)
{
}
public void insert(int id, double dd)
{
}
public void delete(int id)
{
}
// various other methods
} // end class Tree
2008, University of Colombo School of Computing

10

5.2. Implementation of Binary trees


contd
The TreeApp Class
Finally, we need a way to perform operations
on the tree.
Here's how you might write a class with a
main() routine to create a tree, insert three
nodes into it, and then search for one of them.
We'll call this class TreeApp:

2008, University of Colombo School of Computing

11

5.2. Implementation of Binary trees


contd
class TreeApp
{
public static void main(String[] args)
{
Tree theTree = new Tree; // make a tree
theTree.insert(50, 1.5); // insert 3 nodes
theTree.insert(25, 1.7);
theTree.insert(75, 1.9);
node found = theTree.find(25); // find node with key 25
if(found != null)
System.out.println("Found the node with key 25");
else
System.out.println("Could not find node with key 25");

} // end main()

} // end class TreeApp


2008, University of Colombo School of Computing

12

5.3. Searching a Binary tree


Finding a Node
Finding a node with a specific key is the simplest of the major
tree operations, so let's start with that.
Remember that the nodes in a binary search tree correspond to
objects containing information.
They could be person objects, with an employee number as the
key and also perhaps name, address, telephone number, salary,
and other fields.
Or they could represent car parts, with a part number as the key
value and fields for quantity on hand, price, and so on.
However, the only characteristics of each node that we can see
in the Workshop applet are a number and a color. A node is
created with these two characteristics and keeps them
throughout its life.
2008, University of Colombo School of Computing

13

5.3. Searching a Binary tree


contd
Java Code for Finding a Node
Here's the code for the find() routine, which is a method of the Tree class:
public Node find(int key) // find node with
given key
{ // (assumes non-empty tree)
Node current = root; // start at root
while(current.iData != key) // while no
match,
{
if(key < current.iData) // go left?
current = current.leftChild;
else
current = current.rightChild; // or go right?
if(current == null) // if no child,
return null; // didn't find it
}
return current; // found it
2008, University of Colombo School of Computing
}

14

5.3. Searching a Binary tree


contd
This routine uses a variable current to hold the node it is
currently examining.
The argument key is the value to be found. The routine
starts at the root. (It has to; this is the only node it can
access directly.) That is, it sets current to the root.
Then, in the while loop, it compares the value to be
found, key, with the value of the iData field (the key field)
in the current node. If key is less than this field, then
current is set to the node's left child.
If key is greater than (or equal) to the node's iData field,
then current is set to the node's right child.

2008, University of Colombo School of Computing

15

5.4. Ways of traversing a tree


Tree-traversal refers to the process of
visiting each node in a tree data structure,
exactly once, in a systematic way. Such
traversals are classified by the order in
which the nodes are visited.

2008, University of Colombo School of Computing

16

5.4. Ways of traversing a tree


contd
Traversal methods
Compared to linear data structures like linked lists and
one dimensional arrays, which have only one logical
means of traversal, tree structures can be traversed in
many different ways. Starting at the root of a binary tree,
there are three main steps that can be performed and
the order in which they are performed define the
traversal type. These steps (in no particular order) are:
performing an action on the current node (referred to as
"visiting" the node), traversing to the left child node, and
traversing to the right child node.

2008, University of Colombo School of Computing

17

5.4. Ways of traversing a tree


contd
To traverse a non-empty binary tree in preorder, perform the following
operations recursively at each node, starting with the root node:
1. Visit the node.
2. Traverse the left subtree.
3. Traverse the right subtree. (This is also called Depth-first traversal.)

To traverse a non-empty binary tree in inorder, perform the following


operations recursively at each node, starting with the root node:
1. Traverse the left subtree.
2. Visit the node.
3. Traverse the right subtree.

To traverse a non-empty binary tree in postorder, perform the following


operations recursively at each node, starting with the root node:
1. Traverse the left subtree.
2. Traverse the right subtree.
3. Visit the node.

Finally, trees can also be traversed in level-order, where we visit every node
on a level before going to a lower level. This is also called Breadth-first
traversal.
2008, University of Colombo School of Computing

18

5.4. Ways of traversing a tree


contd
Example

In this binary search tree,


Preorder traversal sequence: F, B, A, D, C, E, G, I, H
Inorder traversal sequence: A, B, C, D, E, F, G, H, I
Note that the inorder traversal of this binary search tree yields an
ordered list

Postorder traversal sequence: A, C, E, D, B, H, I, G, F


Level-order traversal sequence: F, B, G, A, D, I, C, E, H
2008, University of Colombo School of Computing

19

5.4.1. Breadth-first search


breadth-first search (BFS) is a graph
search algorithm that begins at the root
node and explores all the neighboring
nodes. Then for each of those nearest
nodes, it explores their unexplored
neighbor nodes, and so on, until it finds
the goal.

2008, University of Colombo School of Computing

20

5.4.1. Breadth-first search


How it works
Breadth-first search (BFS) is an uninformed search
method that aims to expand and examine all nodes of
a graph systematically in search of a solution. In other
words, it exhaustively searches the entire graph
without considering the goal until it finds it. It does not
use a heuristic.
From the standpoint of the algorithm, all child nodes
obtained by expanding a node are added to a FIFO
queue. In typical implementations, nodes that have
not yet been examined for their neighbors are placed
in some container (such as a queue or linked list)
called "open" and then once examined are placed in
the container "closed".
2008, University of Colombo School of Computing

21

5.4.1. Breadth-first search


How it works

2008, University of Colombo School of Computing

22

5.4.1. Breadth-first search


Applications of BFS
Breadth-first search can be used to solve many
problems in graph theory, for example:
Finding all connected components in a graph.
Finding all nodes within one connected component
Copying Collection, Cheney's algorithm
Finding the shortest path between two nodes u and v
(in an unweighted graph)
Testing a graph for bipartiteness
(Reverse) CuthillMcKee mesh numbering
2008, University of Colombo School of Computing

23

5.4.2. Depth-first search


Depth-first search (DFS) is an algorithm for traversing
or searching a tree, tree structure, or graph. One starts
at the root (selecting some node as the root in the graph
case) and explores as far as possible along each branch
before backtracking.
Formally, DFS is an uninformed search that progresses
by expanding the first child node of the search tree that
appears and thus going deeper and deeper until a goal
node is found, or until it hits a node that has no children.
Then the search backtracks, returning to the most recent
node it hadn't finished exploring. In a non-recursive
implementation, all freshly expanded nodes are added to
a LIFO stack for exploration.
2008, University of Colombo School of Computing

24

5.4.2. Depth-first search


How it works

See next slide:

2008, University of Colombo School of Computing

25

5.4.2. Depth-first search

a depth-first search starting at A, assuming that the left edges in the shown graph are
chosen before right edges, and assuming the search remembers previously-visited
nodes and will not repeat them (since this is a small graph), will visit the nodes in the
following order: A, B, D, F, E, C, G.
Performing the same search without remembering previously visited nodes results in
visiting nodes in the order A, B, D, F, E, A, B, D, F, E, etc. forever, caught in the A, B,
D, F, E cycle and never reaching C or G.
Iterative deepening prevents this loop and will reach the following nodes on the
following depths, assuming it proceeds left-to-right as above:

(Note that iterative deepening has now seen C, when a conventional depth-first
search did not.)

2: A, B, D, F, C, G, E, F

(Note that it still sees C, but that it came later. Also note that it sees E via a different
path, and loops back to F twice.)

0: A
1: A (repeated), B, C, E

3: A, B, D, F, E, C, G, E, F, B

For this graph, as more depth is added, the two cycles "ABFE" and "AEFB" will
simply get longer before the algorithm gives up and tries another branch.

2008, University of Colombo School of Computing

26

5.4.3. Stackless Depth-First


Traversal
Threaded trees allow you traverse the tree
by following pointers stored within the tree
Each node would store pointers to its
predecessor and successor
This would create a lot of overhead with
the additional two pointers for a total of 4
pointers per node

2008, University of Colombo School of Computing

27

5.5. Insertion and deletion


Inserting a Node
To insert a node we must first find the place to insert
it. This is much the same process as trying to find a
node that turns out not to exist, as described in the
section on Find.
We follow the path from the root to the appropriate
node, which will be the parent of the new node.
Once this parent is found, the new node is connected
as its left or right child, depending on whether the new
node's key is less than or greater than that of the
parent.
2008, University of Colombo School of Computing

28

5.5. Insertion and deletion


contd
Java Code for Inserting a Node
The insert() function starts by creating the new node,
using the data supplied as arguments.
Next, insert() must determine where to insert the new
node. This is done using roughly the same code as
finding a node, described in the section on find(). The
difference is that when you are simply trying to find a
node and you encounter a null (nonexistent) node,
you know the node you are looking for doesn't exist
so you return immediately. When you're trying to
insert a node you insert it (creating it first, if
necessary) before returning.
2008, University of Colombo School of Computing

29

5.5. Insertion and deletion


contd
The value to be searched for is the data item
passed in the argument id.
The while loop uses true as its condition
because it doesn't care if it encounters a node
with the same value as id; it treats another node
with the same key value as if it were simply
greater than the key value. (We'll return to the
subject of duplicate nodes later in this chapter.)

2008, University of Colombo School of Computing

30

5.5. Insertion and deletion


contd
A place to insert a new node will always
be found (unless you run out of memory);
when it is, and the new node is attached,
the while loop exits with a return
statement.
Here's the code for the insert() function:

2008, University of Colombo School of Computing

31

5.5. Insertion and deletion


contd
public void insert(int id, double dd){
Node newNode = new Node(); // make new node
newNode.iData = id; // insert data
newNode.dData = dd;
if(root==null) // no node in root
root = newNode;
else // root occupied {
Node current = root; // start at root
Node parent;
while(true) // (exits internally)
{
parent = current;
if(id < current.iData) // go left?
{

2008, University of Colombo School of Computing

32

5.5. Insertion and deletion


contd
current = current.leftChild;
if(current == null) // if end of the line,
{ // insert on left
parent.leftChild = newNode;
return;
}
} // end if go left
else // or go right?
{
current = current.rightChild;
if(current == null) // if end of the line
{ // insert on right
parent.rightChild = newNode;
return;
}
} // end else go right
} // end while
} // end else not root
} // end insert()

2008, University of Colombo School of Computing

33

5.5. Insertion and deletion


contd
Deletion
The algorithm to delete an arbitrary node from a
binary tree is deceptively complex, as there are many
special cases. The algorithm used for the delete
function splits it into two separate operations,
searching and deletion. Once the node which is to be
deleted has been determined by the searching
algorithm, it can be deleted from the tree. The
algorithm must ensure that when the node is deleted
from the tree, the ordering of the binary tree is kept
intact.
Special Cases that have to be considered:
2008, University of Colombo School of Computing

34

5.5. Insertion and deletion


contd

Deletion
1. The node to be deleted has no children.
In this case the node may simply be deleted
from the tree.

2008, University of Colombo School of Computing

35

5.5. Insertion and deletion


contd
Deletion contd
2. The node has one child.
The child node is appended to its
grandparent. (The parent of the node to be
deleted.)

2008, University of Colombo School of Computing

36

5.5. Insertion and deletion


contd

Deletion contd
3. The node to be deleted has two children.
This case is much more complex than the
previous two, because the order of the binary
tree must be kept intact. The algorithm must
determine which node to use in place of the
node to be deleted:

2008, University of Colombo School of Computing

37

5.5. Insertion and deletion


contd
(i)Use the inorder successor of the node to
be deleted.

2008, University of Colombo School of Computing

38

5.5. Insertion and deletion


contd
(ii) Else if no right subtree exists replace the
node to be deleted with the it's left child.

2008, University of Colombo School of Computing

39

5.5. Insertion and deletion


contd
Deletion of the root node is also a special
case. It can be accomplished using the
methods described above, checking for
the separate cases with no children, two
children, or one.
Complexity
Average case is O(log2n).
Worst case is O(n).
2008, University of Colombo School of Computing

40

5. Trees
Part -2

2008, University of Colombo School of Computing

5.6. Balancing a tree


BSTs where introduced because in theory
they give nice fast search time.
We have seen that depending on how the
data arrives the tree can degrade into a
linked list
So what is a good programmer to do.
Of course, they are to balance the tree
2008, University of Colombo School of Computing

5.6. Balancing a tree -ideas


One idea would be to get all of the data
first, and store it in an array
Then sort the array and then insert it in a
tree
Of course this does have some drawbacks
so we need another idea

2008, University of Colombo School of Computing

5.6.1. DSW Trees


Named for Colin Day and then for Quentin F.
Stout and Bette L. Warren, hence DSW.
The main idea is a rotation
rotateRight( Gr, Par, Ch )
If Par is not the root of the tree
Grandparent Gr of child Ch, becomes Chs parent by
replacing Par;

Right subtree of Ch becomes left subtree of Chs


parent Par;
Node Ch aquires Par as its right child
2008, University of Colombo School of Computing

Maybe a picture will help

2008, University of Colombo School of Computing

5.6.1.1. More of the DSW


So the idea is to take a tree and perform
some rotations to it to make it balanced.
First you create a backbone or a vine
Then you transform the backbone into a
nicely balanced tree

2008, University of Colombo School of Computing

5.6.1.2. Algorithms
createBackbone(root,
n)
Tmp = root
While ( Tmp != 0 )
If Tmp has a left child
Rotate this child
about Tmp
Set Tmp to the child
which just became
parent

Else set Tmp to its right


child

createPerfectTree(n)
M = 2floor[lg(n+1)]-1;
Make n-M rotations
starting from the top of
the backbone;
While ( M > 1 )
M = M/2;
Make M rotations
starting from the top of
the backbone;

2008, University of Colombo School of Computing

Maybe some more pictures

2008, University of Colombo School of Computing

5.6.1.3. Wrap-up
The DSW algorithm is good if you can take
the time to get all the nodes and then
create the tree
What if you want to balance the tree as
you go?
You use an AVL Tree

2008, University of Colombo School of Computing

5.6.2. AVL Trees


Named after its inventors Adelson-Velskii
and Landis, hence AVL
The heights of any subtree can only differ
by at most one.
Each nodes will indicate balance factors.
Worst case for an AVL tree is 44% worst
then a perfect tree.
In practice, it is closer to a perfect tree.
2008, University of Colombo School of Computing

10

5.6.2.1. What does an AVL do?


Each time the tree structure is changed,
the balance factors are checked and if an
imbalance is recognized, then the tree is
restructured.
For insertion there are four cases to be
concerned with.
Deletion is a little trickier.

2008, University of Colombo School of Computing

11

5.6.2.2. AVL Insertion


Case 1: Insertion into a right subtree of a
right child.
Requires a left rotation about the child

Case 2: Insertion into a left subtree of a


right child.
Requires two rotations
First a right rotation about the root of the subtree
Second a left rotation about the subtrees parent
2008, University of Colombo School of Computing

12

Some more pictures

2008, University of Colombo School of Computing

13

5.6.2.3. Deletion
Deletion is a bit trickier.
With insertion after the rotation we were
done.
Not so with deletion.
We need to continue checking balance
factors as we travel up the tree

2008, University of Colombo School of Computing

14

5.6.2.4. Deletion Specifics


Go ahead and delete the node just like in
a BST.
There are 4 cases after the deletion:

2008, University of Colombo School of Computing

15

Cases
Case 1: Deletion from a left subtree from a
tree with a right high root and a right high
right subtree.
Requires one left rotation about the root

Case 2: Deletion from a left subtree from a


tree with a right high root and a balanced
right subtree.
Requires one left rotation about the root
2008, University of Colombo School of Computing

16

Cases continued
Case 3: Deletion from a left subtree from a tree
with a right high root and a left high right subtree
with a left high left subtree.
Requires a right rotation around the right subtree root
and then a left rotation about the root

Case 4: Deletion from a left subtree from a tree


with a right high root and a left high right subtree
with a right high left subtree
Requires a right rotation around the right subtree root
and then a left rotation about the root
2008, University of Colombo School of Computing

17

Definitely some pictures

2008, University of Colombo School of Computing

18

5.7. Self-adjusting Trees


The previous sections discussed ways to
balance the tree after the tree was
changed due to an insert or a delete.
There is another option.
You can alter the structure of the tree after
you access an element
Think of this as a self-organizing tree

2008, University of Colombo School of Computing

19

5.8. Heaps
A heap is a binary tree storing keys at its internal
nodes and satisfying the following properties:
Heap-Order: for every internal node v other than the
root,
key(v) key(parent(v))
Complete Binary Tree: let h be the height of the
heap
for i = 0, , h 1, there are 2i nodes of depth I
at depth h 1, the internal nodes are to the left of the
external nodes

2008, University of Colombo School of Computing

20

5.8. Heaps contd


The last node of a heap is the rightmost internal node of depth h 1

2008, University of Colombo School of Computing

21

5.8.1. Height of a Heap


Theorem: A heap storing n keys has
height O(log n)
Proof: (we apply the complete binary tree
property)
Let h be the height of a heap storing n keys
Since there are 2i keys at depth i = 0, , h
2 and at least one key at depth h 1, we
have n 1 + 2 + 4 + + 2h2 + 1
Thus, n 2h1 , i.e., h log n + 1
2008, University of Colombo School of Computing

22

5.8.1. Height of a Heap contd

2008, University of Colombo School of Computing

23

5.8.2. Heaps and Priority Queues

We can use a heap to implement a priority queue


We store a (key, element) item at each internal node
We keep track of the position of the last node
For simplicity, we show only the keys in the pictures

2008, University of Colombo School of Computing

24

5.8.3. Insertion into a


Heap
Method insertItem of the priority queue ADT corresponds
to the insertion of a key k to the heap

The insertion algorithm consists of three steps


Find the insertion node z
(the new last node)
Store k at z and expand z
into an internal node
Restore the heap-order property
2008, University of Colombo School of Computing

25

5.8.4. Removal from a Heap


Method removeMin of the priority
queue ADT corresponds to the
removal of the root key from the
heap
The removal algorithm consists of
three steps
Replace the root key with the
key of the last node w
Compress w and its children
into a leaf
Restore
the
heap-order
property
2008, University of Colombo School of Computing

26

6. Graphs

2008, University of Colombo School of Computing

6.1. Definition of different Graphs


A graph is the basic object of study in graph theory.
Informally speaking, a graph is a set of objects called
points, nodes, or vertices connected by links called lines
or edges. In a proper graph, which is by default
undirected, a line from point A to point B is considered to
be the same thing as a line from point B to point A. In a
digraph, short for directed graph, the two directions are
counted as being distinct arcs or directed edges.
Typically, a graph is depicted in diagrammatic form as a
set of dots (for the points, vertices, or nodes), joined by
curves (for the lines or edges).

2008, University of Colombo School of Computing

6.1. Definition of different Graphs


contd
A graph or undirected graph G is an
ordered pair G: = (V,E) that is subject to
the following conditions:
V is a set, whose elements are called vertices or
nodes,
E is a multiset of unordered pairs of vertices (not
necessarily distinct), called edges or lines.
(Note that this defines the most general type of graph. Some
authors call this a multigraph and reserve the term "graph" for
simple graphs.)
2008, University of Colombo School of Computing

6.1. Definition of different Graphs


contd

The vertices belonging to an edge are called the ends, endpoints, or end
vertices of the edge.
V (and hence E) are usually taken to be finite, and many of the well-known
results are not true (or are rather different) for infinite graphs because
many of the arguments fail in the infinite case. The order of a graph is | V |
(the number of vertices). A graph's size is | E | , the number of edges. The
degree of a vertex is the number of edges that connect to it, where an edge
that connects to the vertex at both ends (a loop) is counted twice.
The edges E induce a symmetric binary relation ~ on V which is called the
adjacency relation of G. Specifically, for each edge {u,v} the vertices u and
v are said to be adjacent to one another, which is denoted u ~ v.
For an edge {u, v}, graph theorists usually use the somewhat shorter
notation uv.

2008, University of Colombo School of Computing

6.1. Definition of different Graphs


contd
Types of graphs
Directed graph
A directed graph or digraph G is an ordered pair G: = (V,A)
with
V is a set, whose elements are called vertices or nodes,
A is a set of ordered pairs of vertices, called directed edges,
arcs, or arrows.

An arc e = (x,y) is considered to be directed from x to y; y is


called the head and x is called the tail of the arc; y is said to
be a direct successor of x, and x is said to be a direct
predecessor of y. If a path leads from x to y, then y is said to
be a successor of x, and x is said to be a predecessor of y.
The arc (y,x) is called the arc (x,y) inverted.
2008, University of Colombo School of Computing

6.1. Definition of different Graphs


contd
Directed graph contd..
A directed graph G is called symmetric if, for every arc that
belongs to G, the corresponding inverted arc also belongs to G.
A symmetric loopless directed graph is equivalent to an
undirected graph with the pairs of inverted arcs replaced with
edges; thus the number of edges is equal to the number of arcs
halved.
A variation on this definition is the oriented graph, which is a
graph (or multigraph; see below) with an orientation or direction
assigned to each of its edges. A distinction between a directed
graph and an oriented simple graph is that if x and y are
vertices, a directed graph allows both (x,y) and (y,x) as edges,
while only one is permitted in an oriented graph. A more
fundamental difference is that, in a directed graph (or
multigraph), the directions are fixed, but in an oriented graph
(or multigraph), only the underlying graph is fixed, while the
orientation may vary.
2008, University of Colombo School of Computing

6.1. Definition of different Graphs


contd

Types of graphs

Undirected graph
A graph G = {V,E} in which every edge is undirected. This is the same as a
digraph (look above) where for an edge (v,u) there is an edge from v to u and u
to v.

Finite graph
A finite graph is a graph G = <V,E> such that V(G) and E(G) are finite
sets.

Simple graph
A simple graph is an undirected graph that has no self-loops and no
more than one edge between any two different vertices. In a simple
graph the edges of the graph form a set (rather than a multiset) and
each edge is a pair of distinct vertices. In a simple graph with p vertices
every vertex has a degree that is less than p.

2008, University of Colombo School of Computing

6.1. Definition of different Graphs


contd
Types of graphs
Regular graph
A regular graph is a graph where each vertex has the same number of
neighbors, i.e., every vertex has the same degree or valency. A regular
graph with vertices of degree k is called a k-regular graph or regular
graph of degree k.

Weighted graph
A graph is a weighted graph if a number (weight) is assigned to
each edge. Such weights might represent, for example, costs,
lengths or capacities, etc. depending on the problem.
Weight of the graph is sum of the weights given to all edges.

2008, University of Colombo School of Computing

6.1. Definition of different Graphs


contd

Types of graphs
Mixed graph

A mixed graph G is a graph in which some edges may be


directed and some may be undirected. It is written as an ordered
triple G := (V, E, A) with V, E, and A defined as above. Directed
and undirected graphs are special cases.

Complete graph
Complete graphs have the feature that each pair of vertices has an
edge connecting them.

Loop
A loop is an edge (directed or undirected) which starts and ends on the same
vertex; these may be permitted or not permitted according to the application. In
this context, an edge with two different ends is called a link.
2008, University of Colombo School of Computing

6.1. Definition of different Graphs


contd
Types of graphs
Multi graph
The term "multigraph" is generally understood to mean that multiple
edges (and sometimes loops) are allowed. Where graphs are defined so
as to allow loops and multiple edges, a multigraph is often defined to
mean a graph without loops,however, where graphs are defined so as
to disallow loops and multiple edges, the term is often defined to mean
a "graph" which can have both multiple edges and loops, although many
use the term "pseudograph" for this meaning.

Half-edges, loose edges


In exceptional situations it is even necessary to have edges with
only one end, called half-edges, or no ends (loose edges).
2008, University of Colombo School of Computing

10

6.2. Graph Representation

Two common ways to represent graphs on a computer are as an adjacency


list or as an adjacency matrix.

Adjacency list:
Vertices are labelled (or re-labelled) from 0 to |V(G)|-1.
Corresponding to each vertex is a list (either an array or linked
list) of its neighbours.
Adjacency matrix:
Vertices are labelled (or re-labelled) with integers from 0 to
|V(G)|-1. A two-dimensional boolean array A with dimensions
|V(G)| x |V(G)| contains a 1 at A[i][j]
if there is an edge from the vertex labelled i to the vertex
labelled j,and a 0 otherwise.
Both representations allow us to represent directed graphs, since
we can have an edge from vi to vj , but lack one from vi to vj . To
represent undirected graphs, we simply make sure that are
edges are listed twice: once from vi to vj , and once from vi to vj .
2008, University of Colombo School of Computing

11

6.3. Graph Traversals contd


Breadth first search

Given a graph G=(V,E) and a source


vertex s, BFS explores the edges of G
to discover (visit) each node of G
reachable from s.
Idea - expand a frontier one step at a
time.
Frontier is a FIFO queue (O(1) time to
update)
2008, University of Colombo School of Computing

12

6.3. Graph Traversals contd


Breadth first search
Computes the shortest distance (dist) from s
to any reachable node.
Computes a breadth first tree (of parents)
with root s that contains all the reachable
vertices from s.
To get O(|V|+|E|) we use an adjacency list
representation. If we used an adjacency
matrix it would be (|V|2)

2008, University of Colombo School of Computing

13

6.3. Graph Traversals contd


Coloring the nodes

We use colors (white, gray and black) to


denote the state of the node during the
search.
A node is white if it has not been reached
(discovered).
Discovered nodes are gray or black. Gray
nodes are at the frontier of the search.
Black nodes are fully explored nodes.
2008, University of Colombo School of Computing

14

6.3. Graph Traversals contd


BFS - initialize
procedure BFS(G:graph; s:node; var
color:carray; dist:iarray; parent:parray);
for each vertex u do
color[u]:=white; dist[u]:=;
(V)
parent[u]:=nil; end for
color[s]:=gray; dist[s]:=0;
init(Q); enqueue(Q, s);
2008, University of Colombo School of Computing

15

6.3. Graph Traversals contd


BFS - main
while not (empty(Q)) do
u:=head(Q);
for each v in adj[u] do
if color[v]=white then
color[v]:=gray; dist[v]:=dist[u]+1;
parent[v]:=u; enqueue(Q, v);
dequeue(Q); color[u]:=black;
end BFS
| ADJ [u ] |= deg ree[u ] =O( E )
uV

uVof Colombo School of Computing


2008, University

O(E)

16

6.3. Graph Traversals contd


BFS example
r

s
0

s
0

w
s
0

wr

w
s
0

r
1

x
t
2

rtx

r
1

t
2

2
x

txv

1
w

2
x

2
v

1
w

2008, University of Colombo School of Computing

17

6.3. Graph Traversals contd


BFS example
r
1

s
0.

r
3

xvu
v

uy

u
3

vuy
2

r
1

s
0

1
w
s
0

t
2

2
x

u
3

r
1

1
2
3
w
x
y
s
t
u
0
2
3

1
2
2
2
3
1
2
v
w
x
y
v
w
x
now y is removed from the Q and colored black
2008, University of Colombo School of Computing

3
y
18

6.3. Graph Traversals contd


Analysis of BFS
Initialization is (|V|).
Each node can be added to the queue at
most once (it needs to be white), and its
adjacency list is searched only once. At most
all adjacency lists are searched.
If graph is undirected each edge is reached
twice, so loop repeated at most 2|E| times.
If graph is directed each edge is reached
exactly once. So the loop repeated at most
|E| times.
Worst case time O(|V|+|E|)
2008, University of Colombo School of Computing

19

6.3. Graph Traversals contd


Depth First Search

Goal - explore every vertex and edge of G


We go deeper whenever possible.
Directed or undirected graph G = (V, E).
To get worst case time (|V|+|E|)

we use
an adjacency list representation. If we
used an adjacency matrix it would be
(|V|2)
2008, University of Colombo School of Computing

20

6.3. Graph Traversals contd


Depth First Search
Until there are no more undiscovered nodes.
Picks an undiscovered node and starts a depth first
search from it.
The search proceeds from the most recently
discovered node to discover new nodes.
When the last discovered node v is fully explored,
backtracks to the node used to discover v. Eventually,
the start node is fully explored.

2008, University of Colombo School of Computing

21

6.3. Graph Traversals contd


Depth First Search
In this version all nodes are discovered even if
the graph is directed, or undirected and not
connected

The algorithm saves:


A depth first forest of the edges used to
discover new nodes.
Timestamps for the first time a node u is
discovered d[u] and the time when the node is
fully explored f[u]

2008, University of Colombo School of Computing

22

6.3. Graph Traversals contd


Depth First Search
procedure DFS(G:graph; var color:carray; d, f:iarray;
parent:parray);
for each vertex u do
color[u]:=white; parent[u]:=nil;
(V)
end for
time:=0;
for each vertex u do
if color[u]=white then
DFS-Visit(u); end if; end for
end DFS
2008, University of Colombo School of Computing

23

6.3. Graph Traversals contd


DFS-Visit(u)
color[u]=:gray; time:=time+1; d[u]:=time
for each v in adj[u] do
if color[v]=white then
parent[v]:=u; DFS-Visit(v);
end if; end for;
color[u]:=black; time:=time+1; f[u]:=time;
end DFS-Visit
2008, University of Colombo School of Computing

24

6.3. Graph Traversals contd


DFS example (1)
u

1/

x
u

y
v

1/

2/

z
w

1/

2/

1/

z
w

2/

B
3/

4/

3/

2008, University of Colombo School of Computing

z
25

6.3. Graph Traversals contd


DFS example (2)
u

1/

2/

1/

B
4/5
x

3/

z
u
1/

w
2/

B
4/5 3/6
x
y
w

2/7

B
4/5 3/6
x
y

2008, University of Colombo School of Computing

26

6.3. Graph Traversals contd


DFS example (3)
u

1/8

2/7

1/8

F B
4/5 3/6
x
y
z
u
v
w
1/8

2/7

F B C
4/5 3/6 10
x
y
z

w
9

2/7

F B C
4/5 3/6
x
y
z
u
v
w
1/8

2/7

F B C
4/5 3/6 10/11
x
y
z

2008, University of Colombo School of Computing

27

6.3. Graph Traversals contd


DFS example (4)
u
1/8

2/7 9/12

F B C
4/5 3/6 10/11
x
y
z

2008, University of Colombo School of Computing

28

6.3. Graph Traversals contd


Analysis
DFS is (|V|) (excluding the time taken by
the DFS-Visits).
DFS-Visit is called once for each node v.
Its for loop is executed |adj(v)| times. The
DFS-Visit calls for all the nodes take
(|E|).
Worst case time (|V|+|E|)

2008, University of Colombo School of Computing

29

6.4. Shortest Paths


Example:
In a flight route graph, the weight of
an edge represents the distance in
miles between the endpoint airports
SFO

PVD

ORD
LGA

HNL

LAX

DFW
MIA

2008, University of Colombo School of Computing

30

6.4 Shortest Paths contd


The weight of path p = < v0, v1,.vk> is
the sum of the weights of its constituent
edges.
Given a weighted graph and two vertices u and
v, we want to find a path of minimum total weight
between u and v.
Length of a path is the sum of the weights of
its edges.
2008, University of Colombo School of Computing

31

6.4 Shortest Paths contd


Example: Shortest path between

Providence and Honolulu


Applications
Internet packet routing
Flight reservations
Driving directions
SFO

PVD

ORD
LGA

HNL

LAX

DFW

2008, University of Colombo School of Computing

MIA

32

6.4 Shortest Paths contd


We will focus on Single source shortest
paths problem: given a graph G = (V,E),
we want to find a shortest path from a
given source vertex s V to each vertex v
V.
Shortest Path Properties
Property 1: A sub path of a shortest path is itself a
shortest path.
Property 2: There is a tree of shortest paths from
a start vertex to all the other vertices.
2008, University of Colombo School of Computing

33

6.4.1. Shortest Path Problem


Example:
Tree of shortest paths from Providence

SFO

PVD

ORD
LGA

HNL

LAX

DFW
MIA

2008, University of Colombo School of Computing

34

6.4.1. Shortest Path Problem


contd
The shortest path algorithms use the
technique of relaxation.
For each vertex v V, an attribute d[v]
is maintained which is an upper bound
on the weight of a shortest path from
source s to v.
d[v] shortest path estimate
2008, University of Colombo School of Computing

35

6.4.2. Shortest Path Algorithms


The shortest path estimates and
predecessors are initialized by the
following O(V) time procedure.
INITIALIZE-SINGLE-SOURCE (G,s)
for each vertex v V[G]
do d[v] 
[v]  NIL
d[s]  0
2008, University of Colombo School of Computing

36

6.4.2. Shortest Path Algorithms


contd
Relaxing and edge (u,v) consists of
testing whether the shortest path to v
found so far can be improved by going
through u. If so d[v] and [v] values
should be updated.
A relaxation step may decrease the
value of the shortest path estimate d[v].
2008, University of Colombo School of Computing

37

6.4.3. Relaxation
Relaxation of an edge (u,v) with weight
w(u,v) = 2.
u

u
5

Relax (u,v,w)

u 5

v
6
Relax (u,v,w)

d[v] > d[u] + w(u,v)


d[v] is changed by relaxation

d[v] d[u] + w(u,v)


d[v] is unchanged by
relaxation
2008, University of Colombo School of Computing

38

6.4.3. Relaxation contd


Relax (u,v,w)
if d[v] > d[u] + w(u,v)
then d[v]  d[u] + w(u,v)
[v]  u

2008, University of Colombo School of Computing

39

6.5. Cycle Detection


Cycle detection on a graph is a bit different
than on a tree due to the fact that a graph
node can have multiple parents. On a tree,
the algorithm for detecting a cycle is to do
a depth first search, marking nodes as
they are encountered. If a previously
marked node is seen again, then a cycle
exists. This wont work on a graph.
2008, University of Colombo School of Computing

40

6.5. Cycle Detection contd


The graph in figure will be falsely reported
to have a cycle, since node C will be seen
twice in a DFS starting at node A.

2008, University of Colombo School of Computing

41

6.5. Cycle Detection contd


The cycle detection algorithm for trees can easily be
modified to work for graphs. The key is that in a DFS of
an acyclic graph, a node whose descendants have all
been visited can be seen again without implying a cycle.
However, if a node is seen a second time before all of its
descendants have been visited, then there must be a
cycle. Can you see why this is? Suppose there is a cycle
containing node A. Then this means that A must be
reachable from one of its descendants. So when the
DFS is visiting that descendant, it will see A again,
before it has finished visiting all of As descendants. So
there is a cycle.
2008, University of Colombo School of Computing

42

6.5. Cycle Detection contd


In order to detect cycles, we use a
modified depth first search called a
colored DFS. All nodes are initially marked
white. When a node is encountered, it is
marked grey, and when its descendants
are completely visited, it is marked black.
If a grey node is ever encountered, then
there is a cycle.
2008, University of Colombo School of Computing

43

Cycle detection algorithm.


boolean containsCycle(Graph g):
for each vertex v in g do:
v.mark = WHITE;
od;
for each vertex v in g do:
if v.mark == WHITE then:
if visit(g, v) then:
return TRUE;
fi;
fi;
od;
return FALSE;
boolean visit(Graph g, Vertex v):
v.mark = GREY;
for each edge (v, u) in g do:
if u.mark == GREY then:
return TRUE;
else if u.mark == WHITE then:
if visit(g, u) then:
return TRUE;
fi;
fi;
od;
v.mark = BLACK;
return FALSE;
2008, University of Colombo School of Computing

44

6.6. Spanning Tree


A spanning tree of a graph is a subgraph
that is a tree containing all the vertices.

2008, University of Colombo School of Computing

45

6.6.1. Minimum Spanning Tree


(MST)
The spanning tree among all spanning
trees with the lowest total edge weight.

2008, University of Colombo School of Computing

46

6.6.2. Applications of MST


Problem
Computer networks
- How to connect a set of computers
using the minimum amount of wire.
Electronic circuits

2008, University of Colombo School of Computing

47

6.6.3. Minimum Spanning Tree


(MST)
Find the MST.

2008, University of Colombo School of Computing

48

6.6.3.Solution

2008, University of Colombo School of Computing

49

6.6.4. Generic Algorithm for MST


Input : connected weighted graph, G
Output : MST, T, for graph G
Greedy strategy in the generic algorithm
- Grow the MST one edge at a time.
- Manage a set of edges A, that is prior to each
iteration, A is a subset of some MST
At each step determine an edge (u,v) that can be
added to A without violating this invariant.
We call such an edge a safe edge for A, since it
can be safely added to A while maintaining the
invariant.
2008, University of Colombo School of Computing

50

6.6.4. Generic Algorithm for MST


Generic-MST(G,w)
1. A  0
2. while A does not form a spanning tree
3. do find an edge (u,v) that is safe for
A
4.
A  A U { (u,v)}
5. return A
2008, University of Colombo School of Computing

51

6.7. Connectivity of graphs


Undirected Graph

An Undirected Graph is a graph where

the edges have no directions.


The edges in an undirected graph are
called Undirected Edges.
V1

{vi,vj} = {vj, vi}

V3
V2

V4
2008, University of Colombo School of Computing

52

6.7. Connectivity of graphs


contd
Example (Undirected Graph)
3

G = (V, E)
2

V = {1, 2, 3, 4, 5}
4
5

E = {(1,2), (1,3), (1,4), (2,3),


(3,5), (4,5)}

2008, University of Colombo School of Computing

53

6.7. Connectivity of graphs


contd
Directed Graphs

A Directed Graph or Digraph is a graph where


each edge has a direction.
The edges in a digraph are called Arcs or
Directed Edges.
V1

{vi,vj} = {vj, vi}

V3
V2

V4
2008, University of Colombo School of Computing

54

6.7. Connectivity of graphs


contd
Example (Digraph)
1
6
4

G = (V, E)

V = {1, 2, 3, 4, 5, 6}
3

E = {(1,4), (2,1), (2,3), (3,2), (4,3),


(4,5), (4,6), (5,3), (6,1), (6,5)}

(1, 4) = 14 where 1 is the tail


and 4 is the head

2008, University of Colombo School of Computing

55

6.8. Topological Sort


Graphs are sometimes used to represent
before and after relationships.
For example, you need to think through a
design for a program before you start
coding.
These two steps can be represented as
vertices, and the relationship between
them as a directed edge from the first to
the second.
2008, University of Colombo School of Computing

56

6.8. Topological Sort contd


On such graphs, it is useful to determine
which steps must come before others. The
topological sort algorithm computes an
ordering on a graph such that if vertex is
earlier than vertex in the ordering, there is
no path from to . In other words, you
cannot get from a vertex later in the ordering
to a vertex
earlier in the ordering. Of course, topological
sort works only on directed acyclic graphs.
2008, University of Colombo School of Computing

57

6.8. Topological Sort contd


The simplest topological sort algorithm is
to repeatedly remove vertices with indegree of 0 from the graph. The edges
belonging to the vertex are also removed,
reducing the in-degree of adjacent
vertices. This is done until the graph is
empty, or until no vertex without incoming
edges exists, in which case the sort fails.
2008, University of Colombo School of Computing

58

6.9. Networks
Networks can be used to represent the
transportation of some commodity through
a system of delivery channels.
There are sources (x) and sinks (y).
The network is a directed graph, where
each arc a is associated with a capacity,
c(a).

2008, University of Colombo School of Computing

59

6.9. Networks contd

2008, University of Colombo School of Computing

60

6.9.1. Flow
A flow in a network is a set of numbers associated
with each arc, f (a).
This indicates how much of a channels capacity is
being used.
0 f (a) c(a).
For a vertex v, the flow into and out of the vertex is
denoted by f(v) and f+(v) respectively.
For intermediate vertices (not sources or sinks) the
flow in is the same as the flow out. This is called the
conservation condition.
2008, University of Colombo School of Computing

61

6.9.2. Resultant flow


For some set of vertices S, the resultant flow out of S is
given by f+(S) f(S).
We are often interested in the resultant flow out of the
source x. (Or the set of sources X if there is more than
one).
In particular, we usually want to find a maximum flow, so
that as much of the capacity is used as possible in
transporting out of the sources to the sinks.
It is straightforward to extend a network with multiple
sources and sinks to one with just one source and sink in
order to analyse the maximum flow.

2008, University of Colombo School of Computing

62

6.9.3. Cuts
A cut is a division of the vertices into two sets S and s, so
that the source is in S and the sink is in .
The capacity of a cut is the sum of all the edges which
cross between S and s .
How many cuts are possible in a network with vertices?
What are the different cuts of the network on the board,
and what are their capacities?

2008, University of Colombo School of Computing

63

6.9.4. Max-flow min-cut


In all the examples we have seen, the minimum
capacity cut is the same as the maximum flow.
Intuitively we can think of saturating the
bottlenecks.
To prove, we can show first that max flow min
cut (no augmenting paths).
Then show max flow min cut (removing
edges changes capacity).

2008, University of Colombo School of Computing

64

6.9.5. The Ford-Fulkerson


Algorithm

An algorithm for finding the maximum flow in a


network.
1. Set the flow to zero for all arcs.
2. Calcuate the residual network Gf . While there is a
path p from x to y in Gf :

Find cf (p) = min{cf (u, v)|(u, v)


p}
For each edge in p, add cf (p) to the flow.

(Subtract cf (p) from the flow if the edge is a reverse arc in the
network).

Repeat step 2 until there is no augmenting path.

2008, University of Colombo School of Computing

65

6.9.5. The Ford-Fulkerson


Algorithm contd

2008, University of Colombo School of Computing

66

6.9.6. Other problems regarding


network flow
Multi commodity flow: a number of sources
produce different products that are to be
transported to different sinks using the
same network.
Mimimum cost flow: each arc has an
associated cost, and we want to find the
cheapest mode of transportation.
Circulation: there is a lower bound on the
flow as well as an upper bound.
2008, University of Colombo School of Computing

67

7. Sorting and Searching


Algorithms

2008, University of Colombo School of Computing

7.1. Efficiency of Algorithms


Worst case efficiency
is the maximum number of steps that an algorithm can take for any
collection of data values.

Best case efficiency


is the minimum number of steps that an algorithm can take any collection of
data values.

Average case efficiency


- the efficiency averaged on all possible inputs
- must assume a distribution of the input
- we normally assume uniform distribution (all keys are equally probable)

If the input has size n, efficiency will be a function of n


2008, University of Colombo School of Computing

7.1. Efficiency of Algorithms


contd
We are interested in analyzing the
efficiency of an algorithm
involves determining the quantity of computer
resources consumed by the algorithm

These resources include


the amount of memory and
the amount of computational time

The efficiency of a given algorithm is


determined the resources required, as the
size of the algorithm grows.
2008, University of Colombo School of Computing

7.1.1. The Big-O Notation


We can say that a function is of the order of n",
which can be written as O(n) to describe the
upper bound on the number of operations
This is called Big-Oh notation
Some common orders are:

O( 1 ) constant (the size of n has no effect)


O( log n ) logarithmic
O( n log n )
O( n2 ) quadratic
O( n3 ) cubic
O( 2n ) exponential
2008, University of Colombo School of Computing

7.1.2. Formal Big-O Definition


Given functions f(n) and g(n), we say that f(n) is
O(g(n)) if there are positive constants
c and n0 such that
10,000
f(n) cg(n) for n n0

3n
2n+10

1,000

100

Example: 2n + 10 is O(n)
2n + 10 cn
(c 2) n 10
n 10/(c 2)
Pick c = 3 and n0 = 10

10

1
1

2008, University of Colombo School of Computing

10

100

1,000

n
5

7.1.3. Big-O and Growth Rate


The big-Oh notation gives an upper bound
on the growth rate of a function
The statement f(n) is O(g(n)) means that
the growth rate of f(n) is no more than the
growth rate of g(n)
We can use the big-Oh notation to rank
functions according to their growth rate

2008, University of Colombo School of Computing

7.1.4. Big-O Rules


If is f(n) a polynomial of degree d, then f(n)
is O(nd), i.e.,
1.Drop lower-order terms
2.Drop constant factors

Use the smallest possible class of


functions
Say 2n is O(n) instead of 2n is O(n2)

Use the simplest expression of the class


Say 3n + 5 is O(n) instead of 3n + 5 is O(3n)
2008, University of Colombo School of Computing

7.1.5 Examples
We say that n4 + 100n2 + 10n + 50 is of
the order of n4 or O(n4)
We say that 10n3 + 2n2 is O(n3)
We say that n3 - n2 is O(n3)
We say that 10 is O(1),
We say that 1273 is O(1)

2008, University of Colombo School of Computing

7.1.6. Big-O Rules


If is f(n) a polynomial of degree d, then f(n) is
O(nd), i.e.,
1. Drop lower-order terms
2. Drop constant factors

Use the smallest possible class of functions


Say 2n is O(n) instead of 2n is O(n2)

Use the simplest expression of the class


Say 3n + 5 is O(n) instead of 3n + 5 is O(3n)

2008, University of Colombo School of Computing

7.1.7. Relatives of Big-O


big-Omega

f(n) is (g(n)) if there is a constant c > 0


and an integer constant n0 1 such that
f(n) cg(n) for n n0
big-Theta

f(n) is (g(n)) if there are constants c > 0 and c > 0 and an


integer constant n0 1 such that cg(n) f(n) cg(n) for n
n0
little-oh

f(n) is o(g(n)) if, for any constant c > 0, there is an integer


constant n0 0 such that f(n) cg(n) for n n0
little-omega

f(n) is (g(n)) if, for any constant c > 0, there is an integer


constant n0 0 such that f(n) cg(n) for n n0
2008, University of Colombo School of Computing

10

7.1.8. Average vs. Worst Case


Algorithm may run faster on some inputs
than it does on the others
Average case refers to the running time of
an algorithm as an average taken over all
inputs of a same size
Worst case refers to the running time of an
algorithm as the maximum taken over all
inputs of a same size
2008, University of Colombo School of Computing

11

7.2. Searching Algorithms

Searching algorithms are closely related to the concept of dictionaries.


Dictionaries are data structures that support search, insert, and delete
operations.
One of the most effective representations is a hash table. Typically, a
simple function is applied to the key to determine its place in the dictionary.
Another efficient search algorithms on sorted tables is binary search.
If the dictionary is not sorted then heuristic methods of dynamic
reorganization of the dictionary are of great value. One of the simplest are
cache-based methods: several recently used keys are stored a special data
structure that permits fast search (for example is always sorted). For
example keeping the last N recently found values at the top of the table (or
list) dramatically improved performance. Other cache-based approach are
also possible. In the simplest form the cache can be merged with the
dictionary:
move-to-front method : A heuristic that moves the target of a search to the
head of a list so it is found faster next time.
transposition method : Search an array or list by checking items one at a time.
If the value is found, swap it with its predecessor so it is found faster next time.

2008, University of Colombo School of Computing

12

7.2.1. Binary search trees


binary search tree (BST) is a binary tree data structure which has
the following properties:
each node (item in the tree) has a key value;
a total order (linear order) is defined on these key values;
the left subtree of a node contains only values less than the parent
node's key value;
the right subtree of a node contains only values greater than or equal to
the parent node's key value.

The major advantage of binary search trees over the other data
structures is that the related sorting algorithms and search
algorithms such as in-order traversal can be very efficient.
Binary search trees can choose to allow or disallow duplicate
values, depending on the implementation.
Binary search trees are a fundamental data structures used to
construct more abstract data structures such as sets, multisets, and
associative arrays.
2008, University of Colombo School of Computing

13

7.2.2. B-trees
B-trees are multiway trees, commonly
used in external storage, in which nodes
correspond to blocks on the disk. As in
other trees, the algorithms find their way
down the tree, reading one block at each
level. B-trees provide searching, insertion,
and deletion of records in O(logN) time.
This is quite fast and works even for very
large files. However, the programming is
not trivial.
2008, University of Colombo School of Computing

14

7.2.3.1. Breadth-first search


breadth-first search (BFS) is a graph
search algorithm that begins at the root
node and explores all the neighboring
nodes. Then for each of those nearest
nodes, it explores their unexplored
neighbor nodes, and so on, until it finds
the goal.

2008, University of Colombo School of Computing

15

7.2.3.1. Breadth-first search


How it works
BFS is an uninformed search method that aims to
expand and examine all nodes of a graph
systematically in search of a solution. In other words,
it exhaustively searches the entire graph without
considering the goal until it finds it. It does not use a
heuristic.
From the standpoint of the algorithm, all child nodes
obtained by expanding a node are added to a FIFO
queue. In typical implementations, nodes that have
not yet been examined for their neighbors are placed
in some container (such as a queue or linked list)
called "open" and then once examined are placed in
the container "closed".
2008, University of Colombo School of Computing

16

7.2.3.1. Breadth-first search


How it works

2008, University of Colombo School of Computing

17

7.2.3.1. Breadth-first search


Applications of BFS
Breadth-first search can be used to solve many
problems in graph theory, for example:
Finding all connected components in a graph.
Finding all nodes within one connected component
Copying Collection, Cheney's algorithm
Finding the shortest path between two nodes u and v
(in an unweighted graph)
Testing a graph for bipartiteness
(Reverse) CuthillMcKee mesh numbering
2008, University of Colombo School of Computing

18

7.2.3.2. Depth-first search


Depth-first search (DFS) is an algorithm for traversing
or searching a tree, tree structure, or graph. One starts
at the root (selecting some node as the root in the graph
case) and explores as far as possible along each branch
before backtracking.
Formally, DFS is an uninformed search that progresses
by expanding the first child node of the search tree that
appears and thus going deeper and deeper until a goal
node is found, or until it hits a node that has no children.
Then the search backtracks, returning to the most recent
node it hadn't finished exploring. In a non-recursive
implementation, all freshly expanded nodes are added to
a LIFO stack for exploration.
2008, University of Colombo School of Computing

19

7.2.3.2. Depth-first search


How it works

2008, University of Colombo School of Computing

20

7.2.3.2. Depth-first search

a depth-first search starting at A, assuming that the left edges in the shown graph are
chosen before right edges, and assuming the search remembers previously-visited
nodes and will not repeat them (since this is a small graph), will visit the nodes in the
following order: A, B, D, F, E, C, G.
Performing the same search without remembering previously visited nodes results in
visiting nodes in the order A, B, D, F, E, A, B, D, F, E, etc. forever, caught in the A, B,
D, F, E cycle and never reaching C or G.
Iterative deepening prevents this loop and will reach the following nodes on the
following depths, assuming it proceeds left-to-right as above:

(Note that iterative deepening has now seen C, when a conventional depth-first
search did not.)

2: A, B, D, F, C, G, E, F

(Note that it still sees C, but that it came later. Also note that it sees E via a different
path, and loops back to F twice.)

0: A
1: A (repeated), B, C, E

3: A, B, D, F, E, C, G, E, F, B

For this graph, as more depth is added, the two cycles "ABFE" and "AEFB" will
simply get longer before the algorithm gives up and tries another branch.

2008, University of Colombo School of Computing

21

7.2.4. Java Implementations


Java implemetation Depth-first search

class StackX
{
private final int SIZE = 20;
private int[] st;
private int top;
public StackX() // constructor
{
st = new int[SIZE]; // make array
top = -1;
}
public void push(int j) // put item on stack
{ st[++top] = j; }
public int pop() // take item off stack
{ return st[top--]; }
public int peek() // peek at top of stack
{ return st[top]; }
public boolean isEmpty() // true if nothing on stack
{ return (top == -1); }
} // end class StackX
////////////////////////////////////////////////////////////////
2008, University of Colombo School of Computing

22

7.2.4. Java Implementations


class Vertex
{
public char label; // label (e.g. 'A')
public boolean wasVisited;
// -----------------public Vertex(char lab) // constructor
{
label = lab;
wasVisited = false;
}
// -----------------} // end class Vertex
////////////////////////////////////////////////////////////////

2008, University of Colombo School of Computing

23

7.2.4. Java Implementations


class Graph
{
private final int MAX_VERTS = 20;
private Vertex vertexList[]; // list of vertices
private int adjMat[][]; // adjacency matrix
private int nVerts; // current number of vertices
private StackX theStack;
// -----------------public Graph() // constructor
{
vertexList = new Vertex[MAX_VERTS];
// adjacency matrix
adjMat = new int[MAX_VERTS][MAX_VERTS];
nVerts = 0;
for(int j=0; j<MAX_VERTS; j++) // set adjacency
for(int k=0; k<MAX_VERTS; k++) // matrix to 0
adjMat[j][k] = 0;
theStack = new StackX();
} // end constructor
// ----------------- 2008, University of Colombo School of Computing

24

7.2.4. Java Implementations


public void addVertex(char lab)
{
vertexList[nVerts++] = new Vertex(lab);
}
public void addEdge(int start, int end)
{
adjMat[start][end] = 1;
adjMat[end][start] = 1;
}
// -----------------public void displayVertex(int v)
{
System.out.print(vertexList[v].label);
}
// ----------------- 2008, University of Colombo School of Computing

25

7.2.4. Java Implementations


public void dfs() // depth-first search
{ // begin at vertex 0
vertexList[0].wasVisited = true; // mark it
displayVertex(0); // display it
theStack.push(0); // push it
while( !theStack.isEmpty() ) // until stack empty,
{
// get an unvisited vertex adjacent to stack top
int v = getAdjUnvisitedVertex( theStack.peek() );
if(v == -1) // if no such vertex,
theStack.pop();
else // if it exists,
{
vertexList[v].wasVisited = true; // mark it
displayVertex(v); // display it
theStack.push(v); // push it
}
} // end while
// stack is empty, so we're done
for(int j=0; j<nVerts; j++) // reset flags
vertexList[j].wasVisited = false;
} // end dfs
// ------------------

2008, University of Colombo School of Computing

26

7.2.4. Java Implementations


// returns an unvisited vertex adj to v
public int getAdjUnvisitedVertex(int v)
{
for(int j=0; j<nVerts; j++)
if(adjMat[v][j]==1 && vertexList[j].wasVisited==false)
return j;
return -1;
} // end getAdjUnvisitedVert()
// -----------------} // end class Graph
////////////////////////////////////////////////////////////////

2008, University of Colombo School of Computing

27

7.2.4. Java Implementations


class DFSApp

{
public static void main(String[] args)
{
Graph theGraph = new Graph();
theGraph.addVertex('A'); // 0 (start for dfs)
theGraph.addVertex('B'); // 1
theGraph.addVertex('C'); // 2
theGraph.addVertex('D'); // 3
theGraph.addVertex('E'); // 4
theGraph.addEdge(0, 1); // AB
theGraph.addEdge(1, 2); // BC
theGraph.addEdge(0, 3); // AD
theGraph.addEdge(3, 4); // DE
System.out.print("Visits: ");
theGraph.dfs(); // depth-first search
System.out.println();
} // end main()
} // end class DFSApp
////////////////////////////////////////////////////////////////
2008, University of Colombo School of Computing

28

7.2.4. Java Implementations


Breath-first search
class Queue
{
private final int SIZE = 20;
private int[] queArray;
private int front;
private int rear;
public Queue() // constructor
{
queArray = new int[SIZE];
front = 0;
rear = -1;
}
public void insert(int j) // put item at rear of queue
{
if(rear == SIZE-1)
rear = -1;
queArray[++rear] = j;
}
2008, University of Colombo School of Computing

29

7.2.4. Java Implementations


public int remove() // take item from front of queue
{
int temp = queArray[front++];
if(front == SIZE)
front = 0;
return temp;
}
public boolean isEmpty() // true if queue is empty
{
return ( rear+1==front || (front+SIZE-1==rear) );
}
} // end class Queue
////////////////////////////////////////////////////////////////

2008, University of Colombo School of Computing

30

7.2.4. Java Implementations


////////////////////////////////////////////////////////////////
class Vertex
{
public char label; // label (e.g. 'A')
public boolean wasVisited;
// -----------------------------------------------------------public Vertex(char lab) // constructor
{
label = lab;
wasVisited = false;
}
// -----------------------------------------------------------} // end class Vertex
////////////////////////////////////////////////////////////////
2008, University of Colombo School of Computing

31

7.2.4. Java Implementations


class Graph
{
private final int MAX_VERTS = 20;
private Vertex vertexList[]; // list of vertices
private int adjMat[][]; // adjacency matrix
private int nVerts; // current number of vertices
private Queue theQueue;
// -----------------public Graph() // constructor
{
vertexList = new Vertex[MAX_VERTS];
// adjacency matrix
adjMat = new int[MAX_VERTS][MAX_VERTS];
nVerts = 0;
for(int j=0; j<MAX_VERTS; j++) // set adjacency
for(int k=0; k<MAX_VERTS; k++) // matrix to 0
adjMat[j][k] = 0;
theQueue = new Queue();
} // end constructor
// ----------------------------------------------------------- 2008, University of Colombo School of Computing

32

7.2.4. Java Implementations


public void addVertex(char lab)
{
vertexList[nVerts++] = new Vertex(lab);
}
// -----------------------------------------------------------public void addEdge(int start, int end)
{
adjMat[start][end] = 1;
adjMat[end][start] = 1;
}
// -----------------------------------------------------------public void displayVertex(int v)
{
System.out.print(vertexList[v].label);
}
// ----------------------------------------------------------- 2008, University of Colombo School of Computing

33

7.2.4. Java Implementations


public void bfs() // breadth-first search
{ // begin at vertex 0
vertexList[0].wasVisited = true; // mark it
displayVertex(0); // display it
theQueue.insert(0); // insert at tail
int v2;
while( !theQueue.isEmpty() ) // until queue empty,
{
int v1 = theQueue.remove(); // remove vertex at head
// until it has no unvisited neighbors
while( (v2=getAdjUnvisitedVertex(v1)) != -1 )
{ // get one,
vertexList[v2].wasVisited = true; // mark it
displayVertex(v2); // display it
theQueue.insert(v2); // insert it
} // end while
} // end while(queue not empty)
// queue is empty, so we're done
for(int j=0; j<nVerts; j++) // reset flags
vertexList[j].wasVisited = false;
} // end bfs()
2008, University of Colombo School of Computing
// ------------------------------------------------------------

34

7.2.4. Java Implementations


// returns an unvisited vertex adj to v
public int getAdjUnvisitedVertex(int v)
{
for(int j=0; j<nVerts; j++)
if(adjMat[v][j]==1 && vertexList[j].wasVisited==false)
return j;
return -1;
} // end getAdjUnvisitedVert()
// -----------------------------------------------------------} // end class Graph
////////////////////////////////////////////////////////////////
2008, University of Colombo School of Computing

35

7.2.4. Java Implementations


class BFSApp
{
public static void main(String[] args)
{
Graph theGraph = new Graph();
theGraph.addVertex('A'); // 0 (start for dfs)
theGraph.addVertex('B'); // 1
theGraph.addVertex('C'); // 2
theGraph.addVertex('D'); // 3
theGraph.addVertex('E'); // 4
theGraph.addEdge(0, 1); // AB
theGraph.addEdge(1, 2); // BC
theGraph.addEdge(0, 3); // AD
theGraph.addEdge(3, 4); // DE
System.out.print("Visits: ");
theGraph.bfs(); // breadth-first search
System.out.println();
} // end main()
} // end class BFSApp
2008, University of Colombo School of Computing

36

7.3. Sorting algorithms


a sorting algorithm is an algorithm that puts elements
of a list in a certain order. The most-used orders are
numerical order and lexicographical order. Efficient
sorting is important to optimizing the use of other
algorithms (such as search and merge algorithms) that
require sorted lists to work correctly; it is also often
useful for canonicalizing data and for producing humanreadable output. More formally, the output must satisfy
two conditions:
The output is in nondecreasing order (each element is no
smaller than the previous element according to the desired total
order);
The output is a permutation, or reordering, of the input.
2008, University of Colombo School of Computing

37

7.3.1. Insertion Sort


Insertion Sort orders a list in the same way we would
order a hand of playing cards.
Compare the first two numbers, placing the smallest one
in the first position.
Compare the third number to the second number. If the
third number is larger, then the first three numbers are in
order. If not, then swap them. Now compare the numbers
in positions one and two and swap them if necessary.
Proceed in this manner until reaching the end of the list.
2008, University of Colombo School of Computing

38

7.3.1.1. Analyzing Insertion Sort


Using Insertion Sort, the number of comparisons on a list
of size n varies, because, if the numbers are already in
order, further comparisons are avoided. So we will find
the average number of comparisons.
Using limits and probability, we find that the average
number of comparisons is (n-1)(n/4) + k, where k
increases for larger values of n.

2008, University of Colombo School of Computing

39

7.3.2. Selection Sort


Scan the list and put the smallest number in the first
position.
Disregard the first position, which is now the smallest
number, and put the second smallest number in the
second position.
Proceed in this manner until reaching the end of the
list.

2008, University of Colombo School of Computing

40

7.3.2. Selection Sort


Sorting Times
(by Selection Sort on a 386 microprocessor running at 20 mHz).

Note that as the list


size doubles, the
time increases
about four-fold.
A little arithmetic
shows that the time
to sort 128,000
numbers would be
over 12 hours!

List Size

Seconds

1000

2.69

2000

11.78

4000

47.51

8000

190.70

2008, University of Colombo School of Computing

41

7.3.2.1. Analyzing Selection Sort


We will assume that the computer spends time only on
comparisons and swaps. However, since the average
number of swaps is extremely difficult to compute, we will
focus only on comparisons.
On a list of size 4, we would compare the first position to
the other 3 numbers, the second position to the 2
remaining numbers, and the third position to the 1
remaining number. This is a total of 3+2+1 = 6
comparisons.
So, for a list of size n, there will be n-1 + n-2 + + 2 + 1
comparisons. In other words, (n-1)(n/2) comparisons.
2008, University of Colombo School of Computing

42

7.3.2.2. Selection Sort vs.

Insertion Sort
(Time in Seconds on a 386 microprocessor running at 20 mHz).

Insertion Sort is
about twice as fast
as Selection Sort.
But note that as the
list size doubles, the
time for both sorts
increases about
four-fold.

200
180
160
140
120
100
80
60
40
20
0

Selection
Insertion

2000 4000 8000

2008, University of Colombo School of Computing

43

7.3.2.3. The Order of an

Algorithm
To compare two algorithms, we will take the limit of the
number of comparisons of the first algorithm divided by
the number of comparisons for the second algorithm as n
(the number of items in the list) approaches infinity.
When taking the limit of (ln N) / N or other similar cases,
LHopitals Rule becomes a handy tool.
Two algorithms have the same ORDER if the limit is
greater than zero and less than infinity. Otherwise, they
have different orders.
2008, University of Colombo School of Computing

44

7.3.2.4. Order of Insertion &


Selection Sorts
Comparing Insertion Sort to Selection Sort, we take
the limit of [(n-1)(n/4) + k] / [(n-1)(n/2)] as n
approaches infinity. As n grows larger, k does not
grow fast enough to affect the limit, and so the limit =
1/2.

Since the fastest growing term in both Insertion &


Selection Sort is n squared, we say they both have
order n squared.
Is there a sorting algorithm with a smaller (and thus
faster) order?
2008, University of Colombo School of Computing

45

7.3.3. Bubble sort


Bubble sort is a straightforward and simplistic method of
sorting data that is used in computer science education.
The algorithm starts at the beginning of the data set. It
compares the first two elements, and if the first is greater
than the second, it swaps them. It continues doing this
for each pair of adjacent elements to the end of the data
set. It then starts again with the first two elements,
repeating until no swaps have occurred on the last pass.
While simple, this algorithm is highly inefficient and is
rarely used except in education. A slightly better variant,
cocktail sort, works by inverting the ordering criteria and
the pass direction on alternating passes. Its average
case and worst case are both O(n).
2008, University of Colombo School of Computing

46

7.3.4. Shell sort


Shell sort was invented by Donald Shell in 1959. It
improves upon bubble sort and insertion sort by moving
out of order elements more than one position at a time.
One implementation can be described as arranging the
data sequence in a two-dimensional array and then
sorting the columns of the array using insertion sort.
Although this method is inefficient for large data sets, it is
one of the fastest algorithms for sorting small numbers of
elements (sets with less than 1000 or so elements).
Another advantage of this algorithm is that it requires
relatively small amounts of memory.

2008, University of Colombo School of Computing

47

7.3.5. Merge sort


Merge sort takes advantage of the ease of
merging already sorted lists into a new sorted
list. It starts by comparing every two elements
(i.e., 1 with 2, then 3 with 4...) and swapping
them if the first should come after the second. It
then merges each of the resulting lists of two
into lists of four, then merges those lists of four,
and so on; until at last two lists are merged into
the final sorted list. Of the algorithms described
here, this is the first that scales well to very large
lists, because its worst-case running time is O(n
log n).
2008, University of Colombo School of Computing

48

7.3.6. Quicksort
Choose an element out of the list as a pivot. A good
process to select a pivot is to compare the first, middle,
and last elements and choose the middle value.
Compare every other element in the list to the pivot and
create two lists, one list where every element is smaller
than the pivot and one where every element is larger.
Now split each of these lists into smaller lists.
Continue in this way until the small lists have only one or
two elements and we can sort them with at most one
comparison each.

2008, University of Colombo School of Computing

49

7.3.6.1. Selection, Insertion, &


Quicksort
(Times in seconds on a 386 microprocessor running at 20 mHz).

List Size

Selection

Insertion

Quicksort

1000

2.69

1.73

0.11

2000

11.78

7.46

0.22

3000

47.51

29.98

0.44

4000

190.70

73.47

0.96

2008, University of Colombo School of Computing

50

71.3.7. Heap sort


Heapsort is a much more efficient version of selection
sort. It also works by determining the largest (or
smallest) element of the list, placing that at the end (or
beginning) of the list, then continuing with the rest of the
list, but accomplishes this task efficiently by using a data
structure called a heap, a special type of binary tree.
Once the data list has been made into a heap, the root
node is guaranteed to be the largest element. When it is
removed and placed at the end of the list, the heap is
rearranged so the largest element remaining moves to
the root. Using the heap, finding the next largest element
takes O(log n) time, instead of O(n) for a linear scan as
in simple selection sort. This allows Heapsort to run in
O(n log n) time.
2008, University of Colombo School of Computing

51

7.3.8. Radix sort


Radix sort is an algorithm that sorts a list of
fixed-size numbers of length k in O(n k) time by
treating them as bit strings. We first sort the list
by the least significant bit while preserving their
relative order using a stable sort. Then we sort
them by the next bit, and so on from right to left,
and the list will end up sorted. Most often, the
counting sort algorithm is used to accomplish the
bitwise sorting, since the number of values a bit
can have is small.
2008, University of Colombo School of Computing

52

7.3.9. Java Implementation

Bubble sort

void bubbleSort(int a[], int n)


/* Sorts in increasing order the array A of the size N */
{
int k;
int bound = n-1;
int t;
int last_swap;
while (bound) {
last_swap = 0;
for ( k=0; k<bound; k++ )
t = a[k]; /* t is a maximum of A[0]..A[k] */
if ( t > a[k+1] ) {
a[k] = a[k+1]; a[k+1] = t; /*swap*/
last_swap = k; /* mark the last swap position */
}//if
}//for
bound=last_swap; /*elements after bound already sorted */
}//while
} // bubbleSort
2008, University of Colombo School of Computing

53

You might also like