Professional Documents
Culture Documents
Overview of Data
Structures
Arrays
Linked lists
Queues
Stacks
Binary trees
Hash tables
Advantages
Disadvantages
Array
Quick inserts
Fast access if index
known
Slow search
Slow deletes
Fixed size
Ordered Array
Slow inserts
Slow deletes
Fixed size
Stack
Linked List
Queue
Q Slow search
u
i
First-in, first-out access
L Quick inserts
i Quick deletes
n
Slow access to other items
Disadvantages
Binary Tree
Quick search
Quick inserts
Quick deletes
(If the tree remains balanced)
Red-Black Tree
Quick search
Quick inserts
Quick deletes
(Tree always remains
balanced)
Complex to implement
2-3-4 Tree
Quick search
Quick inserts
Quick deletes
(Tree always remains
balanced)
(Similar trees good for disk
storage)
Complex to implement
Data Structure
Advantages
Disadvantages
Hash Table
Slow deletes
Access slow if key is not known
Inefficient memory usage
Heap
Quick inserts
Quick deletes
Access to largest item
Graph
NOTE: The data structures shown above (with the exception of the array) can be thought
of as Abstract Data Types (ADTs).
10
11
12
13
14
15
2.1. Stacks
A stack is a data structure in which all the
access is restricted to the most recently
inserted items.
If we remove this item, then we can
access the next-to-last item inserted, and
etc.
A stack is also a handy aid for algorithms
applied to certain complex data structures.
2008, University of Colombo School of Computing
Example
Stack contains 3, 4
Push item 9
Now the stack contains 3,4,9
Stack Properties
The last item added to the stack is placed
on the top and is easily accessible.
Thus the stack is appropriate if we expect
to access on the top item, all other items
are inaccessible.
Compiler Design
Mathematical Expression Evaluation
Balanced Spell Checker
Simple Calculator
Array Implementation
To push an item into an empty stack, we
insert it at array location 0. ( since all java
arrays start at 0)
To push the next item into the location 0
over the location to make room for new
item.
10
11
12
13
TOS (0)
a
Stack is empty TOS(-1)
Push (a)
TOS (1)
b
a
a
pop
Push (b) 2008, University of Colombo School of Computing
TOS (0)
14
Java implementation
Zero parameter constructor for array
based stack
public stackar( )
/* construct the stack
{
thearray= new object[default-capacity];
Tos = -1;
}
2008, University of Colombo School of Computing
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Error Handling
There are different philosophies about how to
handle stack errors. What happens if you try to
push an item onto a stack that's already full, or
pop an item from a stack that's empty?
We've left the responsibility for handling such
errors up to the class user. The user should
always check to be sure the stack is not full
before inserting an item:
2008, University of Colombo School of Computing
29
Error Handling
if( !theStack.isFull() )
insert(item);
else
System.out.print("Can't insert, stack is full");
In the interest of simplicity, we've left this code out of the main() routine (and
anyway, in this simple program, we know the stack isn't full because it has just
been initialized).
We do include the check for an empty stack when main() calls pop().
30
31
2.2. Queues
A queue is a special kind of list
Items are inserted at one end (the rear) and
(enqueue)
deleted at the other end (the front).
(dequeue)
Queues are also known as FIFO lists
Front
Inserted
Removed
A B
Back
2008, University of Colombo School of Computing
32
2.2. Queues
There are various queues quietly doing their job in our
computer's (or the network's) operating system.
There's a printer queue where print jobs wait for the
printer to be available.
A queue also stores keystroke data as we type at the
keyboard.
This way, if we are using a word processor but the
computer is briefly doing something else when we hit a
key, the keystroke won't be lost; it waits in the queue
until the word processor has time to read it.
Using a queue guarantees the keystrokes stay in order
until they can be processed.
2008, University of Colombo School of Computing
33
34
35
36
37
38
39
40
41
42
43
2.2.3.2. Deques
A deque is a double-ended queue.
We can insert items at either end and delete them from
either end.
The methods might be called insertLeft() and
insertRight(), and removeLeft() and removeRight().
If we restrict ourself to insertLeft() and removeLeft() (or
their equivalents on the right), then the deque acts like a
stack.
If we restrict ourself to insertLeft() and removeRight() (or
the opposite pair), then it acts like a queue.
A deque provides a more versatile data structure than
either a stack or a queue, and is sometimes used in
container class libraries to serve both purposes.
However, it's not used as often as stacks and queues, so
we won't explore it further here.
2008, University of Colombo School of Computing
44
45
46
47
48
49
50
51
3. Linked Lists
A new node is created, and then its three fields are initialize as
exaples info,el and prev.
the info field to the number el being inserted
the next field to null
and the prev field to the value of tail so that this field points to the last
node in the list. But now, the new node should become the last node;
therefore,
tail is set to reference the new node. But the new node is not yet
accessible from its predecessor; to rectify this,
the next field of the predecessor is set to reference the new node.
10
11
12
13
14
15
16
17
18
19
20
else
{
21
22
23
24
25
A circular list. The large yellow object represents the circular list as such. The
circular green nodes represent the elements of the list. The rectangular nodes
are instances of a class similar to LinkedListNode, which connect the
constituents of the list together.
26
27
28
29
30
31
32
33
34
35
36
The introduction of skip lists was motivated by the need to speed up the
searching process.
Although singly and doubly linked lists require sequential search to locate
an element or to see that it is not in the list, we can improve the efficiency of
the search by dynamically organizing the list in a certain manner.
This organization depends on the configuration of data; thus, the stream of
data requires reorganizing the nodes already on the list.
There are many different ways to organize the lists, and this section
describes four of them:
Move-to-front method. After the desired element is located, put it at the
beginning of the list.
Transpose method. After the desired element is located, swap it with its
predecessor unless it is at the head of the list.
Count method. Order the list by the number of times elements are being
accessed.
Ordering method. Order the list using certain criteria natural for the information
under scrutiny.
37
38
With the first three methods, we try to locate the elements most likely to be
looked for near the beginning of the list, most explicitly with the move-tofront method and most cautiously with the transpose method.
The ordering method already uses some properties inherent to the
information stored in the list.
For example, if we are storing nodes pertaining to people, then the list can
be organized alphabetically by the name of the person or the city or in
ascending or descending order using, say, birthday or salary.
This is particularly advantageous when searching for information that is not
in the list, because the search can terminate without scanning the entire list.
Searching all the nodes of the list, however, is necessary in such cases
using the other three methods.
The count method can be subsumed in the category of the ordering
methods if frequency is part of the information.
In many cases, however, the count itself is an additional piece of
information required solely to maintain the list; hence, it may not be
considered "natural" to the information at hand.
39
40
4. Recursion
if n =0
x.xn-1
if n > 0
Xn =
Nested Recursion
A more complicated case of recursion is found in
definitions in which a function is not only defined in terms
of itself, but also is used as one of the parameters. The
following definition is an example of such a nesting:
h(n) =
if n = 0
if n > 4
h(2 +h(2n))
if n <= 4
A(n,m) =
m+1
if n = 0
A(n-1,1)
if n > 0, m = 0
216
-3
16
if n = 0
Fib(n-2)+Fib(n-1) otherwise
5. Trees
Part -1
What Is a Tree?
In the above picture of the tree, the nodes are represented as circles,
and the edges as lines connecting the circles.
Trees have been studied extensively as abstract mathematical entities,
so there's a large amount of theoretical knowledge about them.
A tree is actually an instance of a more general category called a graph.
}
2008, University of Colombo School of Computing
10
11
} // end main()
12
13
14
15
16
17
Finally, trees can also be traversed in level-order, where we visit every node
on a level before going to a lower level. This is also called Breadth-first
traversal.
2008, University of Colombo School of Computing
18
19
20
21
22
23
24
25
a depth-first search starting at A, assuming that the left edges in the shown graph are
chosen before right edges, and assuming the search remembers previously-visited
nodes and will not repeat them (since this is a small graph), will visit the nodes in the
following order: A, B, D, F, E, C, G.
Performing the same search without remembering previously visited nodes results in
visiting nodes in the order A, B, D, F, E, A, B, D, F, E, etc. forever, caught in the A, B,
D, F, E cycle and never reaching C or G.
Iterative deepening prevents this loop and will reach the following nodes on the
following depths, assuming it proceeds left-to-right as above:
(Note that iterative deepening has now seen C, when a conventional depth-first
search did not.)
2: A, B, D, F, C, G, E, F
(Note that it still sees C, but that it came later. Also note that it sees E via a different
path, and loops back to F twice.)
0: A
1: A (repeated), B, C, E
3: A, B, D, F, E, C, G, E, F, B
For this graph, as more depth is added, the two cycles "ABFE" and "AEFB" will
simply get longer before the algorithm gives up and tries another branch.
26
27
28
29
30
31
32
33
34
Deletion
1. The node to be deleted has no children.
In this case the node may simply be deleted
from the tree.
35
36
Deletion contd
3. The node to be deleted has two children.
This case is much more complex than the
previous two, because the order of the binary
tree must be kept intact. The algorithm must
determine which node to use in place of the
node to be deleted:
37
38
39
40
5. Trees
Part -2
5.6.1.2. Algorithms
createBackbone(root,
n)
Tmp = root
While ( Tmp != 0 )
If Tmp has a left child
Rotate this child
about Tmp
Set Tmp to the child
which just became
parent
createPerfectTree(n)
M = 2floor[lg(n+1)]-1;
Make n-M rotations
starting from the top of
the backbone;
While ( M > 1 )
M = M/2;
Make M rotations
starting from the top of
the backbone;
5.6.1.3. Wrap-up
The DSW algorithm is good if you can take
the time to get all the nodes and then
create the tree
What if you want to balance the tree as
you go?
You use an AVL Tree
10
11
12
13
5.6.2.3. Deletion
Deletion is a bit trickier.
With insertion after the rotation we were
done.
Not so with deletion.
We need to continue checking balance
factors as we travel up the tree
14
15
Cases
Case 1: Deletion from a left subtree from a
tree with a right high root and a right high
right subtree.
Requires one left rotation about the root
16
Cases continued
Case 3: Deletion from a left subtree from a tree
with a right high root and a left high right subtree
with a left high left subtree.
Requires a right rotation around the right subtree root
and then a left rotation about the root
17
18
19
5.8. Heaps
A heap is a binary tree storing keys at its internal
nodes and satisfying the following properties:
Heap-Order: for every internal node v other than the
root,
key(v) key(parent(v))
Complete Binary Tree: let h be the height of the
heap
for i = 0, , h 1, there are 2i nodes of depth I
at depth h 1, the internal nodes are to the left of the
external nodes
20
21
22
23
24
25
26
6. Graphs
The vertices belonging to an edge are called the ends, endpoints, or end
vertices of the edge.
V (and hence E) are usually taken to be finite, and many of the well-known
results are not true (or are rather different) for infinite graphs because
many of the arguments fail in the infinite case. The order of a graph is | V |
(the number of vertices). A graph's size is | E | , the number of edges. The
degree of a vertex is the number of edges that connect to it, where an edge
that connects to the vertex at both ends (a loop) is counted twice.
The edges E induce a symmetric binary relation ~ on V which is called the
adjacency relation of G. Specifically, for each edge {u,v} the vertices u and
v are said to be adjacent to one another, which is denoted u ~ v.
For an edge {u, v}, graph theorists usually use the somewhat shorter
notation uv.
Types of graphs
Undirected graph
A graph G = {V,E} in which every edge is undirected. This is the same as a
digraph (look above) where for an edge (v,u) there is an edge from v to u and u
to v.
Finite graph
A finite graph is a graph G = <V,E> such that V(G) and E(G) are finite
sets.
Simple graph
A simple graph is an undirected graph that has no self-loops and no
more than one edge between any two different vertices. In a simple
graph the edges of the graph form a set (rather than a multiset) and
each edge is a pair of distinct vertices. In a simple graph with p vertices
every vertex has a degree that is less than p.
Weighted graph
A graph is a weighted graph if a number (weight) is assigned to
each edge. Such weights might represent, for example, costs,
lengths or capacities, etc. depending on the problem.
Weight of the graph is sum of the weights given to all edges.
Types of graphs
Mixed graph
Complete graph
Complete graphs have the feature that each pair of vertices has an
edge connecting them.
Loop
A loop is an edge (directed or undirected) which starts and ends on the same
vertex; these may be permitted or not permitted according to the application. In
this context, an edge with two different ends is called a link.
2008, University of Colombo School of Computing
10
Adjacency list:
Vertices are labelled (or re-labelled) from 0 to |V(G)|-1.
Corresponding to each vertex is a list (either an array or linked
list) of its neighbours.
Adjacency matrix:
Vertices are labelled (or re-labelled) with integers from 0 to
|V(G)|-1. A two-dimensional boolean array A with dimensions
|V(G)| x |V(G)| contains a 1 at A[i][j]
if there is an edge from the vertex labelled i to the vertex
labelled j,and a 0 otherwise.
Both representations allow us to represent directed graphs, since
we can have an edge from vi to vj , but lack one from vi to vj . To
represent undirected graphs, we simply make sure that are
edges are listed twice: once from vi to vj , and once from vi to vj .
2008, University of Colombo School of Computing
11
12
13
14
15
O(E)
16
s
0
s
0
w
s
0
wr
w
s
0
r
1
x
t
2
rtx
r
1
t
2
2
x
txv
1
w
2
x
2
v
1
w
17
s
0.
r
3
xvu
v
uy
u
3
vuy
2
r
1
s
0
1
w
s
0
t
2
2
x
u
3
r
1
1
2
3
w
x
y
s
t
u
0
2
3
1
2
2
2
3
1
2
v
w
x
y
v
w
x
now y is removed from the Q and colored black
2008, University of Colombo School of Computing
3
y
18
19
we use
an adjacency list representation. If we
used an adjacency matrix it would be
(|V|2)
2008, University of Colombo School of Computing
20
21
22
23
24
1/
x
u
y
v
1/
2/
z
w
1/
2/
1/
z
w
2/
B
3/
4/
3/
z
25
1/
2/
1/
B
4/5
x
3/
z
u
1/
w
2/
B
4/5 3/6
x
y
w
2/7
B
4/5 3/6
x
y
26
1/8
2/7
1/8
F B
4/5 3/6
x
y
z
u
v
w
1/8
2/7
F B C
4/5 3/6 10
x
y
z
w
9
2/7
F B C
4/5 3/6
x
y
z
u
v
w
1/8
2/7
F B C
4/5 3/6 10/11
x
y
z
27
2/7 9/12
F B C
4/5 3/6 10/11
x
y
z
28
29
PVD
ORD
LGA
HNL
LAX
DFW
MIA
30
31
PVD
ORD
LGA
HNL
LAX
DFW
MIA
32
33
SFO
PVD
ORD
LGA
HNL
LAX
DFW
MIA
34
35
36
37
6.4.3. Relaxation
Relaxation of an edge (u,v) with weight
w(u,v) = 2.
u
u
5
Relax (u,v,w)
u 5
v
6
Relax (u,v,w)
38
39
40
41
42
43
44
45
46
47
48
6.6.3.Solution
49
50
51
V3
V2
V4
2008, University of Colombo School of Computing
52
G = (V, E)
2
V = {1, 2, 3, 4, 5}
4
5
53
V3
V2
V4
2008, University of Colombo School of Computing
54
G = (V, E)
V = {1, 2, 3, 4, 5, 6}
3
55
56
57
58
6.9. Networks
Networks can be used to represent the
transportation of some commodity through
a system of delivery channels.
There are sources (x) and sinks (y).
The network is a directed graph, where
each arc a is associated with a capacity,
c(a).
59
60
6.9.1. Flow
A flow in a network is a set of numbers associated
with each arc, f (a).
This indicates how much of a channels capacity is
being used.
0 f (a) c(a).
For a vertex v, the flow into and out of the vertex is
denoted by f(v) and f+(v) respectively.
For intermediate vertices (not sources or sinks) the
flow in is the same as the flow out. This is called the
conservation condition.
2008, University of Colombo School of Computing
61
62
6.9.3. Cuts
A cut is a division of the vertices into two sets S and s, so
that the source is in S and the sink is in .
The capacity of a cut is the sum of all the edges which
cross between S and s .
How many cuts are possible in a network with vertices?
What are the different cuts of the network on the board,
and what are their capacities?
63
64
(Subtract cf (p) from the flow if the edge is a reverse arc in the
network).
65
66
67
3n
2n+10
1,000
100
Example: 2n + 10 is O(n)
2n + 10 cn
(c 2) n 10
n 10/(c 2)
Pick c = 3 and n0 = 10
10
1
1
10
100
1,000
n
5
7.1.5 Examples
We say that n4 + 100n2 + 10n + 50 is of
the order of n4 or O(n4)
We say that 10n3 + 2n2 is O(n3)
We say that n3 - n2 is O(n3)
We say that 10 is O(1),
We say that 1273 is O(1)
10
11
12
The major advantage of binary search trees over the other data
structures is that the related sorting algorithms and search
algorithms such as in-order traversal can be very efficient.
Binary search trees can choose to allow or disallow duplicate
values, depending on the implementation.
Binary search trees are a fundamental data structures used to
construct more abstract data structures such as sets, multisets, and
associative arrays.
2008, University of Colombo School of Computing
13
7.2.2. B-trees
B-trees are multiway trees, commonly
used in external storage, in which nodes
correspond to blocks on the disk. As in
other trees, the algorithms find their way
down the tree, reading one block at each
level. B-trees provide searching, insertion,
and deletion of records in O(logN) time.
This is quite fast and works even for very
large files. However, the programming is
not trivial.
2008, University of Colombo School of Computing
14
15
16
17
18
19
20
a depth-first search starting at A, assuming that the left edges in the shown graph are
chosen before right edges, and assuming the search remembers previously-visited
nodes and will not repeat them (since this is a small graph), will visit the nodes in the
following order: A, B, D, F, E, C, G.
Performing the same search without remembering previously visited nodes results in
visiting nodes in the order A, B, D, F, E, A, B, D, F, E, etc. forever, caught in the A, B,
D, F, E cycle and never reaching C or G.
Iterative deepening prevents this loop and will reach the following nodes on the
following depths, assuming it proceeds left-to-right as above:
(Note that iterative deepening has now seen C, when a conventional depth-first
search did not.)
2: A, B, D, F, C, G, E, F
(Note that it still sees C, but that it came later. Also note that it sees E via a different
path, and loops back to F twice.)
0: A
1: A (repeated), B, C, E
3: A, B, D, F, E, C, G, E, F, B
For this graph, as more depth is added, the two cycles "ABFE" and "AEFB" will
simply get longer before the algorithm gives up and tries another branch.
21
class StackX
{
private final int SIZE = 20;
private int[] st;
private int top;
public StackX() // constructor
{
st = new int[SIZE]; // make array
top = -1;
}
public void push(int j) // put item on stack
{ st[++top] = j; }
public int pop() // take item off stack
{ return st[top--]; }
public int peek() // peek at top of stack
{ return st[top]; }
public boolean isEmpty() // true if nothing on stack
{ return (top == -1); }
} // end class StackX
////////////////////////////////////////////////////////////////
2008, University of Colombo School of Computing
22
23
24
25
26
27
{
public static void main(String[] args)
{
Graph theGraph = new Graph();
theGraph.addVertex('A'); // 0 (start for dfs)
theGraph.addVertex('B'); // 1
theGraph.addVertex('C'); // 2
theGraph.addVertex('D'); // 3
theGraph.addVertex('E'); // 4
theGraph.addEdge(0, 1); // AB
theGraph.addEdge(1, 2); // BC
theGraph.addEdge(0, 3); // AD
theGraph.addEdge(3, 4); // DE
System.out.print("Visits: ");
theGraph.dfs(); // depth-first search
System.out.println();
} // end main()
} // end class DFSApp
////////////////////////////////////////////////////////////////
2008, University of Colombo School of Computing
28
29
30
31
32
33
34
35
36
37
38
39
40
List Size
Seconds
1000
2.69
2000
11.78
4000
47.51
8000
190.70
41
42
Insertion Sort
(Time in Seconds on a 386 microprocessor running at 20 mHz).
Insertion Sort is
about twice as fast
as Selection Sort.
But note that as the
list size doubles, the
time for both sorts
increases about
four-fold.
200
180
160
140
120
100
80
60
40
20
0
Selection
Insertion
43
Algorithm
To compare two algorithms, we will take the limit of the
number of comparisons of the first algorithm divided by
the number of comparisons for the second algorithm as n
(the number of items in the list) approaches infinity.
When taking the limit of (ln N) / N or other similar cases,
LHopitals Rule becomes a handy tool.
Two algorithms have the same ORDER if the limit is
greater than zero and less than infinity. Otherwise, they
have different orders.
2008, University of Colombo School of Computing
44
45
46
47
48
7.3.6. Quicksort
Choose an element out of the list as a pivot. A good
process to select a pivot is to compare the first, middle,
and last elements and choose the middle value.
Compare every other element in the list to the pivot and
create two lists, one list where every element is smaller
than the pivot and one where every element is larger.
Now split each of these lists into smaller lists.
Continue in this way until the small lists have only one or
two elements and we can sort them with at most one
comparison each.
49
List Size
Selection
Insertion
Quicksort
1000
2.69
1.73
0.11
2000
11.78
7.46
0.22
3000
47.51
29.98
0.44
4000
190.70
73.47
0.96
50
51
52
Bubble sort
53