Professional Documents
Culture Documents
acements k
n
Assume that there are already k cards in the list. The fraction k p = n p = k contains the already shued n cards in their nal positions. The 1 p rest of the list, the fraction 1 p of the list is open. Here n = 52.
The (k + 1)th card must now be randomly drawn by generating a random number r [1..52]. How long does it take to draw a valid card = r that can be added to the list? Card = r is only valid if it does not appear in the list shuffledDeck[1..k]
k For p = 52 of the time the card = r is already present in the the rst k.
How do we nd this out? We must search through the list until it is found.
5
k p=n
PSfrag replacements k
n
1 p = nk n
Remember that we have assumed that we will nd r in the list, In this case about half of the list of k elements must be searched to discover r, so k+1 probes are 2 done on average, for p of the time: p k+1 2
k p=n
PSfrag replacements k
n
1 p = nk n
For the rest of the time the whole listcurrently of length kis searched, i.e. the r is compared with every one of the k items in the list, and is not equal to any of them. So k comparisons are made. But this always happens when a new r that is not already in the list is drawn. How often is there a new r, i.e. an r that is not in the rst k elements? k k = (1 p)k. 1 52
7
Suppose the generated card is in the list. It will be found on average after k+1 probes. 2 But, if it is in the list another card must be generated, and this process must be repeated until nding a card not in the list. 1st probe costs p k+1 2 2nd probe costs (p + p2) k+1 2 3rd probe costs (p + p2 + p3) k+1 2 4th probe costs (p + p2 + p3 + p4) k+1 2 etc.
9
n+1 giving S = ppn+1 and (1p)Sn = pp n 1p k since p = n < 1, because k < n, it follows that lim pn+1 = 0 and therefore n p pi = lim Sn = 1p S = i i=1 p = 1p k+1 and 2 Total number of probes
51
k=1
p (k+1) + (1 p)k 1p 2
p (k+1) + (1 p)k 1p 2
It mounts up to about 90% of the time. The trick is to store a bit arrayincorrectly called a bit vectorthat indicates which cards have been generated: boolean[]notInList = new boolean[53]; for (k=1; k<=52; k++) notInList[k] = true; while (k < 52) { r = (int)(Math.random()*52) + 1; if (notInList[r]) { shuffledDeck[k++] = r; notInList[k] = false; } }
11
(1 p)k
This is about 10% of the time taken by the original bad method. There are better methods.
12
But the expression n O(1) is dependent on n, so n O(1) = O(n) so it is of the order of n and is written as O(n). It is much faster that the previous two methods.
14
probes.
15
If the decimal number 1048575 can be represented exactly with 20 binary digits then any 6digit decimal number can also be represented exactly using 20 bits.
16
log10(2n 1)
i.e. number of bits 0.3, i.e. n 0.3 gives the accuracy in digits. To nd the number of bits needed to store m digits, simply calculate m/0.3 , e.g. to store 7 digits, use 7/0.3 = 23.333333 = 24 bits.
17
18
i.e. taking logarithms to the base b, logb n < logbbk = k, i.e. take logbn k > logbn
n < bk ,
The number of times that n can be divided by 10 is thus log10 n The same holds for repeated halvingdividing by 2of a binary number: each division by 2 removes one bit from the least signicant end of the binary number, so it can only be done log2 n times before the result becomes less than 1. The number of times a list of length n can be halved until its length is 0 is thus log2 n .
19
21
22
23
Hash functions:
Let ij = hash(idj ) and ik = hash(idk ). When idj = idk but ij = ik , this is known as a collision. 1. Design the hash function to minimize collisions.
Attempt to spread the range of h(id) uniformly over [1..m]. h(id)s values must be as random as possible.
2. Since collisions are often unavoidable methods of handling, collisions must be devised.
Make use of linear probing: suppose idj is already in the table and ij = ik , then map ik onto some multiple r of a suitable prime p, i.e. ik = (ij + r p) mod m + 1.
24
Assume that there are now k entries in the hash k table, then the load factor is = m . When no entries have been made, then the a single probe is enough to make the rst entry. Suppose k entries have been made, then the probability of a collision is . The probability of a collision on the ith probe is i. The probability of nding an entry in precisely i probes is i1(1 ).
25
=
i=0
ii1(1 )
i=0
= =
(1 )
ii1
(1 ) (1 )2 1 1
Loading % Average Search Length 85% 0.85 = 1/0.15 = 6.667 95% 0.95 = 1/0.05 = 20 99% 0.99 = 1/0.01 = 100 99.9% 0.99 = 1/0.001 = 1000
26
1 2 3 4 5 6 acements 7 8 9 10
three
n = 9 and m = 10
The table d[m+1] has m positions and n entries. In chained tables n may exceed m. The performance of a chained hash table depends n on the average length of the chains, 1 + m . So the performance may be tuned. One probe is always required, to get to the head of the chain, and then a linear search is done in a n list of average length m = is required. This amounts to an average of 1 + +1 probes to 2 nd any item. So ASLchained hash = O()
27
28
Chained hashingexample
nine eight six five seven one four two
1 2 3 4 5 6 acements 7 8 9 10
three
n = 9 and m = 10
Set up the hash table by entering keys: two, five, three, four, six, one, seven, nine, eight in this order. Other orders will give logically equivalent tables, but their links will be dierent. Initially there are n = 0 items in the store. Let us enter two, assume id = "two", so, rst hash it: i = hash(id); For this table i = 2. The data, id, may be placed at the next open place, ++n, in the store, by: store[n] =<id,0>; Check if the table entry 2 is open or lled by examining d[i]. If d[i]!=0, i.e. d[i] is already occupied, the link 0 in the node <id,0> must be replaced by d[i] as in <id,d[i]>;, and d[i] == n;.
29
Chained hashingexample
nine eight six five seven one four two
1 2 3 4 5 6 eplacements 7 8 9 10
three
n = 9 and m = 10
Set up the hash table by entering keys: two, five, three, four,
six, one, seven, nine, eight in this order.
Now enter five. There is now n = 1 item in the store. Entering five, assume id = "five", so, rst hash it: i = hash(id); For this table i = 10. The data, id, may be placed at the next open place, ++n, making n = 2 in the store, by: store[n] =<id,0>; Check if the table entry i is open or lled by examining d[i]. Since d[i]==0, i.e. d[i] is not occupied, the link 0 in the node <id,0> is used as in <id,0>;, and d[i] = n == 2;.
30
Chained hashingexample
nine eight six five seven one four two
1 2 3 4 5 6 eplacements 7 8 9 10
three
n = 9 and m = 10
Set up the hash table by entering keys: two, five, three, four,
six, one, seven, nine, eight in this order.
Next enter three. There are n = 2 items in the store. Entering three, put id = "three", so, rst hash it: i = hash(id); For this table i = 7. The data, id, may be placed at the next open place, ++n, making n = 3 in the store, by: store[n] =<id,0>; The table entry d[i] is is still open, so the link 0 in the node <id,0> is used and d[i] = n == 3;.
31
1 2 3 4 5 6 eplacements 7 8 9 10
three
n = 9 and m = 10
Set up the hash table by entering keys: two, five, three, four, six, one,
seven, nine, eight in this order.
Now enter four. There are now n = 3 item in the store. Entering four, assume id = "four", so, rst hash it: i = hash(id); For this table i = 7. The data, id, may be placed at the next open place, ++n, makes n = 4 in the store, by: store[n] =<id,0>; Check if the table entry i is open or lled by examining d[i]. Since d[i]!=0, i.e. d[i] is occupied, the link 0 in the node <id,0> must be replaced by d[i] as in <id,d[i]>;, and d[i] == n;.
32
1 2 3 4 5 6 eplacements 7 8 9 10
three
n = 9 and m = 10
Set up the hash table by entering keys: two, five, three, four, six, one,
seven, nine, eight in this order.
Now enter six. There are now n = 4 item in the store. Entering six, assume id = "six", so, rst hash it: i = hash(id); For this table i = 7. The data, id, may be placed at the next open place, ++n, makes n = 5 in the store, by: store[n] =<id,0>; Check if the table entry i is open or lled by examining d[i]. Since d[i]!=0, i.e. d[i] is occupied, the link 0 in the node <id,0> must be replaced by d[i] as in <id,d[i]>;, and d[i] == n;.
33
Chained hashinghashInsert
int d[] = new int [m+1]; node store[] = new node [6*m +1]; int n = 0; void hashInsert(string id) { ++n; store[n].key = id; i = hash(id); store[n].link = d[i]; d[i] = n; }
34
Chained hashinghashSearch
This method is called as in the example:
boolean found = hashSearch("ten");
35
36
class node
public class node{ long key; String data; node next; public node(long k, String d){ key = k; data = d; next = null; } public node(){ key = 0; data = ""; next = null; } }
37
class linkedList
public class linkedList{ protected static int length; protected node head; public linkedList(){ head = new node (0, "head node"); length=0; } public void insert(node n){ n.next = head.next; head.next = n; length++; } public int length(){ return length; }
38
40
41
int lineNumber = 0; // Process first line //---the header---differently System.out.println(inFile.readLine()); // // readLine() deprecated // Process rest of file while (inFile.available() != 0){ String inString = inFile.readLine(); String outString; // readLine() deprecated int commaPos = inString.indexOf(","); // One space: handle Da Costa, Van Wyk, Le Roux, ... // Not treated: van der Merwe, de la Querra, etc. // Before processing //2410832 VAN ZITTERS, ST ... 000000 // Final output for this record: // Key = 2410832 Data = Van Zitters, ST int spacePos = inString.substring (10, commaPos).indexOf(" ") + 10; int minusPos = inString.substring (10, commaPos).indexOf("-") + 10;
42
43
45
The Makefile
OBJ = tryList LIST = linkedList DEP = node DATA = cos224.list all:$(DEP).class, $(LIST).class -rm errors javac -deprecation $(OBJ).java 2> errors -rm output java $(OBJ) $(DATA) > output $(DEP).class, $(LIST).class: javac -deprecation $(LIST).java javac -deprecation $(DEP).java clean: -rm *~ *.class errors output
46
47
48
replacements
0 1 2 3 4 ... k2
k1
replacements
23 0 1 2 3 4 ... k2 k1
49
replacements
23 0
45 1
67 2
89 3 4 ... k2 k1
The queue where the nodes 23, 45 have been dequeued or served.
front: rear: length: 2
replacements
0 1
67 2
89 3 4 ... k2 k1
50
replacements
0 1
67
2
89
3
02
4
24
46
... k2 k1
When k nodes have been entered usingenqueue the last node of the array becomes occupied and the next free place is at 0 (zero). There may be no elements in the queue. How is rear calculated?
front: rear: length:
replacements
0 1
k2
67
2
89
3
02
4
24
46
...
... k2
99
k1
51
replacements
0 1
k2
67
2
89
3
02
4
24
46
...
... k2
99
k1
Simply adding 1 as in rear++ will crash when the value of rear reaches kthe maximum subscript of the array is k1. Use: rear = (rear + 1) % k; and also length++; Updating front when doing a dequeue is similar: front = (front + 1) % k; and also length--; Note that a 1 is added in both cases.
52
k1
00
0
67
2
89
3
02
4
24
46
...
99
...
k2
k1
If the fullness of the queue is implemented using a length counter and the relation, length <= k, the array may be completely lled with rear pointing to the same place as front. Note: When the comparison rear == front is used to test if the array-implemented queue is lled without using the queues length counter then it is impossible to say whether the array is full-up or empty, so in this case the queue is regarded as being full when its length is k 1. Here length = (rear - front + k) % k; calculates length when there is no counter.
53
An empty queue
front: length: 0
rear:
54
replacements
0 "head"
0 "head"
01
"... "
0 "head"
01
"... "
front:
length: 3
rear:
23
"... "
replacements
front:
length: 1
rear:
45
"... "
55
acements
long
String
node
public class node{ private long key; private String data; private node next; public node(){ key = 0; data = ""; next = null; } public node(long k, String d){ key = k; data = d; next = null; } public long getKey() { return key }
56
public void setKey(long k) { key = k; } public String getData() { return data; } public void setData(String d) { data = d; } public node getNext() { return next; } public void setNext(node n) { next = n; } }
57
58
59
public void display(){ node here = front; int listed = 0; while (here != rear && listed++ < 6) { System.out.println("Key = " + here.getKey() + " Data = " + here.getData()); here = here.getNext(); } if (length > 0) System.out.println("Queues length= " + length); else System.out.println("Queue is Empty"); } }
60
node long String node A node The node points backwards with its prev pointer and forwards with its next pointer.
A deque
header:
replacements
The sentinel nodes next pointer points to the front of the deque and the sentinels prev pointer points to the the rear of the deque. The deque has methods for adding and removing elements at both ends.
61
A node A deque
node
header:
replacements
62
replacements
63
replacements
public node removeLast(){ node n = null; if (!isEmpty()) n = header.getPrev(); // 1 header.getPrev(n.getPrev(n); // 2 (n.getPrev()).setNext(header);// 3 n.setPrev(null); n.setNext(null); length--; } return n; }
64
replacements
public node removeLast(){ node n = null; if (!isEmpty()) n = header.getPrev(); // 1 header.getPrev(n.getPrev(n); // 2 (n.getPrev()).setNext(header);// 3 n.setPrev(null); n.setNext(null); length--; } return n; }
65
Skiplists
66
67
parent: data:
replacements key:
left: right:
tree node
The nodes in a binary tree have a left and a right forward pointer, a back link to the parent is useful. The element also should contain a key and data eld. In a binary search tree each nodes left pointer points to a node such that left.key < key when left = null and right and a pointer that points to a node such that right.key key when right = null.
68
acements
17
35
19
27
41
public class tryTree{ public static void main(String args[]){ tNode tItem; tree T = new tree(); T.display(); T.insert(new tNode (23, "seventeen")); T.insert(new tNode (17, "seventeen")); T.insert(new tNode (7, "seven")); T.insert(new tNode (35, "thirty five")); tItem = new tNode (41, "twenty three"); T.insert(new tNode (27, "twenty seven")); T.insert(tItem); T.insert(19, "nineteen"); T.display(); } }
69
70
71
Dening a tree
An empty tree has a root set to null and a counter length set to 0 by the constructor.
root: length: 0 null
public class tree{ private int length; private tNode root; public tree(){ root = null; length=0; }
An instance of the tree is dened in the user program by: tree T = new tree(); The rst node in a tree of type tNode is pointed to by root. A subsequent node n is added to the left subtree if n.key < root.key or otherwise to its right.
72
The tree after inserting a node of type tNode with key = 23 using:
T.insert(new tNode(23, "twenty three"));
root: 23 length: 1
public class tree{ ... public void insert(tNode n){ if (root == null){ root.setParent(null); root = n; } else root.insert(n); }
root.insert(tNode) in the tNode class inserts the tNode n into the tree starting at root.
73
17
If left != null, then left.insert(n) is called, as happens when a node with key = 7 is entered.
74
17
17
The pointers down the left are followed until a null is found.
public void insert(tNode n){ if (n.key < key) // n into left subtree if (left == null){ n.parent = this; left = n; } else left.insert(n);
75
17
35
The right pointer of the root node is null and n is inserted there.
if (n.key < key) // n into left subtree if (left == null){ n.parent = this; left = n; } else left.insert(n); else // n into rightsubtree if (right == null){ n.parent = this; right = n; } else right.insert(n); }
76
17
35
41
The right pointer of the root node is not null and the pointer is followed to 35 whose right pointer is null. where n is inserted.
if (n.key < key) // n into left subtree if (left == null){ n.parent = this; left = n; } else left.insert(n); else // n into rightsubtree if (right == null){ n.parent = this; right = n; } else right.insert(n); }
77
17
35
27
41
The right pointer of the root node is not null and the pointer is followed to 35 whose left pointer is null. where n is inserted.
if (n.key < key) // n into left subtree if (left == null){ n.parent = this; left = n; } else left.insert(n); else // n into rightsubtree if (right == null){ n.parent = this; right = n; } else right.insert(n); }
78
17
35
19
27
41
The left pointer of the root node is not null and is followed to 17 whose right pointer is null where n is inserted.
if (n.key < key) // n into left subtree if (left == null){ n.parent = this; left = n; } else left.insert(n); else // n into rightsubtree if (right == null){ n.parent = this; right = n; } else right.insert(n); }
79
80
81
82
83
84
85
86
87
88
89
90
91
92
19
27
41
10
1. Removing elements with zero children is the easiest: Simply replace the pointer to the leaf with a null. We will see later that this case can be ignored because it is subsumed by e case where any one of the pointers is a null. Initially, the nodes with keys 11, 19, 27 and 41 are such nodes.
11
2 To remove a node with only one child: The non-null pointer emanating from the node to be deleted replaces the pointer pointing to the node that must be removed. Initially, the nodes with keys 7, 9, and 10 are such nodes. 3 To remove a node with two children: Find its biggest (smallest) child in its left (right) subtree, copy that biggest (smallest) child over the node to be deleted and then remove the node that was copied as usual. The copied node cannot have two children.
94
length: 1
p:
p:
17
17
Then root = p; suces, but p could become null. This would make the root == null,
root: length: 0 null
otherwise the new node, now pointed to by root, must have its parent set to null, i.e.
if (root != null) root.parent = null;
& %! !%&
!'!'! ( ( '(!(!(! !'!'! '(!(!(! !'!'!( '(!(!(!' !'!'!('( ''(!'!'! !(!(!' ((!(!(!( '!'!'! !(!(!' !'!'!( '(!(!(!' !'!'!( '(!(!(!' !'!'!( '(!(!(!' !'!'!'( ('(!(!(!( '!'!'!' (!(!(!( '!'!'!' (!(!(!( '!'!'! !!!'('('
23
23 n:
" ! !"
!#! $ #$!$! !#! #$!$!$ #$!#!# $!$!$ #!#! !$!# !#!$ #$!$!# !#!$ #$!$!# !#!$ #$!$!# !#!$#$ ##$!#! !$!# $$!$!$ #!#!# $!$!$ #!#! !!#$#$#
95
I I H0H0 0I0 0H0HI HI0I00H 0H0HI HI0I0HI0H 0H00H IHI0I0HI H0H00H I0I0HI H0H0HI0 000HH
41
C C B0B0 0C0 0B0BC BC0C00B 0B0BC BC0C0BC0B 0B00B CBC0C0BC B0B00B C0C0BC B0B0BC0 000BB
41
7 60 067
23
3 3 2020 3030 202023 3030 202023 3030 202023 3030 202023 )1 2020 0) 002323 3030
The parent may have its left pointing to n. Test this using:
17
080 9 989090 8080 09089 080 8909089 080 8909089 08089 989090 808089 9090 8080 008989
p:
040 5 545050 4040 05045 040 4505045 040 4505045 04045 545050 404045 5050 4040 004545
p:
23
FG
E E E D0D0D0 0E0E0 0D0D0 DE0E0E0DE 0D0D0 DE0E0E0DE 0D0D0DE EDE0E0E0 D0D0D0DE E0E0E0 D0D0D0 000DEDE @A
n:
p:
n:
23
55
23
n:
17
96
p: n: 55
41
n:
p: 55
n:
VW
41
23
23
p: 55
97
Heapspriority trees
A heap has a tree structure without pointers, most important item is at the top of the tree, and is also called a priority queue. Store the heap in an array, A, of type node. leftChild(i) 2i rightChild(i) 2 i+1 parent(i) i/2 The heap property: A[parent(i)] >= A[i] for a heap with its greatest element on top, or A[parent(i)] <= A[i] with the minimum element on top.
98
99
Heapsthe methods
Constructors, gets and sets. removeMax(); boolean isEmpty(); boolean isFull(); insert(node n); buildUp(node X[]); heapify(int i); sort(node X[]);
100
101
Heapsheapify(int i)
public node heapify(int i) { l = left(i); r = right(i); if (l <= getSize() && A[l].key > A[i].key) topPriority = l; else topPriority = i; if (r <= getSize() && A[r].key > A[topPriority].key) topPriority = r; if (topPriority != i) { swap(A[i], A[topPriority]); heapify(topPriority); } }
102
Heapsheapify(int i)faster
public node heapify(int i) { l = left(i); r = right(i); if (l <= getSize() && A[l].key > A[i].key) topPriority = l; else topPriority = i; if (r <= getSize() && A[r].key > A[topPriority].key) topPriority = r; if (topPriority != i) { swap(A[i], A[topPriority]); if (left(topPriority) <= getSize()) heapify(topPriority); } }
103
Heapsheapify(int i)iterative
First, we see if child = left(i) is within the heap. Next, make child += 1 if the right child is bigger than the left child, then break if A[i] >= A[child] or carry on by swapping A[i] and A[child] and repeat another level.
public node heapify(int i) { int child = left(i), n = getSize(); while (child <= n) { if (child < n && A[child] < A[child+1]) child++; if (A[child] < A[i]){ swap(A[i], A[child]); i = child; child = left(i} } else break } }
104
Heapsinsert(node n)
public node insert(node n) { setSize(getSize()+1); i = getSize(); while (i > 1 && A[parent(i)].key < n.key) { A[i] = A[parent(i)]; i = parent(i); } A[i] = n; }
105
Heapsnode removeMax()
public node remove() { if (getSize()<1) { System.io.println ("Heap is empty."); return null; } else { top = A[1]; A[1] = A[getSize()]; setSize(getSize()-1); heapify(1) return top; } }
106
Heapsnode buildUp()
Pseudocode:
// pre: given an array of sizeX elements // post: node X[] is built into a heap public void buildUp(node X[], int n) { for (i=n/2; i--; i>0) heapify(i); }
107
Heapsnode sort()
Pseudocode:
// pre: given an array of sizeA elements // post: node A[] is sorted. public void sort(node A[], int sizeA) { buildUp(A, sizeA); for (i=sizeX; i--; i>1){ swap(A[1],A[i]); n--; heapify(1); } }
108
Heapscomplexity of buildUp
Call the total time to run buildUp, T . Suppose there are n nodes in the heap, then, The height of the heap tree, H = log2 n. The height of a vertex is the number of edges from the node to the root. The roots height is zero. At most
2h+1 n
The heights of vertices run from 0, 1, 2, . . . , H . So, by running over all heights:
H
h h+1 2 h=0
H
= n
h=1
h2h1
H h2h1 < 1, h=1
SH =
H h2h1 h=1
<1
SH
=
h=1
h2h1
2SH
110
111
112
n 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096
height 0 1 2 3 4 5 6 7 8 9 10 11 12
n 2h+1
Now we vary the heights for a xed value of n. Suppose that the root of a heap has height 0.
A heap is full when it has exactly 2h+1 1 nodes, where h is the height of the heap. Adding a node to a full heap increases its height by 1. The height of the ith node in a heap is log2 i . The heights run from 1, 2, . . . , H . n At most h+1 vertices are 2 of height h.
H h1 h=1 h2
113
total number of probes number of nodes (1 2h) + h2h = in terms of h, h1 2 N + (N + 1) log2(N + 1) = in terms of N , N
114
Search trees
Search trees
BST
placements
AVL trees
Splay trees
(2,4) trees
Redblack trees
Btrees
Search trees implement the dictionary abstract data structure (ADT) and have the methods
boolean isFull() and boolean isEmpty() node find(k) as well as iterator findAll(k) boolean insert(k, data) or boolean insert(n) node remove(k)
115
BST
placements
AVL trees
Splay trees
(2,4) trees
Redblack trees
Btrees
116
BST
placements
AVL trees
Splay trees
(2,4) trees
Redblack trees
Btrees
117
BSTs
Many pages of notes exist for BSTs. The most dicult method to program is delete(k) which removes the node with key k from a BST: Removing node p with at least one node with a null child, entails moving the other child pointer up to p.parents child that points to p and so doing discarding p. To remove a node p with two children: Find its biggest (smallest) child in its left (right) subtree, copy that biggest (smallest) child over the node to be deleted and then remove the node that was copied as usual. The copied node cannot have two children. The other methods are pretty elementary. Retrieval in a balanced BST takes O(log n) time and in an unbalanced BST retrieval can be as bad as O(n). So, much eort is made to keep BSTs balanced.
118
AVL trees
The two mathematicians G.M. Adelson-Velskii and Y.M. Landis invented AVL trees in 1962. An AVL tree has the height-balanced property or AVL property:
The children of every internal node n have heights that dier by at most 1.
root: 23
root: 23
17
17
35
Since AVL trees are balanced the retrieval of nodes is at worst log2 n
119
120
acements
A B C
h+1 T1
h T3 T2
T4
h1
After a single rotation, in which A becomes B s right child and T2 becomes As left child; the tree stays a acementsBST and it becomes an AVL tree:
B A C
h+1 T1 T2
This is an AVL tree
h T3 T4 h1
Given only A, the code is: B = A.left; A.left = B.right; B.right = A; None of the nodes now violates the AVL property.
121
acements
h1 T1
T2 h T3 T4
h+1
After a single rotation, A becomes C s left child and T3 becomes As right child; the tree stays a BST and acementsit becomes an AVL tree:
C A B
h h1 T1 T2 T3
h+1 T4
Given only A, the code is: C = A.right; A.right = C.left; C.left = A; None of the nodes now violates the AVL property.
122
acements Node A violates the AVL property. The other internal nodes do not. A double rotation is required to turn this into an AVL tree.
h h T1 h+1 T2 h1 h T3 T4
2
After a double rotation, in which D becomes the root, the tree stays a BST and becomes an AVL tree: Given only A, the code is: B = A.left; D = B.right; B.right = D.left; D.left = B; A.left = D.right; D.right = A; D becomes the root and none of the nodes now violates the AVL property.
123
acements
An AVL tree B D A C
h1 h T2 h+1 T1 T3 T4 h h
acements Insertion at T3 instead of at T2 results in: Node A violates the AVL property. The other internal nodes do not. A double rotation is required to turn this into an AVL tree.
After a double rotation, in which D becomes the root, the tree stays a BST and becomes an AVL tree: Given only A, the code is: B = A.left; D = B.right; B.right = D.left; D.left = B; A.left = D.right; D.right = A; D becomes the root and none of the nodes now violates the AVL property.
124
acements
An AVL tree B D A C
h1 h T2 h+1 T1 T3 T4 h h
acements Insertion at T2 now results in: Node A violating the AVL property. The other internal nodes do not. A double rotation is required to turn this into an AVL tree.
h h1 T1
2
h T4
h T2 T3
Not an AVL tree
h+1
After a double rotation, in which D becomes the root, the tree stays a BST and becomes an AVL tree:
An AVL tree A B D C
acements
h1 h h+1 T1 h T2 T3 T4 h
Given only A, the code is: C = A.right; D = C.left; A.right = D.left; D.left = A; C.left = D.right; D.right = C; The tree with D as root is now an AVL tree.
125
acements An insertion at T3 now results in node A violating the AVL property. The other internal nodes do not. A double rotation is required to turn this into an AVL tree.
h h1 T1
2
h T4
h T2 T3
h+1
After a double rotation, in which D becomes the root, the tree stays a BST and becomes an AVL tree:
An AVL tree A B D C
acements
h1 h h+1 T1 h T2 T3 T4 h
Given only A, the code is: C = A.right; D = C.left; A.right = D.left; D.left = A; C.left = D.right; D.right = C; The tree with D as root is now an AVL tree.
126
127
Summary
AVL trees are ecient. It has been shown that they require 1.45comparisons of optimal trees. Experiments have shown that AVL trees yield an average search time of log2 n+0.25 comparisons. A disadvantage is the extra space, and bookkeeping required for handling the balance factors.
128
Rotation of the edge joining node x and node y . Each triangle represents a subtree. Note that the subtrees are in the same N r order before and after a rotation. Rotation does not aect the ordering of the nodes in the subtrees, A, B , and C . Assuming the gure on the ileft and pointers to nodes x and y , the code to execute a right rotation is:
subtree B = x.right; x.right = y; y.left = B; y.parent.(left or right) = x; x.parent = y.parent; y.parent = x;
129
A B
C B
Rotation of the edge joining node x and node y . Each triangle represents a subtree. Note that the subtrees are in the same N r order before and after a rotation. Rotation does not aect the ordering of the nodes in the subtrees, A, B , and C . Assuming the gure on the left and pointers to nodes x and y , the code to execute a right rotation is:
subtree B = x.left; x.left = y; y.right = B; y.parent.(left or right) = x; x.parent = y.parent; y.parent = x;
130
131
Splay trees
Before the zig operation:
z 10 y 20 x 30
ag replacements
PSfrag replacements
T1 h h1 h+1 T3 T4 T2 h h1 h+1 T1 T2 T3
T4
Splay trees restructure the tree so that each time a node x is accessed it is moved to the root by doing rotations bottom-up along the path of access by one three splaying steps until x reaches the root.
1. zig, which reduces xs height by one. 2. zig-zig, and 3. zig-zag and their mirrored symmetries.
Zig-zig and zig-zag reduce xs height by two, i.e. each splay x moves 1 or 2 closer to the root.
132
Case 1: zig: If y = parent(x) is the root, rotate along the edge x y. Case 2: zig-zig: If y = parent(x) is not the root and x and y are both left children, or both right children: let z = parent(y), rst rotate the edge y z and then rotate the edge x y. Case 3: zig-zag: If y = parent(x) is not the root and x is a left child and y a right child, or vice versa, letting z = parent(y), rst rotate the edge x y and afterwards rotate x z.
Splaying a node x at depth d takes (d) time which is proportional to the time that it takes to access x. Note that each splay step preserves the inorder properties of the tree.
133
left rotate y z
y z x
x y z C A B D
acements
left rotate x y
Or
Rotate y z
x
y z
x y A B C D z
acements
Rotate x y
Or
Rotate x y
z x y
Rotate x z
acements
A
Or
Rotate x y
z
Rotate x z
acements
A
Or
y A B C D z
acements
xyz
Assume that we know x, y and z . Why can we assume this? The code to do a zig-zig:
leftRotate(y, z); leftRotate(x, y);
Or
A B
C B
139
140
Only one argument is needed: the pivot of the rotation, x. y = x.parent; // assuming that x != null The temprorary node B or C is not needed. The rotation is determined by the leftness or rightness of x, determined by x.key < y.key.
141
Case 1: zig: If parent(x) is the root, rotate(x). Case 2: zig-zig: If parent(x) is not the root and x and parent(x) are both left children, or both right children: rst the rotate(parent(x)) and then rotate(x). Case 3: zig-zag: If parent(x) is not the root and x is a left child and parent(x) a right child, or vice versa, rst rotate(x) and afterwards rotate(x).
Splaying a node x at depth d takes (d) time which is proportional to the time that it takes to access x. Note that each splay step preserves the inorder properties of the tree.
142
rotate(x.parent)
y z x
x y z C A B D
acements
rotate(x)
It is called a left-left zig-zig because both rotations are to the left, i.e. counter-clockwise or anti-clockwise.
Or
rotate(x.parent)
y x z
x y A B C D z
acements
rotate(x)
Notice that the code for a right-right zig-zig is the same as that for a left-left one. rotate(x) decides the direction.
Or
rotate(x)
z x y
rotate(x)
x z y
acements
A
Or
rotate(x)
z
rotate(x)
acements
A
Or
y A B C D z
acements
xyz
Assume that we know x, y and z . Why can we assume this? The code to do a zig-zig:
leftRotate(y, z); leftRotate(x, y);
This becomes
rotate(x.parent); rotate(x);
Or
Splay trees
In our pseudo-code we have used the expression x.parent instead of getParent(x) to clarify our coding. Next we tackle the code for splay(x) given only the tree node x.
149
Take care this has some errors. Can you spot them?
150
void splay(tree x) { // repeat while x is not the root node while (x.parent != null) { // y -- x if ((x.parent).parent != null) { // z -- y -- x // x.parent has a non-root parent if (x.key < (x.parent).key) // x is a left child if ((x.parent).key < ((x.parent).parent).key) // x.parent is a left child: left-left zig-zig rotate(x.parent); else // left-right: zig-zag rotate(x); else // x is a right child if ((x.parent).key < ((x.parent).parent).key) // x.parent is a left child: right-left zig-zig rotate(x.parent); else // right-right: zig-gig rotate(x); rotate(x); } }
151
Splay treessplay(x)tidier
void splay(tree x) { // repeat while x is not the root node while (x.parent != null) { // y -- x if ((x.parent).parent != null) // z -- y -- x if (x.key < (x.parent).key) if ((x.parent).key < ((x.parent).parent).key) rotate(x.parent); else // left-right: zig-zag rotate(x); else // right if ((x.parent).key < ((x.parent).parent).key) rotate(x.parent); else // right-left: zig-gig rotate(x); rotate(x); } }
152
done
out
do
p(x) is not root
no
yes
x.key < p(x).key
>=
>=
rotate(p(x))
eplacements
<
p(x).key < g(x).key
< >=
rotate(x)
<
rotate(p(x)) rotate(x)
153
Splay trees
See Goodrich and Tamassia for more slides on zaber at ../../notes/ds/SplayTrees.pdf when you are in you home directory.
154
Multiway trees
Before discussing (2 4) trees we rst consider multiway trees. The gure below is a multiway tree.
acements
T1 T2 T3 T4 h h1 h+1
lm
11 13
17
de
fg fg hi hi
de d
bc b
jk
14
23 24
hi
27
fg f
tu tu vw `a xy
5 10
rs rs
25
22
pq
Red-black trees
A (2 4) tree has an updatesearch, insert and removetime of O(log n). A red-black tree has an update time of O(log n), i.e. it needs a constant number of changes to do an update. A red-black tree is a BST with nodes coloured red or black using the following rules:
1. The root is black. 2. Every external node is black. 3. The children of a red node are black. 4. External nodes have the same black depth. i.e. the number of black ancestors minus one.
The following slides shows how to convert nodes from a 2 4 tree into a red-black tree
156
xy x
acements
z{ z{ |}
11 14 17
vw
tu rs r
pq
22
6 8
22
~
no
or
8 6
14
11
17
157
11 14 17
23 24
27
158
5 10
25
The vertical-horizontal red-black tree looks more like the original (2 4) tree.
22 5 3 4 6 10 8 11 14 17 23 24 25 27
11 14 17
23 24
27
159
5 10
25
The resultant tree below has lost its black-depth property. Since 25s right childa double blacknow has a black depth of 2, it is no longer a red-black tree.
22 5 3 4 6 10 8 11 14 17 23 24 25
It is easy to repair the double-black error by restructuring 23 25 24 as three separate black 2-nodes with 24 on top and the black-depth property is restored.
160
The leaves of 23 and 24 now have the incorrect black depth of 2. Subsequently, this 4-node 23 24 25 is recoloured into three separate black 2-nodes with 24 on top. All the leaves now have a black depth of 3.
22 5 3 4 6 10 8 11 14 17 23 24 25
161
Deletion of either 23 or 25 from this red-black tree cause a double black to hang from node 24. Let us remove 23. The double black hangs from node 24.
22 5 3 4 6 10 8 11 14 17 23 24
The red-black tree is repaired xed by rst amalgamating 24 and 24 into a 3-node and then the next step becomes obvious.
22 5 3 4 6 10 8 11 14 17 23 24
162
Since they are all red, removing 3, 8, 11, 17 or 24, has no eect on the depths of the leaves of the tree.
163
(a, b) Trees
A tree where all the nodes have at least a children and at most b children is a B-tree when 2 a (b = 1)/2. All the leaves have the same depth. The root may have 0 children when the tree is empty. Otherwise the root must have at least 2 children. If there are less than a entries, they all lie in the root node, such that the root may have less that a children.
164
165
it follows that
(h 1) log2(n + 1) 1 log2 a log2(n + 1) 1 +1 h log2 a . = loga(n + 1) h is O(loga n)
and
log2(n + 1) h log2 b = logb(n + 1) h is (logb n)
166
167
A B -tree of order d
A B -tree of order d is a ( d/2 , d) tree. d is chosen such that d 1 keys and d references can be stored in a single block of size B . So each time a node is accessed a single disk transfer is done. The most transfers that ever need to be done is O(logan) = O(log d/2 n) and it will occupy O(n/B) blocks.
168
169
171
The void procedure insert(A, i, j) inserts A[i] into the array A[1..i] at j, by moving each element down one place and inserting now at A[j]: void insert (int A[], int i, int j) { int now = A[i], k; for (k = i; k >= j; k--) A[k] = A[k-1]; A[j] = now }
173
174
175
176
177
When 2i = n, i.e. i = log n, then T (n) = icn and T (n) = cn log n, i.e. T (n) is O(n log n).
178
179
At some stage either i > p, or j > to. We will rst discuss the case where i > p, then all of A[f rom..p] and only some of A[p + 1..to] has been copied into merged[0..k]. The tail end of A[], namely A[k + 1..to] is in place and only merged[0..k] needs to be copied into its right place at A[f rom..f rom + k].
180
p i = p+1
j1 j k
to
some left
A merged
Notice that, here, j coincides with f rom + k + 1. To complete the sorting of A[f rom..to] the contents of merged[0..k] must be copied back into A[f rom..j 1] since A[j..to] is already in place.
from from+k j to
All of merged[0..k] copied back here.
181
p i = p+1
j1 j k
to
some left
A merged
// Copy merged[] and A[i..p] or A[j..to] back to A[] if (i == p+1) { // rest of A[j..to] already in place r = from; s = 0; while (r < j) // OR while (s <= k) A[r++] = merged[s++]; } else ...
from
All of merged[0..k] copied back here.
from+k j
to A
182
i
Some left
p p+1
Vacated
to j = to+1 A
Empty
0
All of A[from..i1] and A[p+1..to] merged here
k merged
from+k from+k+1
Was A[i..p]
to
Vacated
A merged
k
Empty
183
To complete the sorting of A[f rom..to] the contents of merged[0..k] must be copied back into A[f rom..f rom + k] since A[f rom + k + 1..to] is already in place.
from
merged[0..k] moved here
from+k from+k+1
Was A[i..p]
to A
184
mergeSortthe code
public void mergeSort(int A[], int from, int to) { // in : A[from..to] array of elements in any order // int from, and int in, within bounds // // out: Array[from..to] in sorted order int p; if (from + 1 >= to) { if (A[from] <= A[to]) return; swap(A, from, to); return; } else { p = (from+to)/2; mergeSort (A, from, p); mergeSort (A, p+1, to); merge (A, from, p, to); return; } }
185
186
In hindsight we see that the code required to copy merged[0..k 1] back into A[f rom..f rom + k 1] is duplicated. So merge is improved as follows by altering the code of the if statement:
if (i != p+1) { // first park A[i..p] into A[..to] r = to; s = p; while (s >=i) A[r--] = A[s--]; } r = from; s = 0; while (s<k) A[r++] = merged[s++];
Note:
The copying back of the merged array merged[0..k] can be replaced with the copying of a few links if the A[] is stored as a linked list.
187
188
189
191
n(n 1) +
i=1
T (i 1) + T (n i)
So
1 T (n) = n 1 + T (i 1) + T (n i) n i=1 1 1 T (i 1) + T (n i) = n1+ n i=1 n i=1 2 = n1+ T (i) n i=0 = O(n log n)
n1 n n n
192
n i=1 T (i)
Closed form of
T (n) = n 1 +
n1
2 n
n1 i=0 T (i)
(1)
nT (n) = n(n 1) + 2
i=0
T (i), n 2
n
(3)
(n + 1)T (n + 1) = n(n + 1) + 2
i=0
T (i), n 2 (4)
194
Closed form of
T (n) = n 1 +
n1
2 n
n1 i=0 T (i)
nT (n) = n(n 1) + n
i=0
T (i), n 2
n
(3)
(n + 1)T (n + 1) = n(n + 1) + 2
i=1
T (i), n 2 (4)
Giving,
(n + 1)T (n + 1) = (n + 2)T (n) + 2n,
Then, dividing by n + 1,
T (n + 1) =
2n n+1
< 2, because
n n+1
< 1,
195
Closed form of
T (n) = n 1 +
2 n
n1 i=0 T (i)
= = =
Where H(n) = 1 + 1/2 + 1/3 + . . . 1/n is the harmonic series, H(n) = ln n + O(1/n), where Eulers constant = 0.57721566 . . .. The closed formula for T (n) is
T (n) 2(n + 1)(ln n + 1.5) = O(n log n)
196
1. If the pattern is empty p = then i = 0 should be returned. 2. If the text is empty then failure is always reported, i = 1. 3. If |p| > |T |, i.e. p.length()>T.length(), then the match fails. 4. Test that a pattern can match T[0..m-1]. 5. Test that a pattern can match T[n-m..n-1]. 6. Test that the pattern can match somewhere in T[1..n-2], i.e. it does not fall on a boundary.
197
n1
Here p[0..j-1] has matched T[i..i+j-1] but p[j] and T[i+j] do not match.
Next: i++; j = 0; and try to match the pattern p[0..m-1] with the substring T[i..i+m-1] from the text, until a match is found. In the worst case most of p will match parts of T and mismatches only occur near the end of p. In this case at O(m) matches will occur O(n) times when the pattern is found near the end of T. Worst case for BF matching is O(nm).
198
In the gure p[0..j-1] has matched T[i..i+j-1] but p[j] != T[i+j]. Try matching p[0] and T[i+1] next.
int int int int matchBF (String p, String T) { m = p.length(); if (m == 0) return 0; n = T.length(); i = 0, found = -1; while (found < 0 && i <= n-m) { int j = 0; while (j < m && p[j] == T[i+j]) j++; if (j == m) found = i; i++; } return found; }
199
matches p matches T
i1 j1
i a j b
n1 m1
In this gure p[0..j-1] has matched T[i-j..i-1] but p[j] != T[i]. Try p[0] :: T[i-j+1] next.
int int int int matchBF (String p, String T) { m = p.length(); if (m == 0) return 0; n = T.length(); i = 0, j = 0; do { if (p[j] == T[i]) if (j == m - 1) return i - j; else { i++; j++; } else { i = i - j + 1; j = 0; } } while (i < n - m); return -1; }
200
1. Rabin-Karp uses hashing to slide p over T. 2. Boyer-Moore determines if and where the character T[i] is present in the pattern p[0..m-1] to decide where to place p next. 3. Knuth-Morris-Pratt matches p with itself to determine where to move p next.
201
T = 27581463145918463717346715471874 First get pHash = 31459, then one-by-one match it with each T Hashi: 275810, 758141, 581462, 814633, 146314, 463145, 631456, 314597. Since pHash matches with T Hash7 the position 7 is returned.
203
T = 27581463145918463717346715471874 pHash T Hash0 T Hash1 T Hash2 T Hash3 T Hash4 T Hash5 T Hash6 T Hash7 T Hash8 = = = = = = = = = = 31459 27581 75814 = (T Hash0 20000) 10 + 4 58146 = (T Hash1 70000) 10 + 6 81463 = (T Hash2 50000) 10 + 3 14631 = (T Hash3 80000) 10 + 1 46314 = (T Hash4 10000) 10 + 4 63145 = (T Hash5 40000) 10 + 5 31459 = (T Hash6 60000) 10 + 9 14591 = (T Hash7 30000) 10 + 1
In general
204
205
digit
digit
208
Rabin-Karpcode
public static int modHash = 65521; public static final int base = (Character.MAX_VALUE + 1) % modHash; static int firstHash(String s) { int m = s.length(); int j, h = 0; for(j = 0; j < m; j++){ h = mod(h * base); h = mod(h + s.charAt(j)); } return h; }
209
Rabin-Karpcode
static int hash(String s) { return firstHash(s); } static int baseLeftmost (String T) { int baseLeft = 1; int i; for (i = 1; i < T.length(); i++) baseLeft = mod(baseLeft*base); return baseLeft; }
210
Rabin-Karpcode
static int matchRK(String p, String T){ int m = p.length(); int n = T.length(); if (m > n) return -1; int pHash = hash(p); int THash = hash((String)T.subSequence(0,m)); int baseLeft = baseLeftmost(p); int i;
211
Rabin-Karpcode
// get the |subsequence|=m starting at 0 for (i = 0; i + m <= n; i++) { if (THash == pHash) { if (stringCompare( (String)T.subSequence(i, i+m), p)) return i; } if (i == n-m) return -1; // deduct the contribution of left character THash = mod(THash - mod(baseLeft * T.charAt(i))); // Shift left // add character at right THash = mod(mod(THash*base) + T.charAt(i+m)) ; } return -1; }
212
Rabin-Karpcode
static boolean stringCompare(String p, String T) { int m = p.length(); int n = T.length(); int i; if (m != n) return false; for (i = 0; i < m; i++) if (p.charAt(i) != T.charAt(i)) return false; return true; }
213
Rabin-Karpcode
static int mod (int top) { int modulo = top % modHash; if (modulo < 0) return modulo + modHash * (1 + (-modulo/modHash)); else return modulo; }
214
215
216
217
If the letter a never appears in p, the pattern may be moved on, placing p[0] over T[i+1] and p[m-1] over T[i+m]ignoring the characters in T[i-j+1..i-1]. Boyer-Moore-matching preprocesses p to nd the last occurrence of each character. It matches from p[m-1] backwards through to p[0]. On non-matching p[j] and T[i], if T[i] does not occur in p, then p[0] is moved to next match T[i+1]ignoring characters in T[i-j+1..i-1]otherwise p is moved putting p[m-1] over T[i ] where i = i + m - min(j,1+lastInP(T[i]));, next, match p[m-1]::T[i ]. In the worst case BM can degenerate to O(mn + ||) but it is likely to scan less than n characters in T.
218
For p, calculate the table lastInP and place p[0] over T[0], i.e. set i = m-1 and j = m -1. Compare p[j]::T[i], i.e. compare p[m-1]::T[m-1]. If equal, then j-- and i--, until a match of p at j == 0. If p[i]::T[i] are not equal: Look up the entry in lastInP(T[i]). It is -1 if T[i] p. If T[i] does not appear in the pattern p then move i forward: i = i + m. If the last appearance of T[i]p is at k then move i forward: i = i + m - min(j, 1+k). Repeat while (i < n).
219
Boyer-MooregetLast(p, lastInP)
static void getLast(String p, int lastInP[]) { // pre: p[0..m-1] pattern for which // lastInP must be created // post: Boyer-Moores lastInP vector // int j; int s = 256; int m = p.length(); for (j=0; j<s; j++) lastInP[j] = -1; for (j=0; j<m; j++) lastInP[(int)(p.charAt(j))] = j; }
220
Boyer-MoorematchBM(p,T)
Start with the usual tests on the bounds m and n and set up lastInP.
static int matchBM(String p, String T) { // pre: p[0..m-1] pattern to match // T[0..n-1] text in which to find pattern // post: returns index of first matched position int m = p.length(); int n = T.length(); if (m == 0) return 0; else if (n == 0) return -1; else if (m > n) return -1; int s = 256; int last[] = new int [s]; int i; int j; getLast(p, last);
221
Boyer-MoorematchBM(p,T)
The code puts i, so that p[0] lies at T[i-m+1].
i = j = m - 1; do { if (p.charAt(j)==T.charAt(i)) if (j == 0 ) return i; // first match else { i--; j--; } else { i = i + m // jump step - Math.min(j, 1+last[T.charAt(i)]); j = m - 1; } } while (i < n );
222
Boyer-Moore matchingcomplexity
BM must set up lastInP[]. This costs |lastInP[]| = ||. Then it starts comparing from the back of p. Unless p is covering itself in T, or if the |p| << |T| or |p| << ||, most comparisons are going to be unequal.
223
ments
Knuth-Morris-Pratt matchingpreview
p
ij i1 i matches p a j 0 p matches p b
ments
Knuth-Morris-Pratt-matching (KMP) preprocesses p by rst matching p with itself storing information about where to move p[0] next when it gets a missmatch at p [i]: We know p[0..j-1]==p [i-j..j-1]. ij ik1 i1 i suffix p a p j k 0 p prefix p b In the matching piece KMP nds a prex of pp[0..k] that matches a sux of p p [i-k-1..i-1].
When there is a match, KMP always sets the next value of j to j++ and steps i to i++. When there is no match, j becomes next[j]k and if j0 then i stays the same and KMP avoids redundant comparisons in p [i-j..i-1], otherwise when j<0 then j=0 and i++.
224
KMP matchingexamples
Let i run in T and j run through p, comparing p[j] with T[i] and putting i++; j++; when they are equal.
0 T band p 0 p band n1 i bandana one bandanatwobandanas j anas m1
ements
p[0..3]T[0..3] but p[j=4]a!=T[i=4]b, Since p[0..3]band does not match with any substring of itself, next put j=0i stays the samecomparing p[0..]::T[4..]. KPM knows p[0]=band and safely jumps over it.
n1 4 10 i 12 0 T bandbandana one bandanatwobandanas j 0 p p bandana s m1
ements
Now p[0..6]bandana matches T[4..10] and j becomes 0 putting p[0..] over T[11..].
225
p bandanas
0 0 0 m1 m1 m1 m1
cements
p
Subsequently j hobbles forward one-by-one until bandana is matched at T[14..20]. The mismatch at p[7]=s with T[21]=t causes p to jump forward to try matching p[0..] over T[21..]
n1 0 21 24 T bandbandana one bandanatwobandanas
p bandanas
0 0 0 m1 m1 m1 m1
cements
p
226
p bandanas
0 0 0 m1 m1 m1 m1
cements
p
When p[0] mismatches both i and j are incremented: p[j=0]b mismatches T[i=21]t and then p[j=0]b mismatches T[i=22]w and then p[j=0]b mismatches T[i=23]o and then p[j=0]b matches T[i=24]b, and eventually the pattern p[0..7]bandanas matches T[24..31]. The value of next[0]=-1 and all the other values of next[j] here are 0 because bandanas has no suxes that match prexes of itself. The following example has substrings p[0..k], k> 0, which have prexes that match their own suxes. Consider the pattern ababbababaa.
227
If the pattern p=ababbababaa mismatches the text T=ababbababababababababababababababababab at p[j=10]a != T[i=10]b, the next[j] value is 2 and j must be set to j = next[j]+13, leaving i the same.
228
Knuth-Morris-Pratt matchingrecap
The longest prex-sux pair is placed over one another rst. If there is a match: i++; j++;, else If there is a mismatch a next j that is always smaller than the previous one is is calculated using j = next[j] + 1. Note that i is only advanced when there is a match or when j becomes 0. We next construct the next[] table. When there is a mismatch of p[j] with T[i] the value of next[j] tells us to advance j to next[j]+1. Usually i stays the same but can advance by 1.
231
KMPconstructing next[]
When p[j] and T[i] match, simply increment both i and j, using i++; j++;. We will now consider mismatches at various values of j[0..m-1]. At a mismatch of p[0], simply put i++; j++;. We will construct next[] for Manbers example from Udi Manber [Section 6.7, pp. 148155, Introduction to Algorithms A Creative Approach, Addison-Wesley Publishing Co., Reading, MA., 1989]
Note that the ababbababaa example is this example in slightly disguised form.
232
KMPconstructing next[]
Consider a mismatch at p[0]. Since no information cementshas been gained, simply i++; j++;. PSfrag replacements
pT x y x y y x y x y x x
0 m1
pT x y x y y x y x y x x
m1
p _ y x y y x y x y x x
p x _ x y y x y x y x x
On mismatching p[1]above rightp[0..0] is known, but it does not help so j=0 and i remains the same. After a mismatch at p[2]below leftthe substring p[0..1] is known, but no prex equals any sux so cementsagainPSfrag replacements j=0 and i remains the same.
pT x y x y y x y x y x x
2 m1
pT x y x y y x y x y x x
m1
p x y _ y y x y x y x x
p x y x _ y x y x y x x
The mismatch at p[3] means that we have covered p[0..2]"xyx". This has a prex x- that equals a sux -x so now j=1 and i remains the same.
j p next[j]
0 1 2 3 4 5 6 7 8 9 10 x y x y y x y x y x x -1 0 0 1 . . . . . . .
233
KMPconstructing next[]
When mismatching at p[4], p[0..3]"xyxy" It has a prex xy- that equals a sux -xy, so now j=2 and cementsi remains the replacements PSfrag same. 5
pT x y x y y x y x y x x
4 m1
pT x y x y y x y x y x x p x y x y y x y x y x x
m1
p x y x y _ x y x y x x
On mismatching p[5]above rightp[0..4]"xyxyy", has no prex-sux pair, so j=0 and i static. After a mismatch at p[6]below leftthe substring p[0..5]"xyxyyx" has the prex-sux pair x-, -x cementsand j=1 and ireplacements PSfrag is static.
pT x y x y y x y x y x x
6 m1
pT x y x y y x y x y x x
m1
p x y x y y x _ x y x x
p x y x y y x y _ y x x
The mismatch at p[7] means that we have covered p[0..6]"xyxyyxy". This has a prex xy- that equals a sux -xy so now j=2 and i remains the same.
j p next[j]
0 1 2 3 4 5 6 7 8 9 10 x y x y y x y x y x x -1 0 0 1 2 0 1 2 . . .
234
KMPconstructing next[]
When mismatching at p[8], p[0..7]"xyxyyxyx" It has a prex xyx- that equals a sux -xyx, so now ementsj=3 and i remains the same. PSfrag replacements
pT x y x y y x y x y x x p x y x y y x y x _ x x
8
pT x y x y y x y x y x x p x y x y y x y x y _ x
If p[9]T[9]above rightp[0..8]"xyxyyxyxy", This has a prex xyxy- that equals a sux -xyxy, so now j=4 and i remains the same. After the mismatch at p[10]belowthe substring p[0..9]"xyxyyxyxyx" has the prex-sux pair xyx-, ements-xyx, so j=3 and i is static.
pT x y x y y x y x y x x
10
p x y x y y x y x y x _
0 1 2 3 4 5 6 7 8 9 10 x y x y y x y x y x x -1 0 0 1 2 0 1 2 3 4 3
235
KMPbuildNext
Pseudocode for buildNext.
void buildNext(String p, int [] next) { int m = p.length(); int i, j; next[0] = -2; next[1] = -1; for (i = 2; i < m; i++) { j = next[i-1]+1; while (j >=0 && p[i-1] != p[j]){ j = next[j]+1; } next[i] = j; } for (j = 0; j < m; j++) next[j]++; }
236
KMPKMPmatch
Pseudocode for buildNext
int matchKMP(String p, String T) { int m = p.length(); int n = T.length(); if (m == 0) return 0; else if (n == 0) return -1; else if (m > n) return -1; int i, j, next[] = new int [m+1]; buildNext(p, next); ...
237
KMPKMPmatch
... buildNext(p, next); i = 0; j = 0; while (i < n) { if (p[j] == T[i]){ i++; j++; } else { j = next[j]; if (j<0) { j = 0; i++; } } if (j == m) return i - m; } return -1; }
238
Knuth-Morris-Prattcomplexity
Calculating buildNext[] takes O(m). Matching the pattern using the next[] table takes O(n). KMP runs in O(m + n). One character in T[i] may be matched with many characters from the pattern p. How many? When there is a mismatch with T[i] another character from p is matched if next[j]>= 0. Suppose the rst mismatch involved p[j], since each evaluation of next[j] leads us to a smaller index j[0..j-1], so we can only j times. However, to reach p[j] we must have gone forward j times without any backtracking. If we assign the costs of backtracking to the forward moves then we at most double the number of forward moves. There are only n forward moves so the number of comparisons is O(n).
239
Tries
Preprocess the text Tnot the pattern p
In Rabin-Karp, Boyer-Moore, and Knuth-MorrisPratt pattern matching the pattern p is preprocessed to speed up searching for its occurrence in a text T. If many searches p T are done, such as searching for the presence of a word p in a dictionary T, then preprocessing the text T may help to speed up queries, especially if T is large or when many dierent queries p are made. A trie is a very fast, dense structure that allows searches in time proportion to the pattern size, i.e. O(m).
240
Standard tries
The standard trie for a set of strings T is an ordered tree such that:
Each node but the root is labelled with a character. The children of a node are alphabetically ordered. The paths from the leaves to the root yield the strings of T. e.g. standard trie for the set of strings T = {bear, bell, bid, bull, buy, sell, stock, stop}
b e a r l l i d l l u y e l l c k s t o p
cements
241
b e a r l l i d l l u y e l l c k s t o p
eplacements
242
a
4 5
b e a r ?
6 7 8 9
s e l l b u y b i d
s t o c k !
20 21 22 23
10 11 12 13 14 15 16 17 18 19
s e e b i d
b u l l ?
s t o c k !
40 41 42 43 44 45 46
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
ements 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
h e a r t h e b e l l ? s t o p !
s t o c k !
s t o c k !
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
We insert some of the words of the text into a trier. Each leaf stores the occurrences of the associated word in the text.
b e a r
6
h u l l y
36
s e e
0, 24
i l l d
47, 58 78 30
e a r
t l l c k o p
84
placements
69
12
243
Compressed tries
A compressed trie has internal nodes of degree at least two. It is obtained from a standard trie by compressing chains of redundant nodes.
b e a r l l i d l l u y e l l c k s t o p
b e ar ll id ll u y ell ck s to p
acements
244
Compact Representation
Compact representation of a compressed trie for an array of strings: Stores at the nodes ranges of indices instead of substrings Uses O(s) space, where s is the number of strings in the array Serves as an auxiliary index structure
s[0]= s e e
0 1 2 3 4
0 1
0 1
s
3,1,2
acements
1,2,3
4,1,1
8,2,3
4,2,3
5,2,2
0,2,2
2,2,3
3,3,4
9,3,3
245
Sux trie
The sux trie of a string T is the compressed trie of all the suxes of T.
m i n i m i z e
0 1 2 3 4 5 6 7
7, 7
1, 1
0, 1 6, 7 2, 7 6, 7
2, 7
6, 7
4, 7
2, 7
eplacements
e mize
i nimize ze nimize
mi ze
nimize
ze
246
7, 7
1, 1
0, 1 6, 7 2, 7 6, 7
2, 7
6, 7
4, 7
2, 7
eplacements
e mize
i nimize ze nimize
mi ze
nimize
ze
247
248
By combining pairs of these frequencies, with the lowest rst, and then adding their combinations to the frequency list, and removing the originals a binary tree is produced from which an optimal code can be written down. The compressed code may then rapidly be produced from the binary tree.
249
In order to nd the lowest frequency characters rst, an obvious method is to: sort the characters by frequency; combine the top pair; remove the top two characters but insert their combination back into the list at its correct postition. The very rst pair with the lowest frequency could appear at the lowest node in the nal tree. Subsequent combinations ll the tree from the lowest level. The last combination is hooked into the root of the tree.
250
:9 a:5 b:1 c:3 d:7 e:3 f:1 h:1 i:1 k:1 n:4 o:1 r:5 s:1 t:2 u:1 v:1
Then combine the last 1-node with a 2 node, and start combining pairs of 2 nodes.
3 2 2 2 4 2
:9 a:5 b:1 c:3 d:7 e:3 f:1 h:1 i:1 k:1 n:4 o:1 r:5 s:1 t:2 u:1 v:1
Reorder the list and combine the remaining 2 nodes. Combine the pair of 3-leaves.
4 2 3 2 6 2 4 2
:9 d:7 a:5 r:5 n:4 t:2 u:1 v:1 b:1 f:1 h:1 c:3 e:3 i:1 k:1 o:1 s:1
251
:9 d:7 c:3 e:3 a:5 r:5 n:4 t:2 u:1 v:1 i:1 k:1 o:1 s:1 b:1 f:1 h:1
4 2 2
4 2
3 2
:9 d:7 c:3 e:3 a:5 r:5 n:4 t:2 u:1 v:1 i:1 k:1 o:1 s:1 b:1 f:1 h:1
252
13
10
4 2 2
4 2
3 2
d:7 c:3 e:3 a:5 r:5 :9 n:4 t:2 u:1 v:1 i:1 k:1 o:1 s:1 b:1 f:1 h:1
253
10
13
4 2 2
4 2
3 2
a:5 r:5 :9 n:4 t:2 u:1 v:1 i:1 k:1 o:1 s:1 b:1 f:1 h:1 d:7 c:3 e:3
There are only three nodes left. We will combine the lowest two, resort and again combine the lowest two to create a single rooted tree from the forest.
254
15
19
13
10
4 2 2
4 2
3 2
n:4 t:2 u:1 v:1 i:1 k:1 o:1 s:1 b:1 f:1 h:1 d:7 c:3 e:3 a:5 r:5 :9
255
19
0
15
1
13
0 1 0 1
10
1
4
0
4
1
3 6
0 1 1
2
0 0
1
2
0
2
1
2
0 1 0
ements
1 0
1 0
1
:9
n:4 t:2 u:1 v:1 i:1 k:1 o:1 s:1 b:1 f:1 h:1 d:7 c:3 e:3 a:5 r:5
256
1 0
19
1
13
0
8
1
7
0 1
d:7
1
6
0 1
10
:9
0
n:4
1 0 0
4
0 1
2
a:5 r:5
0
2
1
2
0 1
b:1
1
2
c:3 e:3
ements
t:2
1 0
1 0
f:1 h:1
From the gure it follows that the number of digits needed to represent each character is the same as its depth in the Human tree.
257
1 0
19
1
13
0
8
1
7
0 1
d:7
1
6
0 1
10
:9
0
n:4
1 0 0
4
0 1
2
a:5 r:5
0
2
1
2
0 1
b:1
1
2
c:3 e:3
acements
t:2
1 0
1 0
a 5 100
1
d 7 010 r 5 101 e 3 0111 s 1 001011 f 1 001110 t 2 00010 h 1 001111 u 1 000110 i 1 001000 v 1 000111
f:1 h:1
9 11
b 1 00110 n 4 0000
c 3 0110
k 1 001001
o 1 001010
coded is:
1001100111 0100001011 0001011101 0001100000 0000011110 1110000011 1011101011 0000011100 0111011110 1110011001 1111100001 1101011000 0100001011 0010100011 1011000100 0111101111 1010100101 001001
258
k 1 001001
o 1 001010
coded is:
1001100111 0100001011 0001011101 0001100000 0000011110 1110000011 1011101011 0000011100 0111011110 1110011001 1111100001 1101011000 0100001011 0010100011 1011000100 0111101111 1010100101 001001
The Human code has 176 bits. Since |T| = 46 and it has an alphabet of 16 characters a xed length code of 4 bits per character is needed to encode this string, i.e. 46 4 = 184. This is not the optimal Human code for this string. Why not?
259
k 1 001001
o 1 001010
coded is:
1001100111 0100001011 0001011101 0001100000 0000011110 1110000011 1011101011 0000011100 0111011110 1110011001 1111100001 1101011000 0100001011 0010100011 1011000100 0111101111 1010100101 001001
The Human code has 176 bits. Since |T| = 46 and it has an alphabet of less that 16 characters a xed length code of 4 bits per character is needed to encode this string, i.e. 46 4 = 184. This is not the optimal Human code for this string. Why not? Since our code caters for some of characters that have dierent counts from T, e.g. b, c:3, d:7, e:3, and f:1, than this string has, the code cannot be optimal. H.W. Calculate the optimal Human code for T.
260
261
262
The total of the frequencies is 122. The letters are sorted according to counts or probability smallest rst. The probability is frequency divided by the total frequency. The two nodes with the lowest counts are combined:
a+b:6 a:2 b:4 c:5 d:7 e:15 f:29 r:60
Insert the combined node [a+b : 6] after node [c : 5] and before node [d : 7].
a+b:6 c:5 a:2 b:4 d:7 e:15 f:29 r:60
263
Human coding
Starting with
a+b:6 c:5 a:2 b:4 d:7 e:15 f:29 r:60
form node [c+(a+b) : 11] by combining combine nodes [c : 5] and node [a+b : 6]:
c+(a+b):11 a+b:6 c:5 a:2 b:4 d:7 e:15 f:29 r:60
264
Human coding
Form node [d+(c+(a+b)) : 18] by combining combine nodes [d : 7] and node [c+(a+b) : 11]:
d+(c+(a+b)):18
265
d+(c+(a+b)):18
f+(e+(d+(c+(a+b)))):62
e+(d+(c+(a+b))):33
d+(c+(a+b)):18
266
e+(d+(c+(a+b))):33
d+(c+(a+b)):18
f+(e+(d+(c+(a+b)))):62
e+(d+(c+(a+b))):33
d+(c+(a+b)):18
267
f+(e+(d+(c+(a+b)))):62
e+(d+(c+(a+b))):33
d+(c+(a+b)):18
268
1
f+(e+(d+(c+(a+b)))):62
1
e+(d+(c+(a+b))):33
1
d+(c+(a+b)):18
1
c+(a+b):11
0 0
r:60 f:29 e:15 d:7 c:5 a:2
1
a+b:6
b:4
269
0
r:60
1
f+(e+(d+(c+(a+b)))):62
0
f:29
1
e+(d+(c+(a+b))):33
e:15 d+(c+(a+b)):18
0
d:7
1
c+(a+b):11
0
c:5
a+b:6
0
a:2
1
b:4
270
Matrix multiplication
j
n
Cij =
k=1
Aik Bkj
replacements
i
m
p A
n C m
The ij th element of the n m matrix C, the product of the n p matrix A and the p m matrix B is: Cij = A1k B1j + A2k B2j + . . . + Apk Bpj
p
=
k=1
Aik Bkj
C is calculated by running through all the possible combinations of i and j using two nested for loops.
271
Matrix multiplication
j
n
Cij =
k=1
Aik Bkj
replacements
i
m
p A
n C m
void matMultiply (double [][] A, double [][] B, double [][] C, int m, p, n) { int i, k, j; double Cij; for (i=1; i<=m; i++) for (j=1; j<=n; j++){ Cij = A[i][1]*B[1][j]; for (k=2; k<=p; k++) Cij += A[i][k]*B[k][j]; C[i][j] = Cij; } }
272
Matrix multiplication
j
n
Cij =
k=1
Aik Bkj
replacements
i
m
p A
n C m
Suppose {Ai}n i=1 is a chain of compatible matrices with dimensions: di di+1. We want to compute: C = A 1 A2 . . . A n This is known as a chain product. The dimension of the nal product C is d1 dn+1.
273
Pij =
k=1
Aik A+1kj
The nal dimensions of P are d d+2. The problem is: Given the dimensions {d}, how should the corresponding matrices be paired? B is 3 100 C is 100 5 D is 5 5 (B C) D takes 1500 + 75 = 1575ops B (C D) takes 1500 + 2500 = 4000ops
274
A1 A 2 . . . A n
Calculate number of operations for each one Pick the one that is best Running time: The number of parenthesizations is equal to the number of binary trees with n nodes This is exponential. This is the Catalan number, which is almost 4n. This algorithm takes too long.
275
A Greedy Approach
Idea : repeatedly select the product that uses (up) the most operations. Counter example:
A is 10 5 B is 5 10 C is 10 5 D is 5 10 Greedy idea gives (A B ) (C D ), which takes 500 + 1000 + 500 = 2000 operations. But the alternative bracketing yields: A ((B C) D) takes 500 + 250 + 250 = 1000 operations.
276
277
A Recursive Approach
Dene subproblems: Find the best parenthesization of Ai Ai+1 . . . Aj Let Ni,j denote the number of operations done by this subproblem. The optimal solution for the whole problem is N1,n. Subproblem optimality: The optimal solution can be dened in terms of optimal subproblems.
There has to be a nal multiplication, i.e. the root of the expression tree, for the optimal solution. Suppose the nal multiplication is at index i: (A1 . . . Ai) (Ai+1 . . . An) Then the optimal solution N1,n is the sum of two optimal subproblems, N1,i and Ni+1,n plus the time for the nal multiplication. If the global optimum did not have these optimal subproblems, we could dene an even better optimal solution.
278
A Characterizing Equation
Then the optimal solution N1,n is the sum of two optimal subproblems, N1,i and Ni+1,n plus the time for the nal multiplication. The global optimal has to be dened in terms of optimal subproblems, depending on where the nal multiply appears. Let us consider all possible places for the nal multiplication: Recall that Ai is a di di+1 dimensional matrix. So, a characterizing equation for Nij is the following: Ni,j = min Ni,k +Nk+1,j +didk+1dj+1 ikj
279
Answer
...
frag replacements
280
282
Substrings
A substring of a character string x0x1x2 . . . xin1 is a string of the form xi0 xi1 xi2 . . . xik , where ij+1 = ij + 1 and i0 0 and ik in1. Example String: "ABCDEFGHIJK" Substring: "ABCDEFGHIJK" Substring: "DEFGH" Substring: "" Not substring: "DCEFGHIJ"
283
SubsequencesSection 11.5.1
A subsequence of a character string x0x1x2 . . . xin1 is a string of the form xi0 xi1 xi2 . . . xik , where ij < ij + 1. Example String: "ABCDEFGHIJIK" Subsequence: "ACEGJIK" Subsequence: "DFGHK" Not subsequence: "DAGH" Subsequence, but also a substring: "DEFGHI" A subsequence is not the same as substring.
284
285
Analysis:
If X is of length n, then it has 2n subsequences This is an exponential-time algorithm
286
A == A
Y = CGATAATTGA G X = GTTCCTAATA
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9 10
A != G
L[9,9]=6 L[8,10]=5
287
...
PSfrag replacements
int [] void LCS(String X, String Y){ \\pre: Strings X of length n, and Y of length m. \\post: For i in [0..n-1], j in [0..m-1]: \\ L[i,j] = length of LCS of X[0..i] and Y[0..j] for (i=1; i < n; i++) L[i,-1] = 0; for (j=0; j < m; j++) L[-1,j] = 0; for (i=0; i < n; i++) for (j=0; j < m; j++) if (x[i] == y[j]) L[i,j] = L[i-1,j-1] + 1; else L[i,j] = max(L[i-1,j], L[i,j-1]); return L; }
n1
...
n1
Answer
288
289
290
Traversal of trees
23 17 8 4 7 10 19 21 27 25 27 29 37 35 41 47
Searching for a node, given its key, in a balanced binary search tree (BST), tends to be logarithmicO(log n). L indicates a move to the Left subtree. R moves to the Right subtree and N denotes a visit to the node itself. The 6 standard orders of traversal when searching for a node in such trees are: NLRpre-order, LNRinorder, LRNend-order or post-order, NRL, RNL and RLN. The LNRin-ordertraversal of the above tree is: 4, 7, 8, 10, 17, 21, 19, 27, 23, 25, 27, 29, 35, 37, 41, 47.
291
Traversal of treesLRN
The following tree is a BST.
23 17 8 4 7 10 19 21 27 25 27 29 37 35 41 47
The LNRin-order traversal of the above tree is: 4, 7, 8, 10, 17, 21, 19, 27, 23, 25, 27, 29, 35, 37, 41, 47. Which ordering traverses the keys from the largest to the smallest? What order does LRN yield? What order does NLR yield?
292
The LNRin-order traversal of the above tree is: 3, 2, 4, 1, 6, 7, 5. The RNLRL-in-order traversal of the above tree: 7, 5, 6, 1, 4, 2, 3. The NLRpre-order traversal of the above tree: 1, 2, 3, 4, 5, 6, 7. The RLNRL-end-order or post-order traversal of the above tree is: 7, 6, 5, 1, 4, 3, 2.
293
The LNRin-order traversal of the above tree is: 1, 2, 3, 4, 5, 6, 7. The RNLRL-in-order traversal of the above tree: 7, 6, 5, 4, 3, 2, 1. The NLRpre-order traversal of the above tree is: 4, 2, 1, 3, 6, 5, 7. The LRN-end-order or post-order traversal of the above tree is: 1, 3, 2, 5, 7, 6, 4.
294
insert the root node into list. do visit and remove the head of list: visit by inserting heads children in the list. repeat until the list is depleted.
The list uses a
295
Tree traversaldepth-rst
4 2 1 3 5 6 7
push the root node onto stack. do visit and remove the top of the stack: visit by pushing tops children onto the stack. repeat until the stack is empty. The depth-rst traversal of the above tree: 4, 2, 1, 3, 6, 5, 7.
296
Tree traversalbreadth-rst
4 2 1 3 5 6 7
Using a queue instead of a stack turns the algorithm into breadth-rst search. add the root node into the queue. do visit and remove the front of the queue: visit by adding fronts children into the queue. repeat until the queue is empty. The breadth-rst traversal of the above tree: 4, 2, 6, 1, 3, 5, 7.
297
void DFS (tree T , node v ) { mark the node v . call prelude(v); for all edges {(v, z)|z childOf(v)} if (z is unmarked) DFS (T, z); call postlude(v,z); } The depth-rst traversal of the above tree: 4, 2, 1, 3, 6, 5, 7.
298
The edges represent the bridges and the nodes represent land in Eulers Knigsberg bridges problem: Is there a o path where each bridge is traversed exaclty once? Edges may be directed or undirected. Each vertex has a degree, d(v) is number of edges incident to v . d(A) = 5 and d(B) = 3. In a directed graph the outdegree is the number of edges that leave the node, and indegree is the number of vertices that enter the node.
299
#$
#$
#$
#$
#$
In a simple path each node appears only once and u is reachable from v if there is a path from u to v . A circuit or cycle is a path whose rst and last vertices are the same. The undirected form of a directed graph G = (V, E) is the same graph without directions on the edges. A graph in its undirected form is called connected if there is a path from any node to any other node. A forest is graph that does not contain a cycle. A tree is a connected forest and a rooted tree is a directed tree with one distinguished node called the root such that all its edges point away from the root.
300
GraphsEulerian graphs
Eulers Knigsberg bridges problem: o Which graphs can be traversed with a circuit that visits each edge exaclty once?
A
0 )0 ) ) )
C
125634 1234 34 34 34 34 56 56 125634 1234 34 34 34 34
In an Eulerian graph the degree of all the nodes must be even. Each vertex has a degree, d(v) is number of edges incident to v . d(A) = 5, d(B) = 3, d(C) = 3, d(D) = 3. So the the Eulers Knigsberg bridges problem is not o Eulerian.
34
34
D
34
56 56 56 56 56 56 56
301
56 56 56 56 56 5634 34 34 34 34
56 56 56 56 56 5634 34 34 34 34
56
B
56 56 56 56 56 5634 34 34 34 34
56
56
56
56
56
A spanning tree of an undirected graph G is a subgraph of G that is a tree that contains all the vertices of G.
A weighted graph is a graph whose edges are associated with costs or distances or weights.
i h8 i8 h8 8hihi s888888 8s88888 8 8 rrs888888 8rrs88888 8 888888 8 8 888888 88888 888888 888888 g f8 888888 888888 g8 f8 8 888888 888888 888888 fgfg
9
X XY8X T Y8X TU8T U8T P 8P F QPQ8P FG8F G8F R RS8R @ S8R 88888b @@ c b8c c c c 8 c8cb8cb8cb8cb8 @A b8b8b8b8b8bc 8 c8c8c8c8c8bc A b8b8b8b8b88b c8c8c8c8c8bc b8b8b8b8b88b c8c8c8c8c8bc b8b8b8b8b88 E888888bbb a a 8B B c8c8c8c8c8 b8b8b8b8b8 e8e8e8e8e8bc d8d8d8d8d88 BC8 a a a a a c8c8c8c8c8 a `8`8`8`8`8 88888 eb8eb8eb8eb8eb8bc `8 d8d8d8d8d8de8 DE8d8d8d8d8de8bd 888` B e88888bc 8`8`8`a C D8e8e8e8e8de8 8DDd8d8d8d88dd ed8 d8e e e e e8e8e8e8e8de 7 d8d8d8d8d88d e8e8e8e8e8de 798 d8d8d8d8d88d VWV8V d8d8d8d8d8de8d 7 e8e8e8e8e8de 98 W8V 888888d 7
8 7 5 9 14 7 7 7 9 10 6 11 3 4 3 8
wvw
12
yxy
302
8 8 8 8
u t8 u8 t8 8tutu
n n m lm l
l l k jk j
j j g fg f
f f i hi h
h h
Graph traversaldepth-rst
The idea is to start by visiting some start or root node, marking it and then calling DFS using each neighour one-by-one as the root. Depth-rst search (DFS) in an undirected simple graph starts a node we call the root.
t t t t t rsrsrsrsrs ststststs srsrsrsrs rtststststsrt srsrsrsrsrt trtststststs rsrsrsrsrs sssssrt srsrsrsrs w w rtsssssrtrt ssw w tv w tv w tv w tv w sssss susususus vuvsvsvsvsvs ususususus svsvsvsvsuv susususus uvsvsvsvsvsuv sususususuv vuvsvsvsvsvs usususususuv vsvsvsvsvs ususususus sssssuvuv
11
d
i j
1 a
2 b
10
5 k 3 e 4
h
Depthfirst order
void DFS (graph G, node v ) { mark the node v . call prelude(v); for all edges {(v, z)|z neighbourOf(v)} if (z is unmarked) DFS (G, z); call postlude(v,z); }
304
g k h
push the root node onto stack. do visit, mark and remove the top of the stack: visit the node by working with it and by pushing tops unmarked neighbours onto the stack. repeat until the stack is empty.
305
Graph traversaldepth-rst
Depth-rst search (DFS) in an undirected simple graph starts a node we call the root. push the root node onto stack. do visit, mark and remove the top of the stack: visit the node by working with it and by pushing tops unmarked neighbours onto the stack. repeat until the stack is empty.
}~}~ }~}~ }~}~ }~}~ }~}~ }~}~ }~}~ }~}~ }~}~ }~}~ }~}~ }} }~}~ }~}~ }} }~}~ }~}~ }}
d e c i f j
g k h
Some depth-rst traversals of this graph are: a, b, e, h, k, j, i, f, c, d, g and a, d, f, i, j, k, g, e, h, c, b. Are the paths correct?
306
Digraphdepth-rst traversal
Depth-rst search (DFS) in an directed simple graph starts a node we call the root. push the root node onto stack. do visit, mark and remove the top of the stack: visit the node by working with it and by pushing tops unmarked reachable neighbours onto the stack. repeat until the stack is empty.
8 3 e
2 b
Depthfirst order
7
d
11
10 9
i j
5 k 4
h
307
Let G be a digraph with n nodes. The edges above represent precedence, , such that a b,b e, a d, d e, d f , a f , a c, c f, . . .. A topological ordering of G is an ordering v1, . . . , vn of the vertices of G such that for every edge (vi, vj ) of G , i < j , i.e. any directed path in G traverses nodes in precedence order. A directed graph G has a topological ordering if and only if it is acyclic.
a 1
s s ss s ss s s s ss ss ss ss ss s s ss ss ss s s s ss ss s ss ss s s ss s s s s ss ss ss ss ss s s ss s s s s ss s s ss s s ss s s s s s s s s s
5 f
a
2 b
ss ss ss ss ss ss ss ss ss ss ss ss ss ss ss ss ss ss ss ss
11 k
6 f
10
b 4
10
d 3
11
308
2
309
Shortest paths
Let G be a weighted graph. Each edge may be labelled with a weight or length, The length beteen two nodes vi and vi+1 is indicated by length(vi, vi+1). The length of a path
P = (v0, v1), (v1, v2), . . . , (vk1, vk ) in the graph is dened as
k1
length(P ) =
i=0
length(vi, vi+1)
The distance, d(v, u) from a vertex v and u in a graph G is the shortest path length from v to u if it exists. When no path exists from v to u we write
d(v, u) = .
Negative weights are possible but can lead to unforseen problems. We will regard the distance as a positive metric.
310
311
Updating the length of a path z.SP takes O(log m) comparisons, where m is the size of the heap.
312
313
Computation
What is computation? What can be computed? David Hilbert (1862 1943) said:
This satisfaction of the solution of every mathematical problem is a powerful motivation during our labours; we hear the constant call: Here is the problem, nd the solution. One can nd it by clear thinking; since in mathematics nothing cannot be known.
Hilbert was reecting what was widely believed by logicians and mathematicians at the beginning of the 20th Century. Was he right?
314
Decision problemsEntscheidungsproblem
Determining the truth of a statement in a formal system is known as the
ComputationKurt Gdel o
In 1930 it rocked the foundations of Hilberts view of formal logic at a time when mathematics appeared to be the unalterable bastion of science while quantum physics was altering our understanding of the Wirklichkeit. Hilberts dream of mechanizing maths the way Hilbert wanted to do it was smashed by Gdel o How was this idea received? John Dawson states, One of the most profound discoveries in the history of mathematics was assimilated promptly and without objection by Gdels o contemporaries. The prevailing Hilbertian view was subsumed by a new view of science by Gdel where statements o may be valid or invalid or might not be provable at all.
316
ComputationKurt Gdel o
Hilbert never uttered a word of criticism. How could he argue against Gdel proof? o Only one Gdels peers of made a severe criticism o which he withdrew in writing a week later. Knowing what is impossible tells us a lot about what computers can and cannot do Gdels showed that no algorithm can be written o to read a statement about integer arithmetic and tell us whether that statement is true or false. Alonso Church (19031995) proved that in -calculus problems which had no algorithmic solution. Steven Kleene (19091994) used recursion theory. Emile Post (18971954) did it using productions. Alan Turing (19121954) designed a machine and proved that this machine could compute everything that was eectively computable.
317
ComputationChurch-Turing thesis
All these discoveries lead to the idea that these methods are equivalent in some sense. We can take a Turing machine and emulate Posts productions or program a Turing machine to emulate itself or Kleenes recursive system. We can emulate a Turing machine by using a C or Java program. These two statements known as the Church-Turing thesis are commonly believed
1. All reasonable denitions of algorithm which are so far known are equivalent. 2. Any reasonable denition of algorithm which anyone will ever make will turn out to be equivalent to the denitions that we know.
The Church-Turing merely states that we think we have a good intuitive grasp of the concept of devising instructions to perform some task.
318
ComputationAlan Turing
Computers that interact with the world can already do in varying degrees many tasks that humans once thought made them unique. Alan Turing showed in 1936 that programmable machines could do anything that is eectively computable He also showed that no machine can be built do decide whether every given program would give a result and haltwe will sketch the proof of this later. This proves that there are incomputable things This is disappointing, but it is a relief that this can at least be proved. There are still very many computable algorithms John von Neumann (19031957) designed a practical machine not unlike todays modern computers that could be built and used to execute algorithms.
319
Limitations of Computation
Computers took us to the moon Science is virtually impossible without computers Will computers be able to think? Marvin Minsky: There is no reason to suppose machines have any limitations not shared by man. Computers are better at very many intellectual activities That computers are not so hot on philosophy perhaps says more about philosophy than about the potential of computers. Computers can still anything that a mathematician can do. We have not yet implemented a mathematician. There is nothing inherently impossible about this. Computers have the advantage of immortality. Programs written for computers are persistent.
320
Limitations of Computation
Douglas Lenats EURISKO and CYC solve problems too complex for humans. Knowing what is impossible tells us a lot about what computers can and cannot do Alan Turing showed that no machine can be built do decide whether every given program would give a result and halt. This proves that there are uncomputable things James Watson about Craig Venters genome sequencing machines: It isnt science the machines could be run by monkeys. We will now discuss some uncomputable algorithms.
321
322
323
324
325
326
Assume that the totality problem is computable. Asking if funnypd(I) is total is the same as asking whether P(D) halts because we may code halttester using it. Since this is impossible our assumption is false.
value halttester(P) { // Assume isTotal can be programmed if (isTotal(funnypd(I))) return "OK"; else return "BAD"; }
327
Asking if funnyp(D) is equivalent to simple(D) amounts to asking if P is total. If P halts on every input, then funnyp(D) will always output 13, whereas if there is some input for which P(D) does not halt, then funnyp(D) will not halt and neither will it print out 13.
328
But since it is impossible to program total(P) the assumption that we can program isEquivalent is false.
329
Partial computability
The halting problem is called partially computable because when it halts we can tell that it has halted, i.e. there is an algorithm that can say that it has halted when it halts but cannot say when it is in a loop. The totality and equivalence problems are not partially computable. This distinction relates to proof systems. It is possible to show that problems are partially computable if and only if they have a proof system. Gdel showed that arithmetic is not even partially como putable. Thus in any arithmetic proof system there are statements that are true even though they cannot be proved.
330
NP completeness
There are infeasible programs There are polynomially computable problems There are classes of problems whose solutions can be tested in polynomial time, but they appear to posses only exponential exact solutions: travelling salesman, bin packing, timetabling, Hamilton cycle. There is a book by Johnson listing these problems.
331
Conclusion
Will computers ever be able to think? Explaining how science discovers and accumulates new ideas and adds these truths and untruths falteringly to our increasing and persistent knowledge of the world is one road of many possible means to its automation.
332
Linear searchcan give algorithm and timing. Binary searchcan give algorithm and timing: e.g. How long does it take to nd an element not in the list? Heaps and priority queues. How to sort with heap. What algorithms do you know that use heaps to speed them up? Quicksort: partition, timing was dealt with in detail in class notes. Merge sort. Timing. Boyer-Moore, Knuth-Morris-Pratt, Rabin-Karp versus brute force. Know all their timings.
333
334
335