You are on page 1of 182

QUANTUM COMPUTING AND QUANTUM INFORMATION

Pavithran S Iyer, 3rd yr BSc Physics, Chennai Mathematical Institute


H-1 SIPCOT IT-Park, Siruseri, Padur Post, Chennai - 603103
Email: pavithra@cmi.ac.in & pavithran.sridhar@gmail.com

DR

Y
P
O
C
T
F
A
Typeset Using LATEX
LAST UPDATED: December 26, 2010

DR

Y
P
O
C
T
F
A

GO TO FIRST PAGE

Contents
I

INTRODUCTION

1 Brief overview
1.1 Linear Alzebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Basic Quantum mechanics . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Qubit - basic unit of quantum information . . . . . . . . . . . . . . .
1.4 Multiple Qubits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5 Quantum Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5.1 Other single qubit gates: . . . . . . . . . . . . . . . . . . . . .
1.6 Bloch Sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6.1 Generalizing Quantum Gates - Universal Gates . . . . . . . .
1.7 Important conventions . . . . . . . . . . . . . . . . . . . . . . . . . .
1.8 Measurement Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.8.1 Quantum Circuits . . . . . . . . . . . . . . . . . . . . . . . .
1.8.2 Quantum Copying or Cloning circuits . . . . . . . . . . . . .
1.9 Quantum Teleportaion . . . . . . . . . . . . . . . . . . . . . . . . . .
1.9.1 Bell States . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.9.2 EPR paradox and Bells inequality . . . . . . . . . . . . . . .
1.9.3 Application of Bell States: Quantum Teleportation . . . . . .
1.9.4 Resolving some ambiguities . . . . . . . . . . . . . . . . . . .
1.10 Quantum Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.10.1 Simulating classical circuits using Quantum circuits . . . . .
1.11 Quantum Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.11.1 Example of Quantum Parallelism - Deutsch Jozsa Algorithm

DR
II

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

9
9
9
10
11
12
13
13
15
17
17
18
19
21
21
22
22
25
25
25
26
28

Y
P
O
C
T
F
A

PREREQUISITES - MATHEMATICS

2 Linear Algebra
2.1 Vector Spaces . . . . . . . . . . . . . . . . . . .
2.2 Linear dependence and independence . . . . . .
2.3 Dual Spaces . . . . . . . . . . . . . . . . . . . .
2.4 Diracs Bra Ket notation . . . . . . . . . . . . .
2.5 Inner and outer products . . . . . . . . . . . .
2.6 Orthonormal Basis and Completeness Relations
2.7 Projection operator . . . . . . . . . . . . . . . .
2.8 Gram-Schmidt Orthonormalization . . . . . . .
2.9 Linear Operators . . . . . . . . . . . . . . . . .
2.10 Hermitian Matrices . . . . . . . . . . . . . . . .
2.11 Spectral Theorem . . . . . . . . . . . . . . . . .
2.12 Operator functions . . . . . . . . . . . . . . . .
2.12.1 Trace . . . . . . . . . . . . . . . . . . .
2.13 Simultaneous Diagonalizable Theorem . . . . .
3

.
.
.
.
.
.
.
.
.
.
.
.
.
.

33
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

35
35
36
36
37
37
38
39
39
41
42
43
44
45
45

CONTENTS

CONTENTS

2.14 Polar Decomposition . . . . . . . .


2.15 Singular value decomposition . . .
2.15.1 Proving the theorem for the
2.15.2 Proving the theorem for the

. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
special case of square matrices:
general case: . . . . . . . . . .

3 Elementary Group Theory


3.1 Structure of a Group . . . . . . . . .
3.1.1 Cayley Table . . . . . . . . .
3.1.2 Subgroups . . . . . . . . . . .
3.1.3 Quotient Groups . . . . . . .
3.1.4 Normalizers and centralizers .
3.2 Group Operations . . . . . . . . . .
3.2.1 Direct product of groups . . .
3.2.2 Homomorphism . . . . . . . .
3.2.3 Conjugation . . . . . . . . . .
3.3 Group Actions . . . . . . . . . . . .
3.3.1 Generating set of a group . .
3.3.2 Symmetric group . . . . . . .
3.3.3 Action of Group on a set . .
3.3.4 Orbits and Stabilizers . . . .
3.3.5 Orbit Stabilizer theorem . . .

III

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

46
47
48
48

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

51
51
51
56
57
58
59
59
59
60
61
61
61
62
62
63

Y
P
O
C
T
F
A

PREREQUISITES - QUANTUM MECHANICS

4 Identical Particles
4.0.6 Describing a two state system . . . . . . . . . . .
4.0.7 Permutation operator . . . . . . . . . . . . . . .
4.0.8 Symmetry and Asymmetry in the wave functions
4.0.9 Extending to many state systems . . . . . . . . .
4.0.10 Bosons and Fermions . . . . . . . . . . . . . . . .

DR

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

65

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

67
67
67
68
69
69

5 Angular Momentum

71

IV

77

PREREQUISITES - COMPUTATION

6 Introduction to Turing Machines


6.0.11 Informal description . . . . . . . . . . . . . . . . . . . . . . . .
6.0.12 Elements of a turing machine . . . . . . . . . . . . . . . . . . .
6.0.13 Configurations and Acceptance . . . . . . . . . . . . . . . . . .
6.0.14 Classes of languages . . . . . . . . . . . . . . . . . . . . . . . .
6.1 Examples: Turing machines for some languages . . . . . . . . . . . . .
6.1.1 L(M ) = {| {a, b} } . . . . . . . . . . . . . . . . . . . . .
6.1.2 {ap | p is a prime} . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Variations of the Turing Machine . . . . . . . . . . . . . . . . . . . . .
6.2.1 Multi-Track Turing Machines . . . . . . . . . . . . . . . . . . .
6.2.2 Multi-Tape Turing Machines . . . . . . . . . . . . . . . . . . .
6.2.3 Multi-Dimensional Turing Machines . . . . . . . . . . . . . . .
6.2.4 Non-Deterministic Turing Machines . . . . . . . . . . . . . . .
6.2.5 Enumeration Machines . . . . . . . . . . . . . . . . . . . . . . .
6.2.6 Equivalence of the Turing machines and Enumeration machines
6.3 Universal Turing machines . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.1 Encoding Turing machines over {0, 1} . . . . . . . . . . . . . .
6.3.2 Working of a Universal Turing machines . . . . . . . . . . . . .
GO TO FIRST PAGE

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

79
79
79
81
82
83
83
85
86
86
86
88
89
89
89
90
91
91

CONTENTS
6.4

6.5
6.6
6.7

CONTENTS

Set operations on Turing machines . . . . . .


6.4.1 Union of two turing machines . . . . .
6.4.2 Intersection of two turing machines . .
6.4.3 Complement of a Turing machine . . .
6.4.4 Concatenation of two Turing machine
Halting Problem . . . . . . . . . . . . . . . .
6.5.1 Membership Problem . . . . . . . . .
Decidability and Undecidability . . . . . . . .
Quantum Turing Machines . . . . . . . . . .

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

91
92
92
92
93
93
94
94
94

7 Computational Complexity

95

97

INFORMATION THEORY

8 Fundamentals of Information Theory


99
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.2 Axiomatic Definition of the Shannon Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.3 Interpretations of the Uncertainty Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

VI

Y
P
O
C
T
F
A

CODING

109

9 Classical Coding Theory


9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.1.2 Notations from graphs . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.1.3 Unique Decipherability . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2 Classifying instantaneous and uniquely decipherable codes . . . . . . . . . . .
9.2.1 Part 1: Krafts Inequality . . . . . . . . . . . . . . . . . . . . . . . . .
9.2.2 Part 2: Macmillans Inequality . . . . . . . . . . . . . . . . . . . . . .
9.2.3 Part 3: Converse of Krafts inequality . . . . . . . . . . . . . . . . . .
9.2.4 Bound on codeword length - Shannons Noiseless Coding Theorem[26]
9.3 Error Correcting Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.3.2 Code parameters for a good error correcting code . . . . . . . . . . . .
9.3.3 Bound on the code distance - The Distance Bound . . . . . . . . . . .
9.3.4 Bound on the number of codewords . . . . . . . . . . . . . . . . . . .
9.3.5 Parity Check codes [25] . . . . . . . . . . . . . . . . . . . . . . . . . .
9.3.6 Linear Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.4.1 Repetition Code[24] . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

111
111
111
112
112
114
114
115
117
118
122
122
123
123
125
132
132
133
133

10 Quantum Codes
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.2 Errors in Quantum codes . . . . . . . . . . . . . . . . . . .
10.2.1 Operator sum representation [3] [22] . . . . . . . . .
10.2.2 Lindblad form using the Master Equation [24] . . . .
10.2.3 Error Correction Condition for Quantum Codes [24]
10.3 Distance Bound [1] . . . . . . . . . . . . . . . . . . . . . . .
10.4 The Quantum Singleton Bound [1] . . . . . . . . . . . . . .
10.5 Quantum Hamming Bound . . . . . . . . . . . . . . . . . .
10.6 The Quantum Gilbert Varshamov Bound [2] . . . . . . . . .
10.7 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.7.1 Bit Flip code [24] . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

135
135
135
136
137
139
142
142
143
144
145
145

DR

GO TO FIRST PAGE

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

CONTENTS

CONTENTS

10.7.2 Phase flip Errors [24] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147


10.7.3 Bit-Phase flip errors - Shor Code [24][23] . . . . . . . . . . . . . . . . . . . . . . . . . . 148
11 Stabilizer Codes
11.1 Pauli Group . . . . . . . . . . . . . . . . . . . . . . . .
11.2 Motivation for Stabilizer codes . . . . . . . . . . . . .
11.3 Conditions on stabilizer subgroups . . . . . . . . . . .
11.3.1 Generating set of the stabilizer subgroup . . .
11.3.2 Structure of the stabilizer subgroup . . . . . .
11.4 Error Correction for Stabilizer codes . . . . . . . . . .
11.4.1 Notion of an Error in a Stabilizer code . . . . .
11.4.2 Measurement on the stabilizer code . . . . . . .
11.4.3 Error Correction condition for Stabilizer codes
11.5 Fault tolerance . . . . . . . . . . . . . . . . . . . . . .
11.5.1 Unitary gates in the stabilizer formalism . . . .

VII

References

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

151
151
152
153
153
154
154
154
154
156
157
158

165

A Solutions to Exercises

DR

.
.
.
.
.
.
.
.
.
.
.

171

Y
P
O
C
T
F
A

GO TO FIRST PAGE

Part I

Y
P
O
C
T
F
A
INTRODUCTION

DR

DR

Y
P
O
C
T
F
A

Chapter 1

Brief overview
1.1

Linear Alzebra

1. Tensor Product: A Tensor product is a method of multiplying two tensors (matrices), given by the
general (pneumonic) form: (It is represented as )

a11 a12 a13 a14 . . .


a11 M a12 M a13 M a14 M . . .
a21 a22 a23 a24 . . .
a21 M a22 M a23 M a24 M . . .

a31 a32 a33 a34 . . .


a31 M a32 M a33 M a34 M . . .

a41 a42 a43 a44 . . . M = a41 M a42 M a43 M a44 M . . .

.
.
.
.
.
. . .
.
.
.
. . .

.
.
.
.
.
. . .
.
.
.
. . .
.
.
.
.
. . .
.
.
.
.
. . .

DR

Y
P
O
C
T
F
A
where, M is any matrix. If the matrices on the LHS have the
dimensions (D1r D1c ) and (D2r D2c )respectively,

then the result matrix, on the RHS will have the dimensions: [(D1r + D2r ) (D1c + D2c )]

It is really important to note that the above is NOT a formal definition for a tensor product, but it
is just a pneumonic form. The result in this case seems to have the same dimensions as the original
matrix in the LHS (first matrix). Only when we expand the RHS by putting various values of M, we
will get the matrix of the dimensions (as given above).

1.2

Basic Quantum mechanics

We shall look at fundamentals of quantum mechanics in greater detail in the next chapter.
Some important definitions are:
1. State of a system:1 The state of a quantum system is a vector in the infinite dimensional complex
vector space known as the Hilbert Space.
The state of a classical system is represented by its position and momentum in a phase space. But this
is not possible in Quantum Mechanics because of the built in concept of the Heisenbergs Uncertainty
principle. So, if we close in into one definite value of position for a position, the possible values
that its momentum can take is infinite. So, we can only close on a given area, and say , with some
probability that the particle lies within that area. So, the (classical) state of the particle lies anywhere
1

A deeper picture of this is given in the quantum mechanics section. For now, this description would do.

1.3. QUBIT - BASIC UNIT OF QUANTUM INFORMATION

CHAPTER 1. BRIEF OVERVIEW

in that continuous area that we define. So, the state of the particle has infinite position and momentum
coordinates. Hence, it is represented as a vector in an infinite dimensional complex vector space (Hilbert
Space). The state of a N-independent particle system where the individual particle wave functions are
|1 , |2 , |3 ... |N , the combined state of the N particle system is given by:
| = |1 |2 |3 ...... |N
In general, if |1 , |2 , |3 , ..... are vectors in N1 , N2 , N2 , ..... dimensional complex vector spaces
respectively, then | is a vector in a (N1 + N2 + N2 + .....) dimensional complex vector space.

2. Dirac Bra-c-Ket Notation: The dirac ket notation is a well known and frequently used here. In this

notation, every column vector is represented as: | which is also called a Ket vector. Similarly, a
row vector is represented as | which is called a Bra vector. Hence the name: BracKet notation.

3. Local and non Local processes: By local process between two particles , it is meant that influences between the particles must travel in such a way that they pass through space continuously; i.e.
the simultaneous disappearance of some quantity in one place cannot be balanced by its appearance
somewhere else if that quantity didnt travel, in some sense, across the space in between. In particular,
this influence cannot travel faster than light, in order to preserve relativity theory.
4. Canonical Commutation Relations: The some observables in Quantum mechanics do not commute.
That is, they have a non zero commutator. The commutator for a pair of operators is defined as:

Y
P
O
C
T
F
A
[A, B] = AB BA Commutator

The basic commutator relations in quantum mechanics are:


[xi , pj ] = i
[xi , xj ] = 0
[pi , pj ] = 0

DR
1.3

Qubit - basic unit of quantum information

A bit, a classical two state system, represents the smallest unit of information. A classical bit is represented
by 1 or 0. It can be thought of as true and false or any two complementary quantities whose union is the
universe, and intersection is the null set. There is a profound reason to why the smallest unit of information
is a 1 or 0, or true or false. This is because any logical querry can be split into a series of yes or no
questions. That is, with a series of yes or no answers to questions, we can perform any logical query. This
is why a bit (which can be thought of as a most general strcture of storing information) is a 1 or a 0.
A quantum bit, is just an example for a two state quantum system. A qubit can can also be an electron (with
spin up and down), an ammonia molecule, etc. In quantum mechanics, a two state quantum system does
not mean that it has only two states. This is what distinguishes a qubit from a classical bit. The dierence
comes due to a very important quantum mechanical phenomena known as interferrence. Just like how a
classical bit can take a value 0 or 1, a quantum bit can take values |0 or |1, and any value produced by the
interferrence between the states |0 and |1 (like |0 + |1). Since, there are infinite such superpositions
(where each of the states have 0 and 1 given by some probability amplitude and respectively), a qubit
can exist in infinite states. If each state can store a unit of information, then the qubit can hold infinite units
of information.
A Qubit, like any other two level quantum system, is (conventionally) represented by its state:
| = | + |
| = |+ + |
| = |0 + |1

where ||2 + ||2 = 1.


GO TO FIRST PAGE

10

CHAPTER 1. BRIEF OVERVIEW

1.4. MULTIPLE QUBITS

This can be misleading, since it gives us the feeling that a classical bit can at most carry 2 units of information
whereas a quantum bit can carry infinitely many units. But, if a measurement is done on the state of the
Qubit, it collapses into one of the eigen states of the measurement.
So, if measurement of | gives a, then after this measurement, the state of the Qubit will remain |a (the
eigen state of the measurement corresponding to the eigen value a). This new state |a now, will got give the
same measurement results as |a. In fact it will not respond to any other measurement.
Why this happens Postulates of Quantum Mechanics.
Hence, only a single unit of information can be retrived from a Qubit.

1.4

Multiple Qubits

Any two state system (like the electron which has a spin) can be represented by a Qubit. But what about
representing the state of two electrons (which are independent of each other), using a qubit? Such a representation, we saw in the beginning of the chapter, was possible. If the state of one electron is |1 and state
of the other electron is |2 , then the system of two independent electrons can be collectively represented by
the state |, where:
| = |1 |2
Since, we take a direct product of the two states, we may represent the now state, which is the two qubit
state, as: (using the convention: |i |j |ij)

Y
P
O
C
T
F
A
| = 00 |00 + 01 |01 + 10 |10 + 11 |11

where :

|00 |2 + |01 |2 + |10 |2 + |11 |2 = 1

and |ij |2 is the probability of the first qubit being in state |i, and the second Qubit being in state |j. If
we want only one of them, we must sum over the other. If we want the probability of system being in state
|i only, we must sum |ij |2 over all js.
So, the probability of measuring the first qubit to be 0 is = |00 |2 + |01 |2 , and obviously, the measurement
|00 |2 + |01 |2
will collapse the state of the qubit into | =
.
|00 |2 + |01 |2

DR

Therefore, we can say that in ths 2 qubit system, we can retrieve 2 units of information. It certainly
can deliver more information than a single qubit system, but there are some diculties too.
In general, we need to carry the measurement process twice to determine the information stored in both the
qubits. So, earlier we were carrying out the measurement only once, and now we need to do it twice. Can
it be better? Can we get away in one measurement itself? In other words, can we store some amount of
information about one qubit in another, such that we can guess both the qubits, by measuring only one of
them? The answer is yes. We can do such a trick that can, with certainty, retrieve the information stored in
one qubit, by measuring the other. Such a two qubit state is called a Bell State or the EPR pair.
The Bell State is given by : | =

|00 + |11

1
Here, the first Qubit being is measured to be 0 with the Probability (changing the state to : | = |00)
2
1

and 1 with Probability (Changing the state to : | = |11)


2
Hence, P(measuring the first qubit to be 0) = P(measuring the second qubit to be 0) , also state after
the measurement of the first qubit to be 0 = state after the measurement of the second qubit to be 0.
Similarly, P(measuring the first qubit to be 1) = P(measuring the second qubit to be 1) , also sate after the
measurement of the first qubit to be 1 = state after the measurement of the second qubit to be 1.
Therefore the first qubit always gives the same result as the measurement of the second qubit. Hence, we
can say that: By knowing the result of measurement to the first qubit, we can tell by certainity,
GO TO FIRST PAGE

11

1.5. QUANTUM GATES

CHAPTER 1. BRIEF OVERVIEW

the result of measurement of the second qubit.


Also we can say The two states are perfectly correlated. In the language of quantum mechanics, the two
states are Entangled.
It is also easy to note that this property cannot be satisfied by any arbitrary state. Hence, we need to
uniquely classify these states.2 Its allpications and significance become more prominent as we procees in the
later sections.

1.5

Quantum Gates

Just as we have Classical Gates that operate on classical bit(s), we also have their quantum analogues.
To Start with, consider a simple classical gate, the NOT gate:

Y
P
O
C
T
F
A

We can now think of its quantum mechanical analogue:


Consider the Gate G that is a quantum mechanical NOT gate. G flips the state of a qubit.
G : (|0 + |1) (|1 + |0)

DR

We can now try to see how this process of flipping the qubit is carrier out. We know that, in the case of
two level systems, (or Spin Half) the state |0 can be changed to |1 and vice-versa using the ladder operators
given by S and S+ . The action of these operators is given by:
S+ = Sx + iSy

S = Sx iSy

S+ |0 = |1 Raising operator
S+ |1 = 0 Raising operator
Simillarly:

S |1 = |0 Lowering operator
S |0 = 0 Lowering operator

Therefore we can say: textbf G (S+ + S )

(S+ + S )(|0 + |1) = (S |0 + S |0) + (S |1 + S |1)


(|1 + 0) + (|0 + 0)

Hence,

|1 + |0

G(|0 + |1) = |1 + |0

A Very important aspect to take note of is that: the state |0 is very dierent from 0. The latter means a
null vector. It represents void. While the former represents some state in which a particle is present in. The
state |0 does not mean void.
2
I still cannot get what is so special, it almost seems like they are two identical quantum states. It is like taking two classical
bits 0 and 0 and saying that the result to measurement of 0 is = to the result of measurement of the other 0 state. What
property of QM is being used?

GO TO FIRST PAGE

12

CHAPTER 1. BRIEF OVERVIEW

1.6. BLOCH SPHERE

Since the
state
of the qubit is a ket vector, we can also have a column vector representation for it:

| =

now the Gate G can be defined as:


G

By looking at this property, we can guess the matrix form of G, to be:

0 1
G=
. We can verify that this is the x + iy or Sx + iSy operator, as computed above.
1 0

1.5.1

Other single qubit gates:

There are many single qubit gates. A major requirement for a quantum gate operator is that it must be
Unitary. This has 2 consequences:
1. The conservation of probability.

Y
P
O
C
T
F
A

2. Since the inverse of an unitary matrix is also an unitary matrix, each single bit quantum gate can be
undone by some other single bit quantum gate. So, the input can be obtained by performing some
operation on the output. Therefore there is no loss of information, unlike the classical case where
the gates are not invertible.
Let us consider a Z Gate that leaves |0 unchanged, and flips the state of |1 to |1.

DR

Z (|0 + |1) = (|0 |1)

1
We can also guess that: Z =
0

0
. This is similar to the z or Sz operator (pauli spin matrix).
1

Let us now consider yet another important single qubit gate, the Hadamard Gate. This gate transforms the states |0 to a states between |0 and |1. This gate can be equated to an operation which is

|0 + |1
|0 |1

reflection of the qubit vector about the line = . It changes |0 to


and changes |1 to
.
8
2
2
H(|0 + |1) = (

|0 + |1
|0 |1

+
)
2
2

1 1
The Matrix form of H can be guessed as: H =
2 1

1.6

1
1

Bloch Sphere

The state of any two state system is represented by a point in a two dimensional complex vector space. Since,
this is a two dimensional complex vector space, each dimension (like we have x, y, and z dimensions in the
cartesian frame) is complex. Each complex number needs two real numbers to represent itself. So, the state
GO TO FIRST PAGE

13

1.6. BLOCH SPHERE

CHAPTER 1. BRIEF OVERVIEW

can now be described by four real quantities, rather that two complex quantities. Hence, we have:
The state in the two dimensional complex vector space is: | = |0 + |1
Since, and are complex, each of them can be described by two
real quantities: = (r , i ) and = (r , i )
The normalization condition: ||2 + ||2 = 1 , now translates to the condition on the
four real quantities as: |r |2 + |i |2 + |r |2 + |i |2 = 1.

(1.1)

Just like how the equation x + y + z = 1 represents the surface of a sphere, placed in a three dimensional
space, the above equation (1.1) represents the surface of the sphere kept in a four dimensional space. Now,
this is the motivation for us to try to describe the state of a two state system as a point on the sphere. A
four dimensional space is still a bizarre object for us to imagine. So, we need to try and remove one degree
of freedom here so that we get our usual two dimensional sphere in a three dimensional space. The two
dimensional sphere hence obtained is known as the Bloch Sphere. For this purpose, we need to work out
the above process in the polar form. So, we shall have:
2

The complex numbers: = r ei and = r ei


Therefore, the state of the system, | = r ei + r ei
So, till this point, we have been working with a sphere kept in a four dimensional space, as there are four
real quantities in the equation of the state. Now, we need to remove one degree of freedom, that is eliminate
one real quantity from the equation of the state.
For doing so, we need to recollect a very important feature of quantum mechanics that any quantum mechanical system is invariant under rotation by a overall phase. The following sentence can be realized if we go
back to our basic state of the quantum mechanical system. The state is represented by | which is actually
a probability amplitude. But what we can measure is the probability density, denoted by ||2 . So, even if
there is an overall phase factor, like , in the probability amplitude, it will not aect our measurement, and
for all values of , the system will be identical3 . So, let us multiply the system by any an overall phase angle.
The choice can be decided by us, to suit our requirements.

DR

Y
P
O
C
T
F
A

Since the choice of can be artitrary, let = ei , so that, we can eliminate one real variable.
Therefore, the state of the system, ei (|) = r ei (ei ) + r ei (ei )

Now, the LHS remains as | since the system is invariant under overall phase change.
Let: = . Then, on simplification we have: | = r |0 + r ei

Now, in the above expression, the second term of the RHS has a complex coecient.
We can write this in the complex form (x + iy):

| = r |0 + (x + iy)|1

where x = r cos and y = r sin

Now, let us apply the normalization condition: |r |2 + |x|2 + |y|2 = 1


Therefore, we now get the equation of the Bloch sphere.

Hence, we got the equation of the Bloch sphere. We now need to find how the state of a system can be
represented on the bloch sphere. For this, let us go to spherical coridinates. Let us map x, y and r on to a
sphere with unit radius.
Let us now make the transformations:
x = cos sin
y = sin sin
r = cos
Therefore, the state now becomes: | = cos|0 + sin(cos + sin)|1
We can write this as: | = cos|0 + (ei )sin|1

In fact, this probability amplitude is the reason for interference.

GO TO FIRST PAGE

14

CHAPTER 1. BRIEF OVERVIEW

1.6. BLOCH SPHERE

Now, by convention, in spherical polar coridinates, goes from 0 to 2. But here, if we put = 0, we get

| = |0 and on putting = , we get | = (ei )|1.


2

So, we see that = 0 to =


covers the entire sphere. So, we modify our by changing it to . Now
2
2

= 0 to =
covers the entire sphere. So, the final equation of the state of the system in a Bloch sphere
2
is:

i
| = cos
|0 + (e )sin
|1
(1.2)
2
2
So, the above equation represents the single qubit in a Bloch Sphere. Now, what about multiple two state
systems, or multiple qubits ? Let us take a two qubit system represented by its state | = |1 |2 . The
state space of | is now a vector in the four dimensional complex vector space because each of the states
are vectors in a two dimensional Hilbert space.
Following the exact similar argument as above, we can see that the state | can be represented as a vector on
a seven dimensional sphere, kept in a eight dimensional space. This seems confusing because in the above case,
we claimed that a two state system (represented as a vector in H2 ) can be represented on a two dimensional
surface. Extending the same argument, one should say that a composition of two state system (represented
as a vector in H2 H2 ) should be represented as a vector on a four dimensional surface (need not be a sphere,
but still it must be some four dimensional surface). Hence, we see that the surface used to represent |
has more dimensions than expected. We know that a seven dimensional surface has more points than a four
dimensional one. We can now conclude that the seven dimensional surface can represent more states than a
four dimensional one. Therefore, there is some missing information that cannot be represented on the four
dimensional surface. The seven dimensional surface carries more information than a four dimensional one.
This means that when we take a tensor product of two 2-state systems, the new system formed contains
information about each of the individual systems as well as some excess information that cannot be attributed
to any single one of them. We arrived at a four dimensional surface from the assumption that all compositions
of two 2-state systems can be represented as a tensor product of some two 2-state systems. From the fact
that the sphere which represents all the compositions has more states, we can say that our old assumption
that all compositions of two 2-state systems can be represented as a tensor product, fails.
So, there are some states that are composed of two 2-state systems, but cannot be expresses as a tensor
product. These states have information (properties) that are not related to any one of the 2-state systems.
This information is lost when the qubits are separated. Hence, this information is due to the tie, or (in
more sophisticated words) entanglement of the two 2-state systems. It is quite clear that all the states
do not have this property. States that have this property are called entangled states. The Bell State
|00 + |11

B00 =
.
2

DR
1.6.1

Y
P
O
C
T
F
A

Generalizing Quantum Gates - Universal Gates

Now, since gates are unitary operators, that act on the state of the qubit, we can think of them as rotation
operators on the Bloch Sphere. Now each gate performs some rotation. But any rotation, in general can be
broken into a sequence of standard rotations, about the x-y plane, y-z plane and the x-z plane. Since, any
rotation can be broken into a sequence of standard rotations, we can drawn the same analogy and say that
any Quantum gate can be represented by a sequence of standard gates that act on the state of the qubit |,
and produce the same answer as the original gate.
Consider a rotation
operator
U into
several basic rotation operators:

U. We can decompose

i
i
cos
sin
e 2
e 2
0
0

2
2

So, U = ei

sin

cos
i
i
2
2
0
e 2
0
e 2
In other words, these are universal operators. Similarly, we can consider universal quantum gates, that,
GO TO FIRST PAGE

15

1.6. BLOCH SPHERE

CHAPTER 1. BRIEF OVERVIEW

when manipulated appropriately, can mimic any other quantum gate.


Let us consider a simple classical universal gate, the NAND gate. The XOR Gate is not a universal gate
because it fails to change the parity of the bits.4 :

DR

Y
P
O
C
T
F
A
Figure 1.1: A classical univaersal gate. NAND gate.

Let us now consider a Universal Quantum Gate: A slight modification of the NOT gate, The controlled NOT
gate or the CNOT Gate.

Figure 1.2: A CNOT Gate


A CNOT Gate is a two input NOT gate. It takes two inputs: a DATA and CONTROL,
flips DATA if CONTROL = 1. Classically, this is analogous to the XOR Gate.
leaves DATA unchanged if CONTROL = 0.
The
The
|00
|01
|10
|11
4

Gate operation is represented ast: CNOT:(|A + |B) |A, B A


action of this Gate can be described explicitly:(|DAT A, CON T ROL |DAT A , CON T ROL)
|00
|11
|10
|01

This is not clear to me

GO TO FIRST PAGE

16

CHAPTER 1. BRIEF OVERVIEW

1.7. IMPORTANT CONVENTIONS

In a Quantum circuit, the CNOT gate is represented as:

Figure 1.3: Circuit representation of a CNOT gate. The top wire has the control bit and the bottom has the
Target or the Data bit.
We can now look at the matrix representation of this Gate.

1
0
CNOT =
0
0

1.7

0
1
0
0

0
0
0
1

0
0

1
0

Y
P
O
C
T
F
A

Important conventions

DR

|0 + |1 X

N OT gate

|1 + |0

|0 + |1 Z

Z gate

|0 + |1 H

Hadamard gate

|0 - |1

|0 + |1
|0 |1

+
2
2

A most remarkable or unique feature that we notice about a quantum gate, is that it operates on single bits,
which are like a superposition of 2 probabilistic classical bits. The unitary(existence of inverse) nature of
these operators enables the input Qubit to be easily retrieved.

1.8

Measurement Basis

Measurement is an operation that is performed on system, to determine with certainty, the state of the
system. Measurement also has a meaning in classical bits. Each classical bit had a even probability of it
being 0 or 1. When a measurement is done, we know by certainty what value that bit has. The classical bit
is 0 with probability one half or 1 with probability one half. Both these choices forever exclude each other.
Whereas in the quantum case, the dierence comes here. We have superposition states. For example, a qubit
can be |0 with probability one half and |1 with probability one half. Note the usage of and and or.
The meaning of measuring some property (operator) of a state | is nothing but finding the eigen values
of that operator (corresponding to the property). The allowed values of any property (or, the result of
measurement of any property) are limited to the eigen values of the operator representing this property. But
it is not so straight forward: what if we are trying to measure the property corresponding to the operator A,
and the given state | is not an eigen state of A?
GO TO FIRST PAGE

17

1.8. MEASUREMENT BASIS

CHAPTER 1. BRIEF OVERVIEW

For example, till now we took | = |0 + |1. This means we took the ket vectors |0 and |1 as our basis
states. These vectors are nothing but the eigen vectors of the z or the Sz operator. So, we were in the Sz
eigen basis. That is, the operator Sz is diagonal in this basis. Now, if we measure a property corresponding
to the Sz operator, then the measurements results can have only two allowed values: 1 and -1, because these
are the eigen values of Sz . But what if we want to measure a property corresponding to the Sx operator ?
Here, |0 and |1 are certainly not the eigen vectors of the Sx operator (they better not be because if they
are the eigen vectors of Sx also, that would mean Sx and Sz commute). Here, we make use of a key property
of the eigen basis. They form a complete basis. So, any state | can be expressed as a superposition of the
Sx eigen states, and the corresponding eigen values are the result of the measurement. So, to start with,
we have a state | in the Sz basis. Now when we want to measure a property corresponding to Sx , we
find the eigen states of Sx , and write | as a superposition of those eigen states. Now the coecients of
the eigen states are the measurement results, and the eigen states are those to which | shall collapse after
measurement.
So, whichever property (operator) we are measuring, we must expand | in the eigen basis of that operator
(correspond to the property), and that find the coecients of those states in the expansion of |.

1.8.1

Y
P
O
C
T
F
A

Quantum Circuits

A quantum circuit, like any other, is read from left to right. A line in the circuit represents a wire. A
wire here may just denote the path of the photon (or passage of time, etc. There is no physical path or
channel.) The input to a circuit is the qubit (represented by |, which by convention, is assumed to be in
the computational basis state).
An important feature not allowed in quantum circuits is loops. There is no feedback from one part of the
circuit to another. Thsee feedback forms are of 2 general types: several wires joined together FANIN , and a
copy of the qubit in one wire, going to multiple wires FANOUT. These are not allowed. Refer to figure. As
an interesting consequence of this, the copying of a qubit is an impossible task.

DR

Figure 1.4: FANIN and FANOUT characteristics:

Also, we can generalize the representation of a controlled not gate which takes N qubits. The generalized
representation is shown in the figure:
GO TO FIRST PAGE

18

CHAPTER 1. BRIEF OVERVIEW

1.8. MEASUREMENT BASIS

Figure 1.5: Generalized CNOT gate

Another important aspect of quantum circuits are meters used to measure the quantum bits.5 As we have
already discussed, the measurement will collapse the qubit into a probabilistic classical bit (distinguished by
drawing a double arrow line)6 . |(= |0 + |1), upon measurement, will change to a classical bit that
gives result 0 with probability ||2 and 1 with probability ||2 . A representation of the quantum meter is
shown below:

DR
1.8.2

Y
P
O
C
T
F
A
Figure 1.6: Representation of a Quantum Bit measuring device:

Quantum Copying or Cloning circuits

One of the key features of quantum computation is that it disallows the replication or copying of Qubits.
We can see why this happens. Let us take up the issue of replicating or the copying of a classical bit. To
accomplish this classically, one could do the following:

measurement here means to identify the qubit. measuring a state |0 + |1 means to extract out and .
The subtle dierence here is that a probabilistic classical bit is fundamentally dierent from a quantum bit. This is because
a classical has some finite probability of carrying only one piece of information, and it gives that upon measurement. Qubit
on the other hand carries infinite information, but when measured, it has some probability of collapsing on to one piece
6

GO TO FIRST PAGE

19

1.8. MEASUREMENT BASIS

CHAPTER 1. BRIEF OVERVIEW

Figure 1.7: Classical method of copying a bit. Here stands for the XOR operation. Here, we take an
arbitrary bit x and perform an XOR operation with that bit and y, where y is another bit that is constantly
= 0.
Let us now try the following with the qubit also: Here, we blindly replace the classical XOR with the
quantum XOR or . A valid question would be : have we copied all the information stored in |. The
answer obviously is NO. It is obvious because, we agreed that an infinite amount of information is stored in
|. So, all of it cannot possibly be copied. This theorem is called the no cloning theorem. There is a
short proof of this theorem.
Suppose we have two states |x and |y, such that there exists a machine U which copies the state |x into
|y (|y is called the target state), then the system of these two qubits is in the state: |x |y. So, we
have:

DR

Y
P
O
C
T
F
A
U(|x |y) = (|x |x)

(1.3)

Here U is copying the value of a state into another. We claim that U is some unitary, universal operator
that can clone any quantum state. Let us not take two arbitrary quantum states | and |. Let the target
state be represented as |s. According to the above claim, we have:
U(| |s) = (| |)

similarly, U(| |s) = (| |)

(1.4)

(1.5)

We can take an inner product of equations (1.4) and (1.5). To take an inner product of the equations means
to take the inner products on the two RHS to form a new RHS and inner product of two LHS to form a new
LHS. The inner product of any two matrices (applies to even column vectors - since they too are matrices)
is nothing but the product of the hermitian conjugate of the first with the second matrix.
The LHS of equation (1.4) is of the form A|x. This is nothing but a product AB. The hermitian conjugate
of (AB) is B A . The LHS will then be:
[U(| |s)] [U(| |s)]

(| |s) U [U(| |s)]


(| s|)U U(| |s)

In the last step, we see that (A B) = (A B ). Also since U is unitary, U U = I. Hence the LHS
becomes:
(| s|)(| |s) (|) (s|s)
In deriving the above step, we have used the formula: (A B)(C D) = (A C) (B D). Also, since
the target state |s is normalized, s|s =1.
LHS:
GO TO FIRST PAGE

(|)
20

CHAPTER 1. BRIEF OVERVIEW

1.9. QUANTUM TELEPORTAION

Coming to the RHS and applying similar simplifications, we have:


(| |) (| |)
(| |)(| |)
(|)2

Therefore, we can write the new equation formed as:


(|) = (|)2
This equation is of the form x = x2 . The solutions to this equation are x = 1 and x = 0. The first case
(|) = 1 means that both | and | are identical states. But then, this would mean that U only copies
states that are identical. That is, U can only copy some specific state. But this contradicts our assumption
that U is an universal operator. In the other case where (|) = 0, we see that | and | are both
orthogonal states. So, the cloning operator U will only clone orthogonal states.
Therefore, we can conclude that there is no universal operator that can copy any arbitrary state. If it can
copy a given quatum state, then the only other state that it can copy is one which is orthogonal to the given
state.
In the figure given below:

DR

Y
P
O
C
T
F
A
Figure 1.8: Repeating the same above process for a quantum case

The process can be concisely described as:

[|0 + |1]|0 CNOT |00 + |11


The output however, is not equal to || because when we multiply [|0 + |1] by [|0 + |1], we
get: 2 |00 + |01 + |10 + 2 |11. But this certainly isnt equal to our result obtained from the CNOT
gate since the cross terms |01 and |10 are not present. But one can now say that we can copy the quantum
bit if = 0. But this must mean that at least one of them must be 0. If this is so, then the bit will no
longer be a qubit.
Hence it is impossible to copy the state of a Qubit.

1.9
1.9.1

Quantum Teleportaion
Bell States

We have encountered these Bell States earlier. They are known to be the most correlated pair. The result of
the measurement on the first qubit is the same as the result of the same measurement on the second qubit.
In this case, on measuring the first qubit of the Bell State, one obtains 2 possible answers: 0 with probability
1
1
and 1 with probability . On measurement on the second qubit one obtains 2 possible answers: 0 with
2
2
GO TO FIRST PAGE

21

1.9. QUANTUM TELEPORTAION

CHAPTER 1. BRIEF OVERVIEW

1
1
and 1 with probability . The measurement on the always yields the same result. These Bell
2
2
states are formed by taking the two qubit system (in the computational basis state), and pass them into a
Hadamard gate followed by a CNOT gate. A simple table for this would be:
probability

Table 1.1: Table Showing the input and output states of a Hadamard Transform, used to produce Bell States:
In
Process and Out
|00

|00 + |10
|00 + |11

CNOT
|B00
2
2

|01

|01 + |11
|01 + |10

CNOT
|B01
2
2

|10

|00 |10
|00 |11

CNOT
|B10
2
2

|11

|01 |11
|01 |10

CNOT
|B11
2
2

Y
P
O
C
T
F
A

The generalized bell state is given by:

|Bx,y

DR
1.9.2

|0, y + (1)x |1, y

EPR paradox and Bells inequality

We shall look at these in detail in the next chapter: Fundamentals of Quantum mechanics.
We just saw that if we have the result of measurement of one qubit in the Bell state, the result of measurement
on the second qubit is determined. Not only this, but even if the two qubits are (practically) infinitely far
away, the result of measurement os determined instantaneously, after measuring one of them. It is surprising
how this is possible because we know that information cannot be transmitted at a speed greater than the
velocity of light. Therefore, following this diculty, EPR suggested that the two halfs of the Bell states
already had some more information, that describes the state of the two partice system at any given time,
and that we have not accounted for that information in our formulation of Quantum mechanics. Also, EPR
called this information, hidden variables. Later on ,John Bell suggested a mechanism to test for the presence
of local hidden variables. He devised an inequality which and he claimed that, if a quantum system does not
satisfy, it is impossible for it to have a local hidden variable.

1.9.3

Application of Bell States: Quantum Teleportation

Consider the following problem:


Bob and Alice are two friends living far apart. They together generated an EPR pair or a Bell State. Alice
and Bob have the 2 halfs of the state. Now Bob is hiding, and Alices mission is that she must deliver a state
| to Bob.
From looking at the problem, we can see that things are bad for Alice. She can only send classical information.
She cannot copy the state |. Alice does not even know the state |, and even if she knew it, it would take
infinite amount of information (and time) to describe | since it takes values in a continuous space.
So, the only thing left for Alice is to utilize the EPR pair (take advantage of the property of coherence of
measurement). So, briefly, what Alice does is the following:
GO TO FIRST PAGE

22

CHAPTER 1. BRIEF OVERVIEW

1.9. QUANTUM TELEPORTAION

1. Alice interacts | with her half of the EPR pair, and gets one of the four possible results: 00, 01, 10,
11.
2. She sends the obtained result as classical information to Bob.
3. Bob, knows that his measurement7 , i.e; when he interacts | with his half of the EPR pair, the result
will be the same. So since Bob knows the result (sent by Alice), he decides an appropriate operation8
on his half of the EPR pair, and he gets |.

Note that here the | has been communicated from Alice to Bob, without being actually transmitted. The
information was conveyed without any transport. So, it can be called Teleported, and hence the name :
Quantum Teleportation.
Now let us look into the process more closely:
|00 + |11

let
be the Bell state or the EPR state that Alice and Bob together created. Let the state to be
2
conveyed to Bob be | (the state to be teleported) = |0 + |1, where and are unknown amplitudes.
Now, on acting the EPR pair, with the state |, we get:
|0 = ||B00
1
|0 = [|0(|00 + |11) + |1(|00 + |11)]
2
Now, the first qubit represents the message to be teleported and the second

Y
P
O
C
T
F
A

is Alices half of the Bell state. The last is Bobs half of the Bell state.

Alices now will get her qubits into a CNOT gate and obtain |1 , where:
1
|1 = [|0(|00 + |11) + |1(|10 + |01)]
2

DR

Now, on sending |1 through a Hadamard gate, we get |2 where:

1
|0 + |1
|0 |1
|2 =
(|00 + |11) +
(|10 + |01)
2
2
2
1
[(|0 + |1)(|00 + |11) + (|0 |1)(|10 + |01)]
2
On expanding and rearranging the terms, we can obtain the following expressions:
1
| = [|00(|0 + |1) + |01(|1 + |0) + |10(|0 |1) + |11(|1 |0)]
2
So, as per the convention,

Figure 1.9: The first two bits are Alices and the next is Bobs as shown.
7
8

measurement means any interaction with the system


kind of inverse operation of what Alice did

GO TO FIRST PAGE

23

1.9. QUANTUM TELEPORTAION

CHAPTER 1. BRIEF OVERVIEW

if Alice performs a measurement and obtains 00, she will send it to Bob, then Bob will see which Qubit of
his in the expression will also give the same result. So, he will come to the conclusion that the state he wants
is with the Qubit |00. So, we have:
Table 1.2: Alices measurement and the corresponding state |:
Alices measurement result State |3 that Bob shall recover and get as the final message from Alice:
00

|3 (00) [|0 + |1]

01

|3 (01) [|1 + |0]

10

|3 (10) [|0 |1]

11

|3 (11) [|1 |0]

From the measurement results, Bob will apply a certain operation and retrieve the corresponding Qubit. For
example, if the measurement result is
1. 00, then Bob will let | as it is.

Y
P
O
C
T
F
A

2. 01, then he will act the | state with an X gate9 .


3. 10, he will apply Z gate to |.

4. 11 then he will first apply X gate to | and then Z gate on the rest.

Therefore, the resultant operation summarizes to:

DR

Z M1 X M2 |2 = |

where M1 and M2 are the two bits of information send by Alice. These are the results of Alices measurement.
Summarizing all the above operations, we can not draw a circuit diagram that Bob must follow, to retrieve
the state |:

Figure 1.10: The top 2 lines of input are for Alice (the first 2 qubits belong to Alice). The bottom qubit
belongs to Bob. The measurement operation that Bob must follow is described in the generalized fashion as
Z M1 X M2 .
9

NOT gate, these conventions will be used in most of the places.

GO TO FIRST PAGE

24

CHAPTER 1. BRIEF OVERVIEW

1.9.4

1.10. QUANTUM ALGORITHMS

Resolving some ambiguities

The whole process is slightly surprising because it causes several doubts that imply that the teleportation
violates the laws of Quantum Computation, that we earlier agreed upon. The following are the ambiguities:
1. This seems to imply that | is being copied by Bob from Alice. This is against the law of no cloning
theorem that we discussed. The subtlety that is hidden here is that, both copies of | never coexist
because as Bob gets the result of Alices measurement (in odder for him to create his |, Alice had
already destroy his copy due to the measurement.
2. The process of Teleportation seems to say that the information | is being conveyed instantly, since
it does not explicitly involve passage of any data. This is misleading because the fact that Bob must
get Alices measurement result, which is in the form of classical information, is a very very key concept
without which the teleportation is impossible. So, the speed of conveying the information is limited by
the speed of light10 , and does not violate the theory of relativity.

1.10

Y
P
O
C
T
F
A

Quantum Algorithms

DR

Now, we can look at a few like why Quantum Computation is preferred to Classical computation, can a
quantum computer do all that a classical computer is capable11 , etc. However, Quantum gates cannot
be used directly as classical logic gates because the former are inherently reversible, while the later are
irreversible. But still we can build classical reversible gates.

1.10.1

Simulating classical circuits using Quantum circuits

Any classical circuit or gate can be replaced by an equivalent reversible gate known as the Tooli gate. A
Tafolli gate is has three input bits. The last input bit, the target bit is flipped if all the other bits are 1. It
leaves the first two bits unchanged and performs an XOR operation of (AND(all other bits)) with the last
bit. The Taoli gate can now be used to simulate NAND12 . This gate is now reversible and has an inverse
which is the Taoli gate itself. The gate can be used as a quantum as well as a classical gate. In the quantum
case, the Taoli gate takes a state |110 and gives |111. It simply permutes the computational basis states.

10

speed of classical information cannot cross the velocity of light - Theory of relativity
It would be surprising if this was not possible because we know that all classical phenomena can be explained through
Quantum mechanics.
12
If it can simulate a NAND gate, which is a classical universal gate, then it can simulate all classical gates.
11

GO TO FIRST PAGE

25

1.11. QUANTUM PARALLELISM

CHAPTER 1. BRIEF OVERVIEW

Figure 1.11: A Tooli gate - Reversible classical gate: Truth table and circuit representations. The last bit
is flipped if the first two bits are 1.

1.11

Quantum Parallelism

Y
P
O
C
T
F
A

Parallelism has a dierent meaning in the quantum case, as compared to the classical case. In the classical
case, at any given unit of time, only one unit of task is in execution. While, in the quantum case, parallelism
actually takes its real meaning. At any given unit or instance in time, all the tasks run in parallel. This
feature in Quantum Mechanics (for the multiple qubit case) is achieved due to presence of the superposition
of bits like: [ij |ij]i,j . One of the most simple operations that can be performed on two qubits is an XOR
operation (denoted by ). For all practical purposes, let us assume XOR gate to be a universal gate. So, if
we can do in parallel two XOR operations, then we have showed quantum parallelism in action. Let us now
claim that corresponding to any function f where f:x f(x) we can always define a unitary transformation
Uf , such that

DR

Uf |x, y = |x, y f (x)

We also impose a condition that for the quantum transformation or circuit given by Uf , the inputs should
not be in the computational basis.
A rough circuit diagram of our set up is given below: (Here, since y is permanently set to |0 the value of
|x, 0 f (x) is equal to f(x) itself). In other words,
Uf |x, 0 = |x, 0 f (x) f (x)

Figure 1.12: Quantum circuit that computes f(x) (since |y = |0) simultaneously for 0 and 1
GO TO FIRST PAGE

26

CHAPTER 1. BRIEF OVERVIEW

1.11. QUANTUM PARALLELISM

|0, f (0) + |1, f (1)

. Hence, in one run of the function Uf , we have computed


2
both f(0) and f(1). A single execution of Uf was able to compute f(x) for multiple values of x. Hence the
name parallelism.
This idea of parallelism can be extended to multiple qubit systems also. We can have a quick look at how
this is done. So, our main objective (when generalized to n qubit case) is that, given n qubit states, we must
compute the values for f(x) for all the n states in parallel, that is, in a single evaluation of Uf . To do this,
just like the previous case, let us take a (n+1) qubit system with the last qubit equal to |0 (It is similar to
the |y that we set to |0 in the two qubit case). We now need all permutations of the first n qubits (with each
qubit in the computation basis) in the system. To produce all the permutations of n qubit system (where
initially, all qubits are set to |0) , we require n hadamard gates. We first set all the qubits to |0 initially.
Each of the hadamard gate, acting on a qubit produces
The output, in this case is:

|0 + |1

|0 + |1
|0 + |1

therefore: (H|0)(H|0) =
2
2

|0 + |01 + |10 + |11

(H|0)(H|0) =
22
H|0 =

Y
P
O
C
T
F
A

This can now be generalized to:

|0 + |1
|0 + |1
|0 + |1
|0 + |1

....
2
2
2
2

|i
i{all permutations of n qubits}

(H|0)(H|0)(H|0).......(H|0) =
n
2

(H|0)(H|0)(H|0).......(H|0) =

DR

The above statement is in short represented as:

|i
i{all permutations of n qubits}
n
n

H |0 =
2n

(1.6)

where H n denotes the parallel action of n hadamard gates. This is also called a hadamard transform to the
first n qubits.
Now this (n+1) qubit system can be sent to the function Uf , and for each permutation, the function Uf
will produce f(x) where x { permutations of n qubits}. The final answer that will come as output from the
function Uf is:

1
Uf (H n |0n |0) =
|x, f (x)
2n
x{all permutations of n qubits}

Therefore, we can compute 2n values of f(x) for dierent values of x, in one single evaluation of the function
f.
It is clear that this parallelism could not be possible if the qubits were not in the superposition of states.
In other words, we have shown that the parallelism works only in the quantum case and not in the case of
probabilistic classical bits because the classical bits cannot be in superposition of states.
But the story does not end here. There is a catch. The problem is that even if Uf computes all the
values of f(x) in one single evaluation, and returns the result, as shown above, we are still restricted (by the
nature of measurement that can be made in quantum mechanics) to only obtain one value of f(x). After this
measurement, the entire output will collapse to an eigen state. So, though we have shown to compute f(x)
for 2n values of x, we are allowed to look at f(x) for only one value of x. But there is a way of getting out of
here, and we shall see how, in the later sections.
GO TO FIRST PAGE

27

1.11. QUANTUM PARALLELISM

1.11.1

CHAPTER 1. BRIEF OVERVIEW

Example of Quantum Parallelism - Deutsch Jozsa Algorithm

In the previous few sections, we have been talking about features that are present only on quantum computers
and not in classical ones. In this section, we shall see how these features can come together and outperform
the classical computer (for this problem which is going to be presented). This problem is given by Deutsch
and Jozsa. The problem formulation is given as the following situation:
Alice in Amsterdam selects a number between 0 to 2n 1 and mails it in a letter to Bob in Boston. Bob takes
the number x (sent by Alice) and computes f(x) and replies with the result, which is either 0 or 1. Now Bob
has promised to use a function which is of one of the two kinds; either f(x) is constant for all values of x, or
f(x) is balanced, i.e; equal to 1 for exactly half of all possible values and 0 for the other half. Alices goal is to
determine with certainty weather Bob has chosen a constant or a balanced function, corresponding with him
as little as possible. How far can she succeed ?
Let us start by taking the naive classical method, which is when Alice shall query Bob at most 2n1 + 1
times (until she gets a dierent number from Bob). If she gets a dierent answer at any instant, she can
conclude that f(x) is balanced. If she gets a particular 2n1 + 1, then it means that the function is constant
(this is because, if it hadnt been constant, it must be balanced and hence it must have 2n1 values of the
other kind. But our function in this case has 2n1 + 1 values of one kind.). So, in this solution, Alice and
Bob need to communicate for at most for 2n1 + 1 values of x.
Let us try to do better by taking the quantum analogue of this problem. Let Alice and Bob be allowed to
exchange Qubits instead of classical bits, and let f(x) now be a function that operates on classical bits. The
output of the function, however the type of output remains the same. Let us now work through the process
for the simple n = 2 case. So, we need Alice to choose a number between 0 and 3. Alice has 4 choices, and we
need two qubits to distinctly represent 4 dierent quantities. So, let us suppose Alice chooses some number
that is represented by the state |0 = |01. Now, let us send each qubit through a Hadamard gate, and
obtain the new state |1 . The parallel action of two Hadamard gates on the state |0 produces |1 . So,
we have:

DR

Y
P
O
C
T
F
A
input state (Alices number): |0 = |01
now:H2 |0 = |1

|0 + |1
|0 |1

|1 =
2
2

Now, let us apply the transformation Uf to the two qubits obtained. (Uf is the quantum circuit that we
discussed in the last subsection). Just for recap,
Uf |x, y = |x, y f (x)
We can now send the two qubits of |1 into the quantum circuit Uf , as |x and |y respectively. Let us
define the new qubit |2 , which is obtained after acting |1 with Uf . So, we get:

|0 + |1
|0 |1

|2 = Uf
2
2

|0(|0 |1) + |1(|0 |1)


On expanding the product, we get: |2 = Uf
2
On separating the terms: |2 =

|0(|0 |1)
|1(|0 |1)
Uf
+ Uf
2
2

Here, both the terms which are separated out and being added, are of the

|0 |1

same form: Uf |x
2
GO TO FIRST PAGE

28

(1.7)

(1.8)

CHAPTER 1. BRIEF OVERVIEW

1.11. QUANTUM PARALLELISM

|0 |1

Therefore, let us first try to compute a general form for the expression: Uf |x

|0 |1
|x, 0 |x, 1

Hence, we get: Uf |x
Uf
2
2

|x, 0 f (x) |x, 1 f (x)

On the action of Uf the expression becomes:


2

|0 f (x) |1 f (x)

Which can be further simplified to: |x


2

Assuming that we know the property of the (XOR) operation, let us consider two cases: f(x) = 0 and f(x)
= 1.
Taking the first case: f(x) = 0:

|0 0 |1 0
|0 |1

|x
|x
2
2
Similarly, taking the second case: f(x) = 1:

|0 1 |1 1
|1 |0

|x
|x
2
2

|0 |1

Therefore, we can summarize by giving a general form for |x


by inspecting the above two cases,
2
as the following:

|0 |1
|0 |1

|x
= (1)f (x) |x
(1.9)
2
2
Now, since we have obtained a general form for the expression, we can substitute values of x to get the terms
for the expressions for |2 , that we obtained earlier in (1.7). The first and second terms of the RHS (of
expression in (1.7)) can be evaluated directly by substituting values of x (of expression in (1.9)) to be 0 and
1 respectively.
Therefore, we obtain:

DR

Y
P
O
C
T
F
A

Putting x = 0 in the first term and x = 1 in the second term, of the RHS:

|0 |1
|0 |1
|2 = (1)f (0) |0
+ (1)f (1) |1
2
2

The above expression gives the same value if f(0) = f(1), and a dierent value otherwise. So, we have two
dierent possibilities of evaluation: one is if f(0) = f(1), the other is when f(0) = f(1).
Let us consider the case f(0) = f(1). So, let f(0) = f(1) = c. Hence, we get:

|0 + |1
|0 |1

|2 = (1)c
. Here, we can write (1)c as
2
2
Similarly, for the other case, f(0) = f(1), we have the following argument.

If f(0) = f(1), then (1)f (0) (1)f (1) . So, we have: let (1)f (0) = (1)f (1) = c

|0 |1
|0 |1

Hence, the expression becomes: |2 = c


. Here, we can write c as
2
2
So, we obtain two expressions for |2 . Now we can write them as:

|0 + |1
|0 |1

f(0) = f(1)

2
2
{|2 =
}

|0

|1
|0

|1

f(0) =
f(1)

2
2
GO TO FIRST PAGE

29

(1.10)

1.11. QUANTUM PARALLELISM

CHAPTER 1. BRIEF OVERVIEW

Now, we have obtained |2 . So, let us now act the first qubit with |2 , to produce |3 . So, we get: H|2
= |3 . Let us see the action of H on the first qubit, separately for the two cases which are (1.10) and (1.10).
Let us expand the product in (1.10) and see the action of H on the first qubit.
H|2 = H

|00 + |01 + |10 + |11


2

We can act H with the first qubit and leave the second one unchanged. So, we get:


1
|0 + |1
|0 + |1
|0 |1
|0 |1

H|2 =
|0
|1 +
|0
|1
2
2
2
2
2
1
Hence, on simplifying: H|2 = [|00 |01 + |10 |11 + |00 |10 |01 + |11]
2 2

|0 |1

H|2 = |0
2
Similarly, on carrying out the calculations, for the other case of |2 , we get:

|0 |1

H|2 = |1
2
So, we obtain two expressions for |3 just like how we got for |2 . Now we can write them as:

|0 + |1

|0
f(0) = f(1)

2
|3 =

|0 |1

f(0) =
f(1)
|1
2

Y
P
O
C
T
F
A
(1.11)

The above expression (1.11) can be writtern in a more shorter and concise manner, in the following way.:

DR

|0 + |1

|3 = |f (0) f (1)
2

(1.12)

Hence, by measuring the first qubit, Alice can, by certainty can say weather f is balanced or constant. Hence,
by a single run of the function, Alice was able to determine a global property of the function. Earlier we saw
that though we could compute the function for lots of inputs, but we could measure only one result. Here,
with the same constraint, we have accomplished our task. So, on summarizing we have done the following
steps:
1. The input state was prepared.
2. Hadamard Transform on the input state was performed.
3. The qubit state was now passed to the Uf transformation.
4. The first qubit of resultant state was passed through the Hadamard gate.
We can now generalize the whole procedure for the multiple qubit case, that is Alice can choose any number
between 0 to 2n where n can be as large. So, in the multiple qubit case, Alices query register can be
represented by a n qubit state, all of which are initially set to zero. Alices input state is a (n+1) qubit state
with the last qubit set to |1.
Alices input state: |0 = |0n |1

Now, let us perform a Hadamard Transform on the query register and pass the last qubit (answer register,
which Bob will modify and send |1 into a Hadamard gate, to get |1 . (this is similar to (1.8)). In the
previous case, instead of the transform, we had only one gate. From (1.6), we know the result of Hadamard
GO TO FIRST PAGE

30

CHAPTER 1. BRIEF OVERVIEW

1.11. QUANTUM PARALLELISM

Transform on N qubits, where all are initially set to |0, gives

x{0,1}

|0 |1

single qubit |1 gives


. So, on their parallel action, we get:
2

|x |0 |1

|1 =
2n
2
x{0,1}n

|x
, and the hadamard gate on the
2n

(1.13)

where (x {0, 1}n ) means all permutations of 0 and 1 which are of length n.
Bob now takes the input state sent by Alice and computes the function f using the transformation Uf . Bob
now sends the answer in |2 (the form of |2 resembles the equation in (1.9)):
|2 =

x{0,1}n

(1)f (x) |x |0 |1

2n
2

(1.14)

Alice has now a set of Qubits in which


the result of Bobs function is stored in the amplitude of the superpo (1)f (x) |x |0 |1

. She now interferes the terms in superposition with a Hadamard


sition state:
2n
2
x{0,1}n

transform, to get |3 . To calculate the eect of the Hadamard transform, we can first see how the transform
in general works with the n-qubit state |x1 , x2 , x3 , ....xn :

Y
P
O
C
T
F
A
Hence, we need to calculate: H n |x1 , x2 , x3 , ....xn

Let us now go one more level down, and first try to calculate the result of Hadamard transform on a single
qubit: H n |x

DR

GO TO FIRST PAGE

31

1.11. QUANTUM PARALLELISM

DR

CHAPTER 1. BRIEF OVERVIEW

Y
P
O
C
T
F
A

GO TO FIRST PAGE

32

Part II

Y
P
O
C
T
F
A

PREREQUISITES - MATHEMATICS

DR

33

DR

Y
P
O
C
T
F
A

Chapter 2

Linear Algebra
2.1

Vector Spaces

Y
P
O
C
T
F
A

A vector space over a field is a set that us closed under the finite addition and scalar multiplication, where
the scalar belongs to the field. In this context, we shall deal with vector spaces over the field of complex
numbers, denoted by C. A vector space has the following properties:
1. Closure under vector addition:
(u + v) V (u,v) V

DR

2. Closure under scalar multiplication:


cu V u V , c F

3. Vector addition and scalar multiplication are commutative


u + v = v + u u,v V
(cd)u = c(du) = d(cu) u V , (c,d) F
4. Vector addition distributive
c(u + v) = cu + cv (u,v) V , c F

5. Existance of an additive inverse


u V, v V such that u + v = 0
The vector v is denoted as -u and called the inverse of u, under addition.
6. Existance of a scalar identity under multiplication
u V, c F such that cu = u
The identity is denoted by 1.
7. Existance of a zero vector
u V, v V such that u + v = u
v is then called the zero vector and denoted by 0.
In this context, we shall deal with vector spaces over the field of complex numbers, denoted by C.
A vector can be represented by a matrix. The entries of the matrix are called the components of the vector.
35

2.2. LINEAR DEPENDENCE AND INDEPENDENCE

CHAPTER 2. LINEAR ALGEBRA

For example:

v1
v2

v3

v=
.
.

.
vn

u1
u2

u3

, u=
.
.

.
un

v1 + u1
v2 + u2

v3 + u3

.
, v + u=

.
vn + un

Here n is called the dimension of the vector space. The vector space can also be infinite dimensinal.

2.2

Linear dependence and independence

A set of vectors
v
1 v2 , v3 . . . . vn are said to be linearly dependent i some constants ci s such
that:
n

ci
vi = 0.

Y
P
O
C
T
F
A
i=1

Exception: The trivial case where all ci s are = 0.

Each vector depends on the rest of the vectors by a linear relationship. Hence the name linearly dependent.
The vectors are called linearly independent i they do not satisfy the above conditions. For linearly
independent vectors, no vector is related to the rest by a linear relationship. The maximum number of
linearly independent vectors in a vector space is called the dimension of the vector space, denoted by n.

Any vector
x in this vector space can be uniquely described as a linear combination of these n vectors of the
vector space. The proof for this is quite simple. A n-dimensional vector space will have a set of n linearly

independent vectors. If we add the vector


x to this set, then we will still have a n-dimensional space. So,

this means the set v1 , v2 , v3 . . . . vn , x is linearly dependent. So:

DR

ci
vi + cx
x =0

i=1

so:

ci
vi = cx
x

i=1
n

ci
vi =
x

i=1

Hence, we have shown that the general vector


x can be expressed uniquely as a linear combination of the
n-linearly independent vectors of the vector space. The set of n linearly independent vectors is called the
basis of the vector space. The set of all the vectors represented by this basis is called the spanning set of
the basis.

2.3

Dual Spaces

A linear function can be defined over the elements of the vector space. Let f be a function which is linear.

f :
x f (
x ), such that f (
x ) C. The function maps an element in the vector space to an element in

the field over which the vector space is defined. By our convention we represent a vector
x by a coloumn
GO TO FIRST PAGE

36

CHAPTER 2. LINEAR ALGEBRA

2.4. DIRACS BRA KET NOTATION

vector. So, we can see by inspection that f should be represented using a row vector. Then on multiplying
the row and the column vector, we get a single scalar. The function satisfies the requirement:

f (a
x + b
y ) = af (
x ) + bf (
x)

where: {a, b} F and { x , y } V


So, we can propose a representation for the function:

x1
x2

x3

if
x =
. then we can say that f = f1 f2 f3 . . .fn
.

.
xn

Hence, on matrix multiplication of f with


x , we get f1 x1 + f2 x2 + f3 x3 ....fn xn
which is a dimensionless number F.
The space of such functions is called the dual space to the corresponding vector space. If the vector space
is V, then the corresponding dual space is represented by V .

2.4

Y
P
O
C
T
F
A

Diracs Bra Ket notation

The Dirac bra-ket notation is widely used in Quantum Mechanics. Here any vector
x (represented as a
column matrix) is represented as |x and this vector is called a ket vector. The corresponding dual vector
is called the bra vector, denoted by x|. Hence the name bra ket notation. We would be following this
method from now on. The reasons will become evident as we proceed. So, we have the conventions:

DR
2.5

(2.1)

x| |x

x|c c |x (where c is some complex number)

(2.2)

Inner and outer products

Consider a vector |x and its dual vector x|. We saw that the former was a column matrix and the latter
the row matrix. x| is called the bra dual of |x. So, the product of a ket vector and its corresponding bra
vector is a dimensionless scalar quantity (as we saw this is multiplying a row matrix with a column matrix).
The product is represented as: x|x. This product is called inner product. The inner product of a bra
and a ket vector is in short defined as:

x|y =
yi xi
(2.3)
i

The inner product also satisfies some linearity properties:

x|c1 y1 + c2 y2 = c1 x1 |y + c2 x2 |y

c1 x1 + c2 x2 |y =

c1 x1 |y

c2 x2 |y

(2.4)
(2.5)

The first can be derived by pulling the constant out of the ket vector. The second can be obtained by the
same technique, but using the rule given in (2.2). Another important property satisfied by the inner product
is:

x|y = (y|x) .
(2.6)
GO TO FIRST PAGE

37

2.6. ORTHONORMAL BASIS AND COMPLETENESS RELATIONSCHAPTER 2. LINEAR ALGEBRA


This can be proved as follows:
given from equation (2.3): x|y =
Consider the RHS:

(yi xi )

yi xi

yi xi

yi xi

coming back to the initial definition of the inner product, we get back:

(y|x)

Therefore, hence we have proved that: x|y = (y|x)

The inner product of a vector and its bra dual gives the square of the norm of the vector. It is represented
as:
x|x = || |x ||2

(2.7)

Y
P
O
C
T
F
A

The norm is a scalar dimensionless quantity. However there is also another possibility which we havent yet
explored. What about the product |xx|. This is a product of a column matrix with a row matrix. This
is a matrix whose dimensions will be (nn) where n is the dimension of the vector space containing |x.
This product is called an outer product. A matrix can be thought as any transformation on a vector. In
quantum mechanics, these transformations or matrices are called operators.

DR
2.6

Orthonormal Basis and Completeness Relations

We saw that any set of n linearly independent vectors {|v1 , |v2 , |v3 , ...|vn } form a basis for a n dimensional
vector space. We impose the condition that:
vi |vj = ij

for all {i , j} {1..n}

The basis that satisfies this condition is called the orthonormal basis. The orthonormal basis is conventionally represented by {|e1 , |e2 , |e3 , ...|en }. The basis has some useful properties. Consider a vector |x
in the vector space spanned by the basis {ei }. We can use this othonormality property to get the coecients
multiplying the basis states:
|x =

ci |ei

multiply both sides by ej |: ej |x =

ci ej |ei

since ej |ei = ji , it will be 1 only when j = i


Hence, when summed over we get: ej |x = cj

(2.8)

Therefore, to get the coecient multiplying the jth basis (orthonormal) state, in some basis expansion for a
vector, we must take an inner product of this basis state with the vector. Now, we can see some results with
the outer product of the basis states. The outer product leads to an operator. Now, let us take the basis
GO TO FIRST PAGE

38

CHAPTER 2. LINEAR ALGEBRA

2.7. PROJECTION OPERATOR

expansion of the vector, and substitute the result obtained in (2.8):

|x =
ci |ei
i

from (2.8),

|x =

ei |x|ei

The expression inside the summation is a scalar multiplication with a vector, which is commutative.

Hence: |x =
|ei ei |x
i

Now, consider the RHS of the form A|x, where A is some operator:

Hence: |x =
|ei ei | |x
i

Now, looking at the form of this operator, we can easily guess that the operator is the identity operator.
Therefore:

|ei ei | = I
(2.9)
i

This relation given above is called the completeness relation.

2.7

Y
P
O
C
T
F
A

Projection operator

Consider the operator |ek ek |. This operator when acts on any vector, gives the coecient of |ek , in the
{ek } basis expansion of that vector. So, in other words, it takes any vector and projects the vector along
the |ek direction. The projection occurs due to the formation of the inner product of the bra ek | with the
arbitrary ket vector. Hence this operator |ek ek | is called the projection operator. It is denoted by Pk .
The projection operator satisfies some important properties:

DR

(Pk ) = (Pk )

(2.10)

Pi Pj = 0

Pi = I

(2.11)

(2.12)

The first (2.10) can be proved by considering Pk = |ek ek |. So,


n

(Pk ) = |ek ek |ek ek |ek ek | . . . |ek ek |.


Now, we can simplify this statement by taking inner product. We use the fact that |ek ek | = 1. Only the
terminal terms |ek and ek | will remain after all the others become 1 through formation of inner product.
Hence, the RHS will be Pk .
The second statement (2.11) can be proved by considering: Pi Pj = |ei ei |ej ej |. Now the inner product
ei |ej is = 0. The entire expression reduces to 0. Hence the statement is proved.
The third statement (2.12) is already proved in (2.9)when we were looking at the orthonormal basis and
completeness relation.

2.8

Gram-Schmidt Orthonormalization

Given any linearly independent basis that span some vector space, we can construct an orthonormal basis
for the same vector space. If we could not do that, then the original set of vectors would not be a basis since
we have a vector that cannot be uniquely described by them. Therefore, we proceed by trying to construct
an orthonormal set of basis from a given basis. So, our objective is that we must construct a set {Oi },
GO TO FIRST PAGE

39

2.8. GRAM-SCHMIDT ORTHONORMALIZATION

CHAPTER 2. LINEAR ALGEBRA

corresponding to the set {vi }. By defenition of orthonormal basis, we need that Oi |Oi = 1 ; that is, the
norm of the vector must be = 1. So, we try to construct that basis. So, we define

|Oi =

|vi
|||vi ||

(2.13)

Our set of vector are all of unit norm. So, we now need to construct orthogonal vectors out of these. Our
idea briefly is the following: We can illustrate this process taking an example of vectors on a 2 dimensional
real vector space, which can be represented on a plane.
1. Without any loss of generality, take the first vector from the given basis (call it |x1 ) as the first
orthonormal vector |O1 . Hence, we have:

DR

Y
P
O
C
T
F
A
Figure 2.1: V1 and v2 are the given basis vectors. choose O1 = v1

2. Take the second vector (|x2 ) from the given basis, and take the projcection of |x2 along this vector
|O1 . (take the projection by multiplying |O1 O1 | with |x2 .) Now, we have a vector that is on |O1 .
Call this vector |PO1 .

Figure 2.2: diagram for step 2. Project x2 along O1

3. Now, subtract PO1 from |v2 (using the triangle law of vector addition) to get |e2 .
GO TO FIRST PAGE

40

CHAPTER 2. LINEAR ALGEBRA

2.9. LINEAR OPERATORS

Figure 2.3: diagram for step 3. get O2 = Po1 - O1. Dotted line indicates that we are subtracting vectors
using the triangle law of addition
4. Now, by construction, this vector |e2 is orthogonal to |e1 .

DR

Y
P
O
C
T
F
A
Figure 2.4: final diagram: O1 and O2 are formed.

5. Similarly, we go on to construct all {ei }. In the ith step, we need to subtract all the projections of vi
from vi .

So, summing up, we got |O2 from performing (|v2 - |PO1 ). We get |PO1 by performing (|O1 O1 |)|v2 .
Hence, we can write:
|Oj =

|vj

|||vj

j1
ij1
i

|Oi Oi |vj

|Oi Oi |vj ||

where (1 j n)

(2.14)

The denominator is to normalize (setting its norm to 1) the resultant vector. n is the dimensionality of the
vector space spanned by the given basis.

2.9

Linear Operators

A linear operator between two vector spaces V and U is any function A: V U which is linear in its
inputs:

A
ai |vi =
ai (A|vi )
(2.15)
i

From the above definition we see that this function has a value in U for each element in V. If V and
U are n and m dimensional respectively, then the function will transform each dimension out of n to a
GO TO FIRST PAGE

41

2.10. HERMITIAN MATRICES

CHAPTER 2. LINEAR ALGEBRA

dimension out of m. In other words, if V is spanned by the basis {|v1 , |v2 , |v3 , ..., |vn } and U is spanned
by {|u1 , |u2 , |u3 , ..., |um }, then for every i in 1 to n, there exists complex numbers A1i to Ami such
that

A|vi =
Aij |ui
i

If V and U are n and m dimensional respectively, then A is m n dimensional.

2.10

Hermitian Matrices

A matrix is called self adjoint or Hermitian if the matrix is equal to its own conjugate transpose. That
is, the element
Aij = (Aji )

(2.16)

The congugate transpose of a matrix A is denoted as A , where, the relation (2.16) holds. Hence, a matrix
A is called Hermitian i A = A . Since we have this property for the Hermitian matrix, we can see that the
diagonal entries of a Hermitian matrix have to be real. A very similar defenition holds also for Hermitian
operators as well. An operator is Hermitian if its matrix representation has a hermitian form. The conjugate
transpose for an operator is defined as:

Y
P
O
C
T
F
A
u|A|v = (v|A |u)

(2.17)

u|A|v = (v|A|u)

(2.18)

Therefore if an operator is Hermitian,

DR

is satisfied. The defenition of a conjugate transpose applies to vectors as well. We have:

(|x) x1 x2 x3 . . . xn x|

(2.19)

Let us look at some derivations that we need to know before proceeding:


(A + B) = A + B
(cA) = c A

(AB) = B A

(2.20)
(2.21)

(2.22)

Let us consider the matrix A to have the elements {Aij } and the matrix B to have the elements {Bij }. If
we prove that the above laws hold for an arbitrary element in A and B, then we have proved the laws in
general.
For the first case, (A + B) will have the elements (Aij + Bij ). Let this sum be = Cij . Now, C will have
the elements (Cji ) . By laws of matrix addition,
(Cji ) = (Aji ) + (Bji ) .
Since, (Aji ) + (Bji ) = (Aji + Bji )

therefore, we can say

(Cji ) = (Aji + Bji ) . Since it is true for Cji it should also hold for Cij :

Therefore

(Cji ) = (Aji + Bji )

Now since the law holds for the individual elements, in should also hold for the matrix or the operator.
Therefore, we can say:
C = (A + B)
GO TO FIRST PAGE

42

CHAPTER 2. LINEAR ALGEBRA

2.11

2.11. SPECTRAL THEOREM

Spectral Theorem

Theorem: For every normal operator M acting on a vector space V, there exits an orthonormal basis for V
in which the operator M has a diagonal representation.
P roof : We try to prove this theorem by induction on the dimension of the vector space V. We know that
for a single dimensional vector space, any operator is diagonal. So, let M be a normal operator on a vector
space V which is n dimensional. Now, let the eigenvalues and the eigenvectors of M be and |a respectively.
Let P be a projector on to the eigen space of M. Let Q be the complement of the projector operator.
We have the relation (P + Q) = I and M|a = |a.
We now make a series of manipulations that shall give us the operator A, that acts on V, in terms of
some operator(s) that/those act on subspace(s) of V. This is because, we already assume that the spectral
theorem holds in the lower dimensional spaces (subspaces), i,e; Every normal operator in the subspace of V
has diagonal representation in some orthonormal basis for that subspace.
M = IAI
using P + Q = I:
M = (P + Q)M(P + Q) PMP + PMQ + QMP + QMQ
Now from these terms, we need to filter out those on the RHS which evaluate to 0. Taking QMP: P projects
a vector on to the eigen space. In other words, P acts on a vector to give the coecient of the eigen state
(when the vector is expressed as a linear combination of the eigenvectors). M Then acts on these eigen
vectors (produced by P), giving the same eigen vector as the result. Q now acts on a vector and projects
it to the space which is an orthogonal complement of the eigen space. Since after the action of P and M,
the result is an eigen vector of the lambda eigen space, the action of Q will now try to express this eigen
vector as a linear combination of the eigen vectors which do not belong to the eigen space. But this is not
possible since we have assumed that the eigen vectors are orthonormal (linearly independent). So, the action
of Q, shall produce 0. Therefore, QMP = 0.
We can now look at the term PMQ. Let us first consider that M is a normal matrix.

DR

Y
P
O
C
T
F
A
MM = M M

MM |a = M M|a

M(M |a) = (M |a)

We now see that M |a is also an eigenvector of M, with eigenvalue . Therefore, from the same above
argument, M acting on an eigenvector of the eigen space, gives another eigen vector in the eigen space.
Hence, PA Q = 0. Now, taking the adjoint of this equation gives: (PM Q) = 0 QMP = 0 (as P and
Q are Hermitian operators).
Therefore, we have:
M = PMP + QMQ

(2.23)

The operators on the RHS act on a subspace of V and we had already assumed that the spectral theorem
holds for the subspace. So, if these operators are normal, then we can say that they are diagonal in some
basis.
We can easily show that PMP is normal. Since, P projects any vector onto the eigen space, and then
the vector is left unchanged by action of PM (except for a scalar multiplication of by M), we can say
that:
(PMP) = P
Now since P is hermitian, it is obviously normal. Therefore we conclude that PMP is also normal.

GO TO FIRST PAGE

43

(2.24)

2.12. OPERATOR FUNCTIONS

CHAPTER 2. LINEAR ALGEBRA

Similarly, we have that QMQ is also normal. (Note: Q2 = Q and Q is hermitian.) We also have:
QM = QMI QM(P + Q)

QM = QMP + QMQ QMQ

(2.25)

Similarly: QM = QM Q

(2.26)

So, using the above equations, we can prove that QMQ is normal.
(QMQ) = QM Q
(QMQ) (QMQ) = QM QQMQ
QM QMQ

since Q2 = Q:

since QM Q = QM :
since MM = M M:
since QM = QMQ:

QM MQ

QMM Q

QMQM Q

QMQQM Q

since Q = Q2 :
Therefore, we have:

(QMQ)(QMQ)

Therefore, QMQ is normal. Now since PMP and QMQ are normal operators on subspaces P, and Q, they
are diagonal in some orthonormal basis for those respective spaces.
Now since PMP and QMQ are diagonal, their sum is also diagonal in the combined basis. Therefore,
M = PMP + QMQ is diagonal in some basis for V. Hence, proved.

2.12

Y
P
O
C
T
F
A

Operator functions

DR

It is possible to extend the notion of functions, defined over complex numbers, to functions that are defined
over operators. It is necessary that these operators are normal. A function on a normal matrix or an normal
operator
is defined in the following way: Let A be some normal operator which has a spectral decomposition
as
|aa|, then:
a

f (A) =

f (a)|aa|

(2.27)

So, in the above equation, we try to represent the operator in a diagonal form and then apply the function
to each diagonal entry. We can try to prove the above equation for special functions. Take for example An
for some positive n.
An |aa| = a An1 |aa| 2a An2 |aa| . . . na |aa|
(2.28)

From completeness theorem, we have:


|aa| = I.
An = An
|aa|
An |aa|
a

From equation (eq. 2.28), we see: A |aa| =


n

na |aa|

A =
n

na |aa|

(2.29)

Hence, we prove equation (eq. 2.27) for the special case of the function being a power operation.
1. Why must we, in general, consider only normal operators or, operators having a spectral decomposition
?
2. How do
we prove equation (eq. 2.27) for the case of square-root or a logarithm operation, that is
f (A) = A and f (A) = log A ?
GO TO FIRST PAGE

44

CHAPTER 2. LINEAR ALGEBRA

2.12.1

2.13. SIMULTANEOUS DIAGONALIZABLE THEOREM

Trace

The trace of a matrix is also a function on the matrix. The trace of a matrix A is defined as the sum of all
the diagonal elements of A. (A need not be a diagonal matrix ). The trace can be defined using the matrix
notation as well as the outer product form:

tr(A) =
Aii
(2.30)
i

tr(A) =

2.13

i|A|i

(2.31)

Simultaneous Diagonalizable Theorem

Theorem: Any two hermitian matrices A and B commute if and only if there exists an orthonormal basis
in which the matrix representations of A and B are diagonal.
We say that A and B are simultaneously diagonalizable if there exists some basis where the matrix representations of A and B are diagonal.
Proof : Let A and B be two operators that have a common orthonormal basis in which they are diagonal.
This means they have a common eigen basis. Let the eigen basis be denoted by the set of eigenstates labelled
|a, b. So, we have:

Y
P
O
C
T
F
A
AB|a, b = bA|a, b = ab|a, b

(2.32)

BA|a, b = aB|a, b = ba|a, b

(2.33)

On subtracting the above two equations: (eq. 2.32) - (eq. 2.33), we get

DR

AB - BA|a, b = (ab ba)|a, b


AB - BA = 0 [A, B] = 0

(2.34)

We have shown that A and B commute if and only if they have a common eigen basis. This proves the
simultaneous diagonalizable theorem.
Proof of the converse: Let |a, j be the eigenstates of the operator A with the eigenvalue a. Here the index j
denotes the degeneracy. Let the vector space containing all eigenstates with eigenvalue a form the vector space
Va . Let the projection operator onto the Va eigenspace be called Pa . Now let us assume that [A, B] = 0.
Therefore, we have:
AB|a, j = BA|a, j = aB|a, j

(2.35)

Therefore, we have that B|a, j is also an element of the eigenspace Va . Let us define an operator
Ba = Pa BPa

(2.36)

We can now try to see how Ba acts on an arbitrary vector. From definition (def. 2.36), we can see that Pa
will cut o all the components of a vector which does not belong to the Va eigen space. The vector, produced
by action of Pa , shall belong to the Va eigenspace. The action of B on any vector inside the Va eigenspace
will leave the vector unchanged (except for multiplying it by a scalar). The action of Pa again on the vector
produced by the action of B, will leave the vector unchanged. Now we can see how Ba acts on any arbitrary
vector. We have Ba = PB Pa . When an arbitrary vector is acted upon by Ba , the projection operator
Pa shall produce a vector that entirely lies in the Va eigen space. The action of Ba will now produce some
arbitrary vector (since the vector produced by Pa may not be an eigenstate of Ba . The projection operator
Pa again acts on this arbitrary vector (produced by Ba ), taking it again to a vector that entirely lies in the
Va eigen space.
Therefore, summing up, we can say that the action of Ba and Ba on any arbitrary vector, produce the same
vector in the Va eigenspace. Therefore, we say that restricting Ba to subspace Va , Ba and Ba are equivalent
GO TO FIRST PAGE

45

2.14. POLAR DECOMPOSITION

CHAPTER 2. LINEAR ALGEBRA

operators acting on Va . In other words, the restriction of Ba to space Va is hermitian on Va .


Since, Ba is hermitian operator on Va , it must have a spectral decomposition in terms of an orthonormal set
of eigenvectors in Va . These eigenvectors are both eigenvectors A (since they belong to the Va eigenspace)
and Ba (since they are part of the spectral decomposition of Ba ). Let us call these |a, b, k where the indices
a, b denote the eigenstates of A and B operators respectively, and k denotes the degeneracy.
We have the eigenvalue equation Ba |a, b, k = b|a, b, k. Since, |a, b, k is an element of the Va eigenspace, we
have:
Pa |a, b, k = |a, b, k

(2.37)

From equation (eq. 2.35), we also have that B|a, b, k is an element of space Va . So, similarly we can
say:
Pa B|a, b, k = B|a, b, k
We can now modify this (above) equation by replacing |a, b, k on the LHS with Pa |a, b, k (refer equation
(eq 2.37)):
B|a, b, k = Pa BPa |a, b, k

(2.38)

If we compare the RHS of the above equation with equation (eq 2.36), we see that the RHS of the above
equation is the same as Ba |a, b, k. Since |a, b, k is an eigenstate of Ba with eigenvalue b, we can rewrite
equation (eq. 2.38) as:

Y
P
O
C
T
F
A
B|a, b, k = b|a, b, k

(2.39)

Therefore, in the above equation, we see that |a, b, k is also an eigenstate of B with eigenvalue b. Hence, we
see that the set of vectors |a, b, k form a common eigen basis for A and B. Hence, we have proved that if
[A, B] = 0, then there exists an orthonormal basis where A and B are diagonal.

DR
2.14

Polar Decomposition

Theorem: For every matrix A, there exists an unitary matrix U and positive (semidefinite) matrices J and
K, such that:
A = UJ = KU

(2.40)

The above theorem can be said as: for every positive operators J and K, there exists an unitary matrix U
such that: KU = UJ .

Proof: Consider the operator J = A A. By construction, the operator J is Hermitian. Therefore, there
exists
a spectral decomposition for A (involving its eigen values and its eigen states). Therefore, let J =
i |ii| , where i ( 0) and |i are the eigen values and the corresponding eigen states of J.
i

Let us define

|i = A|i

(2.41)

From this, we can see that | = 2i . Now, consider all the non-zero eigen values, that is all i = 0.
Let
|ei =

|i
i

(2.42)

i |i
= ij (since all i s are real). We are still considering the i = 0
i j
case. Let us now construct an orthonormal basis using the gram-schmidt orthonormalization technique, by
Therefore, we have ei |ej =
GO TO FIRST PAGE

46

CHAPTER 2. LINEAR ALGEBRA

2.15. SINGULAR VALUE DECOMPOSITION

starting o with the


first vector as |ei . We can now construct an orthonormal basis {|ei }. Let us now define
an operator U =
|ei i|. When i = 0, we have UJ|i = U(i |i) since |i is an eigenstate of J. Now,
i

U(i |i) = i U|i . On using the definition of the operator U, we have: U|i = |ei . Therefore, UJ|i = i |ei
. From definition, (def 2.42), i |ei = |i , where again from definition (2.41), we have |i = A|i. So, we
can summarize to say that
UJ|i = A|i

(2.43)

When i = 0, we have as before, UJ|i = U(0), and even |i = 0 from relation (def. 2.42). So, we again
have UJ|i = A|i. Therefore, we see that operators UJ and A agree with each other on all values in the |i
basis. Hence we can say that there exists a basis where A = UJ.
A = UJ

(2.44)

This gives the right polar decomposition of A. Also, we can prove that the matrix J is unique for every A.
This can be done by multiplying the equation (eq. 2.44) with its adjoint. So, we have:
A A = JU UJ
Since J is hermitian: A A = J2 J

J = A A

(2.45)

U = AJ1

(2.46)

We can also get an expression for the operator U from equation (eq. 2.44), by post-multiplying both sides
by J1 . We have:

Y
P
O
C
T
F
A

The above definition is however possible only if J is invertible or non-singular. Also whether J is invertible
or not depends on the existence of inverse for A. Therefore, we see from the equation (eq. 2.46) that if A is
invertible, then the U is also uniquely defined for every A.
We can also do the left polar decomposition, starting from equation (eq. 2.44). On post-multiplying the RHS
with U U (since U is unitary, this will preserve the equality) we get:

DR

A = UJU U

Now, let UJU = K. So, we can rewrite the above equation as:
A = KU

(2.47)

This now gives us the left polar decomposition of A. We can also show that the matrix K is uniquely defined
for every A. For this, we multiply the equation (eq. 2.47) with its adjoint. We get
AA = KUU K
Since, U is unitary, we have
K=

AA

(2.48)

Therefore, we see that K is uniquely defined. Similarly, we can show that if A is invertible, then the matrix
U is uniquely defined.
Therefore, combining both the left the and right parts, we have proved the polar decomposition.

2.15

Singular value decomposition

Theorem: For every matrix A, there exist unitary matrices U and V and a diagonal matrix D with positive
entries, such that:
A = UDV
The diagonal entries of the matrix D are known as the singular values of A.
GO TO FIRST PAGE

47

(2.49)

2.15. SINGULAR VALUE DECOMPOSITION

2.15.1

CHAPTER 2. LINEAR ALGEBRA

Proving the theorem for the special case of square matrices:

Before proving, we can go through some simple assumptions:


1. If A and B are two unitary matrices, then their product: AB is also an unitary matrix.
Proof: To prove (AB) AB = I , given A A = AA = B B = BB = I:
(AB) (AB) = B A AB
Since A and B are unitary: B IB IB B I
(AB) (AB) = I

Proof: From the polar decomposition of A, we know that there exist


an unitary matrix S and a positive
operator J such that A = SJ . Since J is hermitian (it is defined as A A (Refer equation (eq. 2.45)) ,
we know that there exits a spectral decomposition for J. So we have some unitary matrix T and a diagonal
matrix D (with non-negative entries) such that J = TDT . Now, we can rewrite A as:
A = SJ = STDT

Y
P
O
C
T
F
A

Now, since S and T are unitary, from assumption (1), we can define another unitary operator U = ST.
Also since T is unitary (which implies that T too is unitary), we can define another new unitary operator
V = T . Putting all the definitions together, we have:

DR

A = STDT UDV

(2.50)

The above equation (eq. 2.50) now, proves the singular value decomposition theorem.

2.15.2

Proving the theorem for the general case:

Proof: Before getting into this proof, we first need to make some assumptions.
1. For every operator A, the operator A A is Hermitian.
2. The eigenvalues of a Hermitian operator are real positive numbers.
3. The eigenvectors, corresponding to dierent eigenvalues, of a Hermitian operator are orthogonal.
Let us now construct a hermitian operator A A. Consider the eigenvalue equation for this operator:
A A|i = i |i

(2.51)

Also, we claim (from assumption no.2) that the set of eigenvalues {i } are all real positive numbers.
That is if the set of eigenvalues {i } are all > 0 for 0 < i r, then the set {i } for (r+1) i n are all zero.
Therefore, the set of eigenvectors {|1 , |2 , |3 , . . . |r } is orthogonal to the set of eigenvectors {|r+1 , |r+2 , |r+3 , . . . |n
as they correspond to dierent eigenvalues (see assumption 3).
GO TO FIRST PAGE

48

CHAPTER 2. LINEAR ALGEBRA

2.15. SINGULAR VALUE DECOMPOSITION

Let us now construct a few operators:

V = |1 |2 |3 . . . |r |r+1 . . . |n

D=

.
n

U = |1 |2 |3 . . . |r |r+1 . . . |m
1
where |i = A|i (1 < i < r)
i
The set {|1 , |2 . . . |r } is orthogonal to {|r+1 . . . |m }
Now, having these definitions, we can perform the product UDV :

UD =
1 |1
2 |2
3 |3 . . .
r |r
r+1 |r+1 . . .

(2.52)

(2.53)

(2.54)
(2.55)
(2.56)

m |m

(2.57)

From equation (eq. 2.55), we see that the entries (|1 , |2 , |3 . . . |r ) of the matrix UD can be simplified.
The entries after |r can be left unchanged.

UD = A|1 A|2 A|3 . . . A|r


(2.58)
r+1 |r+1 . . .
m |m

1 |
2 |

UDV = A|1 A|2 . . . A|r


(2.59)
r+1 |r+1 . . .
m |m r |

r+1 |

.
n |

UDV = A|1 1 | + A|2 2 | + + A|r r | + r+1 |r+1 r+1 | + + m |m m | (2.60)

DR

Y
P
O
C
T
F
A

Since, we assumed earlier that eigenvalues {i } are all zero for i > r, can ignore the terms in equation (eq.
2.60) that succeed A|r r | .
UDV = A

i=1

|i i |

(2.61)

Since, we know that all the eigenvalues after r are 0, we have A|i = i |i = 0 for r < i < n. If A|i = 0,
then A|i i | = i |0 0 . So, we can write the above equation (eq. 2.61) as:
UDV = A

i=1

|i i | +

UDV = A

i=1

i=r+1

A|i i |

|i i |

(2.62)
(2.63)

The above is the completeness relation. So, we can write:

UDV = AI
UDV = A
Hence, we prove the singular value decomposition statement.
GO TO FIRST PAGE

49

(2.64)

2.15. SINGULAR VALUE DECOMPOSITION

DR

CHAPTER 2. LINEAR ALGEBRA

Y
P
O
C
T
F
A

GO TO FIRST PAGE

50

Chapter 3

Elementary Group Theory


3.1

Structure of a Group

Y
P
O
C
T
F
A

Group: A Group (G, ) is a set G, along with a binary operation such that:
Closure: It is closed under the binary operation: (a b) G

a, b G

Associative: The binary operation on the elements of G is associative: a (b c) = (a b) c


a, b, c G

DR

Identity: an unique element e G such that: a e = e a = a


element of G.
Inverse: b G such that: ab = ba = e
the inverse1 of a denoted as a1 .

a G. e is called the Identity

a, b G where e is the Identity element of G. b is called

Abelian: A group is called Abelian if, in addition to the above, the below property also holds:
Commutative: a b = b a

a, b G.

Order of a Group: The order of a group (G, ), denoted as |(G, )| is the cardinality of the set G: |(G, )| =
#G. Order of an element of the group: The order of an element g (G, ) is n where g n = e.

3.1.1

Cayley Table

As the group is defined along with a binary operation , we need to define this operation for every pair of
elements in G. To do this in a compact manner we have the following table.
1
Note that the inverse of an element depends upon the individual element and is not unique for a group, unlike the identity
element.

51

3.1. STRUCTURE OF A GROUP

CHAPTER 3. ELEMENTARY GROUP THEORY

Table 3.1: Cayley Table for (G, ). The (i, j) entry of the table gives gi gj . g1 is taken as the identity
element e.
X
g1
g2
g3
g4
...
g1

g1 g1

g1 g2

g1 g3

g1 g4

...

g2

g2 g1

..

...

g2 g4

...

g3

g3 g1

...

g3 g3

...

...

g4

g4 g1

...

...

g4 g4

...

..
.

..
.

..
.

..
.

..
.

..

Constructing the Cayley Table


Note that:

Y
P
O
C
T
F
A

1. a b = a c implies b = c a, b, c G.
proof: Pre-Multiplying a1 on both sides: a1 (a b) = a1 (a c). Since a uniqe identity e,
a1 a = e a G. Hence eb = ec b = c. Hence the result a binary operation on two elements will
produce distinct results, unless they are same. Since along a particular row (or a particular column),
each pair of elements involved in the binary operation is distinct, so is the result of the binary operation.
Hence each element along a particular row (or column) is unique, as there are #G positions along a
row (or column).

DR

2. Now in the previous case, putting a a1 and c b1 , we see that a a1 = a b1 implies


a1 = b1 . Hence each element in G has a unique inverse.
3. a,
G for which
b1
[a, b] = 0, as a b = b a, (a b) is symmetric w.r.t the diagonal. Since a G,
a, a
= 0 and a a1 = e, all es are placed symmetric w.r.t the diagonal.

We now take a fixed example. Consider G = {e, a, b, c, d, f } with the binary operation . Let us construct
the cayley table for (G, ).

1. As there are 5 elements (apart from e) and each of them must have an unique inverse, we see that (since
there are an odd number of elements) at least one of them must be its own inverse. Just by choice we
take a, b, c to be thier own inverses: a1 = a, b1 = b, c1 = c. For the other two elements d and f , we
assume them to be inverses of each other: d1 = f an f 1 = d. Hence aa = bb = cc = df = f d = e.
We now begin by placing es. Notice that they are symmetric about the diagonal.
2. Also note that g e = g g G. Hence the first row and first column are trivially filled.
3. Just by choice, we take a b = c. From this, we get: a b = c
ab=c

(3.1)

a c = a (a b)
= (a a) b
=b

ac=b

(3.2)

using (eq. 3.2): b c = (a c) c


= a (c c)
=a
GO TO FIRST PAGE

52

bc=a

(3.3)

CHAPTER 3. ELEMENTARY GROUP THEORY

3.1. STRUCTURE OF A GROUP

4. Notice that the first row has two vaccant positions. Using (statement. 1), these two positions must be
filled by d and f . Hence consider the following two possibilities:
a d = d:
d=ad

d f = (a d) f

e = a (d f )

e=a

The above statement is incorrect as it states that the identity element is not unique.
Hence the only other option is a d = f .
f =ad

(3.4)

f f = (a d) f

Y
P
O
C
T
F
A
= a (d f )

=a f f =a
Using (eq. 3.4): f d = (a d) d

(3.5)

e = a (d d)

DR

a e = a (a (d d))

= (a a) (d d)

a=dd

Using (eq. 3.6): a (f d) = d d

dd=a

(3.6)

(a (f d)) f = (d d) f

(a f ) (d f ) = d (d f )
af =d

af =d

(3.7)

5. Similarly, we need to determine: (c a), (c b) and (b a):


Using (eq. 3.1): c a = (a b) a
= a (b a)

a (c a) = (a a) (b a)

Using a c = c a :

c = (b a)

ba=c

(3.8)

6. Hence the first row elements as well as thier symmetric couterparts are determined. Notice that the
positions marked by have to be filled either with d or f as the row containing them has the other
symbols. But neither can be used as the column containing each has both d and f . This states that
the initial assumption a b = c is wrong.
GO TO FIRST PAGE

53

3.1. STRUCTURE OF A GROUP

CHAPTER 3. ELEMENTARY GROUP THEORY

Table 3.2: Cayley Table for G = {e, a, b, c, d, f }. Positions marked as have not yet been filled.
X
e
a
b
c
d
f
e

7. Notice that a b = a, a b = b are invalid as we would then get e = a and e = b respectively. With
a b = c also ruled out, we see that the only two options are a b = d and a b = f . Let us take a b = d.
With this we have:

Y
P
O
C
T
F
A
(3.9)

a b= d

Using (eq. 3.9): a (a b) = a d

Using (eq. 3.9):

DR

b=ad

ad=b

(3.10)

a=db

db=a

(3.11)

(a b) b = d b

Using (eq. 3.10): b f = (a d) f


bf =a

Using (eq. 3.12): b (b f ) = b a

Using (eq. 3.13):

bf =a

(3.12)

f =ba

ba=f

(3.13)

b=f a

f a=b

(3.14)

(b a) a = f a

Filling the table now gives:


Table 3.3: Cayley Table for G = {e, a, b, c, d, f }. Positions marked as have not yet been filled.
X
e
a
b
c
d
f

GO TO FIRST PAGE

54

CHAPTER 3. ELEMENTARY GROUP THEORY

3.1. STRUCTURE OF A GROUP

8. Notice that in the row corresponding to a, we have two vaccancies for a c and a f . These must be
filled using c or f , as the other elements are already contained in this row. Also as a c = c (for if it
was, then we would have e = a, which is incorrect), the only assignments for the vaccancies are:
(3.15)

a c= f

a f= c

(3.16)

Using (eq. 3.15):

(a c) c = f c

Using (eq. 3.16):

(a f ) d = c d

a=f c

f c=a

(3.17)

a=cd

cd=a

(3.18)

d=ca
Using (eq. 3.20): (c a) a = d a

ca=d

(3.19)

da=c

(3.20)

Using (eq. 3.18): c (c d) = c a

c=da

Filling the table, we get:


Table 3.4: Cayley Table for G = {e, a, b, c, d, f }. Positions marked as have not yet been filled.
X
e
a
b
c
d
f

DR

Y
P
O
C
T
F
A
e

9. Notice that in the row corresponding to b, there are two vaccancies for b c and b d which must be
filled using c and d as other elements in this row contain the rest of the elements. The option of putting
b c = c (and hence b d = d) is ruled out since we will then get b = e. Hence the only option of filling
the two positions is to put:
bc=d

(3.21)

bd=c

Using (eq. 3.21): (b c) c = d c


b=dc

(3.22)
dc=b

(3.23)

Now, the only vaccant position in the row corresponding to d (for d d) must be filled with f .
dd=f

Using (eq. 3.22): (b d) f = c f


b=cf

(3.24)
cf =b

(3.25)

Finally, in the row corresponding to f , the only vaccancy (for f f ) must be replaced by d.
f f =d
GO TO FIRST PAGE

55

(3.26)

3.1. STRUCTURE OF A GROUP

CHAPTER 3. ELEMENTARY GROUP THEORY

Hence we have the final cayley table:


Table 3.5: Final Cayley Table for G = {e, a, b, c, d, f }.
X
e
a
b
c
d
f

3.1.2

Subgroups

Y
P
O
C
T
F
A

Subgroup: (H, ) is a subgroup of (G, ) if H G and (H, ) satisfies the group properties.
It is called an Abelian subgroup if it also satisfies the commutative law.
We can now infer few general properties:

Every subgroup of a group contains the identity element of the group. As (H, ) is also a group, it has
an identity element. Moreover since the identity element of a group is unique, (G, ) and (H, ) must
contain the same identity element, which is eG . Therefore (H, ) contains eG .

DR

For every group, the set containing just the identity element (of a group) is a subgroup (of that group).
This is becuase identity element is contained in the group (otherwise the group properties would ot
be satisfied for this group). It has an identity by definition, it is its own inverse and is closed under
the group operation by definition. The subgroup containing just the identity element is called a trivial
subgroup.

Cosets

The coset of a subgroup (along with an element of the group) is the set containing the results of the binary
operation between the given element and every element of the subgroup. Since the group in general is nonAbelian, we see that if (H, ) is a subgroup of (G, ) then for some g (G, ) and h (H, ), g h need
not be equal to h g. Therefore we need to specify which sided a particular binary operation is. Hence
have left and right cosets.
Left and Right Cosets: The left and right cosets of (H, ) in (G, ) are defined by, for some g (G, ):
Left Coset :
Right Coset :

g (H, ) = {g h|h (H, )}

(H, ) g = {h g|h (H, )}

(3.27)
(3.28)

We can now look at a few properties of cosets.


1. From the definitions in (eq. 3.27) and (eq. 3.28), the number of elements in the left and right cosets of
a subgroup in a group is equal to the order of the subgroup.
2. Assumption: g (H, ) = (H, ) g = (H, ) g (H, ).
Justification: Since (H, ) forms a group it is closed under the operation. Hence from (eq. 3.27)
we see that g, h (H, ) g h (H, ), and similarly from (eq. 3.28) we have g, h (H, )
GO TO FIRST PAGE

56

CHAPTER 3. ELEMENTARY GROUP THEORY

3.1. STRUCTURE OF A GROUP

h g (H, ). Hence we see that all the elements in g (H, ) and (H, ) g are in (H, ),
g (H, ). Hence we have the justification.

3. Assumption: g (G, ), the idenitity element of (G, ): eG is contained in g (H, ) and (H, ) g.
Justification: It suces to show that g (G, ) such that eG = g h and eG = h g. This is true
as both (G, ) and (H, ) contain eG , so setting g = h = eG , we provide the justification.
4. Assumption: Every element of (G, ) is present in exactly one of the left cosets of (H, ) in (G, ).
Justification: If an element is present in two dierent left cosets, say g1 (H, ) and g2 (H, ), of
(H, ) in (G, ) then g1 , g2 , h (G, ) with g1 = g2 such that g1 h = g2 h. From (eq. 1) of (sec. 3.1.1)
we see that if g1 = g2 , then g1 h = g2 h. Also note that g1 h = g2 h implies that g1 = g2 , which means
that the two left cosets are identical. Hence we have the justification.
5. Assumption: Every pair of cosets (of (H, ) in (G, )) are either disjont or identical.
Justification: In the previous statement we have showed that no two left cosets can share the same
element unless the left cosets are identical, implying that every pair of left cosets is disjoint, unless they
are alike. Hence we have the justification.
6. Assumption: no g (G, ) which is not present in any left coset of (H, ) in (G, ).
Assumption: Consider the coset g (H, ). It suces to show that g g g (H, ) g (G, ). For
g (H, ) this statement is trivially true. Else, it means that h (H, ) such that g = g h. As
(H, ) contains the identity element eG , we see that the previous statement is true, thereby justifying
the assumption.

Y
P
O
C
T
F
A

7. From the assumptions in (statement. 5) and (statement. 6), we see that the cosets of a subgroup in
a group, are all disjoint and they cover all the elements of the group. In other words, they partition
the group, with each partition being a coset. Notice that # elements in (G, ) = |(G, )|. From
(statement. 1) we have that # elements in a left coset = |(H, )|. Hence we see that # left cosets
|(G, )|
needed to partition (G, ) =
. This quantity is denoted by [(G, ) : (H, )] and called the
|(H, )|
index of (H, ). the theorem named after this equation is called Lagranges Theorem.
Index of a Subgroup: The index of a subgroup of a group is the # left cosets of the subgroup required
to partition the group.

DR

[(G, ) : (H, )] =

|(G, )|
|(H, )|

(3.29)

Normal Subgroup: A normal subgroup (N, ) of a group (G, ) is such that the right and left cosets of x in
(N, ) are equal (G, ).
Normal Subgroup (N, ) of (G, ) :

3.1.3

a N = N a a (G, )

(3.30)

Quotient Groups

We see clearly that the coset does not form a group. So we consider the set of all cosets of a normal
subgroup (left or right, it does not matter). Let (N, ) be a normal subgroup of (G, ). Cosider the set
S = {a N |a (G, )}. We claim this set forms a group. To check this we have:
Identity, Inverse and Associativity are trivially satisfies as the one of the cosets contain the identity
and as (G, ) is a group, the elements in it have thier also inverses.
Closure: We need to show that: (a N ) (b N ) S (a N ), (b N ) S.
For this it suces to show that (a N ) (b N ) = (c N ) for some c (G, ). Now notice that:
(aN ) (bN ) = a ((N b) N )
normal subgroups coset = (a b) (N N )
= (a b) N
GO TO FIRST PAGE

57

3.1. STRUCTURE OF A GROUP

CHAPTER 3. ELEMENTARY GROUP THEORY

Now since a, b (G, ) we have: (a b) (G, ), thereby justifying the assumption.


Hence we see that S forms a group under the operation. This group formed by the elements of S is
called the quotient group of (G, ) and is represented as: (G, ) / (N, ).
Quotient Group: The quotient group of (G, ) is the set {g N |g (G, )} containing all the cosets
(right or left) of its elements in of its normal subgroup. It is represented as (G, ).
More generally, since the cosets of elements in a subgroup partition the group, the quotient group is
the group formed by the partition of (G, ) along with the operation defined analogously.

3.1.4

Normalizers and centralizers

For any two elements A, B of a group: (G, ), we note that unless the group is Abelian, the result of the
binary operation of the two elements depends upon the order in which they are considered: AB = B A.
The dierence between these two inequal quantites is defined at the commutator between A and B.
Commutator: The commutator for any two elements A and B of a group (G, ) is defined as: [A, B] =
A B B A.
From the definition of the commutator we can verify that g1 , g2 (G, ): [g1 , g1 ] = 0 and [g1 , g2 ] = [g2 , g1 ].
Based on this commutator we have the following sets associated with the group elements:
Center of a group: The center of a group Z (G, ) is the set of all elements in (G, ) that commute with all
the elements in (G, ). Hence Z (G, ) = {g (G, ) |g x = x gx (G, )}.
Since in an Abelian group, all the elements commute with the each other, we see that the centre of an abelian
group is the group itself, i,e; Z (G, ) = (G, ) Abelian groups (G, ).
For any subgroup (of a group) we define the following two groups:
Normalizer: The normalizer of a subgroup (H, ) of (G, ) is the set:

N (H, ) = g| (g hi ) g 1 (H, ) g (G, )


(3.31)

Y
P
O
C
T
F
A

Centralizer: The centralizer of a subgroup (H, ) of (G, ) is the set:

Z (H, ) = g| (g hi ) g 1 = hi g (G, )

DR

(3.32)

We can now look at some properties of the centralizers and the normalizers of a subgroup.

1. Immediately one can see that the elements in Z (H, ) form a subset of those in N (H, ).

2. The centalizer of a subgroup forms a subgroup (of a group) of the underlying group, i,e; Z (H, ) and
(H, ) are subgroups of (G, ).
Justification: Notice that the definition of the centralizer can also be given as the set of elements of the
subgroup that commute with each element in the group. Z (H, ) = {g|g h = h g, g (H, )}.
With this denition we can verify the group properties of Z (H, ).
Identity: Trivially identity commutes with all the elements of the group and hence it is in Z (H, ).
Inverse: If x Z (H, ) then x1 Z (H, ).
Justification: We see that x y = y x, y Z (H, ). Notice that if It suces to show:
x1 y = y x1 g (G, ). Since x (G, ) which is a group, x1 (G, ) such that
x x1 = eG .
xy =yx

x (x y) x1 = x1 (y x) x1
1


x x y x1 = x1 y x x1

y x1 = x1 y

Hence justifying the assumption.


Closure: a x Z (H, ) x, a Z (H, )
Justification: It suces to show that (a x) y = y (a x). Notice that since x, y and a are
GO TO FIRST PAGE

58

CHAPTER 3. ELEMENTARY GROUP THEORY

3.2. GROUP OPERATIONS

elements of Z (H, ), they commute with all elements in (G, ).


(a x) y = a (y x)
= y (a x)
Hence justifying the theorem.
Hence from the above statements it can be seen that Z (H, ) is group, moreover it is a subgroup of
(G, ).
Note that the center of a group is not to be confused with the centralizer of a subgroup in a group. The
former is the set of elements in (G, ) which commute with every element in (G, ), while the latter is the
set of all elements in (G, ) that comuute with every element in the subgroup (H, ). Hence the latter is
defined with respect to a subgroup unlike the former. However, both the center (of a group) as well as the
centralizer (of any subgroup in that group) are subgroups of the underlying group.

3.2
3.2.1

Group Operations
Direct product of groups

Direct product: The direct product of two groups (H, ) and (K, ), represented by (H, ) (K, ) is a
group containng (h, k), h (H, ) , k (K, ).
We can define the group operations on (H, ) (K, ) are:

Y
P
O
C
T
F
A
(h1 , k1 ) (h2 , k2 ) = (h1 h2 , k1 k2 )

(3.33)

The idenity element of this direct product group is the tuple containing the idenity elements of the individual
groups:(eH , eK ). The inverse of an element is also the tuple containig the inverse of the correspoding elements
from the two groups (in the product).

DR
3.2.2

Homomorphism

Homomorphism: For any two groups (G, ) and (H, ), a group homomorphism is a function f : (G, )
(H, ) whose action on the elements of a, b G is given by:
f (a b) = f (a) f (b)

(3.34)

Notice that it preserves the group structure, i,e; if the elements in G form a group, so do the elements in
the set H. We can now see some properties of f . Let eG and eH denote the identity elements of the groups
(G, ) and (H, ) respectively.
Using (eq. 3.34): f (a eG ) = f (a) f (eG )
f (a) = f (a) f (eG )
1

(f (a))

f (a) = (f (a)) (f (a) f (eG ))

1
eH = (f (a)) f (a) f (eG )
eH = f (eG )

(3.35)

Hence we see that the identity element of (G, ) is mapped to the identity element of (H, ). We now see
the mapping of the inverse of an element in (G, ). From (eq. 3.34), for u (G, ):

f (u) f u1 = f (eG )

GO TO FIRST PAGE

Using (eq. 3.35): = eH

1
1
[f (u)] f (u) f u1 = [f (u)] eH

1
f u1 = [f (u)]
59

(3.36)

3.2. GROUP OPERATIONS

CHAPTER 3. ELEMENTARY GROUP THEORY

Hence we see that f maps the inverse of every element to the inverse of its image in (H, ).
Types of Homomorphisms
Isomorphism: It is a homomorphism f : (G, ) (H, ), where f is one-to-one. Hence f 1 too is a
homomorphism.
Automorphism: It is a iomorphism from a group onto itself: f : (G, ) (G, ).
Endomorphism: It is a homomorphism from a group onto itself: f : (G, ) (G, ). Note: f is not
one-to-one.
Kernel
We now consider the elements in (G, ) that are mapped to the same element in (H, ) by the homomorphism.
As the identity element eH is unique to (H, ), the properties of this element can be referrd to as the properties
of the group. Hence, we consider all the elements in (G, ) that are mapped to eH 2 . The set containing all
such elements is called the kernel of f , denoted as ker(f ).
ker(f ) = {g (G, +q ) |f (g) = eH }

(3.37)

The kernel is useful in associating a homomorphism to a set. That is we can can check the containment of an
element in a set by checking the action of the correspoding homomorphism (associated with that set) on the
element in question. When we talk of (G, ) and (H, ) being linear codes, or vector spaces over the field Fq
we see that , become addition modulo q: +q . Notice that the identity element here is the null vector. So
the kernel of the homomorphism (in this case it is a linear map represented by a matrix) is the set of vectors
in (G, +q ) that are mapped to the null vector in (H, +q ). We now pick a linear map such that the kernel of
this linear map is the linear code. The advantage of doing this is that we can quickly identify a code element
by checking if it gives the null vector upon action of this linear map. Such a linear map (represented by a
matrix) is called the Parity Check Matrix of the linear code. It is used to check for errors in the codewords.
If there is an error in the the codeword, the vector undergoes a translation such that the new vector no more
belongs to the vector space, and hence does not lie in the kernel or the pairity check matrix. Therefore upon
action of this matrix it will not give the null vector, thereby indicating the presence of an error.

DR
3.2.3

Y
P
O
C
T
F
A

Conjugation

Two elements of a group a, b (G, ) are said to be conjugate to each other if g (G, ) such that:
g a = b g. Restating the previous statement, we have: b is conjugate to a if g (G, ) such that
b = (g a) g 1 . We can further formalize this by looking at the RHS of previous equation as a function
of a: b = f (a), where f (a) = (g a) g 1 . This function, or automomorphism (as it takes an element in
(G, ) to itself) is called inner automorphism or conjugation.
Conjugation or Inner-Automorphism:
It is an automorphism defined by f : (G, ) (G, ) such that for

some a, g (G, ), f (a) = g a g 1 .


We now consider the set of all elements (in a group) that are conjugate to a given element (from the same
group). That is the set {b|g a = b g, g (G, )}. Note that since a is
in the definition
the free variable

of this set, it must be indexed by a. We can also write this set as: Sa = g a g 1 |g (G, ) . Such
a set is called the conjugacy class of a.
Conjugacy Class: Conjugacy class of an element (in a group) is the set containing all the elements from that
group which are conjugate to the given element.

Cl(a) = g a g 1 |g (G, )
(3.38)
We can now look at some properties of this set:
2

Note that, from (eq. 3.35), eG is one of them.

GO TO FIRST PAGE

60

CHAPTER 3. ELEMENTARY GROUP THEORY

3.3. GROUP ACTIONS

1. Assumption: If g is an element of an Abelian group, g (G, ), #Cl(a) = 1, that is Cl(a) is a


singleton set.
Justification: To see this, we try to write the definitions of the abelian group, and the conjugacy class
of an element, in the same form.
Cl(a) = {b|g a = b g,
G = {b|g b = b g,

g (G, )}

g (G, )}

Notice that the conjugacy class of an element from an Abelian group has t satisfy Observing the two
conditions in the above definitions, we see that the only solution, or satisfying assignment for a (it is the
only free variable) is a = b. Hence we see that: Cl(a) = {b|g a = b g, a = b, g (G, )} {a}.
Hence we justify our assumption.

3.3
3.3.1

Group Actions
Generating set of a group

As the elements of a group satisfy certian properties, given a set X it must be possible to construct (or
mechanically generate from this set) a subset following these properties. In otherwords, we can take a set of
elements and then generate a group such that group properties are satisfied by construction. Such a set is
called a generating set of a group and the elements of this set are called generators.
Generating Set of a Group: A generating set of a group (G, ) is a set X such that every element of (G, )
can be expressed as a combination (using ) of finite subset of X. We donte it by: (G, ) = X.
Since a group is also a set of elemets, it can also be used to generate another group. More generally we can
use more than one group to generate a single group. That is elements in groups (H, ) , (G, ) can be used
to generate the elements of (G, ) In this case we say that (G, ) is a direct sum of subgroups (H, ) and
(K, ). But for this to be possible notice that (H, ) has to be a normal subgroup of (G, ).
Direct sum of Groups: A group (G, ) is a direct sum of groups (H, ) and (K, ), represented as (G, ) =
(H, ) (K, ) if the generating set of (G, ) H K, and (H, ) and (K, ) are normal subgroups of
(H, ).

DR
3.3.2

Y
P
O
C
T
F
A

Symmetric group

We now try to explore the properties of a given set X using groups. By exploring the properties, we mean
to look at all possible relations between the elements of the set. For this we need to consider the set of all
functions {h : X X}, since each function relates one element of X with another. For the sake of simplicity,
we avoid relations between a given element and many other elements. Hence we only consider all possible
one-to-one functions (or bijections). This set containing all possible bijections from X to X also forms a
group under the composition (of two functions) operation.
Symmetric Group: A symmetric subgroup on a set X is a group formed by the set G cotaining all possible
bijections f : X X, under the binary operation which denoted the composition of two functions. This
group (G, ) satisfies the group properties.
We can now verify the group properties of this symmetric group:
Closure: f, g (G, ) (f g) (G, ).
For this it suces to show that (f g) is also a bijection. Notice that x X, (f g) (x) = f (g(x)).
Since g is a bijection g : X X we see that g(x) is injective and surjective [g (X) = (X)]. Hence all
values of g(x) are distinct ( distinct x X). As f is again a bijection from X to X, it will take each
of these distinct g(x) (corresponding to distinct x) to some f (x) X. Therefore all values of f (x) are
distinct ( distinct x X). Hence f g is injective. As f (X) = X, we see that f (g(X)) = X. Hence
f g is both surjective and injective, thereby bijective, hence justifying the assumption.
GO TO FIRST PAGE

61

3.3. GROUP ACTIONS

CHAPTER 3. ELEMENTARY GROUP THEORY

Associative: , , (G, ), ( ) = ( ) .
The above statement is true as we have: [ ( )] x = [( ) ] x = ( ((x)))
is in general true for any three functions. Hence we justify the assumption.

x X. This

Identity: eG (G, ) such that f (G, ), eG f = f eG = f .


It suces to show that a bijection eG such that f (G, ) and x X, we have: f (eG (x)) =
eG (f (x)) = f (x). We can now see that eG is nothing but the identity map: eG IX , which clearly is
a bijection and hence (G, ). Hence we justify the assumption.

Inverse: f (G, ) g (G, ) such that f g = g f = eG .


It suces to show that f (G, ) g (G, ) such that x X we have: f (g(x)) = g (f (x)) = x.
We can now see that g is nothing but the inverse map: f 1 , which clearly is a bijection since f is a
bijection. Hence we see that there is an inverse in (G, ) for every element in it, thereby justifying the
assumption.

3.3.3

Action of Group on a set

In the above section (sec. 3.3.2) we considered a group of bijections on a set X. Now we consider a general
group (Q, ) and a homomorphism: h from (Q, ) to the symmmetric group of X: (G, ). Every element of
(Q, ) is mapped to a bijection on X. As a result we can describe the action of the bijection (on some x X)
as the action of the element of (Q, ) (which has been mapped to this bijection by the homomorphism h).
Therefore the homomorphism is defined as: h : (Q, ) (G, ) such that:

Y
P
O
C
T
F
A
q1 , q2 (Q, ) ,

h (q1 q2 ) = h (q1 ) h (q2 )

(3.39)

Note that h(q) is a bijection q (Q, ), that is in the above expression: h (q1 ) : X X. The homomorphism
maps the identity of (Q, ) to identity element of (G, ) the inverse of every element in (Q, ) is mapped to
the inverse of the corresponding elements map.

DR

q (Q, ) ,

h (eQ ) = IX

1
h q 1 = (h(q))

Now when we want to describe the operation [h (q1 )] (x) for x X, we denote it as action of q1 on x: q1 x. Similarly {[h(q)] (x)|q (Q, )} can be denoted as: {q x|q (Q, )} and hence QX = {q x|x X, q (Q, )}.
This operation is called action of a group on a set, or group action. The group action can be described as
a function that takes an element of (Q, ) and an element of X, giving another element of X. Hence the
group action on a set is described as another homomorphism from (Q, ) X to X.
Group Action: The Group Action of (Q, ) on a set X is defined as a homomorphism: f : (Q, ) X X
such that: for x X, q (Q, ) we have: f (q, x) = q x which represents q x [h(q)] (x) where h is as
defined in (eq. 3.39).

3.3.4

Orbits and Stabilizers

We now need to look at the geometric picture by considering X to be a set of points (in R, C, or anything.)
and the action of elements q (Q, ) (which is the action of h(q) as defined in (eq. 3.39). When q x = x
where x X we say that x has been transported to the point x and the path taken by x is the set {x, x }.
Consider the set (Q, ) x = {q x|q (Q, )}. This says all the points which can be obtained by acting the
elements of (Q, ) on x. In other words it gives the path traced by the point x X upon action of elements
in (Q, ). This path represented by the set of points is called the Orbit of x, denoted by O(Q,) (x).
Orbit : Orbit of an element x X :

O(Q,) (x) = (Q, ) x {q x|q (Q, )}

(3.40)

Since x is a finite set, it may happen that the path traced by x contains x itself. Suppose the path taken by
x has points {x, x1 , x2 , . . . xm , x . . . xn , x . . . xk , x . . . } we have:
GO TO FIRST PAGE

62

CHAPTER 3. ELEMENTARY GROUP THEORY


q1

q2 qm1

qm

qm+1 qn1

3.3. GROUP ACTIONS


qn+1 qk1

qn

x x1 xm x xn x xk . . . which can now be rewritten


qn+1 qk1 ...
q1 q2 qm1 qm
qm+1 qn1 qn
as: x x x . . . . We now see that the set of operators: Sx (Q, ) = {q1 q2 qm1 qm , qm+1 qn1 qn , q2 qk1 . . . } leave the point x
invariant. Now since q1 , q2 , . . . , qm , . . . , qn , . . . (Q, ) which s closed under , we see that the operators
in Sx (Q, ) are also in (Q, ). Moveover, the identity operator eQ Sx (Q, ) (since it corresponds to
the identity mapping) and since each of these operators correspond to a bijection (on X), an inverse can be
defined easily which also is a bjection that leaves x invariant. Hence Sx (Q, ) contains the inverse of element
element in it. Note that the set Sx (Q, ) is closed under , has an identity element and evey element in this
set has its inverse in the same set. Therefore the set Sx (Q, ) along with the operation forms a group,
and trivially a subgroup of (Q, ). This subgroup is called the Stabilizer Subgroup of (Q, ).
Stabilizer Subgroup: The stabilizer subgroup Sx (Q, ) (of a group) consists of elements that leave an element
x X invaraint.
Sx (Q, ) = {q (Q, ) |q x = x}

3.3.5

(3.41)

Orbit Stabilizer theorem

We now have a theorem similar to lagranges theorem in (eq. 3.29), relating the sizes of O(x), Sx (Q, ) and
(Q, ).

Y
P
O
C
T
F
A

Theorem: For any group, the number of elements in the orbit, with respect to any element in the set and the
stabilizer subgroup (of this group) with respect to the same element of the set are related by:
|O(x)| |Sx (Q, )| = |(Q, )|

DR

(3.42)

Proof: Let us denote Sx (Q, ) S(x). Consider the slight modification of S(x) (eq. 3.41) as:
Hy (x) = {q (Q, ) |q x = y}

Hx (x) = S(x)

(3.43)

(3.44)

We now claim that the sets Hy (x) for dierent y are disjoint.
Assumption: y1 , y2 X, y1 = y2 the sets Hy1 (x) and Hy2 are disjoint.
Justification: It suces to show that if q Hy1 Hy2 , then y1 = y2 . This is true since in the former case
we will have: q x = y1 and
q x = y2 , which clearly implies y1 = y2 , thereby justifying the assumption.
We now have: (Q, ) =
Hy (x) and hence:
yO(x)

|(Q, )| =

yO(x)

|Hy (x)|

(3.45)

We now claim that each set Hy (x) for every value of y X has the same cardinality which can be equated
to that of #Hx (x) which is, from (eq. 3.44): |S(x)|.
Assumption: For every y X, #Hy (x) = |S|.
Justification: It suces to show that a bijection f : S Hy (x), y X. We now try to construct this
bijection. For a fixed t Hy (x) and h S define the bijection as: f (h) = t h. Notice that (t h) Hy (x)
since (t h) x = t (h x) which from (eq. 3.41) = t x y using (eq. 3.43). To show that f is a bijection,
we need to show that f is injective and surjective.
Injective: For f (h1 ) = f (h2 ), then t h1 = t h2 which implies h1 = h2 . So if h1 = h2 then f (h1 ) = f (h2 ).
Surjective: We need to show that every element in Hy (x) is covered in the image of f , for some x X. It
suces to show that every element in Hy (x) can be represented in the form of the image of f . Equivalently
GO TO FIRST PAGE

63

3.3. GROUP ACTIONS

CHAPTER 3. ELEMENTARY GROUP THEORY

we can show that u Hy (x), t Hy (x) such that t1 u S.


From (eq. 3.43) :

ux=y

x=t

Using (eq. 3.46) :

& tx=y

(3.46)

= t (u x)
1
x= t u x
1

Now from (eq. 3.41) we have: t1 u S(x). Hence we have the justification.
Now we see that f is a bijection and y X, #Hy (x) = |S|. In (eq. 3.45) we can replace the sum by a
product as all the entities being summer over have the same value. We then have the statement as in the
theorem (eq. 3.42). Hence we have proved the theorem.

DR

Y
P
O
C
T
F
A

GO TO FIRST PAGE

64

Part III

Y
P
O
C
T
F
A

PREREQUISITES - QUANTUM
MECHANICS

DR

65

DR

Y
P
O
C
T
F
A

Chapter 4

Identical Particles
4.0.6

Describing a two state system

Consider the system comprising of 2- two state systems. For example, consider a system of two electrons.
The state of the two state system is given by the tensor product of the individual states of the system. Let
vectors |k1 and |k2 represent the individual states of the systems (1) and (2) respectively. (From now, it
is implicit that the subscript denotes the particle number.) The state of this composite system is given by
|k1 k2 or, equivalently, |k2 k1 . Physically, we see no reason as to why we must prefer one to the other, but
however mathematically they are orthogonal states.

Y
P
O
C
T
F
A
k1 k2 |k2 k1 = k1 ,k2

(4.1)

So, it is now evident that if we are given a state of the system, we do not know a priori whether the system
is in state |k1 k2 or in |k2 k1 . (In other words, if we are told that the state of the composite system is |a, b
then, we do not know whether the state of the first system is |a or |b.). More generally, by the principle
of superposition, the state of the two state system can be any linear combination of the states |k1 k2 and
|k2 k1 :

DR

| = c1 |k1 k2 + c2 |k2 k1

(4.2)

Now, when a measurement (given by some measurement operator) is performed on this composite system
given by |, the eigenvalues produced by the two states |k1 k2 and |k2 k1 will be identical (since eigenvalues
are just numbers and k1 k2 = k2 k1 ). So, now dierent eigenkets of | will have the same eigenvalues, thereby
introducing degeneracy into the system. This is called the exchange degeneracy.

4.0.7

Permutation operator

In the previous subsection, we said that we could describe the same composite system using two orthogonal
states. If the state of the system is |k1 k2 , and we interchange the particles 1 and 2, then we get the state
|k2 k1 . So, the system is physically the same as before. To do this exchange of particles, we define an operator
called the permutation operator, with the following property:
P21 |k1 k2 = |k2 k1

(4.3)

From the above definition, it is evident that P21 P12 . Also,


2

P21 P21 |k1 k2 = P21 |k2 k1 |k1 k2 (P21 ) = I


2

Also since (P21 ) = I, and the eigenvalue of I is 1, we may say that the eigenvalues of P21 are 1. P12 is also
Hermitian.
67

CHAPTER 4. IDENTICAL PARTICLES


Hence, the permutation operator changes the state of particle 1 to |k2 and that of particle 2 to |k1 . Let us
now take some operator T where T = T1 T2 . The action of T is defined as:
T1 |t1 t2 = t1 |t1 t2

(4.4)

T2 |t1 t2 = t2 |t1 t2

(4.5)

Now, applying P12 on both sides of (eq. 4.4), we have:

P12 T1 |t1 t2 = t1 P12 |t1 t2

Since P12 is unitary (very easy to check from above assumptions), we have:

P12 T1 P12
P12 |t1 t2 = t1 P12 |t1 t2

P12 T1 P12
|t2 t1 = t1 |t2 t1

Now, on comparing the above equation with (eq. 4.5), we obtain the relation:

P12 T1 P12
= T2

(4.6)

This shows that the permutation operator P12 , can permute the particle number of the operators as well.
Let us now take a general Hamiltonian describing the two state system:
p21
p2
+ 2 + Vint (|x2 x1 |) + V1 (x1 ) + V2 (x2 )
(4.7)
2m 2m
Let us now see the action of the permutation operator, or in other words, the change in the Hamiltonian of
the composite system under the exchange of the two particles. So, we have:
H=

Y
P
O
C
T
F
A
p21
p2

P12 + P12 2 P12


+ P12 V (|x2 x1 |)P12
+ P12 V1 (x1 )P12
+ P12 V2 (x2 )P12
2m
2m
p2
p2
2 + 1 + Vint (|x1 x2 |) + V2 (x2 ) + V1 (x1 ) H
2m 2m

P12 HP12
= P12

DR

P12 HP12
=H

(4.8)

Hence, we see that the Hamiltonian of the composite does not change under the exchange of the two particles.
d
Hence, [H, P12 ] = 0, and P21 = 0. Hence, P12 is a constant of time.
dt

4.0.8

Symmetry and Asymmetry in the wave functions

From (eq. 4.8), we see that, P12 is a constant at all times. This means that if the action of P12 on | is
known initially, then it is known for all times. We still have the physical requirement that when the state
produced on acting P12 on any of the states must not be dierent from P12 . With this physical requirement,
we see that only those states which are invariant under the action of P12 are physically consistent. Hence,
we must look for the eigenstates of P12 . We already know that the eigenvalues are 1. The corresponding
eigenstates of P12 are:
1
|+ = (|k1 k2 + |k2 k1 )
2
1
| = (|k1 k2 |k2 k1 )
2

(4.9)
(4.10)

These two eigenstates are the only possible states that are physically valid out of all the states given in (eq.
4.2). Therefore, any composite system can only exist in one of the two states.
The eigenstate |+ is such that, if we exchange the two particles of the composite system (exchange the
particle indices k1 and k2 ), then the state remains the same. In other words, the state of the system is
symmetric under the exchange of the two particles. On the other hand, the eigenstate | is such that, if
we exchange the two particles in the system, the state of the composite system picks up a negative sign. In
other words, the system is asymmetric under the exchange of the two particles.
GO TO FIRST PAGE

68

CHAPTER 4. IDENTICAL PARTICLES

4.0.9

Extending to many state systems

We may now extend the idea of the two state system to a system comprising of N identical particles. The
state of the system can now be given by any permutation of:
| = |k1 k2 k3 . . . ki ki+1 . . . kj kj+1 . . . kn

(4.11)

The permutation operators {Pij } are defined:


Pij | = |k1 k2 k3 . . . kj ki+1 . . . ki kj+1 . . . kn

(4.12)

(Exchanges the particles i and j in the system.). Here again, the system should be physically the same
under any of the permutations and therefore, we must consider the only the eigenstates of the permutation
operators as the possible states of the composite system. These eigenstates again are the symmetric and the
asymmetric states.

4.0.10

Bosons and Fermions

Let us take the systems that are described by the asymmetric wave-function | . If the two particles in the
composite system were in the same state, that is k1 = k2 , then we see that the two terms in the wave-function
cancel each other. Hence if the two particles are in the state, the wave-function vanishes. A more physical
interpretation of the scenario is that, we cannot expect both the particles to be in the same state. These
particles are called fermions and the famous rule that no two fermions can be in the same quantum state is
called the Pauli Exclusion Principle. The statistics used to study fermions is called the Fermi-Dirac statistics.

Y
P
O
C
T
F
A

On the other hand, let us examine the systems given by the symmetric wave-function |+ . Here, if the
two particles constituting the composite system are in the same quantum state, i,e; k1 = k2 , then we see that
|+ = |k1 k1 = |k2 k2 . Therefore, two quantum systems can in fact be in the same state. We can also generalize this (without proof) to the statement that any number of particles can occupy the same quantum state.
There particles are called Bosons and the statistics used to study them is the Bose-Einstein statistics. An
important consequence of this is the Bose-Einstein condensation where, at absolute zero temperature, all
the quantum states of the system come into a single state (of the lowest energy), and as a result, this energy
state becomes macroscopically large.

DR

GO TO FIRST PAGE

69

CHAPTER 4. IDENTICAL PARTICLES

DR

Y
P
O
C
T
F
A

GO TO FIRST PAGE

70

Chapter 5

Angular Momentum
Any Quantum system has to be identified with some property that is conserved in that system. This is true
for classical systems as well. In many cases we define a quantum system by its total Hamiltonian (as its total
energy remains conserved). We can also consider another observable, the total Angular Momentum. It
is represented by J. Every quantum system has a total angular momentum associated with it (just like how
every quantum system has a position and momentum associated with it). Also just like X and P, J is also an
vector observable (unlike the Hamiltonian, which is a scalar). In the cartesian system of coordinates

Y
P
O
C
T
F
A

J = Jxi + Jy j + Jz k

(5.1)

Going back to our classical concepts, we se that the angular momentum is defined as:

DR

L =
r
p

(5.2)

The same definition still carries on to the quantum case. The dierence is that all those quantities those are
variables in the classical case (eq. 5.2) are now operators. The quantum case becomes:

J =
x
p

(5.3)

From expanding the above equation (eq. 5.3) we can extract the relation between the total angular momentum
and the known observable. Let us now solve the cross product by considering the components or the physical
quantities. To solve the cross product we may consider the determinant form of the cross product.

i
j
k

A B = Ax Ay Az
Bx By Bz

Solving for J , we get:

J = (ypz zpy )i + (zpx xpz )j + (xpy ypx )k

On equating the components of J , we get:


Jx = (ypz zpy )

(5.4)

Jz = (xpy ypx )

(5.6)

Jy = (zpx xpz )

(5.5)

The above three equations (eq. 5.4, eq. 5.5 and eq. 5.6) represent the various components of the angular
momentum operator. From this we can extract more about the operators. Firstly we should see if they
71

CHAPTER 5. ANGULAR MOMENTUM


commute. Let us start by finding the commutator of Jx and Jy operators. For this we need to consider
equations (eq. 5.4, eq. 5.5 and eq. 5.6).
[Jx , Jy ] = [(ypz zpy ), (zpx xpz )]

[ypz , zpx ] [ypz , xpz ] [zpy , zpx ] + [zpy , xpz ]


To solve the above commutations we need to use the expansions:
[A, BC] = [A, B]C + B[A, C]
[AB, C] = A[B, C] + [A, C]B
From the above two expressions,we can make a new commutation expansion for [AB,CD] which shall be
useful in evaluating the terms of the above expression:
[AB, CD] = [AB, C]D + C[AB, D]
(A[B, C] + [A, C]B)D + C(A[B, D] + [A, D]B)

A[B, C]D + [A, C]BD + CA[B, D] + C[A, D]B


Therefore, we write:

[AB, CD] = A[B, C]D + [A, C]BD + CA[B, D] + C[A, D]B

(5.7)

Y
P
O
C
T
F
A

Using the above equation (eq. 5.7) we can see how each term becomes. In each term we can make the right
substitutions for A, B, C and D. Then we get the results:

DR

Table 5.1: examining the terms of the commutator


Terms [B,C] [A,C] [B,D] [A,D]
Result
term 1
i
0
0
0
A[B,C]D iypx
term 2
0
0
0
0
0
term 3
0
0
0
0
0
term 4
0
0
0
i
C[A,D]B ixpy

Therefore, the result is: i(xpy ypx ). If we refer back to (eq. 5.6), we can see that this is the expression
for Jz , which is the z-component of the total angular momentum (eq. 5.1). Therefore, we have the relation
[Jx , Jy ] = iJz . Similarly, we can work out the other relations too as [Jy , Jz ] = iJx and [Jx , Jz ] = iJy .
The relations can be summarized as:
[Jp , Jq ] = ipqr Jr

(5.8)

The equation above (eq. 5.8) gives the summary of all the commutation relations discussed above. Let us now

go back and see why we introduced J . We said that we would like to associate a total angular momentum
to a quantum system in order to characterize it. Now we have seen some way of identifying a system. A
quantum system may also exist in various phases or forms. These various forms or phases of a quantum

system are called the states of a quantum system. J does not say anything about the states in which the
quantum system may exist. To identify the states, we need some property that is dierent for each of the

individual components 1 and that can be compared with J . For this, we can take some component of the
total angular momentum. Take for example the z-component (this is just a convention) Jz . The number of
values of Jz will tell us how many states the system exists in, because each state will have a contribution

to Jz . So, on comparing J and Jz for a system, we can see how many states the system exists in. There

is another minor problem before we proceed. The two properties J and Jz can be compared. When we say
that two properties can be compared, we mean that there exists a common basis for the two operators. In

other words, the two operators commute. Comparing the two properties J and Jz is slightly weird because
1

If it is not dierent, the states shall be indistinguishable.

GO TO FIRST PAGE

72

CHAPTER 5. ANGULAR MOMENTUM

the former is a vector and the latter is a scalar. So, let us consider J2 instead of the vector J . This would
make the comparison logically right. Now we have:
J2 = J2x + J2y + J2z

(5.9)

From the above expression itself, it is clear that [J2 , Jk ] = 0, k {x, y, z}. Since J2 commutes with Jz ,
these two operators can be diagonalized simultaneously. This means they have simultaneous eigen kets. Let
us consider the equation:
J2 |, = |,
Jz |, = |,

(5.10)
(5.11)

Here |, denotes the simultaneous eigen ket of J2 and Jz , with dierent eigen values. We know J2 is a
property of a system that exists in various states, each state corresponding to a value of Jz . If we expect
that a particular system must exist in some given number of states, then we must expect those many values
of Jz , (corresponding to each state), for a single value of J2 (corresponding the system as a whole). So, in
mathematical terminology, if we expect a system to exist in n dierent quantum states, then we must expect
n dierent eigen values for Jz (corresponding to each state) for a single eigen value of J2 (corresponding to
the system). Going back to the above eigenvalue equations (eq. 5.10 and eq. 5.11), we can now claim that:
for every , there shall be a number of values of . The number of such s for a given shall tell us the
number of states that the system exists in. So, till now we have argued with physical reasons. We can try to
verify this mathematically too. From (eq. 5.11) we have:

Y
P
O
C
T
F
A
J2z |, = Jz |, 2 |,

(5.12)

Therefore, on subtracting the equations (eq. 5.12) from (eq. 5.10), we get:

DR

(J2 J2z )|, = ( 2 )|,

, |(J2 J2z )|, = , |( 2 )|,

On the LHS, from equation (eq. 5.1) we can write (J2 J2z ) as (J2x + J2y ). The RHS, shall become ( 2 )
if we assume that the simultaneous eigenkets are normalized. Hence, we have:
, |(J2x + J2y )|, = ( 2 )

The LHS of the above equation is the expectation value of the a positive operator. The LHS is therefore a
positive definite quantity. Hence, we can write that:
0 ( 2 )
2

(5.13)

Therefore, from the above equation (eq. 5.13), we can see that the value of 2 is bounded by . This means
there are a finite number of states for a finite value of . In other words, there are two distinct bounds for the
value of . Let the extreme values of be denoted as min and max . But to examine all the states (values
of between min and max ) we need to construct some operator that can help us to traverse through each
of the states from min up to max and vice-versa. A higher value of corresponds to a higher value of
angular momentum in the z-direction. We call call this a higher state.
Let us now define two operators that take us to a higher and a lower state respectively, from a given state.
These are the raising and lowering operators represented as J+ and J . These operators are defined
as:

GO TO FIRST PAGE

J+ = Jx + iJy

(5.14)

J = Jx iJy

(5.15)

73

CHAPTER 5. ANGULAR MOMENTUM


The action of these operators can be defined as:
J+ |, = c+ |, ( + )

(5.16)

J |, = c |, ( )

(5.17)

J+ |, max = 0

(5.18)

and more importantly:


J |, min = 0

(5.19)

We can just pause for a moment and explore the mathematics of the raising and lowering operators. Let us
look at some commutation relations:
[J+ , J ] = [(Jx + iJy ), (Jx iJy )]

RHS :

[Jx , Jx ] + i([Jy , Jx ] [Jx , Jy ]) + [Jy , Jy ]

using equation (eq. 5.8)

0 + i(Jz + Jz ) + 0

[J+ , J ] = 2iJz

(5.20)

Similarly, we can see:


[Jz , J+ ] = [(Jz , (Jx + iJy )]
RHS :

[Jz , Jx ] + i[Jz , Jy ]

Y
P
O
C
T
F
A
iJy + Jx

[J+ , Jz ] = iJ+

(5.21)

similarly, we also have: [J , Jz ] = iJ

(5.22)

There is another important relations that we should work out as we would be using it shortly. Let us simplify
the products J+ J and J J+ :

DR

J+ J = (Jx + iJy )(Jx iJy )

RHS :

J2x + i(Jy Jx Jx Jy ) + J2y

with equation (eq. 5.9), we can replace J2x + J2y with J2 J2z
J2 J2z + i(Jy Jx Jx Jy )
J2 J2z + i[Jy , Jx ]

with equation (eq. 5.8) we have: J+ J = J2 J2z + Jz


Similarly,

J J+ = J
2

J2z

Jz

(5.23)
(5.24)

Hence with this many commutators, we can proceed to see what the ladder operators (the raising and the
lowering operators) do to the states of a system and how to find the number of states in which a system
exists. Let us start by the assumption made in the equation (eq. 5.18):
J+ |, max = 0
since any operator acting on 0 would give 0, we can have:
J J+ |, max = 0

from equation (eq. 5.24): (J2 J2z Jz )|, max = 0


2
on expanding: ( max
max )|, max = 0

We know that |, max cannot be a null ket since it represents a state of some system. So expression
preceding the ket must be equal to 0. On equating the eigen value to 0, we have:
2
max
max = 0

= max (max + )
GO TO FIRST PAGE

74

(5.25)

CHAPTER 5. ANGULAR MOMENTUM


Similarly, on considering equation (eq. 5.23), we have:
= min (min + )

(5.26)

On comparing the above two equations (eq. 5.25 and eq. 5.26), we see that:
max = min

(5.27)

We know that |, min is the lowest possible state and |, max is the highest possible state. The raising
operator can be applied repeatedly to the lowest state to get it to the highest one. Each time we apply the
raising operator, we get to a higher state. So if we apply the raising operator n times (where n is the number
of states) to |, min , we reach |, max . Each time we act a state with J+ , the value of increases by .
Therefore:
max = min + n

n number of states
n
from equation (eq. 5.27): max =
2

To eliminate the factor of , Let us define: j =

max

n
(5.28)
2
Since n, the number of states is an integer, j is either an integer or a half integer. From equation (eq. 5.25),
n
we can get the form of . On substituting max =
max = j in equation (eq. 5.25), we have:
2
j=

Y
P
O
C
T
F
A
= 2 j(j + 1)

(5.29)

On referring equation (eq. 5.13), we have all the possible values of . We know that is related to by a
power half law. ||. Since has a factor of 2 , the expression of must have a factor. The rest is just
a constant. Hence, we may write:

DR

= m

(5.30)

where m is some number, depending on the value of j.On rearranging the above equation, we get
= m.

max
Even from the equation (eq. 5.28), we have
= j. We discussed earlier that for every max , takes

all values from max (which is min ) to max . So, for every value of max , takes (2max + 1) values.
Hence, the same relation can be said about j and m also since they are equivalent to max and up to some
constant. Hence, for every value of j, we have (2j + 1) values for m and m takes only integer values.
j m j

(5.31)

J2 |j, m = j(j + 1)2 |j, m

(5.32)

Now, the number of values for m shall determine the number of states in which the system (given by the value
of j) shall exist. Now the simultaneous eigenstates of Jz and J2 can now be labelled as |j, m. Therefore, we
have:
Jz |j, m = m|j, m

(5.33)

Where j tells us about the total angular momentum of the system and the number of values of m give the
number of states in which the system exists.
For any general quantum system, the total angular moment of the system is given by:
J=L+S

(5.34)

where: J is the total angular momentum, L is the orbital angular momentum, S is the spin angular
momentum. If the wave function associated with the quantum system is a scalar function, the J = S. If
the wave function is a vector function, then we use the relation given in equation (eq. 5.34).
We can relate the above equation for determining the number of states in which a system shall exist in, to
the spin of the quantum system.
GO TO FIRST PAGE

75

CHAPTER 5. ANGULAR MOMENTUM

DR

Y
P
O
C
T
F
A

GO TO FIRST PAGE

76

Part IV

Y
P
O
C
T
F
A
PREREQUISITES COMPUTATION

DR

77

DR

Y
P
O
C
T
F
A

Chapter 6

Introduction to Turing Machines


6.0.11

Informal description

A turing machine is an extension of a push down automaton. A push down automaton, if we recall, had a
finite length stack to store temporarily some alphabets. Some of the restrictions that we faced in case of a
push down automaton were that: Only the topmost element of the stack can be accessed, and on reading that
alphabet, it is erased from the stack. That is, the position of the read write head is always on the topmost
element of the stack. In case of the turing machine, we have a tape instead of a stack. As a result, the
read write head can be moved anywhere in the tape to access the elements of the tape.
The read-write head can be moved anywhere in the tape and it moves one cell at a time. Apart from this,
it has a finite control (finite number of states and transition functions) , just like a PDA, that controls the
position of the read write head.

DR
6.0.12

Y
P
O
C
T
F
A

Elements of a turing machine

To give a description of a turing machine, we must provide its basic elements. The turing machine is described
using a 9-tuple, as given in the figure below:
M = (Q, , , , , , s, t, r)

(6.1)

Q = finite set of states

= finite set of alphabets

= finite set of tape alphabets (since the input alphabets can also be written to the tape, this set
contains ).
= left end marker (to denote the end of the tape, so that the read write head does not go outside
the tape. This is not a part of the alphabet and is a property of the tape).

= Transition function. The transition function of a machine tells where the input read write must
move to (can only move one cell to the right or left, so it gives the direction in which the read write
head must move), based on the state at which the machine is in and the contents of the tape at its
current position. The transition function for a turing machine is given as:
: Q Q {L, R}

(6.2)

where: {L, R} stands for the direction in which the read write head must move. (1 stands for right
and L stands for left). For example,
(p, a) = (q, b, R)
79

(6.3)

CHAPTER 6. INTRODUCTION TO TURING MACHINES


means that when the machine is in state p and read write head reads a on the tape, then: it must
go to state q ,the read write head must write b on to its current cell (over-righting a), and move
one cell to the right on the tape.
s = start state. s Q.
t = final state. f Q.
r = reject state. r Q.
In addition to the above, there are certain restrictions which need to be met:
1. The left end marker that denotes the end of the tape, must never be over-written (is it is then the
read write head may go outside the tape). Therefore, if there is some transition that involves the left
end-marker, then it must write that end-marker back to the tape: (where d {L, R})
(p, ) = (q, , d)
This eectively means that we cannot over-write .
2. We also require that once a machine enters a final or a reject state, it can never exit. This requirement
can again be formalized as in the previous case by stating that: b , c, c and d, d {L, R} such
that:

Y
P
O
C
T
F
A
(t, b) = (t, c, d )

which states that if there is some transition function that provides the next move for a machine in
its final state, then the next move will be such that the system still continues to be in its final state.
Similarly,

DR

(r, b) = (r, c , d )

Figure 6.1: Figure showing the elements of a Turing machine and its pictorial view
GO TO FIRST PAGE

80

CHAPTER 6. INTRODUCTION TO TURING MACHINES

6.0.13

Configurations and Acceptance

Configuration of a Turing machine


A configuration of a turing machine is a set or a tuple of elements that can accurately describe the state of
the turing machine. A configuration of a turing machine should describe the values of the free variables of
the turing machine. Therefore, a configuration at some instant of time can be given by: [(the contents of the
read-write tape) , (state of the turing machine) , (position of the read-write head on the input tape)]. We
need to see in more detail about representing some of these quantities. Firstly, the position of the read-write
head can be given as an integer ( 0). The state of the turing machine can also be represented normally as
some index. To see how to represent the contents of the tape, we must first see how the tape seems like. The
tape, at any instance of time seems like an semi-infinite (bounded on the left side) string that has a finite
number of elements that belong to (these are the ones that are of our interest) followed b an semi-infinite
sequence of blanks: . That is:
Input tape:

y ...

where: y . Therefore, formally, we define a configuration to be an element of the set of tuples. Therefore,
Configuration: ( Q
{y |y }

N
)
States Contents of the tape Position of read-write head

(6.4)

Y
P
O
C
T
F
A

The start configuration of the read-write head of the turing machine will be at the first index of the tape: 0,
scanning the left end-marker. Its state will be the start state of the turing machine: s, and the contents of the
read-write tape will have the left end-marker, its input and a semi infinite sequence of blanks: x .

DR

Start configuration: (s, x , 0) ;

Similarly, one can also define an accept and a reject configuration:


Accept configuration: (t, y , 0) ;

y Reject configuration: (r, y , 0) ;

The running of the turing machine can now be given as a set of such tuples where consecutive tuples are
related by some transition function. To denote that a configuration (p,q,0) is related to (e,s,1) through some
transition, we write that:
1

(p, q, 0) (e, s, 1)
M

(6.5)

It means that there exists some derivation in M that is of unit length which can take (p,q,0) to (e,s,1).
We now need some inductive way of defining the next configuration of the turing machine, in terms of its
previous configuration. A next configuration after reading an alphabet from the tape is given by:

(q, snb (z), n 1) if (p, zn ) = (q, b, L)


1
(p, z, n)
(6.6)
M
(q, snb (z), n + 1) if (p, zn ) = (q, b, R)
where:
z = string that denotes the contents of the tape
zn = nth alphabet of the string z snb (z) = string produced when the nth alphabet in z is replaced by b.
The meaning of (eq. 6.6) is quite clear. It says that: if the machine is initially in state q, its read-write head
reads the symbol zn , at position n, on the tape, and the transition function requires the state to be changed
to q, b to be written to the current position of the tape (over-writing zn ) and the read-write head to move
left (to position n 1); then the new configuration will be:
(new state , new contents of the tape formed by replacing the nth symbol in z by b , n 1). Similarly, we
can try to see the other case also.
GO TO FIRST PAGE

81

CHAPTER 6. INTRODUCTION TO TURING MACHINES


Acceptance

We can now extend the concept of by defining its reflexive transitive closure: inductively:
0

n+1

if

;
n

if

for some n 0

We can now use to define the notion of acceptance and rejection. Intuitively, by acceptance of a string x
M

(x ) we mean that some derivation from the start configuration that can upon reading the alphabets
of x can lead to the accept configuration. Hence, for a string x

Y
P
O
C
T
F
A
1

acceptance: (s, x , 0) (t, y , n) x


M

rejection: (s, x , 0) (r, y , n) x


M

DR
6.0.14

(6.7)
(6.8)

Classes of languages

Now we look at how the set of languages (subsets of ) are divided based on Turing machines. To see that,
first we see that we need to note that if a string is accepted or rejected by a turing machine, the turing machine
is said to halt on that string. If the string is neither accepted nor rejected, that is, the turing machine runs
indefinitely on that string, then the turing machine is said to loop on that string. Now languages that can
be accepted by some turing machine (but not necessarily rejected) are called Recursively Enumerable
languages. (We will see how this name originated, a little later). There are some turing machines that halt
on all their inputs. These are called Total Turing machines. Languages accepted by these total turing
machines are called Recursive languages. That is:

recursively enumerable if M is some turing machine


L(M ) = {x|x , M accepts x} is
recursive if M is some total turing machine

(6.9)

Clearly, we can see that Recursive languages are part of the the larger set of Recursively Enumerable languages. Every recursive language is a recursively enumerable language but the converse is not true. To see
how the languages are classified, we can look at the figure below:
GO TO FIRST PAGE

82

CHAPTER 6. INTRODUCTION TO
6.1.
TURING
EXAMPLES:
MACHINES
TURING MACHINES FOR SOME LANGUAGES

Y
P
O
C
T
F
A

Figure 6.2: Classification of Languages. The universal set corresponds to the set of all subsets of . This is
an uncountably infinite set.

DR

Existence of Non-Recursively enumerable languages

The set of all languages is nothing but the set of all subsets of . Since is a countably infinite set, the
set of all subsets of this countably infinite set is uncountably infinite. But we will later see that the set of
Turing machines is countably infinite. That is, we can find some way of enumerating all the Turing machines.
Therefore, it is quite evident that has elements that are not in the R.E set (as shown in the picture above).
These languages are called Not Recursively Enumerable. Therefore, no turing machines that will accept
these languages. Another subtle point is that the set of R.E languages is a countably infinite subset of an
uncountably infinite set. This means that there is a large portion of 2 that is not R.E.

6.1
6.1.1

Examples: Turing machines for some languages


L(M ) = {| {a, b} }

We know that the above language L(M ) is not even context free. It turns out that we can find a turing
machine that will accept L(M ). Let us try to intuitively see how to construct an algorithm that checks
whether a string is in L(M ). The algorithm may look something like:
On some input string x:
1. Mark the end of the string (on the tape) by .

2. Check if the input string is of even length. If not reject, else proceed.
3. Find the middle of the string:
GO TO FIRST PAGE

83

6.1. EXAMPLES: TURING MACHINES FOR


CHAPTER
SOME LANGUAGES
6. INTRODUCTION TO TURING MACHINES
Scan the tape from to , each time marking the leftmost unmarked alphabet by ` and the
rightmost unmarked alphabet by.
On finding that`andare next to each other, the middle is now in between these two. (Middle is
in between the last`and)
4. Compare the corresponding alphabets (corresponding in the sense, alphabets that are at equal distances
from the middle). If at any stage they are equal, reject. Else, keep comparing until the and symbols,
and accept.
We have now given a description of how the turing machine must work. We can try to write down the
turing machine by describing all the parameters that constitute the turing machine (see eq. 6.1). The turing
machine for this problem, can be defined as:
Q = {q01 , q02 , q11 , q001 , q002 , q11 , q3 , q300 , q301 , q302 }
= {a, b, a
`, a
, `b, b, , }
= {a, b}
t = accept state
r = reject state
q01 = state state

Y
P
O
C
T
F
A

Now we write down the transitions as follows:


(q01 , a) = (q02 , a, R)
(q02 , a) = (q02 , a, R)
(q01 , b) = (q02 , b, R)
(q02 , b) = (q01 , b, R)
(q01 , ) = (q11 , , L)
(q02 , ) = (r, , L)
(q11 , a/b) = (q11 , a/b, L)
(q11 , a
`/`b/ ) = (q001 , a
`/`b/ , R)
`
(q001 , a/b) = (q12 , a
`/b, R)
(q12 , a/b) = (q12 , a/b, R)
(q12 , ) = (q002 , , L)
(q12 , a
/b) = (q002 , a
/b, L)
(q002 , a/b) = (q11 , a
/b, L)
(q001 , a
/b/ ) = (q3 , a
/b/ , L)
`
(q002 , a
`/b/ ) = (q3 , a
`/`b/ , R)
come to the beginning of the tape and move state to q300
(q300 , /) = (q300 , /, R)
(q300 , a
`) = (q301 , , R)
(q300 , `b) = (q302 , , R)
(q301 , /`
a/`b) = (q301 , /`
a/`b, R)
(q301 , a
) = (q32 , , L)
(q301 , b) = (r, b, L)
(q302 , /`
a/`b) = (q302 , /`
a/`b, R)

(q302 , b) = (q32 , , L)
(q302 , a
) = (r, a
, L)
(q301 /q302 , ) = (r, , L)
come to the starting of the tape and move state to q300
(q32 , a
/b/`
a/`b/) = (q300 , a
/b/`
a/`b/, L)
(q32 , ) = (q300 , , R)
(q300 , ) = (t, , L)

DR

GO TO FIRST PAGE

84

CHAPTER 6. INTRODUCTION TO
6.1.
TURING
EXAMPLES:
MACHINES
TURING MACHINES FOR SOME LANGUAGES

6.1.2

{ap | p is a prime}

The language is over a single alphabet, and therefore, the problem reduces to finding whether a number
(length of a string in that language) is prime or not. Intuitively, to determine whether a number is prime or
not, we use the sieve of erasthosthenes method. To find whether a number n is prime:
1. If n 1, n is not prime. If n = 2, then n is prime.
2. List down all the numbers from 2 to n.

3. Declare the first unmarked number in the list as prime, if this number happens to be n itself, then n is
prime. Else, mark all its multiples (including itself) on the list.
4. If the last number n, is marked in the list, then n is not prime.
5. Repeat the steps 3 and 4 for all the numbers on the list.
We can now see how the above steps translate when we think of a turing machine following the algorithm:
1. Determine if there are at least three as by scanning the first three cells of the tape. If there are only
two, then accept. If there is one or less, reject.
2. Create an identification for the last a (which corresponds to the prime number n). Erase the last a and
replace it with $. Also, erase the first a (replace with ).

Y
P
O
C
T
F
A

3. Identifying multiples:

Start from , scan right and find the first non-blank symbol. (Let this position be m). If this
happens to be $, accept. Else, erase the symbol and replace it with a
.
We now need to delete all multiples of m, that is all as that occur at positions that are multiples
of m.

DR

a. Move left until , marking each symbol on the way, with a`.
).
b. Erase a
(replace it with

` , move right until the first non-marked symbol: a, and


c. Start from the leftmost marked blank:
mark it with a
`.
to the first unmarked a, to give a
d. Repeat the above step until theis moved from
.
e. If $ is marked with a, reject.

f. Repeat steps b to e, until all the symbols in the tape (except ) are marked with`or.

Repeat steps 1 and 2.

We can now try to write down the turing machine by describing its elements (and its transitions):
Q = {q0 , q000 , q001 , q002 , q1 , q101 , q102 , q202 , q203 , q204 , q205 }
= {a}
,
` , , $}
= {a, a
`, a
, ,
Transitions:
(q0 , ) = (q000 , , R)
(q000 , ) = (r, , L)
(q000 , a) = (q001 , a, R)
(q001 , ) = (r, , L)
(q001 , a) = (q1 , a, L)
Stage Pass: Go to the beginning of string. Move state to q100 . Erase the first a.
(q1 , a) = (q1 , a, L)
(q1 , ) = (q100 , , R)
Marking the first a as :
GO TO FIRST PAGE

85

6.2. VARIATIONS OF THE TURING MACHINE


CHAPTER 6. INTRODUCTION TO TURING MACHINES
(q100 , a) = (q101 , , R)
Keep moving right until and mark the last a as $.
(q101 , a) = (q101 , a, R)
(q101 , ) = (q102 , , L)
(q102 , a) = (q2 , $, L)
Go to the beginning of the tape and move state to q200 .
(q2 , a) = (q2 , a, L)
(q2 , ) = (q200 , , R)
(q200 , /`
a) = (q200 , , R)
(q200 , a) = (q201 , a
, L)
If $ is the first unmarked symbol, to be marked with $, then accept.
(q200 , $) = (t, $, L)
Fromto , mark all the symbols with`:
` /`
(q201 , a/ /$) = (q201 , a
` /
$, L)
(q201 , ) = (q202 , , R)
Shifting the markers. Collect the first marked symbol, and keep moving right to find the first unmarked one after
, and mark it.
` ) = (q203 , a/, R)
(q202 , a
` /
` ) = (q203 , a
` , R)
(q203 , a
` /
`/
(q203 , a
) = (q204 , a
, R)
(q204 , a
`) = (q204 , a
`, R)
Do the same marking for all the multiples:
(q204 , a) = (q202 , a
`, L)
`
(q204 , $/$) = (q202 , $, L)
If a
itself needs to be shifted, then move right to the first a (unmarked), and mark it as a
.
(q202 , a
) = (q205 , , R)
(q205 , a
`) = (q205 , a
`, R)
Do the same striking o for multiples of all the numbers.
(q205 , a) = (q200 , a
, L)
If $ happens to be one of the multiples (to be marked as $), reject.
(q205 , $) = (r, , L)

DR

Y
P
O
C
T
F
A

Therefore, we now have an explicit construction of a turing machine that will accept L(M ) = {ap | p is prime}.

6.2

Variations of the Turing Machine

The Turing machine that we have been considering until (eq. 6.1) is a single tape deterministic turing
machine. Now we consider variations of the Turing Machine. Our task is now to show that all these
models are equivalent, that is, we can simulate all these Turing Machines using the model of the Single Tape
Deterministic Turing Machine.

6.2.1

Multi-Track Turing Machines

6.2.2

Multi-Tape Turing Machines

These are Turing machines with multiple tapes and multiple read write heads. A rough diagram of the turing
machine looks like:
GO TO FIRST PAGE

86

CHAPTER 6. INTRODUCTION TO TURING MACHINES


6.2. VARIATIONS OF THE TURING MACHINE

Figure 6.3: Figure showing a Turing machine with 3 independent tapes and read write heads

Y
P
O
C
T
F
A

At each step the finite control reads three dierent symbols from the three dierent read write heads of these
tapes. Based on this information and the information contained in the states, the turing machine makes
the transitions, and makes necessary operations on the three tapes. Each read-write head can move in any
direction, irrespective of the other read-write heads. We now need to simulate the above turing machine
using the single tape turing machine. If should reduce the above problem of three tapes to a problem of three
tapes with a common read-write head. We can then use the result of the previous variation (section: 6.2.1)
to show that it trivially reduces to the single tape turing machine.
Intuitively, the problem of blindly replacing a single read-write head instead of three read-write heads is
that, when the finite control switches tapes, the information about its position on the previous tape is lost.
Therefore, when it again comes back to that tape, it wouldnt know form where ,on the tape, it must resume.
So, if we could preserve this information, then we can have a single read-write head that will read the
information from each of these tapes, and when it switches tapes, it will remember its last position on the
previous tape, by storing this position somewhere. Another subtle issue is that the tape is unbounded on
one side, so the information about the position of the read-write head cannot be stored in the states, because
the number of states are finite. Therefore, we need a new tape. We can now do the following:

DR

1. Introduce a new track to each of the three tapes. This doesnt increase the power of the turing machine
as we have shown earlier (section: 6.2.1).
2. In each of the cases, fill this additional track will 0s except at one position, which corresponds to the
position of the read-write head on its parallel track.
3. Therefore, when the finite control switches tapes, the position of the read-write head on the previous
tape is remembered in this extra track.
A schematic view of this modified turing machine looks like:
GO TO FIRST PAGE

87

6.2. VARIATIONS OF THE TURING MACHINE


CHAPTER 6. INTRODUCTION TO TURING MACHINES

Y
P
O
C
T
F
A

Figure 6.4: Figure showing a turing machine with three tapes but only one read write head. The light lines
indicate the position of the read write head when it left the tape.

Therefore, we have now reduced the case of a multiple read write heads to a single read write head case. Now
we can use the same technique as in (section: 6.2.1) to reduce it the standard model.

DR
6.2.3

Multi-Dimensional Turing Machines

Figure 6.5: Figure showing a Turing Machine with a Two-Dimensional Tape


GO TO FIRST PAGE

88

CHAPTER 6. INTRODUCTION TO TURING MACHINES


6.2. VARIATIONS OF THE TURING MACHINE

6.2.4

Non-Deterministic Turing Machines

The case of a not deterministic turing machine is a slightly non-trivial case. In this case, the turing machines
makes non-deterministic transition, just like the transitions of a PDA or an NFA. the transition function is
now defined as:
(Q Q {L, R})

(6.10)

The transition function of this NDTM will take an element (q, a) where q Q and a , to the set
{(q1 , a1 , L), (q2 , a2 , R), (q3 , a3 , L), . . . }, where the turing machine, upon reading (q, a) can go non-deteministically
to any element of the set {(q1 , a1 , L), (q2 , a2 , R), (q3 , a3 , L), . . . }. To convert it to a deterministic model, we
must first consider all the transitions.
Table 6.1: Table
State
1
(q1 , a1 )

(q2 , a2 ) (q1 , a1 , L)
(q3 , a3 )

(q4 , a4 )

(q5 , a5 )

showing all possible transitions


2
3
4
(q2 , a2 , L) (q5 , a5 , R) (q3 , a3 , R)

(q4 , a4 , R) (q3 , a3 , R)
(q1 , a1 , R) (q5 , a5 , L)

(q5 , a5 , L)

(q2 , a2 , L)

(q3 , a3 , L)

Y
P
O
C
T
F
A

Let the non deterministic turing machine be Mn . At every step, the transition function of Mn has at most
k choices. Let these be labelled from 1 . . . k. Now, a particular number represents a possible transition.
Therefore, a sequence of numbers from 1 to k, represents a possible run of the Turing machine. Now if we
take all possible sequences of numbers form 1 to k, then we will get all possible runs of Mn . Since Mn is
non deterministic, it suces with at least one of the runs of the Turing machine results in an acceptance.
Therefore now let Md be a machine with 3 tracks. On one track, it will have

DR
6.2.5

Enumeration Machines

An enumeration machine is a modification of a turing machine, and is described as follows:

1. It has two tapes, a read-write tape called the work tape on which it does some operations, and a write
only output tape, on which it prints some output, after doing something on the work tape.
2. It can only write symbols in .
3. It does not have any input. Also it does not have any accept or reject states. It starts from a start
state, with both tapes blank, and continues to make transitions just like a T.M.
4. When it prints some string on the output tape (it is said that the string has been enumerated ), the
machine enters a special state called the Enumeration State. After writing the output, in the next
transition, the output tape is automatically erased.
5. The machine may never enumerate any strings L(E) = , or it may enumerate infinitely many strings.
The strings being enumerated can be repeated.
6. Thus, the machine runs forever.

6.2.6

Equivalence of the Turing machines and Enumeration machines

We now need to show that Turing machines and Enumeration machines are equivalent in terms of computation
power. For this, we need to show a two way equivalence:
GO TO FIRST PAGE

89

6.3. UNIVERSAL TURING MACHINES

CHAPTER 6. INTRODUCTION TO TURING MACHINES

1. Given a Enumeration machine E, a Turing machine M such that L(M ) = L(E). 1


We can now show the construction of this turing machine M that will accept all the strings which are
enumerated by E. Now let M have a tape with three tracks. On the first two tracks, we can simulate
E, by making one of them the work tape and the other as the output tape. The third track is left
for the turing machine M . On input x, M will copy the input to the third track, and run E on the
other two tracks. E will now run and occasionally output strings on the output track. Each time E
enumerates a string, M will check it with its string on the third track. If they match, M shall accept
x. Otherwise, M will wait (indefinitely) until the string is enumerated.
Now we see that if a string is enumerated by E, then it will be accepted by M . Therefore, M will only
accept strings that are enumerated by E. Hence we have shown that E, M such that L(M ) = L(E).

2. Given a Turing machine M , an Enumeration machine E such that L(E) = L(M ).


We can now show the construction of the enumeration machine E that will enumerate only those strings
that are accepted by M . Consider an enumeration machine with a 2-track work tape, where, the first
track is used to simulate M , and the second is used by E itself. The enumeration machine E will now
enumerate all the strings in on the second track of the work tape.
Intuitively we would expect that the Enumeration machine E now copies each of the strings one-by-one
on to the first track of the work tape and simulates M on this string. If M accepts the string, then the
enumerator prints the string on its output tape. But, there is a subtle issue here. We failed to consider
the possibility that M is not a Total Turing machine. So, M need not accept or reject some strings, it
could indefinitely keep looping; In which case, E would be stuck simulating M on x forever and would
never move on to strings later in the list. (Also, it is impossible to determine whether M halts on x 2 ).
The flaw in this procedure is that we are simulating M on each of the strings, one after the other.
To avoid getting stuck in one string, we must do some sort of a time sharing procedure where we run
several simulations, working a bit of each simulation each time. Consider the following steps:

Y
P
O
C
T
F
A

Divide the work track of E into several segments, separated by some # . The length of these
segments can be arbitrary. Our goal is now to simulate M on several strings, using the dierent
segments of the tape, doing one step of each simulation at a time.

DR

In the first segment of the tape, let E run one step of the simulation of M on the first input string.
Then let E run one step, each, of the simulation of M on the first and second input strings. Then,
let E run one step, each, of the simulation of M on the first, second and the third input strings.
This process can go on for-ever.

If M halts on some of the strings (say after m steps), then that string will be accepted (or rejected)
after m steps of this simulation. If M does not halt for some strings, then M will not get stuck in that
string and each time it will perform one step of simulation of M on that string. Therefore, strings which
are accepted by the Turing Machine, will be enumerated after a finite number of simulation-steps; and
those which strings on which M loops, even E will loop on them (they are never enumerated).
Thus, we have now constructed an Enumeration machine that will enumerate all the strings accepted
by M .

Therefore, from this two way equivalence, we can conclude that an Enumeration Machine and a Turing
Machine are equivalent in terms of their computation power.

6.3

Universal Turing machines

Universal Turing machines are some Turing machines which are structurally like any other Turing machines,
but their task is to simulate other Turing machines. They take as input, a Turing machine (rather, a
description of a Turing machine) and an input string, and simulate what M would do on input x. Their
output would be the same as M s output on input x. Now, we know that Turing machines can only accept
1
In fact, the name Recursively Enumerable comes from the fact that the language can be enumerated by some Enumeration
machine.
2
This is the famous Halting problem

GO TO FIRST PAGE

90

CHAPTER 6. INTRODUCTION TO TURING MACHINES


6.4. SET OPERATIONS ON TURING MACHINES
strings as inputs. So, in order to input description, we need to find some encoding of Turing Machines
as strings. As a matter of convenience, this encoding of Turing machines using strings over the alphabet
{0, 1}.

6.3.1

Encoding Turing machines over {0, 1}

To encode the description of a Turing machine as a string, we must find a way of representing the nine
elements (eq. 6.1): (Q, , , s, t, r, , , ). Let us first see how the first eight components are encoded.
Let there be n states labelled 1 . . . n, out of which the sth state is start state, tth and rth states are the
accept and reject states respectively. All we need to do to encode a state is to represent n, using the
alphabet {0, 1}. This can be done trivially by putting 0n . Similarly, we put 0s , 0t and 0r to represent
the start, accept and the reject states respectively. (Alphabet 1 need not even be used).
Let there be m input alphabets labelled 1 . . . m, k tape alphabets labelled 1 . . . k, out of which, uth and
v th alphabets represent the left end-marker and the blank symbols respectively. Similar to the above
case, the k tape alphabets can be represented as 0k , and we can put 0u and 0v to represent the left
end-marker and the blank symbols respectively.
The transition function can be encoded using the encoding of states as described previously. A transition
function of the form: (p, a) = (q, b, L) can be encoded as: 0p 10a 10q 10b 10, where the last 0 stands for
L and 1 if R. Similarly, we can enumerate all the transitions as strings in {0, 1}.

Y
P
O
C
T
F
A

Now we have encoded all the components of the Turing machine. We just need to put together these
components, by separating each description by a 1. Hence:

DR

(Q, , , s, t, r, , ) 0 10 10 10 10 10 10 10
n

(6.11)

Along with the above description, we must append the encoding of all the transitions as strings in
{0, 1}.

6.3.2

Working of a Universal Turing machines

A universal turing machine will now take an encoding of a turing machine M and an input string x. To
dierentiate between the encoding and the description, the two are separated by a # symbol, that is part of
the tape alphabet of the UTM - U . Hence the input string is represented as M #x. U has a 3-track tape.
On the first track, it stores the description of the turing machine M and input x. The second track is used
as the tape for the turing machine M . the simulation contents will be stored in this tape. The third track
will now be used to store the current state and the position of the read-write head of M . Depending on the
transition function,the contents in the second and third tapes are updated.

6.4

Set operations on Turing machines

Given below are some set operations that can be applied to Turing machines as they represent the set of R.E
languages. These can also be thought of as set operations on R.E languages. The given set operations are
for two Turing machines. They can however be generalized to involve more than two also. In many of these
set operations, we need to distinguish between Total Turing machines and Turing machines (these consists
of the Non-Halting machines as well as the Halting machines), which is the same as distinguishing between
Recursive and Recursively Enumerable languages.
GO TO FIRST PAGE

91

6.4. SET OPERATIONS ON TURING MACHINES


CHAPTER 6. INTRODUCTION TO TURING MACHINES

6.4.1

Union of two turing machines

Let the two turing machine whose union


needs to be taken be M1 and M2 . We now need to find some turing
machine M , such that: L(M ) = L(M1 ) L(M2 ).

1. Consider a turing machine with a 4-track tape. On the first track, we have the encoding of M1 . On the
second track, we have the encoding of M2 . On the third and fourth tracks each we copy the input x.
The last two tracks are like the read-write tapes that M1 and M2 will use. Let M be the new Turing
Machine formed.
2. Now, we simulate M1 (using the description of M1 on the first track and a Universal Turing machine)
on the input x, on the third track. Then we simulate M2 on the input x on the fourth track. If either
of the two simulations result in acceptance of the input string, the string is accepted by M .
There is a subtle issue here. There may be cases where the Turing machine M1 may loop indefinitely
on input x but M2 may accept x. However, with the procedure described above, we will get stuck in
the simulation of M1 , thereby, not realizing that M2 will actually halt and accept x. To avoid these
situations, we need to follow a time-sharing procedure. That is, we need to simulate one step of M1
and M2 alternatively on the input x in the last two tape. Now, if any simulation loops, it still does not
prevent the other simulation from running. Therefore, for a string in L(M ), one of them will halt, in
finite time, and the string will be accepted. For strings not in L(M ), either (both) of them will reject,
in which case, the new Turing machine will also reject; and if both M1 and M2 loop on some input, M
will also loop.

6.4.2

Y
P
O
C
T
F
A

Intersection of two turing machines

To take the intersection of two Turing machines M1 and M2 , we need to follow a similar procedure as above,
except that there is not time-sharing technique required. We can simulate M1 and M2 , one after the other,
on two the tracks of M . Only if both simulations result in acceptance, the input is accepted by the new turing
machine. Now if M1 loops on some string, it is true that we will be stuck in simulating M1 and never begin
simulating M2 . But the criteria for acceptance requires both M1 and M2 to accept. So, if we see that M1
does not accept and keeps looping, there is no use on simulating M2 , and might as well keep looping.

DR
6.4.3

Complement of a Turing machine

To take the complement of a Total Turing machine, it is enough if we make the accept states as reject states
and vice-versa. The problem with a non-Total Turing machine is that it does not halt for all inputs. For a
Non-Total Turing machine, not-accepting and rejecting are not synonymous. When it does not accept
some input, it need not also reject, it could keep looping on that particular input. So, if we interchange
the accept and the reject states, then the new Non-Total machine will reject strings that the old machine
accepted and accept strings that the old machine rejected; but on strings that the old machine loops, the
new machine will also loop on them. As, a result, it will neither accept nor reject some strings. Therefore,
the new machine is not even a valid Turing machine for the complement Language.
is a total Turing machine, then M is also a total Turing
Theorem: If M is a Turing machine such that M
machine.
Proof:
are turing machines, then (from section: 6.2.6) we can have some two enumeration machine
1. If M and M

) respectively.
E and E, that can enumerate all the strings in L(M ) and L(M
2. If L(M ) is recursively enumerable and we show that, given an arbitrary input string x, it is possible to
find whether x L(M ) or x
/ L(M ), then L(M ) is recursive.
#x.
Consider a universal Turing machine U , that takes as input M #M
U is a 6-track turing machine.
GO TO FIRST PAGE

92

CHAPTER 6. INTRODUCTION TO TURING MACHINES

6.5. HALTING PROBLEM

and x
- The first and the fourth tracks are used to store the description of M and x, and M
respectively.
- The second and fifth tracks are used as a read-write tape to store the result of simulation of M
on x respectively.
and M
- The third and sixth tracks are used to keep track of the states and the read-write head of M
respectively.
and M
on x, in a employing a time sharing
Now, given an input x, U will now run both M and M
procedure.
results in
If the simulation of M results in an acceptance, then xinL(M ). If the simulation of M
must
an acceptance, then x
/ L(M ). These is no scope of looping here because both M or M

accept all the strings in L(M ) and L(M ) respectively.


Therefore, we have found a method to accurately determine if an arbitrary string x is in L(M ) or
not. Therefore, L(M ) is a recursive language and M is a Total Turing machine.

6.4.4

Y
P
O
C
T
F
A

Concatenation of two Turing machine

DR

Let M1 and M2 be Turing machines accepting L(M1 ) and L(M2 ) respectively. Now we need to find a Turing
machine that will accept the language L(M1 ) L(M2 ). To do this, we simply make the final states of M1 as
the non-final states and put an -transition form these states (old final states of M1 ) to the start state of M2 .
The new machine formed will now accept all x where x L(M1 ) L(M2 ).

6.5

Halting Problem

The halting problem is a famous problem for turing machines. The halting problem, informally is, whether
there exists some universal turing machine that can take as input, some Turing machine M and an arbitrary
input x, and determine whether M halts (or loops) on the input x. Formally, the halting problem is: Is the
language
HP = {M #x|M halts on x}

(6.12)

recursive ?
It is obviously Recursively Enumerable, as we have the universal turing machine which can determine if x is
accepted by M . For this, We just need to the U T M to simulate M on x. The whole problem is to determine
whether M will be reject x, or will loop indefinitely.
GO TO FIRST PAGE

93

6.6. DECIDABILITY AND UNDECIDABILITY


CHAPTER 6. INTRODUCTION TO TURING MACHINES
States
T.Ms

00

01

10

11

000

001

010

...

M
M0
M1
M00
M01
M10
M11
M000
M001
M010
..
.

H
H
H
L
H
H
H
L
L
L
..
.

L
H
H
L
L
H
H
L
L
H

L
L
L
H
H
L
L
L
H
L

H
H
L
H
L
H
L
H
H
H

L
L
H
L
L
H
H
H
L
L

H
H
L
L
H
L
L
L
L
H

H
L
H
H
L
L
H
L
H
L

L
L
L
H
L
H
L
L
H
L

L
H
L
H
H
L
H
L
H
L

H
H
H
L
H
L
L
L
L
H
..
.

...
...
...

...
...
...

Table 6.2: Table showing a possible output table of the (hypothetical) Total Turing Machine that can solve
the Halting Problem. The (i, j) cell denotes whether the Turing machine Mi will halt on input j. H stands
for Halt and L for Loop.

DR

Y
P
O
C
T
F
A

Figure 6.6: Output table for the Turing Machine: K. Notice the diagonal. The machine K diers from each
of Mi s in the ith position, thereby contradicting the fact that K is a Turing machine.

6.5.1

Membership Problem

6.6

Decidability and Undecidability

6.7

Quantum Turing Machines

GO TO FIRST PAGE

94

Chapter 7

Computational Complexity
Chapter not yet started.

The Topics to be covered are:

1. Definition of Complexity classes P and NP. NP-Completeness. Two equivalent definitions of NP languages.
2. Reductions. Closure of P and NP under reductions.

Y
P
O
C
T
F
A

3. Cooke Levin Theorem: Proof.


4. The P = NP problem.

5. Examples of NP-Complete problems: Proof by reduction to other (already proved NP-Complete languages).

DR

(a) TM-SAT

(b) k-clique

(c) Independent Set

(d) Vertex cover

(e) Hamiltonian Path and Hamiltonian Cycle

6. Defintion and examples of languages for the co-NP set.


7. The Decision vs. Search problem and the particular case for the NP-Complete languages.
8. Definition of Space Complexity classes: DSPACE, NSPACE.
9. Time and Space constructible functions

10. Deterministic Space Hierarchy theorem: Proof by diagonalization.


11. Configuration graphs and the method of crossing sequences.
12. Deterministic Time Hierarchy theorem: Proof by diagonalization, using the following ideas:
(a) Simulation of k-tape Turing machine using a
- 1-tape turing machine with a quadratic slowdown - Bound proved using crossing sequences
argument.
- 2-Tape turing machine with a logarithmic slowdown - moving the data in cells instead of the
read/write head - Bi operations.
13. Non-Deterministic Time Hierarchy Theorem. Proof by Lazy-diagonalization.
14. Immerman Szelepeenyi Theorem: N SP ACE = coN SP ACE
95

CHAPTER 7. COMPUTATIONAL COMPLEXITY


(a) Proof for: P AT H is N L-Complete. Then using this to give an algorithm for deciding P AT H.
Guessing the number of reachable vertices from start state and ensuring the guess. Thereby
proving P AT H is also N L-Complete. Hence N L = coN L.
15. Proof for: 2 SAT is in P , and 2 SAT is N L-Complete.

16. Various Time and Space complexity classes. The upward and the downward translational lemmas.
17. Logarithmic and sub-logarithmic space bounds: Classes L and N L. Logspace reductions logspace
constructible, logspace transducers.
18. Savitch Theorem: P SP ACE = N SP ACE. Proof: Using the graph reachability or the P AT H problem.
19. P SP ACE Completeness:
(a) True Quantified Boolean Formulae (TQBF): Proof for: TQBF is PSPACE-Complete.
(b) Generalized Geography (GG): Proof for: GG PSPACE-Complete.
20. Oracle Turing machines
(a) Definition of an Oracle Machine
(b) Relations between dierent Complexity classes, imposed due to the oracle machines. Ex. N P T QBF =
P T QBF .
(c) Baker Gill Solvay Theorem: an Oracle A, P A = N P A .

Y
P
O
C
T
F
A

21. Polynomial Time Hierarchy


(a) FILL

(b) Definition of Complexity classes 2P and 2P , and their properties.


22. Definition of the Complexity class P/P oly.

DR

(a) Properties and theorems involving the class P/P oly, and their proof:
- SAT P/P oly P H = 2P

No subset of {1} is NP-Complete unless P = N P .

GO TO FIRST PAGE

96

Part V

Y
P
O
C
T
F
A

INFORMATION THEORY

DR

97

DR

Y
P
O
C
T
F
A

Chapter 8

Fundamentals of Information
Theory
8.1

Introduction

Y
P
O
C
T
F
A

There are two types of formalisms for this theory. One is due to Shannon, where the information stored in
an event is measured using the uncertainty associated with the probability of that event. Another is die to
Kolmogorov/Chaitin, where the amount of information stored in an object is proportional to number of bits
needed to describe (compress) that object.
Whenever we talk about the information stored in an object, we mean the information stored in some random variable. The value that a random variable takes is the information stored in that random variable. We
now attempt now quantify this information stored in the random variable, using Shannons approach.

DR
8.2

Axiomatic Definition of the Shannon Entropy

Before stating the axioms that our measure of uncertainty must satisfy, we need to state some definitions
that we will use throughout:
1. Let X be a discrete random variable (source of information) that takes the values {x1 , x2 , . . . xn } with
probabilities {p1 , p2 , . . . pn } respectively. The event {X = xi } occurs with probability pi .

2. Let us assume that


pi = 1 and pi > 0i.
i

3. Define the function: h(pi ) = uncertainty in the event {X = xi }.

4. Define another function of n variables: H(p1 , p2 , . . . , pn ) = average


uncertainty associated with all
the events {X = xi }. We have the relation: H(p1 , p2 , . . . , pn ) =
pi h(pi ). For convenience, let us
i

write H(X) instead of H(p1 , p2 , . . . , pn ). H(X) can be interpreted as an average of all uncertainties
associated with {X = xi }i, with each quantity being weighted with its probability.

5. Define a function f (N ), the average uncertainty associated with N independent and equally probable
1 1
1
events: f (N ) = H( , , . . . ).
N N
N
Now that we have the necessary definitions, we state some basic properties that we demand, the measure of
uncertainty, H(X), must satisfy:
99

8.2. AXIOMATIC DEFINITION OF THE


CHAPTER
SHANNON
8. FUNDAMENTALS
ENTROPY
OF INFORMATION THEORY
1. f(N) must be a monotonically increasing function of N .
This is because, intuitively, the uncertainty associated an event occurring out of N events (equally
likely), should increase as N increases.
2. For any two mutually independent events X and Y , where X takes N values and Y takes M values,
we must have: f (M N ) = f (M ) + f (N ). This comes from the probability condition that the joint
probability of two events is the sum of their probabilities, if they are mutually independent.
3. We now go back to the function H(X). Consider the following experiment: The set of Random variables
that X can take is divided into two groups A and B. The experiment is continues to obtain the values
x1 , x2 , . . . .

DR

Y
P
O
C
T
F
A
Figure 8.1: Figure showing the compound experiment.

Now the probability of obtaining a value of X, say P {X = xi } = P { A is chosen } P {x| A is chosen


} + P { B is chosen } P {x| B is chosen }. Therefore, uncertainty about the value of X obtained =
(Uncertainty about which group is chosen) + (probability of a group being chosen) (Uncertainty
with finding xi in that group).
r
N

There are only 2 groups that can be chosen with probabilities


pi and
respectively. The uncertainty associated with this event is H

i=1

pi ,

i=r+1

i=1

i=r+1

pi . Similarly, from the diagram we can see the

probability of xi being chosen given the group. Therefore, uncertainty associated


with finding xi in

p1
p2
pr
pr+1
pr+2
pN
, and that of finding xi in B is : H N
group A is: H r
, r
, . . . , r
, N
, . . . , N
i=1 pi
i=1 pi
i=1 pi
i=r+1 pi
i=r+1 pi
i=r+1

GO TO FIRST PAGE

100

CHAPTER 8. FUNDAMENTALS OF8.2.


INFORMATION
AXIOMATICTHEORY
DEFINITION OF THE SHANNON ENTROPY
Therefore, we have:

p1

p2

pr

H(X) = H(p1 +p2 + +pr , pr+1 +pr+2 + +pN )+(p1 +p2 + +pr )H r
, r
, . . . , r
i=1 pi
i=1 pi
i=1 pi

pr+1
pr+2
pN
(pr+1 + pr+2 + + pN )H N
, N
, . . . , N
i=r+1 pi
i=r+1 pi
i=r+1 pi

Hence, we now have another requirement that H(X) must satisfy.

4. Finally, we now demand that H(p, 1 p) is a continuous function of p. That is, the eect of a small
change in the probability of some event must also be a small change in the uncertainty associated with
that event.
With the above requirements, we state the theorem that:
Theorem: The functional form that satisfies all the above axioms is:
H(p1 , p2 , . . . , pN ) = C

pi log pi

(8.1)

i=1

where pi stands for P {X = xi }. The base of the logarithm and C are some arbitrary constants. We now
need to prove the above theorem by showing explicitly that the above axioms are satisfied by the functional
form in (eq. 8.1).

Y
P
O
C
T
F
A

1. From (eq. 8.1), we can see that

1 1
1
, ,..., )
N N
N
N

= C
P {X = xi } log P {X = xi }

f (N ) = H(

DR

i=1

1
1
1
1
1
1
= C
log
+
log
+ +
log
N
N
N
N
N
N

1
= C log
C log N
N

(8.2)

now, we know that C > 0 and N > 0. Since log N is a monotonically increasing function of N , we
conclude that f (N ) is also a monotonically increasing function of N . Therefore, the first axiom is
satisfied by the functional form in 8.1.

2. The second axiom is regarding two mutually independent random variables. Consider them to be X
and Y taking N and M discrete values respectively. Once again we need to consider the functions
f (N ) and f (M ), similar to how we defined it in (eq. 8.2). Since X and Y are mutually independent,
we have: P {X = xi , Y = yi } = P {X = xi } P {Y = yi }. From (eq. 8.2):
f (N M ) = C log(P {X = xi , Y = yi })

= C [log P {X = xi } + log P {Y = yi }]

= C log P {X = xi } C log P {Y = yi }
= C log N + C log M f (N ) + f (M )

Hence, the second axiom is also satisfied by the functional form in (eq. 8.1)
3. We now need to show the satisfaction of the grouping axiom (axiom 3). We can show this by induction
on N . Before assuming that the theorem holds for N , we need to show that the N = 2 case is satisfied.
For N = 2 case, we have:
H(p1 , p2 ) = H(p1 , p2 ) + p1 H(p1 ) + p2 H(p2 )
= H(p1 , p2 )
GO TO FIRST PAGE

101

8.2. AXIOMATIC DEFINITION OF THE


CHAPTER
SHANNON
8. FUNDAMENTALS
ENTROPY
OF INFORMATION THEORY
since, H(p1 ) = H(p2 ) = 0 1 . Hence, we see that the grouping axiom trivially holds for the N = 2
case. Let us now assume that the formula holds for N , and hence for the N = 1 case, we consider
the values {x1 , x2 , . . . , xN +1 } to be split into two groups, one consisting of {x1 , . . . , xN } and the other
having {xN +1 }. Therefore, we write:

p1
pN
H(p1 , p2 , . . . pN +1 ) = H(p1 + + pN , pN +1 ) + (p1 + . . . pN )H N
. . . N
+ pN +1 H(pN +1)
i=1 pi
i=1 pi
We have, H(pN +1 ) = 0. (See footnote 1). From (eq. 8.1), we have LHS of the above equation =
N
+1

C
pi log pi . We now need to show that the RHS is also the same. Consider the first term in the
i=1

RHS:

H(p1 + p2 + + pN ) = C (p1 + p2 + . . . pN ) log

N
+1

i=1

pi = 1 = C
= C

pi

i=1

pi

i=1

log

pi

i=1

log

pi

i=1

pi

i=1

+ pN +1 log pN +1

pi

i=1

log 1

CpN +1 log pN +1

i=1

pi

(8.3)

Y
P
O
C
T
F
A

The second term of the RHS would give:


N

p1
pN
p1
p1
p2
p2
pi H N
, . . . , N
=
pi C N
log N
+ N
log N
+
i=1 pi
i=1 pi
i=1 pi
i=1 pi
i=1 pi
i=1 pi
i=1
i=1

pN
pN
+ N
log N
i=1 pi
i=1 pi

DR

= C p1 log p1 + p2 log p2 + + pN log pN


= C

pi log pi + C

i=1

i=1

pi

log

pi

i=1

log

i=1

pi

i=1

pi

(8.4)

We can now put together the simplified expressions of the two terms in the RHS, and we get from
equations (eq. 8.3) and (eq. 8.4):
N

N
N
N

RHS = C
pi log
pi CpN +1 log pN +1 C
pi log pi + C
pi log
pi
= C
= C

i=1

i=1

pi log pi CpN +1 log pN +1

i=1

N
+1

i=1

i=1

i=1

pi log pi

i=1

= H(p1 , p2 , . . . pN +1 )
Therefore, we have shown that the RHS simplifies to the same expression as the LHS, and hence the
grouping axiom is also satisfied by the functional form in (eq. 8.1).
1

When H is a function of p1 , p2 , . . . pn , we must have the condition that

N
X

pi = 1. Similarly, if H is a function of a single

i=1

variable p, then we must have the condition that p = 1. If p = 1, then it means that the corresponding event is certain and
intuitively, we would require that the uncertainty associated with the event occurring is 0.

GO TO FIRST PAGE

102

CHAPTER 8. FUNDAMENTALS OF8.3.


INFORMATION
INTERPRETATIONS
THEORYOF THE UNCERTAINTY FUNCTION
4. We now need to show the property that the uncertainty function is continuous function of its parameters.
We can show this for the N = 2 case, as the cases for N > 2 will then hold. For the N = 2 case, we
have: p1 = p and p2 = 1 p. We have:
H(p, 1 p) = C [p log p (1 p) log(1 p)]
We need to show that the above function is continuous p (We have already assumed that p > 0). To
show that it is continuous, we can show that it is dierentiable p > 0. On dierentiating H(p, 1 p),
we get:
d
H(p, 1 p) = C [(log p 1) ( log p 1)] C [2 log p]
dp
We know that in the interval [0, 1] 0, the function log p is well defined. Therefore. we have that
H(p, 1 p) is dierentiable p > 0. Hence, we have that H(p, 1 p) is also continuous p > 0.
Therefore, we have shown that the functional form in 8.1 satisfies axiom four.
Hence, we conclude that the definition of the measure of uncertainty proposed in (eq. 8.1) is indeed the right
measure of uncertainty according to our axioms. We take the base of the logarithm to be 2 and C = 1, by
convention. Therefore, we have:

H(X) =
pi log pi
(8.5)

Y
P
O
C
T
F
A
i

The above measure of uncertainty given by H is Shannons measure of uncertainty and the quantity H(X)
is called the Shannon Entropy of X, or the Communicational Entropy of X.

8.3

Interpretations of the Uncertainty Function

DR

1. From the definition of the Shannon Entropy (eq. 8.5), we have:

H(X) =
pi log pi
=

pi wi

Where, wi is given by the random variable W (X) = log P {X = xi }. Therefore, we have:


H(X) = E (W (X))

(8.6)

Where E stands for the Expectation value. Therefore, the uncertainty associated with the random
variable X is the expectation value of a random variable W (X) defined as: W (X) = log P {X = xi }.

2. Let us consider the following experiment. A coin is tossed, giving heads (0) with probability p and tails
(1) with probability (1 p). Our task is to find out the outcome of a particular run of the experiment.
All that we can do is to ask questions about the coin, whose answers will be a yes or a no. We
must find the outcome of the coin toss by asking the minimum average number of questions.
Let a introduce a slight technicality: Let the coin toss be replaced by a random variable X that takes
value 0 and 1 with probabilities p and (1 p) respectively. Consider the trivial case where the coin is
tossed once and we need to guess the outcome: The trivial and the only question to ask would be is X
= 0 ? If we receive a yes, we know that the outcome was 0, or if we receive a no, the outcome was
1. Therefore, by asking one question, we have guessed the outcome of one coin toss, thereby asking an
average of one question per coin toss. The question obviously arises: Can we do better and if so, what
is the best we can do ? To answer the first question, consider another slightly non-trivial example. In
the figure below we have the situation where we wait for two coin tosses and then guess the outcome
of both the tosses simultaneously.
GO TO FIRST PAGE

103

8.3. INTERPRETATIONS OF THE UNCERTAINTY


CHAPTER 8. FUNDAMENTALS
FUNCTION
OF INFORMATION THEORY

Y
P
O
C
T
F
A

Figure 8.2: Experiment involving two coin tosses. The guess is made simultaneously on the results of both
the tosses.

DR

Figure 8.3: Experiment involving three coin tosses. The guess is made simultaneously on the results of all
the tosses.

For the case of two, three and four coin tosses, if the probability of getting X = 0 is p, then we see that
the average number of questions, per coin toss, to be asked is:
GO TO FIRST PAGE

104

CHAPTER 8. FUNDAMENTALS OF8.3.


INFORMATION
INTERPRETATIONS
THEORYOF THE UNCERTAINTY FUNCTION

Table 8.1: Average number of questions to be asked to determine the result of one coin toss
tosses

General expression for average question per coin toss

p = 0.95

(x2 + 2(1 x)x + 3(1 x)x + 3(1 x)2 )


2

0.57

(x3 + 3((1 x)2 x + (1 x)3 ) + 4(x2 (1 x) + 2x(1 x)2 ) + 5(2x2 (1 x)))


3

0.51

(x4 + 4(2x3 (1 x) + (1 x)4 ) + 5(2x3 (1 x) + 4x(1 x)3 + 2x2 (1 x)2 ) + 6(4x2 (1 x)2 ))
4

0.41

Here is a graph showing the coin toss results:

DR

Y
P
O
C
T
F
A

Figure 8.4: Graph showing the average number of questions that need to be asked per coin toss, to determine
the value of the toss.
Indeed, one can do a similar method for the case of five and above number of coin tosses. We can
clearly see that by increasing the number of coin tosses, one can say the result of a toss by asking less
than one question per coin toss. The claim is that, after a certain number of tosses, one cannot do
better. This certain number is dependent on p, the probability of obtaining a heads, and it is the
functional value H(p) (from eq. 8.5). Therefore, restating our claim, we say that: if the probability of
heads is p, then, H(p) is the minimum average number of questions that one can ask to determine the
value of X. This is another interpretation of the uncertainty function.
3. There is another interpretation which is similar to the previous one. Consider X to be a discrete random
variable that takes values {x1 , x2 , . . . xn } with probabilities {p1 , p2 , . . . pn } probabilities. Consider a n
experiments with X where it takes n dierent values, or equivalently, consider n identically distributed
random variables {X1 , X2 , . . . Xn }. Let fi (X1 , X2 , . . . Xn ) be a function that will give the number of
GO TO FIRST PAGE

105

8.3. INTERPRETATIONS OF THE UNCERTAINTY


CHAPTER 8. FUNDAMENTALS
FUNCTION
OF INFORMATION THEORY
times xi occurs in the sequence (X1 , X2 , . . . Xn ). The value fi (X1 , X2 , . . . Xn ) takes, will certainly
depend on the values X1 , X2 , . . . Xn assume, in the experiment. We can however, compute the average
number of occurrences of xi . So, consider all the cases:
P (xi occurring once) P(anyone Xj = xi , and all other Xj = xi )
= n C1 pi (1 pi )n1
Similarly, P (xi occurring twice) P(any two of Xj = xi and all other Xj = xi ) = (number of
ways of choosing 2 Xj s from n Xs) (probability of two of them being xi ) (Probability of the
rest not being xi ) = n C2 p2i (1 pi )n2
And so on ... we have: P (xi occurring m times) (number of ways of choosing m Xj s from n
Xs) (probability of m of them being xi ) (Probability of the rest not being xi ) = m C2 pm
i
(1 pi )nm
Therefore, we see that the average number of occurrences are:
E(X) =n C1 pi (1 pi )n1 + 2n C2 p2i (1 pi )2 + 3n C3 p3i (1 pi )n3 + +n Cn pni
n

nm
=
mn Cm pm
i (1 pi )
m=1
n

n!
m1
nm
=
m
p i pi
(1 pi )
m!(n m)!
m=1

(n 1)!
m1
nm
=
npi
p
(1 pi )
(m 1)!(n m)! i
m=1

DR

Y
P
O
C
T
F
A
= npi

m=1

n1

Cm1 pm1
(1 pi )nm
i

= np(pi + (1 pi ))n1
= npi

Therefore, we see that, on an average xi will occur npi number of times. There will certainly be
some cases where the number of xi s is not npi . The number of such sequences can be estimated
using the standard deviation of this random variable. The standard deviation is defined by: 2 =
2
E(X 2 ) (E(X)) .
2 =
=
=

nm
m2n Cm pm
(npi )2
i (1 pi )

m=1
n

m=1
n

m=1
n

m2

n!
pm (1 pi )nm
m!(n m)! i

(npi )2

n!
m
pm (1 pi )nm
(m 1)!(n m)! i

(npi )2

(n 1)!
m
nm
=
n(m 1 + 1)
pi (1 pi )
(npi )2
(m

1)!(n

m)!
m=1

(n 1)!
(n 1)!
nm
m
nm
=
n (m 1)
pm
(1

p
)
+
p
(1

p
)
(npi )2
i
i
i
i
(m

1)!(n

m)!
(m

1)!(n

m)!
m=1
n

(n 1)!
(n

1)!
nm
nm
=n
pm
+
pm
(npi )2
i (1 pi )
i (1 pi )
(m

2)!(n

m)!
(m

1)!(n

m)!
m=1
m=1
GO TO FIRST PAGE

106

CHAPTER 8. FUNDAMENTALS OF8.3.


INFORMATION
INTERPRETATIONS
THEORYOF THE UNCERTAINTY FUNCTION

= n (n 1)p2i

= n (n

m=1

1)p2i

(n 2)!
pm2 (1 pi )nm
(m 2)!(n m)! i

n2

Cm2 pm2
(1
i

m=1

pi )

nm

+ pi

+ pi

m=1

n1

(n 1)!
pm1 (1 pi )nm
(m 1)!(n m)! i

Cm1 pm1
(1
i

m=1

= n (n 1)p2i (pi + (1 pi ))n2 + pi (pi + (1 pi ))n1 (npi )2

= n p2i (n 1) + pi n2 p2i

(npi )2

pi )

nm

(npi )2

= n2 p2i np2i + npi n2 p2i

2 = npi (1 pi )

2
, where and
2
are the mean and the standard deviation values (respectively) of the probability distribution of X.
Applying the Tchybechevs inequality to this case, we get:
Now consider the Tcheybechevs Inequality which states that: P (|X | )

P (|fi (X1 , X2 , . . . Xn ) npi | )

Y
P
O
C
T
F
A

npi (1 pi )
= k, pi , n. Then we have:

npi (1 pi )
P (|fi (X1 , X2 , . . . Xn ) npi |
) k2
k

f (X , X , . . . X ) np 1
i 1 2
n
i

P
k2

k
npi (1 pi )

f (X , X , . . . X ) np
i 1 2
n
i

P
k k2

npi (1 pi )

Let k be some large number such that:

DR

npi (1 pi )
2

(8.7)

(8.8)

The above statement (eq. 8.7) says that the probability


of the frequency of xi (given by fi (X1 , X2 , . . . Xn ))

n
being, within a maximum distance of the order
(where k is a large number), to npi is greater than
k
2
k . In other words, for a large value of k, the probability of the frequency of xi in {X1 , X2 , . . . Xn } being
close npi is large. Of course, there are some
sequences for whom the frequency of xi in {X1 , X2 , . . . Xn }

n
exceeds npi by a number larger than
. But these sequences occur with very low probability, for a
k
suciently large k. Let us now consider the sequences with high probability, from (eq. 8.8), we see
that, for these sequences:

f (X , X , . . . X ) np
i 1 2
n
i

(8.9)

npi (1 pi )
The sequences {X1 , X2 , . . . Xn } for whom the above property holds are called Typical Sequences.
They are sequences
in which the dierence between the actual and the expected frequencies of xi is of

the order of n, which is small compared to n, for large n. Taking the condition from (eq. 8.9), we
can say:

fi (X1 , X2 , . . . Xn ) npi k npi (1 pi )

fi (X1 , X2 , . . . Xn ) k npi (1 pi ) + npi

GO TO FIRST PAGE

107

8.3. INTERPRETATIONS OF THE UNCERTAINTY


CHAPTER 8. FUNDAMENTALS
FUNCTION
OF INFORMATION THEORY
Now, since npi is a positive number, we can say:

k npi (1 pi ) npi fi (X1 , X2 , . . . Xn ) k npi (1 pi ) + npi

(8.10)

In the above equation, we have found a lower as well as an upper bound for the frequency for xi , i, in
a typical sequence. Let us now look at the probability of obtaining a typical sequence:
P {(X1 , X2 , . . . Xn ) to be a typical sequence} = P {x1 occurring f1 (X1 , X2 , . . . Xn ) times, x2 occurring
f2 (X1 , X2 , . . . Xn ), . . . , fn occurring fn (X1 , X2 , . . . Xn ) times}. Therefore, if we denote the probability
of (X1 , X2 , . . . Xn ) being a typical sequence as p(X), then:
f (X1 ,X2 ,...Xn )

p(X) = p11

f (X1 ,X2 ,...Xn )

p22

pfnn (X1 ,X2 ,...Xn )

log p(X) = p1 log f1 (X1 , X2 , . . . Xn ) + p2 log f2 (X1 , X2 , . . . Xn ) + + pn log fn (X1 , X2 , . . . Xn )

=
pi log fi (X1 , X2 , . . . Xn )
i

log p(X) =

pi log fi (X1 , X2 , . . . Xn )

Let us now take the inequality in (eq. 8.10) and, multiply by log pi and sum over all i, for all the
terms of the inequality:

(fi (X1 , X2 , . . . Xn ) log pi )

npi log pi k npi (1 pi ) log pi


i


k npi (1 pi ) log pi + npi log pi

Y
P
O
C
T
F
A

The middle term in the inequality can be obtained from (eq. 3). Expanding the first and the third
terms:


n
(pi log pi )
k npi (1 pi ) log pi
log p(X)

DR
i

k npi (1 pi ) log pi +
(npi log pi )
i

Let
k npi (1 pi ) log pi = A n, for some positive constant A. Now, from (eq. 8.5), we have:
i



nH(X) A n log p(X) nH(X) + A n



nH(X) + A n log p(X) nH(X) A n

where H(X) H(p1 , p2 , . . . , pn ) 2 . Now, we can take antilogarithms of all the terms in the above
inequality:
2nH(X)A

p(X) 2nH(X)+A

Therefore, we see that the probability of obtaining a typical sequence is indeed related to the Shannon
entropy of the
random variable. Furthermore, we want to find the total number or typical sequences
1
1
of length n =
, we need to find the upper and lower bounds for
. For this, let us take
p(X)
p(X)
inverse of all the terms in the above inequality:

2nH(X)A n number of typical sequences of length n 2nH(X)+A n


(8.11)

Hence, if we assume n is very small 3 , number of typical sequences of length n nH(X). Therefore, we
Number of typical sequences of length n
see that, for a discrete random variable X (taking n values), H(X) =
.
n
This is another interpretation of the Shannon Entropy of the random variable X.
2
We have assumed that X1 , X2 , . . . , Xn are independent, identically distributed random variables. Hence from the definition
of Shannon entropy, H(X1 ) = H(X2 ) = = H(X
). Hence, without any loss of generality, we may call all of them as H(X).
n
3
so small that even the exponential term 2 n is negligible

GO TO FIRST PAGE

108

Part VI

Y
P
O
C
T
F
A
CODING

DR

109

DR

Y
P
O
C
T
F
A

Chapter 9

Classical Coding Theory


9.1

Introduction

9.1.1

Definitions

Y
P
O
C
T
F
A

To start with, we would be needing some basic set theoretic notations that would be used throughout.
Suppose A is a set, we define the following quantities:
1. A =

Ar , and Ar = (A A A A)
r times
r=1

2. 2 = Set containing all possible subsets of A


A

DR

3. #A = cardinality of set A

Having the above notations in mind, let us now look at some basic terminology that would be useful in
discussing the concept and various properties related to codes and coding theory:

1. Alphabet: An alphabet, is a set of symbols. Each symbol, called a letter represents one unit of
information. Ex. the sets {0, 1}, {a, b, c}, etc represent alphabets.

2. String: A String is a set of symbols (from an alphabet). More formally a string s over an alphabet
is nothing but a finite subset of : s . The number of symbols (from ) in s is called the block
length of s. Notice that is nothing but the set of all strings that can be formed using symbols in A.
Ex. 001010 is a string over the alphabet {0, 1}.
3. Code: A code, denoted by C is a set of strings (formed over some alphabet, called the code alphabet.
This alphabet is denoted by ). Similar to the example for string, we see that {0, 010, 10, 01} is a code
over {0, 1}. The average length of each codeword1 is called

4. Encoding: An encoding is a map between an alphabet to a set of strings (or a code, generally not over
the same alphabet.) f : A C, where A is called the message alphabet. Ex. A = {x1 , x2 , x3 , x4 } and
= {0, 1} where C {0, 010, 01, 10}, then an encoding will map (finite) sequences (or strings)
consisting of symbols in A to strings in . An encoding, for instance can be:
x1 0

x2 010
x3 01

that is

x4 10
(block length of c)

cC

111

9.1. INTRODUCTION

CHAPTER 9. CLASSICAL CODING THEORY

Quite evidently, as any other function, the encoding map cannot be one-to-many, in which case there
would be many encoded sequences for the same message sequence, thereby causing an ambiguity as to
which one is the right encoded one.
5. Decoding: A decoding is also a map between a code and an alphabet, whose function is to perform
the inverse of an encoding map. The decoding, given an encoding (as defined above), is a map from
C (code) to A (message alphabet). As example of a decoding map is just the reverse of the encoding
map. The decoding map also depends on the code and hence the encoding. Now, just as we had a
very naive condition for the encoding map not to be one-to-many, the same condition is applicable for
a decoding map too, as we want the decoding also to be unambiguous (that is a sequence in must
map to only one sequence in A ). But notice that this condition, in turn imposes additional conditions
on the encoding map. Moreover, it says that the encoding map must be one-to-one (injective). When
such an injective encoding map exists, decoding can be done without any ambiguity.
6. Uniquely Decipherable: Let us now define the extensions of the encoding and decoding maps. Consider
the encoding map between two strings2 , f : A . This will map a message sequence (over A) to
a code sequence (over ). Analogously, the extended decoding map would now map code sequences to
message sequences. If f is one-to-one, the certainly our decoding would be unique, and therefore the
encoding is called uniquely decipherable and the code, a uniquely decipherable code.
7. Irreducible3 codes: A code called an instantaneous code if no string in the code is a prefix of any other
string in the code. More formally, for an instantaneous code C, we have: u, v C, no w such
that v = uw.

9.1.2

Y
P
O
C
T
F
A

Notations from graphs

Let us take some set V with elements {v1 . . . vn }. These elements represented as points on a space and
moreover, we can connect these points (geometrically). This new connected entity is called a graph. The
points and the connections are called vertices and edges respectively. These edges can now be unidirectional
(or directed). (Ex. Suppose there are two points vi & vj , we can have an edge connecting vi with vj but the
same edge does not connect vj with vi .) Suppose we now have some sort of comparison between elements in
V and state that a vertex (corresponding to an element) that is superior to another is connected to it, by
a directed edge. This restricted structure is now called a tree, with the most superior vertex being the root.
Moreover if we now assume that there can be only two edges from a vertex, this structure is called a binary
tree. For a vertex vi of a binary tree, we define: O (vi ) = # vertices the root to vi We would be needing this
knowledge of a binary tree in our latter discussion.

DR

Now that we have an encoding (f : A C), we would expected to:


be uniquely decipherable

have minimum average code word length

9.1.3

Unique Decipherability

Problem: Given an encoding (mapping) f : A C, and an encoded sequence (an element of ), is it possible
to find an unique message sequence (element of S ) ? We need to find all the encodings that correspond to
no instances of this question. Consider the example from (stat. 4) of (sec. 9.1.1). Let us try to find some
sequence that would lead to a no answer for whether we can uniquely identify a message sequence to an
encoded sequence. Take, for example the encoded sequence: 010. This corresponds 3 dierent message
2
This extension is just made using the fact that the action of the map on a string in A is nothing but the concatenated
string produced by joining the result of action of f on each symbol in .
3
or Instantaneous

GO TO FIRST PAGE

112

CHAPTER 9. CLASSICAL CODING THEORY

9.1. INTRODUCTION

sequences, namely: x2 , x3 x1 , x1 , x4 . We can verify this by taking:


f (x2 ) = 010
f (x3 x1 ) = f (x3 )f (x1 ) = (01)(0) 010

f (x3 x1 ) = f (x1 )f (x4 ) = (0)(10) 010


By finding these sequences, we have stated that the code C is not uniquely decipherable. So, to classify a
code as not uniquely decipherable, we need to find such ambiguous sequences in , which correspond to
more than one sequence in A . In general it isnt easy to find such ambiguous sequences, and we need some
way of checking the existence of such ambiguous sequences.
To see this process of decoding, let us more closely look at how one would naively go about decoding some
arbitrary sequence from .
Aim: Given some general sequence c1 c2 . . . cp , find the splitting s1 C, s2 C, . . . sq C so that using
the reverse of the encoding map (or the decoding map), we get the message sequences.
Intuitively: Guessing the splitting:
Given a code sequence, c1 c2 . . . cp . Let S0 be the code C.

Read characters until it forms a word in S0 , and see which all words in S0 (other than itself) it might
correspond to. These are initial guesses. These guesses words are those that have the sequence read,
as their prefix.

Y
P
O
C
T
F
A

Separate the sux of all these words (sux symbols are to be read in future) and store the suxes in
S1 .
Repeat steps 1 & 2 (using S1 instead of the second S0 in step 1) constructing sets S1 , S2 ,etc until Si
becomes empty.
If Sj (for some j) contains an element of S0 , say ckj1 +1 . . . ckj :

DR

Suppose initially we read ck . . . ck1 & one of the initial guesses were ck . . . ckj . The sux ck1 +1 . . . ckj
S1 , and so on ... we construct S2 , S3 ,etc.

In the process of forming Sj , we have read j codewords: ck . . . ck1 , ck1 +1 . . . ck2 , . . . , ckj2 +1 . . . ckj1 ,
including the guessed ck . . . ckj . We also see that: c1 . . . ck ck+1 . . . ck1 ck1 +1 . . . ck2 ck2 +1 . . . ckj1 ckj1 +1 . . . ckj .

... ... ... ... ... ... ...

If Sj has an element, say ckj1 +1 . . . ckj that is also in C, then the code sequence ck . . . ckj has two
ambiguous deciphering and hence it is not uniquely decipherable.

Consider the theorem summarising the above illustration: More Formally we have, with S0 = C,
S1 = {w |u = vw, (u S0 ) & (v S0 )}
With the above base case, we define the sets Si inductively:
Si = {w |u = vw, (u S0 v Si1 ) & (v S0 u Si1 )}
Si = set of all suxes of words in Si1 , whose corresponding prefixes in C. We now define the union of all
the sets Si from i = 1 to (Si can be constructed for any i, after a particular i, Si will all be empty. So
instead to finding out what that maximum value of i is, we just just take an infinite union.)
S =

Si

i=1

Theorem: (Sardinas Paterson): An encoding of S0 is uniquely decipherable i S0


GO TO FIRST PAGE

113

S = .

Sj

9.2. CLASSIFYING INSTANTANEOUS AND UNIQUELY


CHAPTER
DECIPHERABLE
9. CLASSICAL
CODESCODING THEORY

9.2

Classifying instantaneous and uniquely decipherable codes

Let us first convince ourselves that an instantaneous code is easier to decode, as compared to a general
uniquely decipherable code. If we were to follow the naive method of decoding (as described in (sec. 9.1.3))
then since no codeword is a prefix of another, all the sets Si (from i = 1 . . . ) would be empty, thereby
assuring that the code is uniquely decipherable. The method is decoding now would simply be to: read a
particular sequence (symbol by symbol, until it forms a codeword), then substitute that sequence with the
corresponding message alphabet, as given in the encoding map.
Hence we want to see if we can, without loss of generality confine ourselves to instantaneous codes. To do
this, we must show that if it is possible to have a uniquely decipherable code CU D , with N codewords of
lengths n1 , n2 , . . . , nN respectively; then there always exists an instantaneous code with the same number
of codewords and corresponding codeword lengths4 . This is our main purpose in this section, as it will now
enable us to analyse the easier class of instantaneous codes, which would also be applicable to the class of
uniquely decipherable codes. We would be achieving this purpose in 3 steps. Firstly, we arrive at some
condition which holds if there exists an instantaneous code with codeword lengths n1 , n2 , . . . , nN over the
alphabet . This condition is an inequality condition called the Kraft inequality. In the second part, we
would be extending this inequality to show that the existence of uniquely decipherable codes also implies
the same condition as above. The condition is again the same as before, but the inequality here goes by the
name of Kraft-MacMillan inequality. Finally in the third part, we show the converse of the Kraft inequality
which would imply that the inequality condition assures the existence of instantaneous codes. Therefore,
at the end we would have shown that if there exists an uniquely decipherable code (with codeword lengths
n1 , n2 , . . . , nN ) the a particular condition is satisfied, which in turn implies the existence of an instantaneous
code with the same corresponding codeword lengths (n1 , n2 , . . . , nN ). After this, we would consider only
instantaneous codes.

9.2.1

Y
P
O
C
T
F
A

Part 1: Krafts Inequality

DR

We now ask the question whether it is possible to construct an instantaneous encoding f : A C such that
every element ai A, has the property |ai | = ni . So we need to ask whether an instantaneous code with
codewords of block length n where the the ith codeword has a block length ni .
Theorem: f is an instantaneous encoding of A to C (over ) with codeword lengths {ni }, i 1 . . . |A|, i
N

i=1

ni

||

(9.1)

Proof Idea: The proof that we would be presenting here is by constructing an instantaneous code. To start
with, we take all sequences in whose block length is at most n. We now arrange these sequences in the
form of a tree, with the convention that if some sequence of length l, matches another of length l + 1, at
all but the last position: (l + 1), then the two vertices are connected, where the vertex corresponding to the
larger sequence is a child of the vertex corresponding to the smaller. After constructing, we would have a
n
large tree (not necessarily a binary tree) with || N vertices. In this tree, each level would have a codeword
of a successively larger length.
This is the gadget that we employ. Now to construct N codewords of length n1 , . . . , nN , it may naively seem
that we can simply choose all the codewords corresponding to vertices in a single path from the root to one
of the leaves. But notice that this method would certainly produce sequences of desired lengths, but each
sequence would be the prefix of the following sequence. Collection of these, by definition would not give an
instantaneous code. hence no two sequences chosen for the instantaneous should have their corresponding
vertices connected by a path. A better method would be to choose the code sequences such that from a path,
4

Needless to say, the converse of this assumption is always true.

GO TO FIRST PAGE

114

CHAPTER 9. CLASSICAL
9.2. CLASSIFYING
CODING THEORY
INSTANTANEOUS AND UNIQUELY DECIPHERABLE CODES
we pick only one sequence. In other words, once we pick a vertex, we delete its subtree. After finding all
such codewords that would constitute the instantaneous code, we would have deleted a couple of vertices.
n
Hence the total number of vertices is certainly less than || N (what it was before constructing the code).
On imposing this simple inequality, we arrive with the statement of the theorem (eq. 9.1).
Proof: Define
= {0, 1, . . . || 1}

n1 n 2 n 3 n N

Consider a method of generating these instantaneous codes:


Consider a tree of size k and order , which is a set of vertices {Vi } such that, all points:
{wi wj

i=1

ni | |wi | = (l + 1), |wj | = l; wi , wj are equal in first l positions }

are connected by an edge. From this graph, we can say that:


For O(Vi ) < O(Vj ), If a path connecting Vi (corresponding to code word wj ) and Vi (corresponding
to code word wi ), then wj is a prefix of wi .
So, while constructing the instantaneous encoding, if we choose wi to be a code sequence, then we cannot
choose any wj , if a path connecting Vj (corresponding to code word wj ) and Vi (corresponding to
code word wi ).

Y
P
O
C
T
F
A

Therefore, if we want a code word wi of length ni , we need to choose a vertex Vi such that O(Vi ) = ni ,
and delete its subtree.
n n

nN

Suppose the tree has ||

vertices, then for every code word wi , of length ni , a subtree of size || N i


N

will be eliminated. Therefore, after all the code words are constructed or picked
||nN ni vertices

DR

i=1

would have been deleted.

Trivially, the number of vertices removed from the tree total number of vertices in the tree.
N

i=1

nN

||nN ni ||

i=1

9.2.2

ni

||

(9.2)

Part 2: Macmillans Inequality

Since, every instantaneous code is also a uniquely decipherable, Krafts Inequality is a sucient condition
for showing unique decipherability. We can show that it is also a necessary condition.
Proof Idea: This proof is beased on the assumption that if K =

i=1

ni

||

is less than 1, then Kn would decay

to 1, as n . We start by boosting this quantity by a power of n, put in the assumption for uniquely
decipherable code (eq. 9.6) and take its limit as n . We see that this limit gives 1, thereby stating that
K 1.
Proof. Proof:
We have:
N

i=1

GO TO FIRST PAGE

ni

||

= ||

+ ||
115

+ + ||

9.2. CLASSIFYING INSTANTANEOUS AND UNIQUELY


CHAPTER
DECIPHERABLE
9. CLASSICAL
CODESCODING THEORY
ni

Let us now group all the ||


each containing k elements.

where ni = k (all the code words of equal length) and let there be m groups,
N

i=1

ni

||

i=1

Therefore, we now have to find a lower bound for

i=1

i=1

k ||

def

i1

i1 ||

(9.3)

k ||
k

k ||

. Consider:

i2

+ i2 ||

im

+ + im ||

u
(i1 +i2 ++iu )

In the RHS of the above equation, each term of the equation will be of the form (i1 i2 . . . iu ) i1 ||
m
u

k
(i +i ++iu )
k ||
=
(i1 i2 . . . iu ) || 1 2
i=1

i1 ,i2 ,...,iu
1ij m

Now we see that m (i1 + i2 + + iu ) mu. We now collect all the ij s such that: (i1 + i2 + + iu ) = k,
for some values of k. (Grouping all ij s that give the same sum: k).
m
u

mu

k
k
k ||
=
(i1 i2 . . . iu ) ||

Y
P
O
C
T
F
A
i=1

k=m

mu

k=m

DR

i1 +i2 ++iu =k
k

||

(i1 i2 . . . iu )

i1 +i2 ++iu =k

In the RHS of the above equation, we can put:


Nk =

(i1 i2 . . . iu )

(9.4)

i1 +i2 ++iu =k

On substituting the above definition, we have:


m
u
mu

k
k
k ||
=
Nk ||
i=1

(9.5)

k=m

Let us look at the structure of the term Nk :

From the definition of k in (eq. 9.3), we see that k = number of code words of length k.

Therefore, from the RHS of (eq. 9.4): (i1 i2 . . . iu ) = (# of code words of length i1 )(# of code words of length i2 )
(# of code words of length iu ).
This is the same as # code sequences which have: (code of length i1 ) followed by (code of length i2 )
followed by . . . followed by (code of length iu ). But since we are given that i1 + i2 + + iu = k,
we can say that these code sequences will have length k. Hence, Nk = # code sequences of length k
having code words of lengths i1 , i2 , . . . , iu such that a < b, code sequence of length ia is before the code
word of length ib . Let Nk = # sequences of length k, with codewords i1 , i2 . . . iu in any order. Clearly,
Nk Nk . Now each sequence in Nk is uniquely decipherable and hence can correspond to only one
split up of codewords5 . Consider now a even general quantity, k = # of all sequences over of length
k. Now Nk k , as the latter contains sequences that neednt even correspond to any meaningful
decoding.

5
Note that all sequences in Nk are such that the codewords in it are present in a specific order, whereas the sequences in
Nk are such that the codewords can be in any order, but still the codewords in the sequence must be the same as the code is
uniquely decipherable.

GO TO FIRST PAGE

116

CHAPTER 9. CLASSICAL
9.2. CLASSIFYING
CODING THEORY
INSTANTANEOUS AND UNIQUELY DECIPHERABLE CODES
Hence, Nk than the total # of code words of length k. Therefore
k

(9.6)

Nk ||

Using the result of (eq. 9.6) in (eq. 9.5), we have:


m
u
mu

k
k
k
k ||

|| ||
i=1

k=m

um m + 1

From the above inequality we can say that:


m

i=1
m

i=1

k ||

k ||

um
1

uu mu

In the above expression, u is arbitrary and hence we can remove u by taking the limit of the above expression
as u . The LHS will remain unchanged.

Y
P
O
C
T
F
A
m

i=1

k ||

lim

uu mu

(9.7)

The limit lim m u mlimu ( u ) m0 1. The other limit:


1

DR

lim u u = elimu (
1

log u
u

We know that from L-Hospitals rule, lim

log u
u

= lim

d
du

(log u)
1


1
lim
0. Hence,
u u

lim u u = 1

Using the result of the above limit evaluations in (eq. 9.7), we get:
m

i=1

k ||

thus, from (eq. 9.3), we have proved the Macmillan Inequality.

9.2.3

Part 3: Converse of Krafts inequality

We can now prove the converse of the Krafts inequality, which states that: an uniquely decipherable code,
N

n
with codewords of length {n1 , n2 , . . . nN }, over an alphabet i
|| i 1.
i=1

Let K =

i=1

|| i . It suces to show that, given the condition K 1, we can construct an instantaneous code6

with codewords of length {n1 , n2 , . . . nN }. Let us use the same gadget (of the tree structure with codewords
6

We saw that every instantaneous code is also uniquely decipherable.

GO TO FIRST PAGE

117

9.2. CLASSIFYING INSTANTANEOUS AND UNIQUELY


CHAPTER
DECIPHERABLE
9. CLASSICAL
CODESCODING THEORY

in

n
N

ni corresponding to vertices) as in the proof of Krafts inequality for instantaneous codes. Now if

i=1

the same procedure of picking a vertex at some depth ni (and hence a corresponding codeword with length
ni ) and deleting its subtree (of size nN ni ) would work, then we are done as we can choose the vertices at
levels nN , nN 1 , . . . , n1 to get various codewords. The only concern is that there should not be a case where
there are no more vertices left in the tree to pick (which would imply that the corresponding instantaneous
code cannot be corrected). Here is where we need to use the statement of the krafts inequality (K 1) to
ensure that after every step (after every codeword is added to the instantaneous code), there are still some
vertices left in the tree. Notice that after ni vertices have been picked (implies that all vertices
at levels
nj

i
i

for j < i are also picked), total remaining vertices in the tree is: nN
nN nj nN 1
nj .
j=1

j=1

Now since the codewords collected upto length


instantaneous code, the krafts equality holds in
ni form an
i
i

their case, thereby causing


nj 1 1
nj 0. Hence putting this in the expression for
j=1

j=1

the remaining vertices in the tree, we get: Remaining vertices after picking i codewords is 0 for any i.
Therefore we see that we can always pick codewords upto length nN .

Hence we have shown the converse of the Kraft-MacMillian inequality, there by finally showing that whenever
there exists an uniquely decipherable code, there certainly exists an instantaneous code. Therefore, in the
next section we show a lower bound on the average codeword length, but this time only for instantaneous
codes, without any loss of generality.

9.2.4

Y
P
O
C
T
F
A

Bound on codeword length - Shannons Noiseless Coding Theorem[26]

We have already shown that existence of a uniquely decipherable code (with given alphabet size) implies
the existence of an instantaneous code with the same alphabet size. Therefore, to show that there exists no
uniquely decipherable code (with certain properties), it suces to show the non-existence of instantaneous
code (over an alphabet of the same size) with the same property. Moreover, we would now see that the case
of instantaneous codes is easier to deal with as compared to general uniquely decipherable codes.

DR

In the first few sections, we had remarked that a code is expected to have minimum average codeword length.
Here we would be exploring the a lower bound for the minimum average codeword length. Quite evidently,
this quantity cannot be as trivial as 0 or 1 or even N (the alphabet size), due to the requirement of unique
decipherability. Again, it is this property that we would make use of, in the form of krafts inequality.
Define:
l(f ) =

l(f (x))(x)

xA

the average code word length of an encoding f , and

L() = min l(f )|f : A B , is an uniquely decipherable code min

xA

m(x)(x)|f : A B ,

xA

bm(x) 1

The lower bound on the minimum average codeword length is given by the Shannons Noiseless Coding Theorem:
Theorem: Let (A, ) be an EIS. Let b = #B, where B is the encoding alphabet. Then:

xA (x) log (x)


xA (x) log (x)
L()
+1
log b
log b

GO TO FIRST PAGE

118

(9.8)

CHAPTER 9. CLASSICAL
9.2. CLASSIFYING
CODING THEORY
INSTANTANEOUS AND UNIQUELY DECIPHERABLE CODES
Proof. Define:
bm(x)
m(y)
yA b

T =
bm(y)

(x) =

(9.9)
(9.10)

yA

Therefore, from (eq. 9.9):


log (x) = m(x) log b log T

m(x) log b = log (x) log T

log (x) log T


log b

log (x) log T

m(x) =

log b
xA
xA

xA log (x)
xA log T

m(x) =
log b
xA

(x) log (x) log T xA (x)

m(x)(x) = xA
log b
m(x) =

(9.11)

Y
P
O
C
T
F
A
xA

We have that:

(x) = 1.

xA

(log b)

xA

DR

m(x)(x) =

(x) log (x)

xA

(x)
(x) log (x)
(x)
xA
xA

(x)
(x) log (x) =
(x) log (x)
(x) log
(x)
xA
xA
xA

(x) (x)

(x) log (x) =


(x) log (x) log
(x)

(x) log (x) =

xA

xA

(9.12)

xA

1
1
(x) (x)
(x) (x)
We have:
=
. The is just the geometric mean of (x) number of quantities.
(x)
(x)
xA
xA
Therefore, using the property that the geometric mean of n quantities is less than or equal to their arithmetic
mean, we have:

(x)
1
1
(x) (x)
(x)

1
(x)
xA
xA (x)
(x) (x)

(x)
(9.13)
(x)

xA

xA

From (eq. 9.9), (eq. 9.10) and the unique decipherability of the code, we have:

xA

(x) =

where the numerator is 1. Hence the RHS of (eq. 9.13) 1. Therefore, (eq. 9.12) now becomes:

(x) log (x)


(x) log (x)
xA

GO TO FIRST PAGE

xA

119

bm(x)
,
T

xA

(9.14)

9.2. CLASSIFYING INSTANTANEOUS AND UNIQUELY


CHAPTER
DECIPHERABLE
9. CLASSICAL
CODESCODING THEORY
We see that, from the unique decipherability of the code, and from (eq. 9.10) that T 1. Therefore
1
1 log T 0 log T 0. Hence, in (eq. 9.11), we see that:
T

(x) log (x)


(x)(x) xA
(9.15)
log b
xA

Using the equations (eq. 9.14) and (eq. 9.15), we have:

(x) log (x)


(x)(x) log b
xA

xA

xA

(x) log (x)

(x)(x)
log b

(9.16)

xA

Therefore, in the
above statement, we have proved that the lower bound on the average length of the code
(x) log (x)
is given by: xA
. We now have to show the upper bound. For this, let us choose a specific
log b
instance of m(x) such that:
log (x)
m(x) , x A
log b

m(x) 1 <

(9.17)

Y
P
O
C
T
F
A

The above assumption can be shown to be valid by observing that the unique decipherability of the code
remains preserved. That is:

log (x)
From (eq. 9.17):
bm(x)
b log b

DR

xA

m(x)

xA

m(x)

xA

xA

xA

logb (x)

xA

(x)

xA

bm(x) 1

Hence, we see that the unique decipherability of the code is preserved. Now, the quantity
m(x)(x)
xA

log (x)
(the average code word length) can be given an upper bound: From (eq. 9.17): m(x) < 1
.
log b
Therefore, we have:

log (x)
m(x)(x) <
1
(x)
log b
xA

xA

On expanding the above RHS and using the property that:

(x) = 1, we have:

xA

xA

m(x)(x) <

xA

(x) log (x)


+1
log b

(9.18)

Therefore, in the above expression we have given an upper bound for the average code word length. Now,
taking expressions (eq. 9.16) and (eq. 9.18), we see that we obtain the expression in the theorem:

xA (x) log (x)


xA (x) log (x)
L()
+1
log b
log b
thus proving the theorem.
GO TO FIRST PAGE

120

CHAPTER 9. CLASSICAL
9.2. CLASSIFYING
CODING THEORY
INSTANTANEOUS AND UNIQUELY DECIPHERABLE CODES
In the theorem above, if we put the condition that the encoding is done over the binary alphabet: B =
{0, 1} b = 2, the lower bound statement gives:

(x) log (x) < L()

xA

H(x) < L()

(9.19)

Where, H(x) is the binary entropy associated with the random variable X, or the entropy of the elementary
information source (A, (x)). The above statement (eq. 9.19) is the famous Shannons Noiseless Coding
Theorem for binary codes.

Examples: Codes satisfying the Shannon Bound


Consider
a Random
Variable X, which takes values {x1 , x2 , x3 , x4 , x5 } with corresponding probabilities:

1 1 1 1 1
, , , ,
. For reasons stated earlier, We would now use instantaneous and variable length (binary7 )
2 4 8 16 16
codes to encode the values of X which is the information source and show that in each case the shannon
bound is satisfied.

Y
P
O
C
T
F
A

H(X) = [pi lg pi + (1 pi ) lg(1 pi )]

1 1 1 1 1 1
1
1
1
1
1 1 3 3 7 7 15 15 15 15

lg + lg + lg +
lg
+
lg
+ lg + lg + lg +
lg
+
lg
2 2 4 4 8 8 16 16 16 16 2 2 4 4 8 8 16 16 16 16

H(X) = 2.7794

1. Instantaneous Encoding:

DR

x1 000
x2 001
x3 010
x4 011
x5 100

1
1
1
1
1
The average codeword length = l(X) = 3
+3
+ 3 + 3 + 3 = 3. Also this is the
16
16
8
4
2
minimum possible with instantaneous codes, as there are 5 values of the random variable to be encoded
and only 4 codewords with length 2. Hence: H(X) l(X).

2. Variable Length Codes:


x1 00

x2 010

x3 0110

x4 01110
x5 11111
1
1
1
1
1
The average codeword length = 5
+5
+ 4 + 3 + 2 = 2.875. Therefore, the average
16
16
8
4
2
codeword length exceeds the shannon entropy, thereby satisfying the Shannon Bound.
7
One can show the bound for any q-ary code, appropriately using the expression for q-ary entropy function: Hq (X) =
x logq (q 1) x logq x(1 x) logq (1 x).

GO TO FIRST PAGE

121

9.3. ERROR CORRECTING CODES

9.3

CHAPTER 9. CLASSICAL CODING THEORY

Error Correcting Codes

We see that a code is defined as a set of strings or sequences, over the encoding alphabet. We now identify
a subset of this set, which follows some additional properties as Error Correcting codes.

9.3.1

Definitions

Let C be a code, defined over the code alphabet = {x1 , x2 . . . xn }. We now need to look at some
terminology that would be useful in characterizing the error correcting properties of C. Before moving on to
error correcting codes, note that from now on we only look at instantaneous codes. Hence all the codewords
in the code have equal block length.
1. Hamming distance: Hamming distance between two sequences8 s1 and s2 , denoted by d (s1 , s2 ) is the
number of positions at which s1 and s2 dier. Properties of Hamming distance:
Positive: d (s1 , s2 ) > 0, (s1 = s2 )
Symmetry: d (s1 , s2 ) = d (s2 , s1 )

Triangle Inequality: d (s1 , s2 ) d (s1 , s3 ) + d (s3 , s2 )

2. Code parameters: For the purpose of error correction, codes are labelled using the following four
parameters.

Y
P
O
C
T
F
A

Block length (n): The block length of a code C is the block length of the codewords in C. We
defined C . Hence n 0 such that C n . This n is nothing but the block length of C.

Alphabet size: q = ||. This is just the number of elements in the alphabet. In most cases we
consider q = 2: (binary) codes.

DR

code dimension k: This is related to the number of codewords in the code. A code C over (with
|| = q) would have q k codewords. Hence, k = log|| |C|.
Distance of a code: d (C) is defined as the minimum hamming distance between two codewords in
C. Formally, d (C) = min(si ,sj )C 2 d (si , sj ).

3. Error operation: An error operation is a function that translates a q-ary sequence, by a number e. A
communication channel can be modelled as an error operation. Formally, an error operation :
is such that: (c) = c e, where c C, e and denotes addition molulo q. Here is such that its
action on a the codeword translates only a single symbol of the codeword by an amount e. A t-error
operation is nothing but application
in a sequential manner. Hence the final, erroneous
of t-times,

sequence can be represented as: (c).

t times

4. Error correction operation: Intutively, the process of recovering from an error consists of detecting the
error followed correction. If a codeword c C is transmitted across an (error inducing) communication
channel, what we receive at the other end c (formally, this is equal to (c)) need not necessarily be c.
As we require the received and transmitted sequences to match, the only remedy is to construct another
operation (function) such that ((c)) = c. Such an operation is called an error correction operation.
Extending this concept, suppose c is transmitted across
t identical channels,
then the received sequence

would be (c). Now if is such that (c) = c, then is called a t error

t times
t times
correction operation. Before correcting an error, it is worthwhile to check the existence of an error 9 .
8

The two sequences, if not of equal length, then the dierence in their lengths simply adds to the hamming distance.
We are making the assuption that if an error has occurred, the resulting sequence is not part of the original code. Hence if
c C, then (c) = (c + e)
/ C.
9

GO TO FIRST PAGE

122

CHAPTER 9. CLASSICAL CODING THEORY

9.3. ERROR CORRECTING CODES

Hence, we
need some function such that:
0 if (c) C
(x) =
. Such a function, as described above is called an Error detection (or syndrome
1 if (c)
/C
measuring) function.
5. t-Error Correcting code: C is a t-Error Corretecting if error correcting functions c C.

6. Error detecting code: C is an error detetcting code if an error detecting function c C.

Having defined these notions of error correction and correction, we now need to see how they are related to
a given code C. Before that, let us note a naive condition that ensures the existence of an error correction
operation for any error on a classical code. Notice that every codeword c C would have sequences associated
to it, which are the result of t-errors occurring on c. The code must be such that c1 , c2 C no error sequences
corresponding to c1 must be equal to any error sequences corresponding to c2 .
But as of now it is not clear how a code would ensure that the error sequences of no two codewords math.
Recall that the error is nothing but a translation of the binary sequence. To ensure that the erroneous
sequence does not match, we must ensure that the translated sequences corresponding to c1 & c2 do not
match. If C is a t-error correcting code, then we must ensure that after t (or lesser) translations (single bit
translations) of any two codewords, we do not get the same sequence. It now suces to ensure that no two
codewords in C must be separated by 2t (or lesser) single bit translations. in other words, we are saying that
any two codewords must vary at a minimum of 2t + 1 places for a t-error correcting code. This implies that
the minimum (hamming) distance between two codewords in C must be > 2t. For a more presice treatment,
let us consider the next section, containing the distance bound.

9.3.2

Y
P
O
C
T
F
A

Code parameters for a good error correcting code

A code is now denoted by its n, k, d and q values, and is called an (n, k, d)q code. Note that n is the
dimesionality of the vector space spanned by all vectors correspodning to n length sequences over Fq . These
include the vectors in the vector space C (of dimensionality k), and those not in C as well. Therefore, k n
and as a result we get R 1. We see that in order to maximise the rate of information transmission, we
must minimise n and maximise k 10 . The former is lower-bounded by the shannon entropy of the source, and
the latter is upper-bounded by n itself. Hence, we require that k be as close to n. At the moment, we do
not worry about the value of q (as we consider only q = 2), though it too has some non-trivial optimum
value.

DR
9.3.3

Bound on the code distance - The Distance Bound

The next non-trivial bound is on the parameter d. Before getting a bound on d, we first need to look at
how messages are decoded, or how an error is corrected. Let C = {wi } and wX be the sequence transmitted across some channel that performs t error operations, resulting in v bein received. Hence, the distance
between wX and v is t. Let us find a naive algorithm for error correction,i,e; for v to be decoded (or
corrected) back to wX , based on the only measure of comparison we know: the hamming distance (between
the received and any of the known code words). This naive algorithm will simply choose wj C which
is closest to v than all other wi C (i = j), to be the original sequnece that was transmitted. For this
algorithm to give the correct result: wj = wX , means that no other sequence must be as close or closer
to v (even if maximum possible error has occurred), than wX . For this to occur, we need to maximise the
distance between the code words, since if the distance between wX and another codeword wX is large, then
a large number of errors (with lesser probability) would need to occur, in order to make v close to wX .
If the distance between he codewords is large, then the number of codewords (or the message length: k)
reduces accordingly, which is again undesirable, as we would like to maximise k as well. Therefore d could
10
It is as though n alphabets are given to us to construct a code and we use only k of them, or equivalently, we are given n
basis vectors of of which we use only k of them to construct the code. In both cases we are reducing the ability to correct more
errors, as the extra alphabets could be used to detect errors (like parity bits).

GO TO FIRST PAGE

123

9.3. ERROR CORRECTING CODES

CHAPTER 9. CLASSICAL CODING THEORY

be kept such that it is just sucient for correcting t errors. (C then becomes a t error correcting code.) A
geometric picture of the same idea can be see in (fig. 9.1). We now see that there is a lower bound for
d, related to the number of errors to be corrected. To make this precise, we have the following theorem:
Let C be a t - error correcting code, with distance d (C) = min(, )C 2 d (, ). Then, d (C) 2t + 1.
Proof: [15][25]
We show the necessity and suciency of the above condition.
We can start by showing the suciency condition.
Claim: Given C is a t error correcting code, d (C) 2t + 1.
Proof: Let C, C be codewords, transmitted across an information channel (that induces at most
e errors) resulting in the sequences v, v respectively, being received. Hence the (hamming) distance
between codewords in the pairs (v, w) and (v , w ) would be at most e: d (, v) e, d ( , v ) e.
For an error to be corrected, only v is to be deciphered as (using the minimum distance decoding
principle). We require that no other sequence, except v, is closer to than v. Hence d (, v) < d (, v )
received sequences (v = v). 11 Hence, for an error to be corrected: v = v d (v, v ) > 0. The
geometric picture can be seen as:

DR

Y
P
O
C
T
F
A

Figure 9.1: Figure showing the geometric interprettation of the Hamming distances between the code words
and their associated erroroneous sequences. The band centered around a code word wi contains all words rj ,
such that if rj is received, it is decoded as wj . On transmission of w, the received sequence must be within
the band centered around w. Its distance from w can be at most t, and its distance from w must be greater
than t. As a result, the distance between w and w must be greater than 2e.
We now note that:
d (, ) = d (, v) + d (v, v ) + d (v , )
d (, ) = e + d (v, v ) + e
Now removing the term d (v, v ), would make the above relation an inequality:
d (, ) > e + e
d (, ) 2e + 1
Hence, we have shown that the minimum distance of a t error correcting code is 2t + 1, thereby proving
suciency.
11
There are two ways to see the error correction condition. The other one would be: For an error to be corrected, v is to be
deciphered as (using the minimum distance
`
decoding
`
principle). Therefore we require that no other codeword, except , is
closer to v than . Hence, d (, v) < d , v = C. After this, the same analysis follows.

GO TO FIRST PAGE

124

CHAPTER 9. CLASSICAL CODING THEORY

9.3. ERROR CORRECTING CODES

We now need to show necessity of the above condition.


Claim: Given d (C) = 2t + 1, C is a t error correcting code.
Proof: Let d (C) = 2t + 1; , be codewords C and v be the received sequence on transmission of .
We now give a proof by providing a contradiction to the definition of d (C). Suppose C is not a t error
correcting code, then we have the relation (from the proof of suciency):
d ( , v) < d (, v) ( = ) C

(9.20)

By the triangle inequality, we have:


d (, ) d (, v) + d (v, )

using (eq. 9.20)

2d (, v)

2t
d (C) 1
Using the definition of the distance, we see that it is the minimum of the set of all hamming distances
between every two codewords of the code. But the above equation says that i C, j C, with
i = , j = , such that d (, ) < d (C). This is a clear contradiction, as it says an element in a
set that is lesser than the minimum value for the set. Hence, we see that the assumption that C is not
a t error correcting code is faulty. Therefore, we have shown that C is indeed a t error correcting code,
thereby proving necessity of the above condition.

Y
P
O
C
T
F
A

Therefore, we have shown the distance bound for classical codes.

9.3.4

Bound on the number of codewords

The Singleton Bound [9]

DR

We can interpret the above theorem as: distance between any two vectors C (2t + 1). Due to this the
number of codewords in C would decrease, and hence |C| would have an upperbound. Moreover, we first show
that |C| is related to the distance of the code by a very simple bound.
Theorem: If C is a (n, k, d)2 code, then n k d 1.

Proof: For any code with distance d, all codewords are of length > d1. Hence, (u = u1 . . . ui1 . . . uid . . . un ) ,
(v = v1 . . . vi1 . . . vid . . . vn ) C, ii . . . id such that uik = vik k {ii . . . id }.
Now let C = {u1 . . . ui1 uid+1 . . . un |u1 . . . ui1 . . . uid . . . un C}. Clearly12 , |C | = |C| 2k . Also |w| nd1
w C . Hence, # codewords in C cannot exceed # binary sequences of length n d 1, which is 2nd1 .
Hence, we have:
2k 2nd1

k nd1

nk d1

Thereby proving the theorem.


The Hamming Bound [25]
Now we consider the case of error correcting codes. The below theorem give essentially a stronger bound
than the previous case. To be more precise, consider the following theorem:
12
As we have not removed any codeword from C while defining C , # codewords would not change. Note that however there
may be many codewords in C that are identical (there would be 2d identical codewords).

GO TO FIRST PAGE

125

9.3. ERROR CORRECTING CODES

CHAPTER 9. CLASSICAL CODING THEORY

Theorem: If C is a (n, k, d)2 t-error correcting code, then |C|

2n

t
n
i=0
i

Proof: We need to show the necessity and the suciency of the above condition.
We can start by showing the suciency of the above condition.
Claim: Given C is a (n, k, d)2 t-error correcting code, |C|

2n

t
n
i=0
i

Proof:
Consider (fig. 9.1). As, each band represents a distinct code word C, there are |C| bands in total. The
key assuption made here is the following: the only13 type of error that can occur to any codeword is an
addition modulo q where (that is nothing but a bit flip for the binary case) q = || and here we consider
q = 2. This error does not change the number of symbols in a codeword. Hence the erroneous sequence
n , and this is true for every erroneous sequence. Since the total number of (binary) sequnces of
length n is 2n , the total number of sequences in all the bands out together, cannot exceed this sum.
We see that:
total # error sequences = (# error sequences for each codeword) (#codewords)
Notice that

(9.21)

Y
P
O
C
T
F
A

# error sequences for a single codeword = # sequences over that dier from a given sequence at t positions

DR

(# sequences that dier from a given sequence, at exactly m positions)

m=0

(9.22)

Suppose the given sequence is: u = u1 u2 . . . uk1 . . . ukm . . . un . Any other sequence which diers from
u at excatly m locations, would be: v = u1 u2 . . . vk1 . . . vkm1 . . . un with ui = vi , i {k1 . . . km1 }.
Since the value of the ui s are fixed, so are those of vi (as we are considering only binary codes and
the error is only of the bit flip type: if ui = 0, then vi is fixed to be 1. This is however not the case14
for q > 2.). The only freedom is that the elements of the sequence u1 . . . uk1 1 and ukm . . . un can
be positioned anywhere inu. As there are n m positions
in u in which we would like to place m
n
n
digits, # possible ways =
(This is equal to
as permuting the n m ui s is the same as
nm
n
permuting the m vi s.). Since each new position would correspond to a sequence ( n ) dierent
from

n
u, # dierent positions = # sequences that dier from u (in exactly m locations)
. Using (eq.
m

n
9.22), we have: total # error sequences for a single codeword =
, and from (eq. 9.21): total
m
m=0
t

n
error sequences = |C|
. This, we said would be 2n . Hence, we have:
m
m=0

|C|

n
2n
m

m=0

|C|

13

2n

t
n
i=0
i

We will have to edit this assumption while taking up this bound for the quantum codes.
NOTE: We latter show that a qary quantum code is equivalent to a 2q ary classical code. Therefore clearly, this assumption
does not work for binary quantum codes. This is also reflected upon by the fact that there are more than one type of errors (bit
and phase flip) that can occur on a single binary quantum codeword.
14

GO TO FIRST PAGE

126

CHAPTER 9. CLASSICAL CODING THEORY

9.3. ERROR CORRECTING CODES

Hence we have shown the upperbound on |C|, thereby proving suciency.

We now need to show the necessity of the above condition.


2n
, C is a t-error correcting code.
Claim: Given |C|
t
n
i=0
i
Proof: Before going to the proof, we can look at the physical interpretation of the claim. The claim
says that if the number of codewords are bounded by a certain number, then the code becomes a t-error
correcting code. But this need to be true, as all the codewords can be close (closer than the minimum
distance for a t-error correcting code) together such that C still satisfies the above cardinality bound.
Therefore, we see that the Hamming Bound is not a necessary condition for a code to be a t-correcting
code.
The Gilbert Varshamov Bound [16][24]
In the previous theorem (Hamming Bound), we saw that a for t error correcting code, |C| cannot be arbitrarily close to 2n . Hence this seems like a bound on the error-correcting capacity of a code: the farther |C|
2n
, the lesser is the eciency (more technically, the Rate) of the error correcting code. But
is from
t
n
i=0
i
all codes cannot be perfect. We would like to see there is some limit on |C| that can make C a good error
correcting code, and if so, we also need to see if there are a large number of them.

Y
P
O
C
T
F
A

Theorem: If C is a (n, k, d)q t-error correcting code, then |C|


and it has a rate R, given by:

DR
with = o(1).

qn

d1 n
i
(q 1)
i=0
i

R 1 hq (p)

Proof:
and for binary codes, we have: |C|

2n
.
d1 n
i=0
i

We will try to prove the general statement for q-ary codes. Before that, consider some definitions:
d1

n
i
(q 1) = Vq (n, d 1)
i

(9.23)

i=0

Hq (p) = p logq (q 1) p logq p (1 p) logq (1 p)


m m

Stirlings Approximation: m! = 2m
(1 + o(1))
e
o(1) denotes a quantity asymptotically smaller than a constant.

(9.24)
(9.25)

(n) denotes a quantity that is asymptotically equal to n.


Note that the statement of the Hamming Bound says that |C|

(q 1) q n , with the euqality holding


i
i=0
only in the case of perfect codes. To establish an inequality in the case of perfect codes too, we simply add
a positive quantity to the LHS of the Hamming bound statement which gives:
2t

n
i
|C|
(q 1) q n
i
i=0

GO TO FIRST PAGE

127

9.3. ERROR CORRECTING CODES

CHAPTER 9. CLASSICAL CODING THEORY

In the case of perfect t-error correcting codes, we have the condition d = 2t + 1.


|C|

d1

n
i=0

(q 1) q n
qn

d1 n
i
(q 1)
i=0
i
qn
|C|
Vq (n, d 1)
|C|

thereby showing the Gilbert Vasharmov Bound. Hence we have showed that there exists good codes.
Certainly this does not seem enough as we need to show that there are a significant number of good codes.
For this, we need to see the probability of this above bound being satisfied which closely depends on the
distace of the code, which in turn depends on the number of errors that occur on transmission of a single
codeword (of block length n). Firstly we assume that the channel that causes error is a binary symmetric
channel 15 . We try to see the upper and lower bound on the number of errors induced by this channel on
a codeword of length n (this is Vq (n, d)) for arbitrarily large n, but fixed probability p of error occurring.
Hence the average # errors would be np, therby causing each codeword to be np distance away from each
other. In otherwords, the average distance of the code would be np. Hence, we now try to find upper and
lower bounds for Vq (n, np).
From the properties
function as defined in (eq. 9.24) we see that it is monotonic and one-to-one
of the Entropy

1
in the regoin p 0, 1 . Consider the following assumption:
q

1
Assumption: for q 2, p 0, 1
we have:
q

DR

Y
P
O
C
T
F
A
Vq (n, pn) q hq (p)n

Justification: Trivially, it suces to show:


9.23):

Vq (n, pn)
1. Starting with the LHS, using (eq. 9.24) and (eq.
q hq (p)n


n
i
(q 1)
i=0
i

np

q [p logq (q1)p logq p(1p) logq (1p)]n



np n
i
(q 1)
i=0
i
np

(1p)n

(q 1) pnp (1 p)
np

n
i
np np
n(1p)
=
(q 1) (q 1)
p (1 p)
i
i=0

np
np

p
n
i
n
=
(q 1) (1 p)
i
(q 1) (1 p)
i=0
1
In the range where hq (p) is monotonic we see that p 1 . Hence we have: 1 p 1 q. Using this:
q
p
1
p
which using the second inequality gives
1 p. Using this in the above equation, we see
q1
q
q1

np

i
p
p
p
that
1. Therefore i np,

. On putting this,
(q 1) (1 p)
(q 1) (1 p)
(q 1) (1 p)
15

This channel causes an error with probablity p and does not cause any error with probability (1 p).

GO TO FIRST PAGE

128

CHAPTER 9. CLASSICAL CODING THEORY

9.3. ERROR CORRECTING CODES

the above equation becomes an inequality, thus giving:

np

n
i=0
np

i=0

(q 1) (1 p)

n i
ni
p (1 p)
i

p
(q 1) (1 p)

np

(p + (1 p))
1
Hence we have a upperbound:

Vq (n, pn) q hq (p)n

(9.26)

We now try to find a lowerbound on Vq (n, pn). We have:



n
pn
Vq (n, pn)
(q 1)
pn
The RHS of the above equation gives:
we note that:

Y
P
O
C
T
F
A

DR

n!
. Using Strilings approximation given in (eq. 9.25),
(pn)! (n pn)!

n
pn

2n

n n

e
=
npn npn
np
2 (n pn)
2np np
(1 + o(1))
e
n e1

12
n
n
n
1 2 1

enp
1p
n pn
pn
pn
(n pn)

n
1
n
enp

1
1p
npnp
((1 p) np) 2 (n np)

n(1p) np
1
1

enp
1p
p

q hq (p)n p logq (q 1) q np
q (hq (p)o(1))n

where we have used that p = o(1). Therefore, we see that


q hq (p)n Vq (n, pn) q (hq (p)o(1))n

(9.27)

We now have an upper and lower bound on the number of error sequneces that can occur on the transmission
of a codeword of length n. We now need to show a lowerbound for the rate, as a good error correcting code
qn
has a higher rate. Note that from Hamming bound and (eq. 9.27), we have: q k h (p)n . Therefore, we
q q
have: q k q nnhq (p) . Hence, on taking logarithms (base q) on both sides, we get:
k n nhq (p)

R 1 hq (p)
A stronger bound can be established

16

(9.28)

as:
R 1 hq (p)

(9.29)

16
See the statement of this theorem in [16]. The version in [24] however, provides the statement of the G.V bound as the one
obtained here.

GO TO FIRST PAGE

129

9.3. ERROR CORRECTING CODES

CHAPTER 9. CLASSICAL CODING THEORY

Where 17 is a small quantity. We now try to show that the above bound is voilated with an inverse
exponential probability, thereby asserting that there exists exponentially many error correcting codes that
satisfy the Gilbert Vasharmov Bound. More precisely,
Theorem: exponentially many codewords that satisfy the Gilbert Vasharmov bound.
Proof:
From the upperbound statement (eq. 9.26), putting the distance of a code as p:
Vq (n, np) q hq (p)n

q (hq (p)1)n q n

Vq (n, np)
q (hq (p)1)n
qn

Notice that the RHS of the above equation is nothing but the probability of a codeword (of block length n)
existing within a distance of pn of another codeword. Summing this probability over all the q k codewords,
we get the total probability of the code having at most distance pn. (Note that we are still dealing with
perfect codes and hence every pair of codewords would be separated by the minimum distance, which is also
the distance of the code.)

Y
P
O
C
T
F
A
Total Probability: P q k q (hq (p)1)n

From (eq. 9.29), we get k (1 hq (p) ) n.

DR

P q (1hq (p))n q (hq (p)1)n

q (1hq (p))n q (hq (p)1)n


q n

e(n)

Hence, we see that the probability of a code having at most distance np is e(n) . The complement of
this event however is the caseof a code having
a distance at least np. This is given by 1 P. Hence, we

(n)
see that this probability is 1 e
. Since this probability is very close to 1, we see that there are
exponentially many codes that satisfy the Gilbert Vasharmov Bound, and hence are good error correcting
codes.

Growth Enumerating Function [20]


Define: N (p) = R 1 + Hq (p). N (p) = 0 if the code attains the G.V bound. The quantity N (p) is called
1
the Growth Enumerating Function. Taking N = 2, R = and H2 (p) = p log2 p (1 p) log2 (1 p), N (p),
2
plotted as a function of p gives:

17

Notice that p is nothing but the probability of an error occurring. One can write p =

GO TO FIRST PAGE

130

t
.
n

CHAPTER 9. CLASSICAL CODING THEORY

9.3. ERROR CORRECTING CODES

Y
P
O
C
T
F
A

Figure 9.2: Graph showing the growth enumerating function N (p) with R =

1
. In this case, pGV 0.1101.
2

At p = pGV , N (p) = 0 and the code attains the G.V bound, thereby being a perfect code. For p < pGV ,
the growth function is negative and hence number of codewords at a distance less np from any codeword,
exponentially decreases with n. On the other hand, for p > pGV , N (p) > 0 and hence the number of
codewords at a distance greater that np from any codeword increases exponentially with n. Hence, taking a
Hamming space with points as codewords, we see that any sphere, centered at the codeword, with radius less
than npGV would not contains other points, whereas any sphere centered around the same codeword, with a
radius greater that npGV would contain exponentially many other points (codewords).

DR

Figure 9.3: Figure Gilbert Varshamov Bound in a Hamming Space: For radius of the sphere greater that
npGV , exponentially many points will lie inside the sphere and for radius less than npGV number of points
inside the sphere would decrease exponentially with n.
If N (p) < 0, then number of codewords inside a sphere (centered around any codeword), with radius np
decreases exponentially with n and similarly, if N (p) > 0, the number of codewords inside a sphere of radius
np would grow exponentially with n. Hence its name, Growth Enumerating Function.
GO TO FIRST PAGE

131

9.3. ERROR CORRECTING CODES

9.3.5

CHAPTER 9. CLASSICAL CODING THEORY

Parity Check codes [25]

Until now, we have concentrated on error detecting and correcting capacity of codes. Now we shift our
attention to the ease of error correction and detection. For this purpose, we introduce the parity check code,
which makes the error detection procedure easy. 18 Let {wi } be the code words transmitted across the
channel.
This encoding appends an extra bit b, called the Parity bit, to each code word wj C such that:

1 if # 1s in wj is odd
b=
. Hence, after this appending all the codewords will have even # 1s. Now, if
0 if # 1s in wj is even
any t-tuple error occurs (if t is odd), then the number of ones will become odd, i.e; the sum of the digits of
the code word will be 1( mod 2). The error detection operation would report error if the sum results in 1.
To check if the number of 1s is even, we just need to add up all the digits of the code checking if the sum
= 0. Let us consider all those codewords for which the # 1s is even. Each of these codewords will satisfy a
linear equation of the type 1 r1 + 2 r2 + . . . n rn = 0, with ri {0, 1}andi {0, 1}. If these are m such
codewords (over the same code alphabet) with even # 1s, then we have the system of linear equations or a
matrix equation:

11 w1 + 12 w2 + + 1n wn = 0
a11 a12 a1n
r1

a21 a22 a2n r2


21 w1 + 22 w2 + + 2n wn = 0


.
..
.
.
..
.. .. = 0 where aij {0, 1}
..

..
.
. .
. + .. + . . . + .. = 0
.

m1 w1 + m2 w2 + + mn wn = 0
a
a

a
rn

m1
m2
mn

Y
P
O
C
T
F
A

Recall that if there is no error, then # 1s is even for all the codewords, and hence we will have |C| linear
equations that can written in the martix form as above, using a (|C| n) matrix H, called the parity check
matrix. Hence the code digits {ri } in the vector X satisfies the equation HX = 0. This is now a very simple
condition, which if not satisfied, indicated the presence of an error. The ease in this method as opposed to
the naive method of error detection is that it takes only polynomial time to multiply two matrices whereas,
it takes exponential time to list down all the sequences in the error neighbourhood of a codeword and check
whether the received sequence is one of them.

DR
9.3.6

Linear Codes

We know that the set n forms a vector space over . However, any subset of n (say S), though uniquely
decipherable, need not form a vector space (over ). Let us look at some condition(s) on S that would make
it a vector space.
Assumption Let S = {xi } n be a set such that a matrix H over a field F where Hxi = 0 xi S, then
S forms a vector space over .
Justification: Let x1 , x2 S. This implies that a matrix H such that x1 , x2 S, Hx1 = Hx2 = 0. To
show that S is a vector space, we need to show that:
v (= x1 + x2 ) S: For this, it suces to show that Hv = 0. As H is a linear operator, Hv =
H (x1 + x2 ) = Hx1 + Hx2 0, for some vector k. Hence we see that Hv = 0, thereby justifying the
assumption.
v (= cx1 ) S: We need Hv = 0. We have: Hv = H (cx1 ) cHx1 = 0, thereby justifying the
assumption.
Hence, the set S forms a vector space over and is called a linear code. A linear code can also be defined as
the code that corresponds to the null space of a matrix which, here, is denoted by H and is called a Parity
Check Matrix for the S. Suppose we are given that G is some linear map, U forms a vector space and X be
a set such that u U, x X we have: Gu = x. It is simple to see that X has the properties of a vector
18

HX = 0

However, it cannot be used to correct errors as the exact location of the error cannot be determined.

GO TO FIRST PAGE

132

CHAPTER 9. CLASSICAL CODING THEORY

9.4. EXAMPLES

space, and hence forms a linear code. Such a matrix G is called a generator matrix of the linear code X .
Consider the following relation:
Gu = x
multiplying H on both sides:
Since x X , which a linear code:

HGu = Hx
HGu = 0

As this holds u U: HG = 0

The above is a very important property for linear codes19 .

9.4

Examples

Errors in classical binary codes are only of the bit-flip type, i.e; In a codeword c {0, 1} the symbol 1 is
replaced by 0 or vice-versa. Hence on passing a binary sequence across an erroneous channel, we expect the
bits in that sequence to be flipped.

9.4.1

Repetition Code[24]

Y
P
O
C
T
F
A

If suppose, the channel flips one bit with probability (where 1 and in practice 1). Therefore,
considering that each bit flip occurs independently, the probability of more bit flips is smaller (probability of
bit flips decays as inverse power of the # bit flips). As a result, by choosing an error model where 1, we
may safely ignore all except single bit flip errors. By this logic, if many copies of the same bit is passed across
the channel then only one of them would undergo a bit flip and by observing the majority of the bits, we can
conclude which bit had been flipped. Flipping back this bit would then give us the transmitted sequence.
This error correcting code where a single bit is repeated (the number of repetitions depends upon how small
is compared to unity) is called the Repetition Code. Hence the encoding map for the repetition code is
given as:

DR

0 000

& 1 111

(9.30)

Now consider the below procedure for correcting a single bit flip error on an arbitrary bit. Let 0101 be
the sequence to be transmitted. Now using the repetition encoding (eq. 9.30) we get 000111000111 to
be the transmitted sequence. The channel now flips one of the bits in the sequence. After receiving the
erroneous sequence, we simply compare block-by-block this sequence with the transmitted one and identify
the mismatch block. Only one bit would dier in these blocks. This, by majority voting, is the flipped bit and
it is flipped back to correct the sequence. On decoding this, we obtain back our transmitted sequence.
0101

encode

000111000111

transmission causing single bit flip Transmitted: 000111000111 &


Received: 010111000111. By
majority vote, 2nd bit has flipped

0101

19

decode

Flip the second bit to get:


000111000111

Note that the definition of linear code is sometimes given as HT x = 0, x X . In that case we have the condition HT G = 0

GO TO FIRST PAGE

133

9.4. EXAMPLES

DR

CHAPTER 9. CLASSICAL CODING THEORY

Y
P
O
C
T
F
A

GO TO FIRST PAGE

134

Chapter 10

Quantum Codes
10.1

Introduction

An n-qubit Quantum Code is defined as a 2k dimensional subspace of the 2n dimensional vector space over
2n
n-qubits (the field (C) ). The main dierence between classical and quantum codes (that will be made
use of in dierent bounds, is the fact that the quantum coding space is 2k dimensional as compared to the
k-dimensional classical coding space. A quantum codeword has n-qubits out of which the k qubits are called
the information bits, as in classical theory and the rest are redundant bits, used for error correction properties
(number of redundant qubits is bounded by the quantum singleton bound). The reason for the quantum
coding space to be 2k dimensional is because a code whose each codeword had k bits would, classically have
2k codewords. As there is no superposition states in case of classical codes unlike quantum codes, we see
that using these 2k classical codes, we can combine them with dierent amplitudes to form more codewords.
Hence the 2k classical codewords of the n-bit classical code are taken as the basis vectors of the n-qubit
quantum code. Since the dimensionally of a Hilbert space is the number o basis vectors used to describe it,
we see that an n-qubit quantum code is 2k dimensional. To dierentiate this notation from the classical case,
we use: [[n, k, d]]q to denote a 2k dimensional quantum code which is a subspace of 2n dimensional vector
space, with the codewords separated by a distance d. The subscript q denotes the alphabets size, which is
taken always as two, in all further explanations1 .

DR
10.2

Y
P
O
C
T
F
A

Errors in Quantum codes

Just as in the classical case, errors in quantum codes can be bit-flips on the basis vectors of the code. Notice
that in the case of quantum codes, we have codes which are superpositions of the basis (with some amplitudes)
and hence there is scope for errors aecting this superpositions, by changing the phases of these amplitudes.
These are called phase errors. It is easy to see that if the original code is in the computational (Z ) basis,
then bit flip errors are caused by X and phase flip are caused by Z operators. However, we in the X basis,
we can see that Z causes phase errors.
We now extend the scope of errors by noticing that any unitary operator that acts on n-qubits is an error.
To build a framework to analyse these errors, we view the quantum computer (system that performs the
computation) with state s as a closed system and the errors as the bath or the environment, with b in
which the quantum computer is placed. The combined open quantum system not is described by the density
matrix s b . The error operation is now described as a map or a time evolution of this open quantum
system. After we have found how the open system evolves, we sum over all degrees of freedom of the bath (or
by taking a partial trace of s b with the environment) thereby obtaining the change in the closed system
1

The general case where q = d for some d is called a qudit code.

135

10.2. ERRORS IN QUANTUM CODES

CHAPTER 10. QUANTUM CODES

or the quantum computer (incorporating the eects of the bath, during the evolution). But as a map, not
much information can be extracted about the error operation. There are two ways of representing this error,
firstly we try to represent this operator as discrete operators. This representation is called the Operator Sum
Representation.. After this, using the operator sum representation, we express the time evolution of s b
using the master equation.

10.2.1

Operator sum representation [3] [22]

The evolution of an open quantum system can be described using an unitary operator U (which acts on the
tensor product state of the system and the bath):
S,B (t) = U (S (0) B (0)) U

(10.1)

Let the states of the system and the bath be given in terms of thier basis: S (0) =

ij

b || respectively.

S,B (t) = U

Tracing over the bath we get: S (t):

DR

|ij| U

S (t) = trB U
=

ij

ij

|ij| U

| U |ij|U |

Assuming that the unitary evolution has the form: U = US UB : U |i =

uj |.

sij |ij| and B (0) =

Y
P
O
C
T
F
A
ij

Hence we get:

S (t) =

ij

sij b |

ij

ui |

ui | and j|U =

uj | |

sij b ui uj ||

Consider the inner product |: since is a state of the bath, it must be contracted only with the
ket vector of the bath (and hence the other vector is left unchanged, or equivalently multiplied by identity
operator of the system) which is |. Due to the orthonormality of the basis vectors, this inner product will
give: , |. Using this technique on both the inner products in the above expression, we get:

S (t) =
sij b ui uj , , ||
ij

Summing over and and noticing that the only nonvanishing comonent is from = and = , we
have:

S (t) =
sij b ui |uj
ij

GO TO FIRST PAGE

136

CHAPTER 10. QUANTUM CODES


Summing over and , and noticing:

10.2. ERRORS IN QUANTUM CODES

ui | = u , we can substitue in the above expression. This

is nothing but contracting a fourth rank tensor (U , with elements ui ) with a rank one tensor (vector) to
obtain a third rank tensor (tensor product between a second rank tensor and a vector).
S (t) =

ij

sij b u |ij|u

b u
|ij| u
ij

u S u

Putting u = |U | and u = |U | and letting E =

S (t) =

E E

b |U |, we get:

We see that the evolution of the density matrix is given by the noise function: S (t) = (). This noise
function can be represented as in the above summation form:

Y
P
O
C
T
F
A
() =

E E

(10.2)

the above form is called the Operator sum representation.

DR
10.2.2

Lindblad form using the Master Equation [24]

We
= (s (0)), which in the operator sum representation becomes: s (t) = (s (0)) =
have: s (t)

Ek s (0) Ek . Now expanding (t) in a taylor series:


k

O (t)
t

s (t) = s (0) + O (t) =


Ek s (0) Ek
s (t) = s (0) +

(10.3)
(10.4)

Expanding the RHS, we get:


s (0) + O (t) = E0 s (0) E0 +

Ek s (0) Ek

k1

Comparing term by term in the above equation: E0 Is + O (t) and Ek O (t). Using the Hamiltonian
of the system to be Hs , we can write the operator E0 (for infinitisimal time translations) terms of Hs :

i
E0 = Is + K Hs t

where K is some hermitian operator. For the operators Ek , k 1, we have the general form:
Ek =
GO TO FIRST PAGE

tLk
137

for k 1

10.2. ERRORS IN QUANTUM CODES

CHAPTER 10. QUANTUM CODES

From the trace-preserving condition of Ek as established in (eq. ??), we see that:


Is =

Ek Ek

= E0 E0 +

Ek Ek

k1



i
i
= Is + K + H t Is + K H t +
Lk Lk t

k1

i
i
2
= Is + 2Kt + K + H
K H (t) +
Lk Lk t

k1


2
Ignoring (t) terms: 0 = 2Kt +
Lk Lk t
k1

1
K=
Lk Lk
2

(10.5)

k1

Y
P
O
C
T
F
A

Putting the above equation in (eq. 10.4) we get:

i
i
s (t) =
Is + K H
s (0) Is + K + H
+
Lk s (0) Lk t

k1


i
i
Using (eq. 10.5) = s (0) +
K H s (0) Is + Is s (0) K + H +
Lk s (0)Lk t + O t2

DR

k1

We can now make the approximation: (t) = (0) + O (t). Puttting this in the above equation, we see
2
that this t factor compbines with the t already present outside the square-paranthesis, to give a (t)
contribution that can be ignored. Therefore, merely replacing (0) (t), inside the O (t) does not aect
the expression. Making this change, we get:

= s (0) + Ks (t) + s (t) K


Hs (t) s (t) H +
Lk Lk t + O t2

k1

i

1
= s (0) + s (t) ,
Lk Lk [H, s (t)] +
Lk Lk t

2
k1

k1

We can now compare the above equation with (eq. 10.3):

i
1
= [H, s (t)] +
Lk s (t) Lk s (t) ,
Lk Lk

2
k1

k1

The above is called the master equation for the density matrix of the system and Lk are called Lindblad
Operators. Notice that this is just the incorporation of the eect of the bath in the usual time evolution

i
equation for the density operator of the close system:
= [H, (0)]. We can now reaggrange the term:
t

GO TO FIRST PAGE

138

CHAPTER 10. QUANTUM CODES

k1

10.2. ERRORS IN QUANTUM CODES

1
Lk s (0) Lk s (0) ,
Lk Lk :

2
k1

1
1
Lk s (0) Lk s (0) ,
Lk Lk =
s (0) Lk Lk +
Lk Lk s (0)
Lk s (0) Lk

2
2
k1
k1
k1
k1
k1

1
=
Lk s (0) s (0) Lk Lk +
Lk s (0) Lk Lk s (0)
2
k1
k1

=
Lk , s (t) Lk + Lk [s (t) , Lk ]
2
k1

we therefore have the Lindblad master equation, where Lk are called the lindblad operators [3].

i
1
(t) = [H, s (t)] +
Lk , s (t) Lk + Lk [s (t) , Lk ]

(10.6)

k1

10.2.3

Error Correction Condition for Quantum Codes [24]

Y
P
O
C
T
F
A

We saw that the classical error correction condition is nothing but the distance bound on the codewords, but
in the quantum case, unlike the classical case where all known errors are only of one type-the bit flip, there
can be arbitrary unitary errors. Hence we have a constraint on these errors, if they are to be correctable.
When we say that an error, the map , is correctable, we mean that there exists another map or operation2 R,
such that composition of R with would give back the initial state . The error correction condition says that:

Let C be a code, P be a projector onto the code space, be a noise (quanum operation) having operational
elements {Ei }. an error-correction operation R correcting {Ei } on C i:

DR

P Ei Ej P = ij P

(10.7)

where ij are entries of a hermitian matrix .

Proof Idea: The key idea here is that quantum operations, to be reversible, cannot increase distinguishability. The Error operation, if correctable, must be such that, if there are two codes such |i , |j which are
orthogonal (or not orthogonal - indistinguishable through a measurement), they must continue to be so even
after the error has occurred. Hence, if we have any two error operators Ei and Ej acting on |i and |j
respectively, such that: |i = Ei |i and |i = Ej |j then:
i |j i |j

i |Ei Ej |j i |j
The above equation must hold for all elements (|i , |j ) Code space. Hence, we can consider the projector
operator P , onto the code space, instead of individual vectors. Therefore,
P Ei Ej P P
The proportionality constant has to be some real number.
P Ei Ej P = P
or, generally, the constant can be the elements of a hermitian matrix. Formal Proof: To formally prove
the theorem, we need to show the necessity and sucinecy of the above comdition. In other words, we need
to prove the theorem as well as its converse. We can start by showing suciency condition.
2

Symbol R is not to be confused with rate of a code. The contexts of usage will be entirely dierent.

GO TO FIRST PAGE

139

10.2. ERRORS IN QUANTUM CODES

CHAPTER 10. QUANTUM CODES

Claim: an error
correcting operation R such that R () , given that P Ei Ej P = ij P .
Proof: Let Fk =
uik Ei be a set of unitarily equivalent set of errors, uik begin elements of the unitary
i

operator U that diagonalises . Hence, D = U U is diagonal, with elements dkl = dkk kl .

P Fk Fl P =
uki ujl P Ei Ej P
ij

By Assumption:

uki ujl ij P

ij

ij uki ujl P

ij

= U U
= dkl P

= kl dkk P

(10.8)

Consider now the polar decomposition Fk P :


Fk P = Uk

P Fk P

= Uk dkk P

Fk P Uk = dkk Uk P Uk

(10.9)

Y
P
O
C
T
F
A

Fk P U
Uk P Uk = k
dkk

Define the subspaces, based on the LHS of the above equation:

DR

Pk =
Pl =

Uk P Uk
Ul P Ul

(10.10)
(10.11)

We now show that the two (above) code spaces are orthogonal, i,e; Pl Pk kl :
from (eq. 10.8)kl

Ul
P Fl Fk P
Uk

Pk Pl =
dkk dll

kl Ul Uk
Pk Pl =
dkk dll

kl

What is the significance of the code spaces being orthogonal ? Finally we show the existance of an
error correcting operation R () . R is also, in some sense, similar to a quantum operation.
More precisely, it
is the inverse of a noise operation. It too has a similar operator sum representation.
Hence3 , R () =
Ri () Ri .
i

R () =
=

U Pk () Pk Uk

Uk Pk Fl Fl Pk Uk

kl

kl kl

kl

where

(10.12)

kl = Uk Pk Fl

3
Notice the position of the daggers, as opposed to the definition of a noise operation in (eq. 10.2). It is, in this sense, the
inverse of a noise operation.

GO TO FIRST PAGE

140

CHAPTER 10. QUANTUM CODES

10.2. ERRORS IN QUANTUM CODES

From (eq. 10.11): Pk = Uk P Uk , and by multiplying both sides of (eq. 10.9) by Uk , we get: Fk P Uk =

Fk P U
dkk Uk P Uk dkk Pk . Hence, we have: Pk = k . Substituting this form of Pk in the expression
dkk
for kl , we get:

U Uk P Fk Fl P
kl = k
dkk

P Fk Fl P

=
dkk

kl dkk
from (eq. 10.8) =
dkk

= dkk kl
Putting the above form of kl in (eq. 10.12), we get:

R () =
=

dkk kl

kl

dkk

R ()

Y
P
O
C
T
F
A

Hence, we have shown the existence of an error correction operation, thereby proving suciency.

We now need to show necessity, of the error correction codition. Since, the error acts on the encoded
state, present in the code space, given by P P , we have:
Claim: P Ei Ej P = ij P i R such that R (P P ) P P , 4 .
Proof: In the operator sum representation,

R (P P ) =
Rj Ei P P Ei Rj

DR

ij

By assumption, RHS of the above equation is P P .

Rj Ei P P Ei Rj P P cP P
ij

ij ij
= cP P

ij

where ij = Rj Ei P . In the above equation, we see that the operation


operation cP on . Putting back the expression for ij , we get:

Rk Ei P cP Rk Ei P =
ck P

(Rk Ei P ) =

ck P

ij , on is equivalent to the

ij

P Ei Rk =

similarly, we have for Ej : Rk Ej P =

(10.13)

ck P

(10.14)

cl P

multiplying (eq. 10.13) and (eq. 10.14): P Ej Ei Rk Rk Ei P =

cl ck P

kl


ij

P Ej Ei P

= ij P

Note that acts on the element of the code space, corresponding to : P P , and not on .

GO TO FIRST PAGE

141

10.3. DISTANCE BOUND [?]

CHAPTER 10. QUANTUM CODES

where ij are elements of some hermitian matrix. Not clear, why they have to be elements of a hermitian
matrix.
Therefore, we have proved both the necessity and suciency of the quantum error-correction condition,
thereby proving the theorem.

10.3

Distance Bound [1]

In section (sec. 9.3.3) we saw that the distance of a t-error correcting code must be greater than 2t. We have
the same bound for quantum codes also, with a small extra condition. Let us follow this statement for the
case of stabilizer codes. Notice that when an error E occurs on a stabilizer code S, it either anti-commutes
with one stabilizer (generator), or commutes with all the generators. Let us consider the former case, which
results in an orthogonal code-space after the action of the error. This can be distinguished from the original
code and hence the error recovery can be performed. In this case, irrespective of the distance of the code (even
if the distance of the code is 1 - the anti-commutation condition separated the erroneous and the original
code-spaces), we can detect and also recover from the error. Taking up the latter case of the two possibilities
for an error operation, if the error is not a stabilizer: E Z (S) (in which case we need not correct this
error) we see that this gives a codes-pace that is not orthogonal to the previous one. This corresponds to
the classical case and hence we need to condition the distance of the code for errors in Z (S) S. With this,
we first define the notion of distance of a stabilizer code. Note that the distance of a general quantum
code is cannot be just defined as hamming distance between the basis vectors as then all 2n1 vectors (in an
n-qubit code) which have dierent relative phases, would have the same distance. However, we can define
the hamming distance for two stabilizers as in he classical sense. In the classical case we see that the distance
of a code can be interpreted as the minimum number of bit flips two codewords, a code can correct. The
distance of the stabilizer code is defined as the minimum weight of an element in Z (S) S, where the weight
of an operator wt(E) is number of non-trivial (non-identity) tensor factors in E. Hence we have:

DR

Y
P
O
C
T
F
A

Theorem: A t-error correcting quantum stabilizer code must have a distance d 2t + 1.

10.4

The Quantum Singleton Bound [1]

In section (sec. 9.3.4) we discussed the classical case of the singleton bound. The main idea in the singleton
bound is as follows: As the code has only k information bits, it suces to consider all the codewords to be
vectors in the 2k dimensional vector space. But this implies that each codeword is at a distance of 1 from
each other. The same was possible in the classical case where we could have taken the block length of each
codeword as k (instead of n), with each codeword being at a distance 1 from each other. This was clearly
not desirable as we wanted a distance d code, as a result of which we had to introduce5 d 1 ancilla bits
to make each codeword separated by a distance d from each other. This increases the block length from k
to n = k + d 1. This statement gives the classical singleton bound: n k d 1. In the quantum case,
there are both the bit flip (classical errors) as well as the phase flip errors (which are bit flip errors in the
X basis). Clearly, more number of error operators suggests that the classical bound cannot carry on for the
quantum case as well, as otherwise as we would have just corrected the bit flip (Z ) errors. Now if we assume
that, in addition to t bit flip errors, there are t phase flip errors as well, then we need to append d 1 more
ancilla qubits (or tensor factors in the stabilizer) to account for the phase errors6 .
Theorem: For an [[n, k]]2 quantum code C, we have: n k 2(d 1).
5

We need to show how introducing d ancilla bits increases the distance since it is not obvious.
This needs to be shown, as again it is not very obvious. Note that there is another ambiguity as it seems like we have not
used a similar assumption for the bit and phase flip errors in the previous section, while discussing the distance bound.
6

GO TO FIRST PAGE

142

CHAPTER 10. QUANTUM CODES

10.5

10.5. QUANTUM HAMMING BOUND

Quantum Hamming Bound

In (sec. 9.3.4) we discuss the hamming bound for classical codes. Notice that the only dierence between
an (n, k)q classical code and an [[n, k]] quantum code code is that the former is kdimensional whereas
the latter is 2k dimensional. There are two ways of realising the quantum hamming bound. One is by
considering an equivalent classical code whose dimension is 2k , which works for general qary codes and the
other, specifically for binary codes, is based on the fact that there are three error operators in the quantum
case, as opposed to only one in the classical case.
7

Take the former method first. Notice that a (n, k)q classical code can be seen as describing the state of n
two state classical systems. Similarly, a [[n, k]]q describes the state of n two state quantum systems. More
explicitly, let us take n = 1 and q = 2. A two state classical system can be described using just two distinct
(orthogonal vectors) with each vector corresponding to one of the states of this system. The states of the
system would be
{0, 1}. On the other hand
consider a spin half (two state quantum) system. This can
|0 + |1 |0 |1
states. There are four states the system can be in and they can
exist in one of |0, |1,
,
2
2
be described using the quantum code: {|0, |1}. Certainly, the classical code {0, 1} would not be able to
describe all the states, whereas the classical code {00, 01, 10, 11}, or equivalently {a, b, c, d} would be able to.
In order to describe a 2state quantum system, we require a 4ary quantum code and similarly to describe
n 2state quantum systems8 , we need n 4ary classical codes (as opposed to n 2ary quantum codes.).
Therefore a 2 ary quantum code is equivalent9 to any 4ary quantum code.

Y
P
O
C
T
F
A

Let us now go back to the general case of qstate system, so as to look at the theorem for qary codes. A
qary quantum system has states that span a qdimensional hilbert space. The basis vectors of this hilbert
space form the states of the system. Hence are 2q states the q-state system can be in and we would require
a 2q -ary classical code to describe this system. Now, summarising we have seen that a q ary quantum
code is equivalent to a 2q ary classical code. As a result, we can take the quantum bound for the qary
quantum code to be the same as that for a 2q ary classical code. This gives the quantum hamming bound
(with q = 2):

DR

Theorem: For an [[n, k]]2 quantum code C, we have:


|C| t

2n
n

m=0 m

3m

(10.15)

Taking now the second method, we note that the only type of error a classical error correcting code has to
correct is a bit flip error, as opposed to a bit flip, a phase flip as well as a bit-phase flip errors a quantum code
has to correct. Let us recall the way we had derived the quantum hamming bound in (sec. 9.3.4), observing
more closely the footnotes in this section (pertaining to quantum codes). While calculating the total number
of binary code sequences that dier from a given sequence in (eq. 9.22), we had taken a fixed sequence u =
u1 u2 . . . uk1 . . . ukm . . . un and we counted the number of sequences of the form v = u1 u2 . . . vk1 . . . vkm1 . . . un
with ui = vi , i {k1 . . . km1 }. Here we had made a key assumption that (for the case of binary classical
codes and the assumed error model ) if ui is fixed, then the only value vi can take is a bit fliped value of
ui , or in the quantum coding terminology, we had assumed that if ui = |0 then vi = Xui |1 where X
is the bit-flip operator (we had also later remarked that it is this assumption that needs to be modified in
the quantum case). In the quantum code, the assumed error model consists of both
X and the phase-flip Z

|0 + |1 |0 |1
operators. Hence, now if ui = |0, then vi can be any one of {X|0, Z|0, XZ|0} |1,
,
2
2
corresponding to bit flip (X), phase flip (Z) and the bit phase flip (XZ or iY ) errors respectively. With
7
An alternate proof, for the case of binary codes, can be found at [24] which emphasizes the presence of three distinct errors
operators (pauli matrices X, Y and Z) for quantum codes as opposed to only one (bit flip) for classical binary codes.
8
We assume that these n systems are non-interacting.
9
That is, we can have a bijection between a 2 ary quantum code and a 4ary classical code.

GO TO FIRST PAGE

143

10.6. THE QUANTUM GILBERT VARSHAMOV BOUND [?]

CHAPTER 10. QUANTUM CODES

n m
3 . Continuing the same procedure as in (sec.
m
m=0
9.3.4) now gives the equation of the hamming bound for quantum codes, as described in (eq. 10.15).

this, we see that the quantity in (eq. 9.22) becomes

10.6

The Quantum Gilbert Varshamov Bound [2]

Just as in the classical case (sec. 9.3.4), consider the quantum hamming bound for perfect codes:
t

n i
k
2
3 = 2n . We can add a positive definite quantity to the LHS of the above equation, obtaini
i=0
2t
2t

n i
n i
k
n
ing the inequality: 2
3 = 2 . As per the notation defined in (eq. 9.23), we see that
3 =
i
i
i=0
i=0
2t

n
i
(4 1) V4 (n, d 1) where we have used the fact that for perfect codes: 2t + 1 = d.
i
i=0
10

2k

2n
V4 (n, d 1)

(10.16)

Thereby showing the Quantum Gilbert Varshamov bound. Similar to the classical case, this above bound
just states that there exists some (in fact, atleast one) perfect quantum code(s), which is not of much use
as we need some statement that assures that there are a significant number of quantum codes. We will first
show that these codes have a rate that is very close to 1, thereby making them perfect. Also notice that
the quantum gilbert varshamov (eq. 10.16) for codes over C2 is the sames as the classical gilbert varshamov
bound for codes over F4 . Hence using the general q-ary analysis as in (sec. 9.3.4) we can construct the
asymtotic bound for the quantum case. Before that, we note a very important point that distinguishes the
quantum case from the classical case. This is based upon the error models in the classical (where we only
deal with bit flip errors) and quantum cases (where we deal with both bit as well as phase flip errors, cased
by pauli operators X and Z respectively.). While showing the asymtotic form in the classical case we have
assumed a binary symmetric channel that will cause an error with probability p and leave the bit unchanged
with probability 1p, thereby causing the average # errors to be 2p. In the quantum case we assume a similar
binary channel which will cause a bit flip error with probability p1 and a phase flip error with probability p2 ,
moreover we will assume this channel to be symmetric, by taking p1 = p2 . Hence the error probability per
bit becomes 2p thereby making the average number of errors as 2np. Extending (eq. 9.26) to the quantum
case (q = 4 and p 2p), we have:

DR

Y
P
O
C
T
F
A
V4 (n, 2np) 4h4 (2p)n

(10.17)

Using the above equation along with (eq. 10.16), we obtain:


2n
V4 (n, 2np)
2n
2h (2p)n
2 4
2n2h4 (2p)n

2k

Using (eq. 9.24) with q = 4 :


Using (eq. 9.24) with q = 2 :
10

k n 2h4 (2p) n
k
1 2h4 (2p)
n
R 1 2 [2p log4 3 2p log4 (2p) (1 2p) log4 (1 2p)]
1 2p log 3 [2p log 2p (1 2p) log (1 2p)]

R 1 2p log 3 h2 (2p)

Given as an exercise in [24]

GO TO FIRST PAGE

144

(10.18)
(10.19)

CHAPTER 10. QUANTUM CODES

10.7. EXAMPLES

# errors
t
= . Here is where we also use that fact the underlying
# total qubits
n
quantum error correcting code can correct t erros. Now we are left with the task of showing that this bound
is satisfied significant, more precisely, an exponential number of times, by showing the the above bound it
violated with inverse exponential probability. But for this we can exactly follow the steps in section (sec.
9.3.4). Hence we see that there are an exponential number of perfect quantum codes too.
As the error probability is nothing but

10.7

Examples

10.7.1

Bit Flip code [24]

Just as the case in classical channel, a quantum channel too can cause a single bit flip (we ignore multiple
bit flips on the same codeword, based on the assumption that they occurs with negligible probability). A bit
2k

flip error on a [[n, k, d]]2 quantum codeword | =


i |i aects each of the 2k basis vectors in {|i}. Each

Y
P
O
C
T
F
A
i=1

basis vector can be considered as a (n, k, d)2 classical code. Upon transmission of |, each |i undergoes
a bit flip, i.e; one of the n components of |i is flipped. It suces to detect and correct errors on each |i
independently. Hence we need to consider a repetition code where each component11 |i is repeated (again,
the number of repetitions depends upon the probability of bit flip). Suppose each component is repeated m
times, then |i becomes a vector in a 2mn dimensional space. Consider the encoding map for a particular
case, k = 2 where number of repetitions, m = 3:

DR

|00 |000000 & |01 |000111

(10.20a)

|10 |111000 & |11 |111111

(10.20b)

The error correction procedure, for some state | = |01 + |11 can be summarised as:

|10 + |11

encode

|111000 + |111111

transmission causing single bit flip

|10 + |11

decode

Transmitted:
|111000 + |111111 &
Received:
|111010+|111101. By
majority vote, 5th bit has
flipped
Flip the fifth bits in each
to get:
|111000 + |111111

The encoding circuit is given for a simpler case where k = 1 (spin half system), and m = 3. Note that the
minimum value of m = 3, as by the distance bound, a 1-error correcting code must have a minimum distance
of 3.
11
Notice that the key concept of the repetition code is to repeat (create copies) of the entity which is aected entirely by the
error. In this case this entity is the component of the basis vector and not the basis vector itself since we are not assuming
multiple qubit errors. Unless all the components of the basis vector is aected simultaneously, the basis vector would not be
entirely aected.

GO TO FIRST PAGE

145

10.7. EXAMPLES

CHAPTER 10. QUANTUM CODES

Figure 10.1: Figure showing the error correction of a single bit flip error, on a three-qubit state. The portion
of the circuit before the single qubit error (transmission) is the encoding part. The decoding procedure is
not explicitly shown. It is done merely by ignoring two qubit and taking only one.

DR

Y
P
O
C
T
F
A

Error Correction condition for the repetition code

We now try to establish the error correction condition in (sec. 10.2.3), explicitly for the case of the repetition
code, thereby showing
that any two single

qubit error can be corrected


by the repetition code. The repetition
code is given by | = |0 + |1 |L = |000 + |111 and hence any projector onto the code
2
2
subspace, P in (eq. ), would be: P || || |000000|+|2 |111000|+|000111|+||
|111111|.

12

The error operators here are single qubit errors, and are pauli operators: {X, Y, Z} (m) |m {X, Y, Z}

(We use the notation (m) to denote the mth pauli matrix in the set: {X, Y, Z}.). Hence the quantity
(m) (p)
Ej Ek j k . We now need to show that (eq. 10.7) is satisfied for the repetition code (with the above
(m) (p)
k || = ij ||. It now
(m) (p)
|j k | would be (ignoring

substitutions). Hence the error correction condition becomes: ||j


(m) (p)
k |

suces to show that |j

12

= ij , (ij can also be 0). Notice:

Given as an exercise in [24]

GO TO FIRST PAGE

146

CHAPTER 10. QUANTUM CODES


(m) (p)
k

constants &.): (000| + 111) j

= (|1j m,X |1j p,Y + |0j p,Y ) (|1k m,X |1k p,Y + |0k p,Y )

(m) (p)
k |000
(m) (p)
j k |111
(m) (p)
000|j k |111

= p,X + p,Y

Similarly, 111|j

Similarly,

(|000 + |111). Notice that13 :

(m) (p)
k |000
(m) (p)
000|j k |000

10.7. EXAMPLES

(m) (p)
111|j k |111

Summarising:

(m) (p)
|j k |
(m) (p)
k P

Finally, P j

= 000 |(|1j m,X |1j m,Y + |0j m,Z ) (|1k p,X |1k p,Y + |0k p,Z )
= (p,X + p,Y ) (m,X + m,Y )

= (|0j m,X + |0j p,Y |1j p,Y ) (|1k m,X + |0k p,Y |0k p,Y )

= 000 |(|0j m,X + |0j m,Y |1j m,Z ) (|0k p,X + |0k p,Y |1k p,Z )
= (p,X + p,Y ) (m,X + m,Y )

= p,Z m,Z

2
2
= || + || p,Z m,Z + ( + ) (p,X + p,Y ) (m,X + m,Y )
= [p,Z m,Z + 2 (p,X + p,Y ) (m,X + m,Y )] P

Hence we have shown the error correction condition holds, moreover all the constants are real, so they can
form the entries of a hermitian matrix, as the error correction condition demands.

10.7.2

Y
P
O
C
T
F
A

Phase flip Errors [24]

DR

It must be noted that phase errors do not aect each basis vector independently, unlike the bit flip error,
and therefore they have no classical counterpart. This error only aects14 superposition states (again, which
do not have classical counterpart). Under this error |0 + |1 |0 |1.

But we have the property of the Hadamard gate H which can transform a Z (phase flip) error to a X
(bit flip) error. We would now be considering the code in the X-eigenbasis, by applying the H operator to
each codeword, such that phase flip errors turn into bit flip errors and then use the same error correction
procedure as before. Now the encoding circuit would consist of H gates on each qubit just before it is send
for transmission (into the single qubit error channel). The encoding map would therefore be:
|0 |000 | + ++

|1 |111 |

| = |0 + |1 | + ++ + |

To bring the code back to the Z eigenbasis (the at receiving end, just before decoding), we need to apply the
H (notice that a property of H is: H = H). Summarising the above error correction procedure, we have
the following error correction circuit.

13

Note: We are now using the convention: |0t |0 . . .

1
|{z}

tth location

. . . 0 and similarly |1t |1 . . .

0
|{z}

tth location

. . . 1.

14
Truely speaking, it also aects states that are not in a superposition. Ex. under this error, |1 |1. But there is just
an overall phase dierence (which can be ignored, as we consider all states only upto some arbitrary phase, to be equivalent.)
between the original and the erroneous sequence.

GO TO FIRST PAGE

147

10.7. EXAMPLES

CHAPTER 10. QUANTUM CODES

Y
P
O
C
T
F
A

Figure 10.2: Figure showing a quantum circuit for correction of a single phase error in a three qubit quantum
code. Note that this figure is similar to the circuit for correcting bit flip errors, except at the encoding and
decoding portions.

DR
10.7.3

Bit-Phase flip errors - Shor Code [24][23]

We now consider the possibility of both bit and phase flip errors occurring simultaneously15 . Therefore taking
the action of such an error on some codeword (k = 2), we get: |0 + |1 |1 |0. We can generalise
this to higher dimensional codewords also, keeping in mind that only one qubit will be aected by both these
l
r1
lr
errors. Hence, on this error: (|0 + |1) (|0 + |1)
(|1 |0) (|0 + |1)
, where
th
the error has occurred on the r qubit.
To correct this error, we must employ both of the above techniques. We first construct a repetition code as
in (eq. 10.20). Then we take each qubit in this code to the X basis using the H gate, to correct the phase

15
Notice that these two errors are independent. Hence if the probabilities of bit and phase flip errors are (upto linear order)
b and f respectively, then the probability of both occurring together is b f , which again is of linear order.

GO TO FIRST PAGE

148

CHAPTER 10. QUANTUM CODES

10.7. EXAMPLES

flip errors16 . The encoding map is therefore given by:

|000 + |111
|000 + |111
|000 + |111

|0 |0L =

(10.21a)
2
2
2
|000000000 + |000000111 + |000111000 + |000111111 + |111000000 + |111000111 + |111111000 + |11111111

|000 |111
|000 |111
|000 |111

|1 |1L =

(10.21b)
2
2
2
|000000000 |000000111 |000111000 + |000111111 |111000000 + |111000111 + |111111000 |11111111

8
With this, we see that | = |0 + |1 transforms as: | | = |0L + |1L .

|000000000 + |000000111 + |000111000 + |000111111 + |111000000 + |111000111 + |111111000 + |11111111

|000000000 |000000111 |000111000 + |000111111 |111000000 + |111000111 + |111111000 |11111111

+
8
(10.22)

| =

Y
P
O
C
T
F
A

We now see the eect of phase and bit flip errors on this code. Starting with the bit flip first, notice that if
there is a bit flip error on the j th qubit, then in the block (of 3 qubits) where qubit j is present, the parity
of qubits would dier (this parity check is done using a CNOT gate). That is, if qubit j is present in the
block with indices l, l + 1, l + 2 then the signs of |Zl Zl+1 | (or/and |Zl+1 Zl+2 | ) would dier from
the rest. Depending on which one(s) dier(s), we apply the X operator at the right position17 .

DR

Coming to the phase flip errors, as seen earlier we apply the H gate to every qubit to get the codrword in
terms of the X eigenvectors. Now the same process as for the bit flip error is repeated18 .
Summarising, the encoding circuit for correcting arbitrary phase and bit errors on a single qubit, is given
by:

16
Each time we take three repetitions of the basis vector because by the distance property, to correct a single error, the
distance of the code must be at least 3. Therefore, by the singleton bound we must have n k to be atleast 3. Before repetition,
n = k = 1 and after repetition, we satisfy the condition n = 4 ( n k = 3).
17
To state the actions performed based on the particular sign dierence we have:

Table 10.1: Table showing the various measurement results for bit flip error location.
|Zl Zl+1 | |Zl+1 Zl+2 | Flipped qubit position Correction Action: apply
+
+
No Error
In
+
l+2
Xl+2
+
l
Xl
l+1
Xl+1
18
We note an interesting property that both |0L and |1L are eigenvectors of the operators X 6 I3 , I3 X 6 and
I3 X 3 I3 . So, when a phase flip occurs, measuring the sign dierence in the expectation values of the above operators,
we can detect the location of the phase flip error.

GO TO FIRST PAGE

149

10.7. EXAMPLES

DR

CHAPTER 10. QUANTUM CODES

Y
P
O
C
T
F
A

GO TO FIRST PAGE

150

Figure 10.3: Circuit showing the correction of single qubit bit and phase flip errors. In this case, after

Chapter 11

Stabilizer Codes
11.1

Pauli Group

The pauli group contains the set of pauli matrices1 .It is a matrix group over the field C2 (or a qubit), denoted
by: (, ) and contains the set of operators:

Y
P
O
C
T
F
A
= {I2 , iI2 , X , iX , Y , iY , Z , iZ }

(11.1)

The pauli group is the group generated by the pauli matrices: (, ) = G where G = {X , Y , Z }. The
commutation relations between the pauli matrices:2

DR
where i,j,k

[Y , Z ] = iX

(11.2a)

[X , Z ] = iY

(11.2b)

[X , Y ] = iZ

(11.2c)

In short:

[i , j ] = ii,j,k k i, j, k {X , Y Z }

if i = j or j = k
0
is the levi-civita tensor such that i,j,k = 1
if i, j, k is an even permutation

1 if i, j, k is an odd permutation

(11.2d)

We note some properties of the pauli group:

1. Any two elements of the pauli group either commute or anticommute3 . Hence gi , gj (, ) :
[gi , gj ] = 0 or {gi , gj } = 0.
Justification: It suces to verify this for every pair of elements in (, ). Note that, in general the identity (and hence I, iI) commutes with every operator and all operators commute with themselves.
Hence we only need to check if [gi , gj ] = 0 or {gi , gj }, gi , gj (, ) , gi = gj . From the property of
pauli matrices we see than the latter case is true. Hence we have the justification. As a consequence:
A, B (, ) we have: A B = A B or A B = A B.
2. All elements of the pauli group square to identity: g g = I g (, ).
Justification: We can verify this for individual elements of the pauli group, keeping in mind that the
pauli matrices have the property: A A = I A {X , Y , Z }.

0 1
0
i
1
0
Now onwards, Pauli matrices refer to the four matrices: X =
Y =
Z =
I2 =
1 0
i 0
0 1

1 0
. Also we set = 1.
0 1
2
The commutator is defined for two matrices as: [X, Y ] = X Y Y X, where stands for the matrix multiplication
operation.
3
Anticommutator for for matrices of defined as: {X, Y } = X Y + Y X.
1

151

11.2. MOTIVATION FOR STABILIZER CODES

CHAPTER 11. STABILIZER CODES

3. All elements of the pauli group are hermitian, i.e; E = E, E (, ). This can also be verified
explicitly for each element of the pauli group, keeping in mind the property that the pauli matrices are
hermitian. Since the operators are hermitian and square to I, we can see that they are unitary too.
4. All the above properties of the pauli group are satisfied i they hold for the generating set of the pauli
group, which is {X , Y , Z }.
Justification: We can justify the contrapositive by showing that if any property does not hold for the
generating set then it does not hold for the Pauli Group too. Notice that any element g (, ) can be
expressed as a finite string of elements in G connected by , say g = h1 h2 hn h1 , . . . , hn G.
It suces to show that the above group properties fail to hold for this string if they fail to hold for G.

Assumption: If hi such that hi hi = I then g g = I, for any g (, ).


Justification: We have: g g = (h1 h2 hn ) (h1 h2 hn ) Notice that since hi either
commutes or anticommutes with each h1 , h2 , . . . hn (this is a property of pauli matrices and can
be verified easily), we can exchange the position of hi1 and hi by adding a +, in case of hi
commuting with hi1 , or sign otherwise. After this, we can we can exchange hi2 and hi again
by introducing either a + or a sign. Doing this inductively we can have: g = hi h1 h2 hn .
Similarly, we can also get g = h1 h2 hn hi . Putting these expressions for g in the product
g g, we get:
g g = (h1 h2 hn hi ) (hi h1 h2 hn )

= (h1 h2 hn ) (hi hi ) (h1 h2 hn )

(11.3)

Y
P
O
C
T
F
A

Now suppose hi hi = h, we need to show that the only solution to g g = I is h = I. Notice that
h (, ) since it is generated by the elements of G. Hence h would commute or anticommute with
every element of (, ). We can now apply a similar technique on (eq. 11.3) to get:
g g = h (h1 h2 hn ) (h1 h2 hn )

The same exercise of getting hi out (as performed in (eq. 11.3)) can be applied on every hj
in the expression for g, starting with h1 . Also, without loss of generality we may assume that
hj hj = I j = i. Hence we will have:

DR

g g = h (h2 h3 hn ) (h1 h1 ) (h2 hn )


= h (h3 h4 hn ) (h1 h1 ) (h2 hn )

and so on ... untill we finally get:

g g = h

Note that if we demand g g = I, the only solution to this equation forces h = I. That is, only if
the generators square to I, so will the elements of the group. Hence we have the justification.

11.2

Motivation for Stabilizer codes

This stabilizer subgroup concept leads to an important class of Error correcting codes.We see that errors on
a single qubit (with state | element of H2 ) are caused by operators on this field. Since we know that the
pauli matrices X , Y , z and I form a basis for the vector space containing all (2 2) matrices, all single
quibit error operators can be expressed as a linear combination of these matrices. Moreover we see that the
pauli matrices. Also these matrices form the pauli group: , under the matrix multiplication operation.
From the closure law for (, ) we see that all the single qubit errors can be seen as action (in the sense of
group action on sets) of (, ) on the field H2 (which is the qubit). Now we consider a stabilizer subgroup
of this pauli group for a particular qubit q with state | H2 , denoted as: S| (, ). By the definition
of the stabilizer subgroup, as in (eq. 3.41), we see that error operators in this subgroup show an important
characteristic: they leave the qubit invariant, or equivalently, leave its state unchanged.
| = | S| (, )
GO TO FIRST PAGE

152

(11.4)

CHAPTER 11. STABILIZER CODES

11.3. CONDITIONS ON STABILIZER SUBGROUPS

In otherwords, | is in the common eigenspace of all S| (, ) (corresponding to eigenvalue 1), which


can be denoted by CS . This eigenspace CS is called a Quantum Stabilizer Code over a qubit.

11.3

Conditions on stabilizer subgroups

11.3.1

Generating set of the stabilizer subgroup

Theorem: An [[n, k]] stabilizer code can be generated using a generating set containing n k independent
commuting generators.
Proof:
Let the generating set of the stabilizer subgroup
S (of the pauli group (, )) be denoted as . Notice that
n
an [[n, k]] stabilizer code isn a subspace of C2 having 2k basis vectors. To generate this code, we need to map
the 2n basis vectors of C2 to the 2k vectors basis vectors of CS . This mapping is done using the stabilizers
genetated by .
We now provide a proof to the above theorem by constructing the codespace CS of the maximum possible
dimension, each time adding one more element to . We stop our construction when the dimension CS
reached 2k . To start with there is only one generator (which by requirement of the group structure must be)
I and the stabilizer subgroup S1 has one element: In . This element can map each of the 2n basis vectors to
themselves. Hence the codespace formed is 2n dimensional.
1
Assumption: Each new addition to reduces the dimensionality of CS by a factor of .
2
Justification: We now provide a justification by induction on the number of generators being added:

DR

Y
P
O
C
T
F
A

Base case: Suppose contains I and we add one


to it, say 1i . The
more elementm
stabilizer subgroup

(nm)
i
S2 would now consist of operators of the form: I
1 0 m n . Notice that all these
n

elements have the 2n basis vectors (in C2 ) as their eigenvectors but only 2n1 of them correspond to
eigenvalue +1. These are elements of S2 for which m is even. Hence after this addition of an generator,
we see that |CS2 | = 2n1 .

Induction step: After q steps: CSq = 2nq . Now it suces to show that after q + 1 steps, CSq+1 =
2nq1 . Now after q steps, suppose S = {Si } (each Si is a stabilizer containing
n tensor factors).
Let ad

m
(nm)
i
i
dition to the generating set on the (q+1)th step be q+1
. Hence Sq+1 = Si
q+1
0 m n, Si Sq .
Notice the all Si Sq have eigenvalue +1. Hence operators with eigenvalue 1 in Sq+1 can only come
m
i
from q+1
, in other words, only from odd values of m. But we know that exactly half of the values
for m are odd. As a result, only half of the set Sq would be used in constructing the valid stabilizers
for Sq+1 . Hence the stabilizer
CSq+1 would have only half as many elements as Sq . As Sq has 2nq

code
elements, we see that CSq+1 = 2nq1 , thereby justifying our assumption.

As an immediate consequence of the above


we see that if we add n k generators, then the
assumption,

resulting codespace would be such that: CSnk = 2k . We now see that n k independent commuting4
generators are needed for a [[n, k]] stabilizer code, thereby proving the theorem.
4

use of the fact thato the generators all commute when we assume that Sq+1
=
m
m
(nm)
i
q+1
0 m n, Si Sq for any q. That is, we consider all permutations of Si
to represent the same element because each of them can be rearranged (since they commute) to obtain the other. The
independence of the generating set is used when we assume that a stabilizer subgroup can be constructed by adding another
generator.
n

We

make

(nm)

Si

i
q+1

GO TO FIRST PAGE

153

11.4. ERROR CORRECTION FOR STABILIZER CODES

11.3.2

CHAPTER 11. STABILIZER CODES

Structure of the stabilizer subgroup

We say that a code is trivial5 if it has only the null vector. In general we want code spaces that are non-trivial.
In order to have a non-trivial stabilizer code corresponding to this subgroup, we see that there must be at
least one vector |, other than the null vector, that is left invariant by all the elements in S. Hence the
subspace common to the eigenspaces of elements in S must have more than one vector. From section (sec.
3.41) we see that the elements of S must commute with each other. Hence gi , gj S : [gi , gj ] = 0. This
means that S is an abelian group. Also by the property of a general Stabilizer group, we see that I
/ S
since if I S, then from (eq. 11.4) we have I| = |, whose only solution for | is the null vector,
implying that stabilizer code is trivial. Hence we have the following assumption:

Theorem: Any stabilizer subgroup of the pauli group corresponding to a non-trivial stabilizer code, is abelian.

Proof: Let M, N be two operators in the stabilizer subgroup S of the pauli group (, ) and CS be the
stabilizer code corresponding to S. As M and N are arbitrary, it suces to show that [M, N ] = 0. We
now provide a justification by showing that [M, N ] = 0 (which implies, from statement (stat. 1) of section
(sec. 11.1), that {M, N } = 0) leads to a contradiction to the fact that CS is non-trivial. Since M and N
are elements of a group, by closure law, so is M N and N M . As elements of a stabilizer subgroup (from
(eq. 11.4)) they follow: | = M N |. Since M and N anticommute, M N = N M . Hence we see
that:

Y
P
O
C
T
F
A
| = M N | = N M | = |

which gives | = |, with the only solution to | being the null vector which in turn implies that CS
is trivial, thereby brining a contradtion. Now we see that for CS to be non-trivial, [M, N ] = 0 and hence S
must be abelian, thereby proving the theorem.

DR
11.4

Error Correction for Stabilizer codes

11.4.1

Notion of an Error in a Stabilizer code

An error in a code is represened as an operator acting on every code-vector in that code. As every error
operator is an element of the pauli group, it either commutes or anticommutes with every other error operator.
Hence, between every error operator and every element of any Stabilizer of the Pauli group, there is either
a commutation or an anticommutation relation. We have seen in (stat. 4) of (sec. 11.1) that in order to
verify the commutator relations for the elements of the pauli group, it suces to verify the same with the
generators of the pauli group. Therefore, every error operator either commutes or anticommutes with every
generator. Depending upon whether it commutes or anticommutes, we define the error syndrome of a code,
l as: l gl = E gl E where E n and gl is a generator of S.

11.4.2

Measurement on the stabilizer code

Given any operator A in (, ) it would do one of the following:


commutes with every generator: A Z (S)

anticommutes with at least one generator: A n Z (S)


5
It is in the same sense as in a trivial subgroup of a group, which is the subgroup containing just the identity. For a vector
space, the identity element is nothing but the null vector.

GO TO FIRST PAGE

154

CHAPTER 11. STABILIZER CODES

11.4. ERROR CORRECTION FOR STABILIZER CODES

in = {g1 , g2 , . . . , g, . . . gnk }. To measure an observable6 A Z (S), we


need to consider a projective
measurement. Using the spectral decomposition of A, we have: A =
mPm where the sum is over
m

all eigenvalues m, with corresponding projection operators Pm . Since any element of the pauli group has
eigenvalues 1 (and let the corresponding projectors be labelled P ), we see that A = P+ + P . The
results of the measurement are also 1. As the eigenstates of A form a complete set, we have the identity
P+ + P = I. Combining these two expressions we get the projection operators P corresponding to the
measurement results 1, in terms of A.
A = P+ P

= I 2P
IA
P =
2

I+A
and similarly, A = 2P+ I P+ =
. Hence measuring A is the projective measurement using
2
IA
P =
. We now see the probability of getting the results 1: Let | be an element of the stabilizer
2
code CS . Then we see that the probability of getting the results 1 are:
p (1) = tr (P ||)

(11.5)

Y
P
O
C
T
F
A

Since | CS , we see that g we have: g| = |. Note that untill now, we have not assumed anything
about the commuting or the anticommuting relations between A and elements of , which is summarized in
the two points at the begining of this section.
Let the consider the latter possibility first, where g such that {g, A} = 0. Calculating the measurement
outcome probabilities, we get:

DR

p (+1) = tr (P+ ||)

I+A
= tr
||
2


I + A

=
g
2


I A

= g
2


I A

=
2

Using (eq. 11.5) :

= p (1)

1
.
2
Consider the former possibility (from the list at the begining of this section): The probability of getting +1
and 1 would then be dierent. Hence one can perform this measurement on an ensemble of states and
reveal the state with a high probability (by choosing the ensemble to be very large).
Using the conservation of probability: p (+1) + p (1) = 1, we immediately conclude: p (+1) = p (1) =

Let us now consider the post mesurement states corresponding to the measurement results 1. The general
IA
post-measurement state is given by: | =
|. Notice that A fixes the state |+ :
2
I+A
|
2
A+I
A|+ =
|
2
|+ =

In the pauli group, all elements are unitary and hermitian. Hence they can be compared to physical observables.

GO TO FIRST PAGE

155

11.4. ERROR CORRECTION FOR STABILIZER CODES

CHAPTER 11. STABILIZER CODES

and moreover with an eigenvalue of +1. Also note that since A anticommutes with g, g no longer fixes the
new state after the measurement, which implies (from the defenition of the stabilizer) that g can no longer
be an element of . (Note that irrespective of whether A commutes or anticommutes with g, the elements
{g1 , g2 , . . . gnk } are still part of the the stabilizer since they commute with both g and A. As a result the
states fixed by A (and g) are fixed by the other elements of as well.) Therefore the new stabilizer code will
have P+ | as the codewords and = {g1 , g2 , . . . , A, . . . gnk } as the new generating set of the stabilizer
subgroup. Consider now the post-measurement state corresponding to the result 1 which is | . Notice
that A does fix this state, but with an eigenvalue of 1, which means that | cannot be part of the
stabilizer code and A cannot be a generator. Since A anticommutes with g, neither can A nor can g be in .
Therefore, after the measurement, we see that we have lost a codeword (upon which the measurement was
performed) and a generator (which anticommutes with this measurement) of the stabilizer subgroup. This is
certainly undesirable and we need to correct this eect of the measurement. Observe that since {g, A} = 0,
we have:
IA
g
2
IgAg
=
2
I+A
=
2
= P+

g P g = g

Y
P
O
C
T
F
A

We have seen that A clearly fixes the state P+ |, which we see from above, can be obtained by the conjugation
of g with P . Hence we see that if we get a measurement result of 1, we must perform this conjugation
action. Summarising, after the measurement of A, on the stabilizer code CS , the generating set7 of the
stabilizer code becomes = {g1 , g2 , . . . , A, . . . gnk }.

11.4.3

Error Correction condition for Stabilizer codes

DR

As in the previous subsection (sec. 11.4.2), we see that E would:


commute with every generator: E Z (S)

anticommute with at least one generator: E n Z (S)

of the Stabilizer subgroup. Notice that a stabilizer code is defined to be the common eigenspace of all the
generators (of the Stabilizer subgroup). Consider the latter case: As the error operator does not commute
with some generator, the common eigenspace corresponding to the generators as well as the error operator
is no longer the same as before (In other words, the code is aected by an error). Moreover, we get a new
eigenspace (errorroneous code), orthogonal to the old one (correct code). The key point here is that the
resulting code is orthogonal to the old code, which means that the errorneous code can be distinguished from
the original correct code. Hence the error can be detected.
Let us now take up the former case: Following the same reasoning as before, if the error operator commutes
with every element of the stabilizer group, then the errorneous code cannot be distinguished from the original
code. This includes two other situations: E S (in which case, E stabilizes the code and hence the error
has no eect on the codewords) and E
/ S but [E, Si ] = 0, Si S which implies E Z (S) S. This
case is not so obvious, and shall be dealt with in the below theorem.
More precisely8 :

Theorem: A set of errors {Ei } G is correctable i Ek Ej


/ N (S) S,

Ej , Ek G.

But we see that = . So it means that after the measurement, we no longer have the same stabilizer code.
In some places, the theorem is stated by using N (S) in place of Z (S). These two are equivalent.
Assumption: For any stabilizer subgroup S of the pauli group (, ), the normalizer and centralizer of that subgroup (defined
in section (sec. 3.1.4)) are equal, i.e; N (S) = Z (S).
7
8

GO TO FIRST PAGE

156

CHAPTER 11. STABILIZER CODES

11.5. FAULT TOLERANCE

Proof: We already have the error correction condition for quantum codes in general in (eq. 10.7). Since
the stabilizer codes are just a subset of quantum codes, it suces to show that the condition in (eq. 10.7)
is satisfied for the above case too. Let P be the projector on to the stabilizer code CS . Notice that the
non-trivial task is to prove the theorem for the case of errors in Z (S) S. For the other cases:
Ei S: As the error leaves the code invariant,

Ei P Ei = P

Ei S

(11.6)

which is equivalent to the condition in (eq. 10.7) with Ek replaced by I. Hence we have the proof.
E n Z (S): In this case Ei takes P to an orthogonal space P = Ei P Ei and hence P P = 0
which implies P Ei P Ei = 0. This can be rearranged (note: P E P = E as P (, )n and
P P = I.) to give:
Ei P Ei = 0 E n Z (S)

(11.7)

which again is equivalent to (eq. 10.7) with i,j = 0. Hence we have the proof.
Consider the final case: Errors in Z (S) S cannot be distinguished as the resulting codespaces are nonorthogonal. All such errors would result in non-orthogonal codespaces. As a result, for any two errors Ej and
Ek the projectors (Ej P Ej and Ek P Ek respectively) on the resulting codespaces would be equivalent,
upto a constant (say ). Hence Ej P Ej = Ek P Ek , Ej , Ek (, ). Noting that the elements of
the pauli group are unitary:

Y
P
O
C
T
F
A
Ej P Ej = Ek P Ek

DR

Ek Ej P Ej Ek = P

The above equation has two possible solutions:

Substituting Ek Ej = E and comparing the above equation with (eq. 11.6) we see that Ek Ej S.

Putting Ek Ej P Ej Ek we see that the equation is trivially satisfied. On substuting E = Ek Ej = E


and camparing with (eq. 11.7) we see that Ek Ej n Z (S).

Hence we see that Ek Ej (S) (n Z (S)). Now notice that n = S


(n Z (S)) (Z (S) S).
Hence we san say that Ek Ej
/ (Z (S) S) , correctable errors {Ei }. Hence we have proved the theorem.

11.5

Fault tolerance

All operators, error operators, measurments or gates are one and the same. We saw that there are certain
methods by which a code can be recovered from an error. This error recovery step in turn uses gates on the
Justification: We can define the normalizer and the centralizer of S as:
n
o
N (S) = E (, ) |E Si E S, Si S
n
o
Z (S) = E (, ) |E Si E = Si , Si S

In the conditional statement for the normalizer we have: E Si E S, Si S, which can be rewritten as: Sj S such that
E Si E = Sj , Si S. It now suces to show that Sj = Si , Si S since putting Si = Sj in the conditional statement for
the normalizer, we obtain N (S) = Z (S). From statement (stat. 1) of (sec. 11.1), we see that: ESi E = ESi E = E ESi ,
which, using the property that all elements of the pauli group are unitary (stat. 3) of (sec. 11.1), simplifies to: E Si E = Si .
Comparing this equation with the initial conditional statement for a normalizer we see that Sj = Si Si S. We are given
that Si , Sj S and from (sec. 11.3) we see that I
/ S which implies (from closure law) I Si
/ S. Hence Sj = Si , which
says that Si = Sj , Si S. Hence we have the justification.

GO TO FIRST PAGE

157

11.5. FAULT TOLERANCE

CHAPTER 11. STABILIZER CODES

codewords (which are again nothing but operations which can cause error). In general it is not desirable to
have a recovery operation that in turn causes error on the codewords. Hence we demand some restrictions on
these revovery operators or gates which are used in the process of error recovery. If we make the right demands
to ensure that these gates do not introduce any error, then along with our error correction implementation, we
can make the entire quantum computation process error-resistant, or more precisely, Fault Tolerant.

11.5.1

Unitary gates in the stabilizer formalism

Unitary gates are nothing but measurements on the state that they do not reveal any information about the
state (of the qubit being passed acorss the gate). In the earlier section, we saw that if a measrement A is
performed on a code, with a stabilizer S = {S1 , S2 , . . . , Sn }, (where {A, S1 } = 0 and[A, Si ] = 0 i {2. . . n})
then the stabilizer for the code after the measurement is performed becomes S = A S1 A, S2 , . . . , Sn . The
code is therefore changed after the measurement. Moreover the new code is orthogonal to the old one since
the unitary gate (measurement) anticommutes with one of the stabilizer generators. This can be compared
to an error in the code. But we do not expect unitary gates to induce error on the code as these gates are
used for error recovery (If the error recovery process itself is erroneous, then the entire recovery operation
has no meaning.). However, if the stabilizer remains invariant after the conjugation of S1 with A, that is
S = S, then we can conclude that the code would not be changed after the measurement. Hence we need
that condition that AS1 A S for all measrements
A that do not alter the code. Therefore more precisely,
we are looking for all measurements in the set: A|ASi A S Si S . In otherwords, from (eq. 3.32) we
are looking at all the unitary gates in the normalizer of the stabilizer subgroup in n : N (S) where n is the
number of qubits9 .

Y
P
O
C
T
F
A

Notice some important properties of H, (X) and S:

1. Defenitions: The Hadamard, CNOT and the phase gates are given by:

DR

1
H=
2

1
1

1
1

S=

1
0

0
i

(X) =

I
0

0
X

1
0
=
0
0

0
1
0
0

0
0
0
1

0
0

1
0

(11.8)

2. Action of Hadamard gate: H on single qubit operators: X and Z:

1 1 1
1 0
1 1
0 1
=
X
0 1
1 1
1 0
2 1 1

1 0 1
1 0
1 1
1 0
H : X HXH =
=
Z
0 1
1 1
0 1
2 1 0
H : Z HZH =

(11.9a)
(11.9b)

3. Action of the CNOT gate: (X) on single qubit operators10 Notice that it can reproduce the same
non-trivial action (as on the first qubit) on the second qubit.
(X) : X I X X
(X) : I Z Z Z

(11.10a)
(11.10b)

The above operations can be expressed using the below figure:


We neednt assume A N (n ) in Un (the group of all (2 2) matrices). We only know that ASi A S Si S, whereas
action of A on elements outside the stabilizer subgroup (in S) is not conditioned (and is also not relevant) by our demand
for fault tolerance and hence can be arbitrary.
10
Single qubit operators, in the sense that those which act non-trivially on only one qubit. It is not the actual single qubit
operator with only one tensor factor.
9

GO TO FIRST PAGE

158

CHAPTER 11. STABILIZER CODES

11.5. FAULT TOLERANCE

Y
P
O
C
T
F
A

Figure 11.1: Figure showing the operations of CNOT, Hadamard and Phase gates on stabilizers of qubits.
We will use each wire in the circuit to represent the stabilizer stabilizing the qubit being transmitted acorss
that particular wire.

DR

With this, we now claim that the Hadamard, Phase and the CNOT gates can be used to generate all n-qubit
operations (in the normalizer of S in n as well as in the normalizer of n in Un (the group of all 2 2
matrices).

Theorem: All n-qubit operations in the normalizer of S in n as well as in the normalizer of n in Un , can
be generated using H, S and (X) gates.

Proof:

We need to show that we can create a generating set Sn = {g1 , g2 , . . . gn } where gi {X , Z }, i {1 . . . n},
using H, S and (X). We will now give an inductive proof.
1. Base Case, n = 1: We need to show that all single qubit operators (those with only one tensor factor)
in N (S) can be expressed as a finite composition of H and S. We will make use of the fact that these
operators are in N (S), by assuming that they are also in Z (S), from the footnote at (sec. 11.4.3).
Notice that all elements (single qubit operators) of N (S) must commute with each other. This leaves
us with the only option that the generating set of N (S) can only be of the form: {I, }, where
denotes one of the pauli X, Y, Z operators. This is beacuse N (S) cannot have more that one pauli
operator (as they dont commute and hence it would imply N (S) = Z (S), thereby contradicting the
fact that S is a stabilizer subgroup.). Hence to show that H and S can generate N (S), it suces to
show that H and S can generate elements of {I, X, Y, Z}.
Assumption: Pauli operators X and Z can be generated using {H, S}.
Justification: It suces to show that the operators X and Z can be obtained from a finite combination
GO TO FIRST PAGE

159

11.5. FAULT TOLERANCE

CHAPTER 11. STABILIZER CODES

of H and S. We have:

1 1 1
1 2 0
1 1
=
I
1 1
2 1 1
2 0 2

1 0
1 0
1 0
S2 =
=
Z
0 i
0 i
0 1

H2 =

Using (eq. 11.9a) and (eq. 11.12) :

HS 2 H = Z

(11.11)
(11.12)
(11.13)

2. Base Case, n = 2: Given11 S1 = {X, Z}, we need to construct S2 = {gi gj |gi , gj {X, Z}}. Let us
evaluate this case explicitly, thereby presenting an algorithm to construct the each element of S2 =
{X X, X Z, Z X, Z Z}, starting from the elements of S1 , using H and (X). Starting with X,
we now introduce an ancilla qubit in an arbitrary state (the exact state is not relavant as its stabilizer
is taken as I). This new state12 (call it |) is trivially stabilized by X I. Passing | acorss a CNOT
gate, with the first qubit as the control and the second as the target, we get a new state (say | ) whose
stabilizer, using (eq. 11.10a) is XX. Now applying the gate H to the first qubit and leaving the second,
or equivalently, applying H I to the two-qubit state | (stabilized by X X), we get a new state

(say | = (H I) | ), whose stabilizer is given by: (H I) (X X) (H I) = (IXI) (HXH)

X Z. Similarly, applying the H gate to the second qubit of | leaving the first, we get (H I) | ,

whose stabilizer is given by: (I H) (X X) (I H) Z X. Similarly, applying H gate to both

the qubits in | , we get (H H) | , which is stabilized by: (H H) (X X) (H H) Z Z.


Hence we see that all the elements of S2 have been obtained from elements of S1 with the help of CNOT
and Hadamard operations.

Y
P
O
C
T
F
A

3. Induction step: We assume that the generating set Sn for all n-qubit operations, each being a tensor
product of the pauli matrices X, Z can be constructed from H and S using(stat. 1). With this, we
need to show that the generating set for all n + 1 qubit operations, Sn+1 can also be constructed using
H and (X). Before this we look closely into the structure of the elements in Sn and Sn+1 . Let
Sn = {s1 , s2 , . . . , s2n } and Sn+1 = {s1 , s2 , . . . , s2n+1 }.

DR

(a) As Sn consists of all distinct n-qubit operations, we see that sj Sn would have n tensor factors
j {1 . . . n}, moreover each of which would be either X or Z. Similarly, each si Sn+1 would
have n + 1 tensor factors i {1 . . . n + 1}.

(b) Every string in Sn falls under exactly one of the two categories:
all its n tensor factors are X.

at least one of its tensor factors is Z.

(c) Every string in Sn+1 can be reduced to some element in Sn by just removing exactly one tensor
factor, from some position (call this position k). Or conversely, every element of Sn+1 can be
formed by adding exactly one tensor factor to some element of sn , at position some k. Moreover,
we see that without loss of generality, we may assume k = n.
Assumption: si Sn+1 sj Sn such that: si = sj g where g {X, Z}.

Justification: Consider a particular encoding from si Sn to a string over {0, 1} where every Z
tensor factor in si is mapped to 0 and X to 1, thereby mapping si to a binary string bi . As every
element in Sn is distinct and Sn has all operators with n tensor factors (of length X and Z), we
see that the image of Sn under this bijection consists of all binary strings of length n. Let this set
be Bn . As #Sn = #Bn (= 2n ), we see that Sn is isomorphic to Bn . It now suces to show that
every element bi Bn+1 can be obtained by appending 0 or 1 to some bj Bn . To show this, we
use a systematic procedure to generate strings in Bl for any l, using a binary tree of height l. Each
11
X and Z can generate all possible one qubit operations (upto an arbitrary phase) as they are the generators of the pauli
group over 1 qubit.
12
In the quantum circuit picture, this is equivalent to incresing the number of wires by introducing a new empty wire.

GO TO FIRST PAGE

160

CHAPTER 11. STABILIZER CODES

11.5. FAULT TOLERANCE

left edge corresponds to 0 and the right to 1 and each node contains the binary string with letters
corresponding to each edge from the root to this node. Hence, at height n, each of the 2n leaves
contain a (distinct, because each string corresponds to a path from the root to a particular node,
which is dierent for dierent nodes) dierent binary sequence of length n (since the height of the
tree = # edges in a path from the root to a leaf). As expected, we get all the 2n binary sequences
(of length n) in Bn . Similarly, using a tree of height n + 1, we would get all sequences in Bn+1 .
It is clearly evident (also from the figure given below) that every string in Bn+1 corresponds to a
leaf of the n + 1-height binary tree, each of which is obtained by drawing an edge from every leaf
of the n-height tree, which corresponds to adding the string 0 or 1 to every string in Bn .

DR

Y
P
O
C
T
F
A

Figure 11.2: Figure showing the construction of binary sequences of length (n + 1), represented by
c1 , c2 , . . . c2n+1 (Or equivalently, an n-qubit operator) by appending 0 or 1 to binary sequences of length
(n + 1), represented by b1 , b2 , . . . b2n (analogous to adding an X or a Z tensor factor to an n-qubit operation.).

Hence we see that every string in Bn+1 can be obtained by appending 0 or 1 to some string in Bn ,
thereby justifying our assumption.

We now need to show that sj X or sj Z can be constructed from sj , j {1 . . . n}. Consider


the former case in (stat. 3b) from above. To apppend an X tensor factor, we need to first introduce
an ancilla qubit: | (shown as a red line in the figure below), in some state with stabilizer I. Then
perform a CNOT operation with nth qubit as the control and | as the target. This new 2-qubit state
output from CNOT, using (eq. 11.10a), is stabilized by X X. Hence, the n-qubit state stabilized by
sj = X X, is changed to (n + 1)-qubit state bit introduction of an ancilla which is stabilized by
X X I and finally using (X) operation on the nth and (n + 1)th qubits, we get the final state
stabilized by X X X which is sj X. To append an Z tensor factor, we apply the H gate on
the (n + 1)th qubit and using (eq. 11.9b), we get that the final state is stabilized by X X Z
which is sj Z.
GO TO FIRST PAGE

161

11.5. FAULT TOLERANCE

CHAPTER 11. STABILIZER CODES

Figure 11.3: Figure showing the process of creating an (n + 1)-qubit stabilizer from an n-qubit stabilizer
along with an alcilla stabilizer I, using CNOT and hadamard gates, for the case where the particular n-qubit
stabilizer consists of all X tensor factors.
Hence we have shown that both sj X and sj Z can be constructed using H and (X) gates, for
all si in the form described by the former case of (stat. 3b).

Y
P
O
C
T
F
A

We now consider the latter case of (stat. 3b) where we are given the existence of atleast one Z tensor
factor, in the operator sj Sn . We use this Z along with I (stabilizer for the ancilla qubit) and (eq.
11.10b), to convert the two qubit state stabilized by I Z to one stabilized by Z Z. For this, we
introduce an ancilla state | (or an empty wire in the quantum circuit) which is given as a control
qubit to a CNOT gate, the target being the state stabilized by Z. From the action of CN OT gate,
we see that this gives a state that is stabilized by Z Z. Hence we have appended a Z tensor factor,
thereby creating sj Z. We now need to append X tensor factor to sj . For this, we pass the output of
the control qubit of the CNOT gate (stabilized by Z) across a H gate. Using (eq. 11.9a), we see that
the output state would be stabilized by X, thereby making the final (added) tensor factor as X. Thus,
we have the new stabilizer generator sj X.

DR

Figure 11.4: Figure showing the process of creating an (n + 1)-qubit stabilizer from an n-qubit stabilizer
along with an alcilla stabilizer I, using CNOT and hadamard gates, for the case where the particular n-qubit
stabilizer consists of at least one Z tensor factor.
Hence we have shown that both X and Z tensor factors can be appended to sj Sn , sj with structure
as described by the latter case of (stat. 3b).
GO TO FIRST PAGE

162

CHAPTER 11. STABILIZER CODES

11.5. FAULT TOLERANCE

Hence we have shown that X and Z tensor factors can be appended to any element in Sn and thereby
obtaining all elements in Sn+1 . Moreover, this can be done using only Hadamard and CNOT gates. Hence
we see that the generating set Sn can be constructed using H, S and (X), for any n, thereby proving the
theorem.

DR

Y
P
O
C
T
F
A

GO TO FIRST PAGE

163

11.5. FAULT TOLERANCE

DR

CHAPTER 11. STABILIZER CODES

Y
P
O
C
T
F
A

GO TO FIRST PAGE

164

Part VII

Y
P
O
C
T
F
A
References

DR

165

DR

Y
P
O
C
T
F
A

Bibliography
[1] V. Arvind, K.R. Parthasarathy, A Family of Quantum Stabilizer Codes Based on the Weyl Commutation
Relations over a Finite Field http://arxiv.org/abs/quant-ph/0206174v1
[2] Upper Bounds on the Size of Quantum Codes, Alexei Ashikhmin and Simon Litsyn, IEEE Transactions on Information Theory, Issue Date: May 1999, Volume: 45, Issue:4, Digital Object Identifier: 10.1109/18.761270, http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=
761270, Also available at http://www.physics.princeton.edu/~mcdonald/examples/QM/ashikhmin_
ieeetit_45_1206_99.pdf

Y
P
O
C
T
F
A

[3] Daniel A. Lidar, (University of Toronto), K. Birgitta Whaley, (University of California, Berkeley),
Decoherence-Free Subspaces and Subsystems http://arxiv.org/abs/quant-ph/0301032v1

[4] David P. DiVincenzo (IBM), Peter W. Shor (AT & T), Fault-Tolerant Error Correction with Ecient
Quantum Codes http://lanl.arxiv.org/abs/quant-ph/9605031v2

DR

[5] Peter W. Shor (AT & T Research), Fault-tolerant quantum computation http://arxiv.org/abs/
quant-ph/9605011v2

[6] John Preskill, Fault-tolerant quantum computation http://lanl.arxiv.org/abs/quant-ph/9712048v1


[7] Daniel Gottesman, A Theory of Fault-Tolerant Quantum Computation, http://lanl.arxiv.org/abs/
quant-ph/9702029v2
[8] Stabilizer Codes and Quantum Error Correction, (PhD. Thesis submitted by) Daniel Gottesman, California Institute of Technology, Pasadena, California, (Submitted: May 21, 1997), http://arxiv.org/
pdf/quant-ph/9705052v1
[9] Web Article: Bounds on the parameters of a code, http://www.usna.edu/Users/math/wdj/book/
node125

[10] Lectures Notes for: Chapter 7: Quantum Error Correction, Prof. John Preskill (Physics 219/Computer
Science 219 Quantum Computation (Formerly Physics 229)), http://www.theory.caltech.edu/
people/preskill/ph229/notes/chap7.pdf
[11] Group representation theory and quantum physics,
Physics093_2009/Lectures/GroupTheoryC0.pdf

http://chaos.swarthmore.edu/courses/

[12] Lectures Notes: Lecture 14: Error-Correcting Codes, Prof. Salil Vadhan, (Based on scribe notes by
Sasha Schwartz and Adi Akavia, April 3, 2007.) http://people.seas.harvard.edu/~salil/cs225/
lecnotes/lec14.pdf
167

BIBLIOGRAPHY

BIBLIOGRAPHY

[13] Lecture Notes: Lecture 1: Introduction, Error Correcting Codes: Combinatorics, Algorithms and Applications (Fall 2007), August 27, 2007, Lecturer: Atri Rudra Scribe: Atri Rudra, http://www.cse.
buffalo.edu/~atri/courses/coding-theory/lectures/lect1.pdf
[14] Lecture Notes: Lecture 2: Error Correction and Channel Noise, Error Correcting Codes: Combinatorics,
Algorithms and Applications (Fall 2007), August 29, 2007, Lecturer: Atri Rudra Scribe: Yang Wang &
Atri Rudra, http://www.cse.buffalo.edu/~atri/courses/coding-theory/lectures/lect2.pdf
[15] Lecture Notes: Lecture 3: Error Correction and Distance, Error Correcting Codes: Combinatorics,
Algorithms and Applications (Fall 2007), August 31, 2007, Lecturer: Atri Rudra Scribe: Michael Pfetsch
& Atri Rudra, http://www.cse.buffalo.edu/~atri/courses/coding-theory/lectures/lect3.pdf
[16] Lecture Notes: Notes 2: Gilbert-Varshamov bound, Introduction to Coding Theory CMU: Spring 2010,
January 2010, Lecturer: Venkatesan Guruswami, Scribe: Venkatesan Guruswami, http://www.cs.cmu.
edu/~venkatg/teaching/codingtheory/notes/notes2.pdf
[17] Lecture Notes: Notes 4: Elementary bounds on codes, Introduction to Coding Theory CMU: Spring
2010, January 2010, Lecturer: Venkatesan Guruswami, Scribe: Venkatesan Guruswami, http://www.cs.
cmu.edu/~venkatg/teaching/codingtheory/notes/notes4.pdf

Y
P
O
C
T
F
A

[18] Lecture Notes: The Orbit-Stabilizer Theorem Rahbar Virk, Department of Mathematics, University of
Wisconsin, Madison, WI 53706 virk@math.wisc.edu http://www.math.wisc.edu/~virk/notes/pre08/
pdf/orbit-stabilizer_thm.pdf

[19] A. M. Steane, Error Correcting Codes in Quantum Theory, Clarendon Laboratory, Parks Road, Oxford,
OX1 3PU, England Physical Review Letters, Volume 77 29 July 1996 Number 5, (Received 4 October
1995) http://prl.aps.org/abstract/PRL/v77/i5/p793_1

DR

[20] BOOK: Information, physics, and computation (Oxford graduate texts), Marc Mezard and Andrea
Montanari, Oxford University Press, 2009 - 569 pages, ISBN : 019857083X, 9780198570837 http:
//www.stanford.edu/~montanar/BOOK/book.html
[21] Web Resource: Group Theory definitions and concepts from Wikipedia Wikipedia List of Group Theory
Topics
[22] Web Resource: The Lindblad Master Equation, Andrew Fisher, Department of Physics and Astronomy,
University College London, http://www.cmmp.ucl.ac.uk/~ajf/course_notes/node36.html
[23] Presentation on Shor Code, Marek Andrzej Perkowski, ECE 510 - Quantum Computing, School of Engineering and Computer Science, Portland State University, Course Schedule, Spring 2005, www.ee.pdx.
edu/~mperkows/CLASS_FUTURE/2005-quantum/2005-q-0018.error-models-9-bit-Shor.ppt
[24] BOOK: Quantum computation and quantum information, Michael A. Nielsen, Isaac L. Chuang, Cambridge
University Press, 2000 - Science - 676 pages, http://books.google.com/books?id=65FqEKQOfP8C&dq=
nielsen+and+chuang&source=gbs_navlinks_s
[25] BOOK: Information theory, Robert B. Ash, Courier Dover Publications, 1990 - Computers 339 pages, http://books.google.com/books?id=ngZhvUfF0UIC&dq=Information+Theory+Robert+B+
Ash&source=gbs_navlinks_s
GO TO FIRST PAGE

168

BIBLIOGRAPHY

BIBLIOGRAPHY

[26] BOOK: Coding theorems of classical and quantum information theory, K. R. Parthasarathy, Hindustan Book Agency, 2007 - 158 pages, http://books.google.com/books?id=miu8PAAACAAJ&dq=
Coding+Theorems+Of+Quantum+and+Classical&hl=en&ei=zQNHTIveJ4KmvQP_m5XNAg&sa=X&oi=
book_result&ct=result&resnum=8&ved=0CEwQ6AEwBw
[27] BOOK: Quantum computing: from linear algebra to physical realizations, Mikio Nakahara, Tetsuo Ohmi,
CRC Press, 2008 - Computers - 421 pages, http://books.google.com/books?id=VdHJsTdoyAMC&dq=
Quantum+Computing+Nakahara&source=gbs_navlinks_s
[28] Collected Work: (1989) Fundamental Theories of Physics, Gravitation, Gauge Theories and The Early
Universe B.R Iyer, N.Mukunda and C.V Vishveshwara
Table 11.1: List of References
S/No
1
2
3
4
5
6
7
8
9
10
11
12
13
14

Text
Information Theory
Elements of Information Theory
Quantum Computing
Introduction to Coding Theory
Introduction To Quantum Mechanics
Modern Quantum Mechanics
Fundamentals of Quantum Mechanics
Quantum Computation and Quantum Information
Quantum Computing
Preskills Lecture Notes
Nilanjana Duttas Lecture Notes
Introduction to the Theory of Computation
Automata Theory
Automata and Languages

DR

Author
Robert B. Ash
Cover and Thomas
Jozef Gruska
Van Lint
David J Griths
Jun John Sakhurai
R.Shankar
Neilson and Chuang
Mikio Nakahara and Ohmi
John Preskill
Nilanjana Dutta
Michel Sipser
Dexter C Kozen
Hopcroft and Uhlmann

Other Details
Dover Publications
McGraw Hill
McGraw Hill

Y
P
O
C
T
F
A

GO TO FIRST PAGE

169

BIBLIOGRAPHY

DR

BIBLIOGRAPHY

Y
P
O
C
T
F
A

GO TO FIRST PAGE

170

Appendix A

Solutions to Exercises
Exercises in Chapter-2,
Book:Quantum computation and quantum information by Michael A. Nielsen,
Isaac L. Chuang
2.13 Given |v, |w two vectors. To show: (|wv|) = |vw|.

Y
P
O
C
T
F
A
LHS :

Hence, proved.

2.14 To show:
ai Ai
=
ai Ai

DR

LHS :

Hence, proved.

2.15 To show A = A.

Hence, proved.

(|wv|) = (v|) (|w) (|vw).

ai Ai

(ai Ai )

ai Ai .

a|A = A|a

a|A = (A|a)

A (a|) = a|A

A |a = a|A

(A.1)

(A.2)

On comparing equations (eq. A.1) and (eq. A.2):



A|a = A |a

Therefore, we make the operator relation: A = A

2.16 To show that any projector operator P satisfies the relation P2 = P . This can be obtained from the
generalized form Pn = P proved by considering Pk = |ek ek |. So,
n

(Pk ) = |ek ek |ek ek |ek ek | . . . |ek ek |.


171

APPENDIX A. SOLUTIONS TO EXERCISES


Now, we can simplify this statement by taking inner product. We use the fact that |ek ek | = 1. Only
the terminal terms |ek and ek | will remain after all the others become 1 through formation of inner
product. the RHS will be Pk . Hence, proved.
2.17 To Show that a Normal matrix is Hermitian if and only if it has real eigen values .
Let A be a hermitian operator.
A|a = a|a
a|A|a = aa|a

(A.3)

a|A |a = a a|a

(A.4)

a|(A A)|a = (a a)a|a

(A.5)

Also: a|A = a|a

subtracting equation (eq. A.3) from equation (eq. A.4):

Since A = A

a = a, and therefore, a is real.

(A.6)

Hence, proved.
2.18 To show that all the eigen values of a unitary operator are of the form ei for some .
Let U be an unitary operator with eigen value a and eigen vector a|.

Y
P
O
C
T
F
A
U|a = a|a

(A.7)

a|U = a|a

(A.8)

On taking an inner product of equation (eq. A.7) with equation (eq. A.8), we get:
a|U U|a = |a|2 a|a

DR

Since, U is unitary: U U = I. Also eigen vectors are orthonormal.


1 = |a|2 a ei

Hence, proved.

2.19 To show that the Pauli matrices are unitary and Hermitian .

We have: X = (|+| + |+|)

X = (|+| + |+) Since X = X, X is Hermitian.

X X = |+|+| + |+|+| + |+|+| + |+|+|


X X = |++| + |++| I. Therefore, X is Unitary.
X = (|+| |+|) Z = (|| |++|)

2.20 Given |vi and |wi are two orthonormal basis. A and A are two matrix representations of the operator
A in the two basis respectively. Aij = vi |A|vj , Aij = wi |A|wj . To find: the relation between A
and A .
Define a unitary operator U such that:
wi | = vi |U

Aij

and

U|vi = |wi and

|wj = U |vj

= wi |A|wj vi |U AU|vj

if A U AU, then A A
A = U AU.

GO TO FIRST PAGE

172

U|vj = |wj

APPENDIX A. SOLUTIONS TO EXERCISES


2.22 To show that the eigen vectors of a hermitian operator, with dierent eigen values, are orthonormal .
Let A be an hermitian operator with eigen values a and a respectively .
A|a = a |a

a |A|a = a a |a

(A.9)

A|a = a |a

a |A|a = a a |a

a |A |a = (a ) a |a

(A.10)

a |(A A )|a = (a a )a |a

(A.11)

on subtracting the equation (eq. A.9) from equation (eq. A.10):


from equation (eq. A.11), we see that either a a = 0 or a |a = 0 .
Since it is given that a and a are dierent, we have a |a = 0 .

Hence, proved .
2.23 To show that all the eigen values of the Projector operator are either 1 or 0 .
Let P be a projection operator with eigenvector |a and eigenvalue a.
P|a = a|a

(A.12)

P |a = a |a

(A.13)

Y
P
O
C
T
F
A
2

on subtracting: (eq. A.13) - (eq. A.12)

since,

DR

(P2 P)|a = (a2 a)|a

P2 = P, RHS = 0. Since |a =
0 we have:

(a2 a) = 0 a(a 1) = 0.

a=0

or

a = 1.

(A.14)

Hence, proved .

2.24 To show that a positive operator is always Hermitian .

A|a = a|a

a|A|a = aa|a

(A.15)

also, (a|A|a) = a a|a

a|A |a = a a|a

(A.16)

since A is a positive operator, a = a .

a|A |a = a|A|a = a|A|a

A = A A is Hermitian .
2.25 To show that for any operator A, AA is positive .
Let A be some operator with eigen value a and eigen vector |a .
A|a = a|a

a|A = a|a

taking an inner product of equation (eq. A.18) with equation (eq. A.18)
a|A A|a = |a|2 a|a (A A) is a +ve operator .
Hence, proved .
GO TO FIRST PAGE

173

(A.17)
(A.18)

APPENDIX A. SOLUTIONS TO EXERCISES


2.26 Given | =
form .

|0 + |1

. To find |2 and |3 , both in terms of Kronecker product and outer product


2

Outer product notation:


|0 + |1

2
|0
+ |1 |0 + |1

|2 = | |

2
2
|00 + |01 + |10 + |11
2
| =
2
|0
+
|1
|0 + |1 |0 + |1

Similarly: |3 =


2
2
2
|00
+
|01
+
|10
+
|11
|0
+
|1
|2 |

2
2
|000 + |001 + |010 + |011 + |100 + |101 + |110 + |111

=
2 2
| =

Y
P
O
C
T
F
A
|3

DR

(A.19)

(A.21)

Kronecker product form:

0
1
|0 =
|1 =
1
0

1
1
0
1
| =
+

1
0
2
2

1
1
1
1
|2 =

1
1
2
2
Similarly: |3 = |2 |

|3

GO TO FIRST PAGE

(A.20)

1
1

1
1

1
1

1
1
1
1

1
1
2
2
2 2

174

1
1
1
1
1
1

(A.22)

(A.23)

APPENDIX A. SOLUTIONS TO EXERCISES


2.27 Given Pauli matrices, compute (a).X Z (b).I X (c).X I .

X=

0
1

1
0

0
i

1
0

i
0

Z=

1
0

0
1

I=

1
0

0
1

(A.24)

Y
P
O
C
T
F
A
(a). X Z =

DR

Y=

Similarly Z X =

GO TO FIRST PAGE

0
1

1
0

0
1

1
0

0
1

0
1

0
0

1
0

0
0
0
1

0
1
0
0

1
0
0
0

The tensor product is not commutative.

0 1 0

1 0 0
1 0
0 1

(b). I X =

0 1
1 0
0 0 0
0 0 1

175

1 0
0 1

0 0
0 0
0
0
0
1

0
0

1
0

0
0

1
0

(A.25)

(A.26)

(A.27)

APPENDIX A. SOLUTIONS TO EXERCISES


2.28 To show that (a). (A B) = A + B (b). (A B)T = AT BT (c). (A B) = A B .

DR

A12 A13 . . . A1n

A22 A23 . . . A2n

A32 A33 . . . A3n

.
.
. . .
.
Let A =

and B be any matrix.

.
.
.
.
.
.

.
.
. . .
.
Am2 Am3 . . . Amn

A11 B A12 B A13 B . . . A1n B


A21 B A22 B A23 B . . . A2n B

A31 B A32 B A33 B . . . A3n B

.
.
.
. . .
.
AB=

.
.
.
.
.
.
.

.
.
.
. . .
.
Am1 B Am2 B Am3 B . . . Amn B

A11 B
A12 B A13 B . . . A1n B
A21 B A22 B A23 B . . . A2n B

A31 B
A32 B A33 B . . . A3n B

.
.
.
. . .
.
(a). (A B) =

.
.
.
.
.
.
.

.
.
.
. . .
.
Am1 B Am2 B Am3 B . . . Amn B

A11 A12 A13 . . . A1n


A21 A22 A23 . . . A2n

A31 A32 A33 . . . A3n

.
.
. . .
.
But RHS =
B
.

.
.
.
. . .
.

.
.
.
. . .
.
Am1 Am2 Am3 . . . Amn
A11
A21
A31
.
.
.
Am1

Y
P
O
C
T
F
A

T
(b). (A B) =

RHS =

RHS =

A11 B
A21 B
A31 B
.
.
.
Am1 B

A11
A12

A13

.
A1n

A B

A12 B A13 B . . . A1n B


A22 B A23 B . . . A2n B
A32 B A33 B . . . A3n B
.
.
. . .
.
.
.
. . .
.
.
.
. . .
.
Am2 B Am3 B . . . Amn B

A21 A31 . . . Am1


A22 A32 . . . Am2

A23 A33 . . . Am3


T
.
.
. . .
.
B

.
.
. . .
.

.
.
. . .
.
A2n A3n . . . Amn

1
2.33 To show that the hadamard operator which has the form H = [(|0 + |1)0| + (0 |1)1|] for a
2
GO TO FIRST PAGE

176

APPENDIX A. SOLUTIONS TO EXERCISES


1
single qubit system, can be generalized for the n-qubit system as: Hn =
(1)x.y |xy| .
2n x,y
We try to prove this result by induction:
1
Hn =
(1)x.y |xy|
2n x,y

1
Hn = (|00| + |01| + |10| + |11|)
2
1
k
For n = k : H =
(1)xy |xy|
2k x,y
1
To prove: Hk+1 =
(1)xy |xy|
2k+1 x,y

for n = 1 :

To prove the above statement for the (k + 1) case, we can consider the n = k case, and multiply H on
both sides to that equation. For this we may take the definition of H from the n = 1 case.

1
1
k
xy

H H=
(1) |xy|
(|00| + |01| + |10| + |11|)
2
2k x,y

Y
P
O
C
T
F
A

On expanding the RHS of the above, equation we have:

|00|
(1)xy |xy| + |01|
(1)xy |xy| + |10|
(1)xy |xy| + |11|
(1)xy |xy|
x,y

(1)

x,y

DR

xy

|00|xy| +

x,y

x,y

(1)

xy

|01|xy| +

x,y

(1)

xy

x,y

|10|xy| +

x,y

x,y

(1)xy |11|xy|

In all of the four terms of the RHS, we see that the inner product forces certain definite values for x
and y. In the first term, we see that 0|x forces the value of x to be 0 if the product must be non
vanishing. So, putting x = 0 in the expression (1)xy , we get (1)0y = 1 . Similarly, we can see the
second term. Here x is forced to be 1. On summing up, we get the following expression for the RHS:

|0y| +
(1)y |0y| +
|1y| +
(1)y |1y|
2k+1 y
y
y
y
1
xy
RHS :
(1) |xy|
2k+1 x y
Hk H Hk+1

Hk+1 =
(1)xy |xy|

LHS :

x,y

Hence, proved .

4 3
.
3 4
Let M be the given matrix. Let the eigenvalues be 1 and 2 and the corresponding eigenvectors be
|1 and |2 respectively. In order to perform any operation on the matrix, we must first write the
operator as a linear superposition of its projection operators. In order to find the projection operators,
we need to determine the eigenvectors. We can start by solving the secular equation.

2.34 To find the logarithm and the square root of the matrix:

det|M I| = 0 ((4 )2 9) = 0 (4 3)(4 + + 3) = 0


the solutions of are the dierent eigenvalues:

GO TO FIRST PAGE

177

1 = 1, 2 = 7

APPENDIX A. SOLUTIONS TO EXERCISES


Eigenvectors:
To find |1 :

4
3

3
4

x
y

=1

x
y

3x + 4y = y x = y |1 =

4 3
x
x
To find |2 :
=7
3 4
y
y

4x + 3y = x and

4x + 3y = 7x and

x=y

3x + 4y = 7y

|2 =

1
1

1
1

Now that we have found the eigenvectors, we can construct the projection operators and express M as
an outer product:

1
1 1
1 1
|1 1 | =
1
1 1

1
1 1
1 1
Similarly, |2 2 | =
1
1 1
From the completeness relation for M, we have M = 1 |1 1 | + 2 |2 2 |.

1 1
1 1
M=1
+7
1 1
1 1

1 1
1 1
1 1
1 1
M= 1
+ 7
1
+ 7
1 1
1 1
1 1
1 1

1 1
1 1
1 1
Also: log M = log 1
+ log 7
log 7
1 1
1 1
1 1

DR

Y
P
O
C
T
F
A

2.35 Given:
v is some real three dimensional unit vector and , a real number. To show that: ei v =
3

(cos )I + i(sin )
v , where
v =
vi i . Here i s denote the pauli matrices.
i=1

v3
v1 iv2

v = v1 1 + v2 2 + v3 3
v1 + iv2
v3

v3
v1 iv2
v3
v1 iv2

2
( v ) =
v1 + iv2
v3
v1 + iv2
v3

2
2
v3 (v1 iv2 ) + (v1 iv2 )(v3 )
v1 + v2 + v32

v12 + v22 + v32


(v1 + iv2 )v3 v3 (v1 + iv2 )

Since,
v is an unit vector, v12 + v22 + v32 = 1 .

(
v )2 =

0
1

1
0

n
n

(
v )n = (
v )2 2 [I] 2 I where n {2, 4, 6, . . . }

From the series expansion for the exponential, we have that:


2
3
4

i
v (
v )
i (
v )
(
v )
0

ei v = (
v ) +

+
+ ......
1!
2!
3!
4!

i
v

i (
v )

From equation (eq. A.28): ei v = I +


I
+ I + ......
1!
2!
3!
4!

On rearranging the terms: ei v = 1 +


I + i
v + ......
2! 4!
3!

i
v
e
= (cos ) I + i v (sin )
GO TO FIRST PAGE

178

(A.28)

APPENDIX A. SOLUTIONS TO EXERCISES


Hence, proved.
2.36 To show that all pauli matrices except I have trace zero.

0 1
X=
tr(X) = 0 + 0 = 0
1 0

0 i
Y=
tr(Y) = 0 + 0 = 0
i 0

1 0
Z=
tr(Z) = 1 + 1 = 0
0 1

1 0
I=
tr(I) = 1 + 1 = 2
0 1

Therefore, we see that the pauli matrices X, Y and Z have trace zero and I has trace two.

2.37 To show the cyclic property of the trace . That is, tr(AB) = tr(BA) .

Trace is defined as: tr(A) =


i|A|i

(A.29)

tr(AB) = i|AB|i

Inserting a complete set of states: tr(AB) =

i|A
|jj| B|i

Y
P
O
C
T
F
A
tr(AB) =

(i|A|j)(j|B|i)

The two entities inside the summation, on the RHS are numbers, we can interchange them:

tr(AB) =
(j|B|i)(i|A|j)

DR

tr(AB) =

(j|A|i)(i|A|j)

Using the completeness relation to sum over i: tr(AB) =

tr(AB) = BA

Hence, proved . Note: While computing trace, we have: tr(A) =

j|BA|j

i|A|i where i is just some

dummy index. It may as well be replaced by some other j, still preserving the value of trace. Also,
the trace is independent of the basis used to express the matrix.
2.38 To show the linearity of trace, i,e; tr(A + B) = tr(A) + B and the the property tr(zA) = ztr(A)
We take the definition of trace from equation (eq. A.29):

tr(A + B) =
i|(A + B)|i
tr(A + B) =

i|A|i +

i|B|i

tr(A + B) = tr(A) + tr(B)


Hence, proved. The other statement can also be proved:
again, from definition (eq. A.29): tr(zA) =

Since z is a number, it can be taken out of the summation:

GO TO FIRST PAGE

179

i|zA|i
tr(A + B) = ztr(A)

APPENDIX A. SOLUTIONS TO EXERCISES


2.39 To verify The Hilbert-Schmidt inner product on operators. The set of linear operators on a
hilbert space also forms a vector space over the field of complex numbers, as it satisfies the criterion of
closure under operator addition and under scalar multiplication. The inner product of the elements
of such a vector space is called a Hilbert-Schmidt inner product.
To show show that (, ) defined by (A, B) = tr(A B) is an inner product.
2.40 To verify the commutation relations for the pauli matrices.
We have the pauli matrices defined as in


0 1
0 i
[X, Y] = XY YX =

1 0
i 0


0 i
1 0
[Y, Z] = YZ ZY =

i 0
0 1


1 0
0 1
[Z, X] = ZX XZ =

0 1
1 0

equation (eq. A.24). So, we have:

0 i
0 1
2i
0
=
2iZ
i 0
1 0
0 2i

1 0
0 i
0 2i
=
2iX
0 1
i 0
2i 0

0 1
1 0
0 2
=
2iY
1 0
0 1
2 0

Hence, proved .
2.41 To verify the anti-commutation relations for the pauli matrices.
We take the definition of the pauli matrices from equation (eq. A.24).

Y
P
O
C
T
F
A

The anti-commutator is defined as: {A, B} = AB + BA:


0 1
0 i
0 i
0 1
i 0

+
{X, Y} =
+
1 0
i 0
i 0
1 0
0 i


0 i
1 0
1 0
0 i
0 i
{Y, Z} =

+
i 0
0 1
0 1
i 0
i 0


1 0
0 1
0 1
1 0
0 1
{Z, X} =
+

+
0 1
1 0
1 0
0 1
1 0

DR

i 0
0 i
0
i
0
1

i
0
0

1
0
0

We can also verify that i2 = I.

[A, B] + {A, B}
2.42 To verify that: AB =
2
Taking the RHS, and using the definition of the commutator and the anticommutator, we have:
RHS :

[A, B] + {A, B}
AB BA + AB + BA

AB
2
2
LHS = RHS

Hence, verified.
2.43 To show that i j = ij I + i

ijk i ., where i , j and k are the pauli matrices.

i=1

We can make use of the statement of the previous problem. i j =


Since

[i , j ] = 2i

ijk k

i=1

and {i , j } = ij I.

On plugging in the definitions, we get:


3
3

ij I + 2i i=1 ijk k
1
i j =
ij I + i
ijk k
2
2
i=1
Hence, proved .
GO TO FIRST PAGE

180

[i , j ] + {i , j }
2

APPENDIX A. SOLUTIONS TO EXERCISES


2.44 Given A and B are invertible operators. To show that B = 0 if [A, B] = 0and{A, B} = 0 .
Since, from problem [2.42] we have AB =
AB = 0 .

[A, B] + {A, B}
:
2

Since A is invertible, it cannot be = 0. , B = 0. Hence, proved .


2.45 To show that [A, B] = [B , A ] .
[A, B] = (AB BA) (AB) (BA)
RHS :

B A A B [B , A ]
RHS = LHS

Hence, proved.
2.46 To show that [A, B] = [B, A]
RHS :

[A, B] = AB BA = (BA AB) [B, A]

RHS = LHS

Y
P
O
C
T
F
A

Hence, proved .

2.47 Given A and B are Hermitian. To show that i[A, B] is also Hermitian

To show that i[A, B] is Hermitian, we must show that:

DR

(i[A, B]) = i[A, B]

(A.30)

From problem [2.45], we have that [A, B] = [B , A ]

given that A and B are hermitian, we can say [A, B] = [B, A] [A, B] = [A, B]
From the LHS of equation (eq. A.30), we have:

(A.31)

(i[A, B]) = i([A, B])

Using equation (eq. A.31) on the RHS, we get: (i[A, B]) = i[A, B] i[A, B] is Hermitian.

Hence, proved .

2.48 To find the polar decomposition of a (a).Positive matrix P (b).Hermitian matrix H (c).Unitary matrix
U.

To find
the
polar
decomposition
of
a
matrix
A,
we
need
to
find
positive
matrices
J
=
A A and

K = AA and a unitary matrix T such that A = TJ = KT.


(a).

J=

P P

K=

PP

since, a positive matrix is hermitian too, we have P = P: J = K = P


Right polar decomposition: TP and Left polar decomposition: PT,
for some unitary operator T
(b).

Same as above. Right polar decomposition: HT , Left polar decomposition: TH

(c). J = U U and K = UU
Since, U is unitary:

J = I and

K=I

Left polar decomposition: IT , Right polar decomposition: TI

GO TO FIRST PAGE

181

APPENDIX A. SOLUTIONS TO EXERCISES


2.49 To express the polar decomposition of a Normal matrix in outer product form:

let A be some normal operator having a spectral decomposition. A =


i |i i |

(A.32)

A A and K = AA

Since A is normal, from equation (eq. A.32): J = K =


|i |2 |i i |

For finding the polar decompositions, we need:

J=

(A.33)

Solution is still incomplete


2.50 To find the right and left polar decompositions of the matrix

1
1

0
1

Let J and K be positive matrices: J = A A K = AA

1 0
1 1
We have: A =
A =
1 1
0 1

1
0
1
1
1
1
AA =

1 1
0 1
1 2

1 1
1 0
2 1

similarly, A A =

0 1
1 1
1 1

Y
P
O
C
T
F
A

To take the square roots of the matrices A A and AA , we must consider the operation on their
eigenvalues. To find their eigenvalues, we need to solve the secular equation.
Eigenvalues of A A: det|A 1 I| (1 1 )(2 1 ) 1 = 0

3+ 5
3 5
2
1 31 + 1 = 0 1 =
or 1 =
2
2

3+ 5
2 1
x
x
1 =
:
= 1
2x + y = 1 x and x + y = 1 y
1 1
y
y
2

0
|1 =
1

3 5
2 1
x
x
1 =
:
= 1
2x + y = 1 x and x + y = 1 y
1 1
y
y
2

1
|1 =
0

since, A A =
i |i i |: A A = 1

DR

Eigenvalues of AA : det|A 2 I| = 0 (1 2 )(2 2 ) = 0 2 = 1 or 2 = 2

1 1
x
x
1
2 = 1:
=1
x + y = x and x + 2y = y |2 =
1 2
y
y
0

1 1
x
x
0
2 = 2:
=2
x + y = 2x and x + 2y = 2y |2 =
1 2
y
y
0

GO TO FIRST PAGE

182

You might also like