Gesture Recognition Using Computer Vision

Chapter 2
Learning Process
2.1 Motivation
A variety of spontaneous gestures, such as nger, hand, body, or head movements are used
to convey information in interactions among people. Gestures can hence be considered a
natural communication channel with numerous aspects to be utilized in human computer
interaction. Up to date, most of our interactions with computers are performed with
traditional keyboards, mouses, and remote controls designed mainly for stationary interaction. With the help of the great technological advancement, gesture-based interfaces
can serve as an alternate modality for controlling computers, e.g. to navigate in oce
applications or to play some console games like Nintendo Wii. Gesture-based interfaces
can enrich and diversify interaction options and provide easy means to interact with the
surrounding environment especially for handicapped people who are unable to live their
lives in a traditional way. On the other hand, mobile devices, such as PDAs, mobile
phones, and other portable personal electronic devices provide new possibilities for interacting with various applications, if equipped with the necessary devices especially with
the proliferation of low-cost MEMS (Micro-Electro-Mechanical Systems) technology. The
majority of the new generation of smart phones, PDAs, and personal electronic devices
are embedded with an accelerometer for various applications. Small wireless devices containing accelerometers could be integrated into clothing, wristwatches, or other personal
electronic devices to provide a means for interacting with dierent environments. By
dening some simple gestures, these devices could be used to control home appliances for
example, or the simple up and down hand movement could be used to operate a garage
door, or adjust the light intensity in a room or an oce.
2.2
Gestures and Gesture Recognition
Expressive and meaningful body motions involving physical movements of the hands,
arms, or face can be extremely useful for
conveying meaningful information, or
interacting with the environment.
This involves:
a posture: a static conguration without the movement of the body part and
a gesture: a dynamic movement of the body part.
Generally, there exist many-to-one mappings from concepts to gestures and vice versa.
Hence gestures are ambiguous and incompletely specied. For example, the concept
"stop" can be indicated as a raised hand with the palm facing forward, or an exaggerated
waving of both hands over the head. Similar to speech and handwriting, gestures vary
between individuals, and even for the same individual between dierent instances. Sometimes a gesture is also aected by the context of preceding as well as following gestures.
Moreover, gestures are often language- and culture-specic. They can broadly be of the
following types :
1. hand and arm gestures: recognition of hand poses, sign languages, and entertainment applications (allowing children to play and interact in virtual environments).
2. head and face gestures: Some examples are
(a) nodding or head shaking
(b) direction of eye gaze
(c) raising the eyebrows
(d) opening and closing the mouth
(e) winking
(f) aring the nostrils
(g) looks of surprise, happiness, disgust, fear, sadness, and many others represent
head and face gestures.
3. body gestures: involvement of full body motion, as in
(a) tracking movements of two people having a conversation
(b) analyzing movements of a dancer against the music being played and the
rhythm
4
(c) recognizing human gaits for medical rehabilitation and athletic training.
Gesture recognition refers to the process of understanding and classifying meaningful
movements of the hands, arms, face, or sometimes head, however hand gestures are the
most expressive, natural, intuitive and thus, most frequently used. Gesture recognition
has become one of the hottest elds of research for its great signicance in designing
articially intelligent human-computer interfaces for various applications which range from
sign language through medical rehabilitation to virtual reality. More specically, gesture
recognition can be extremely useful for:
1. Sign language recognition in order to develop aids for the hearing impaired. For
example, just as speech recognition can transcribe speech to text, some gestures
representing symbols through sign language can be transcribed into text.
2. Socially assistive robotics. By using proper sensors and devices, like accelerometers
and gyros, worn on the body of a patient and by reading the values from those
sensors, robots can assist in patient rehabilitation.
3. Developing alternative computer interfaces. Foregoing the traditional keyboard and
mouse setup to interact with a computer, gesture recognition can allow users to
accomplish frequent or common tasks using hand gestures to a camera.
4. Interactive game technology. Gestures can be used to control interactions within
video games providing players with an incredible sense of immersion in the totally
engrossing environment of the game.
5. Remote Controlling. Through the use of gesture recognition, various hand gestures
can be used to control dierent devices, like secondary devices in a car, TV set,
operating a garage door, and many others.
Gesture recognition consists of gesture spotting that implies determining the start and
the end points of a meaningful gesture trace from a continuous stream of input signals,
and, subsequently, segmenting the relevant gesture. This task is very dicult due to two
main reasons. First of all, the segmentation ambiguity in the sense that as the hand motion
switches from one gesture to another, there occur intermediate movements as well. These
transition motions are also likely to be segmented and matched with template traces, and
need to be eliminated by the model. The spatio-temporal variability is the second reason
since the same gesture may vary dynamically in duration and, very possibly in shape even
for the same gesturer.
2.3 Literature Survey

The rst step in recognizing gestures is sensing the human body position, conguration
(angles and rotations), and movement (velocities or accelerations). This can be done
either by using sensing devices attached to the user which can take the form of magnetic
eld-trackers, instrumental (colored) gloves, and body suits or by using cameras and
computer vision techniques. Each sensing technology varies along several dimensions,
including accuracy, resolution, latency, range of motion, user comfort and cost.
GLOVE BASED gestural interfaces typically require the user to wear a cumbersome
device and carry a load of cables connecting the device to the computer. This hinders the
ease and naturalness of the user's interaction with the computer.
VISION BASED techniques, while overcoming this problem, need to contend with
other problems related to occlusion of parts of the user's body. Most of the work on
gesture recognition available in the literature is based on computer vision techniques .
Vision-based techniques vary among themselves in
1. The number of cameras used,
2. Their speed and latency,
3. The structure of environment such as lighting and speed of movement,
4. Any user requirements such as any restrictions on clothing,
5. The low-level features used such as edges, regions, silhouettes, moments, histograms,
and others,
6. Whether 2D or 3D is used.
Therefore, these limitations restrict the applications of vision-based systems in smart
environments. More specically, suppose you are enjoying watching movies in your home
theatre with all the lights o. If you decide to change the volume of the TV with a gesture,
it turns out to be rather dicult to recognize your gesture under poor lighting conditions
using a vision-based system. Furthermore, it would be extremely uncomfortable and
unnatural if you have to be directly facing the camera to complete a gesture.
In IMAGE PROCESSING TECHNIQUE for gesture recognition ,the input image ( hand gesture image) is divided into regions separated by boundaries. The segmentation process depends on the type of gesture, if it is dynamic gesture then the hand gesture
need to be located and tracked, if it is static gesture (posture) the input image have to
be segmented only. The hand should be located rstly, generally a bounding box is used
to specify the depending on the skin colorand secondly, the hand have to be tracked.For
tracking the hand there are two main approaches:
6
Figure 2.1: Gesture recognized using image processing

1. Either the video is divided into frames and each frame have to be processed alone,
in this case the hand frame is treated as a posture and segmented or
2. Using some tracking information such as shape, skin color using some tools such as
Kalman lter.
Dierent tools and methods used skin and non-skin pixels to model the hand. These
methods are parametric and non-parametric techniques, Gaussian Model (GM) and Gaussian Mixture Model (GMM) are parametric techniques, and histogram based techniques
are non- parametric.
However it is aected with illumination condition changes abs dierent races. Although these systems can detect dierent skin colors under cluttered background but it
is aected with changing in temperature degrees besides their expensive cost. The segmentation considered as an open issue problem itself. The color space used in a specic
application plays an essential role in the success of segmentation process, however color
spaces are sensitive to lighting changes, for this reason, researches tend to use chrominance components only and neglect the luminance components such as r-g, and HS color
spaces. However there are some factors that obstacle the segmentation process which is
complex background, illumination changes, low video quality. Edge detection or contour
operators cannot be used for gesture recognition since many hand postures are generated
and could produce misclassication.
FINGERTIP DETECTION Most of the complete hand interactive systems can
be considered to be comprised of three layers: detection, tracking and recognition. The
detection layer is responsible for dening and extracting visual features that can be attributed to the presence of hands in the eld of view of the camera(s). The tracking layer
is responsible for performing temporal data association between successive image frames,
so that, at each moment in time, the system may be aware of what is where. Moreover,
in model-based methods, tracking also provides a way to maintain estimates of model
parameters, variables and features that are not directly observable at a certain moment
in time. Last, the recognition layer is responsible for grouping the spatiotemporal data
7
Figure 2.2: Gesture Recognized Using Finger Tip Detection
Figure 2.3: Twelve Gestures In A Data Set

extracted in the previous layers and assigning the resulting groups with labels associated
to particular classes of gestures.
The problem of ecient and robust nger tip-based recognition of natural hand gestures in unprepared environments still remains open and challenging, and is expected to
remain of central importance to the computer vision community in the forthcoming years.
A very promising alternative is to resort to other sensing techniques such as acceleration based techniques or electromyogram-based (EMG-based) techniques. Accelerationbased gesture control is well-suited to distinguish noticeable, larger scale gestures with
dierent hand trajectories. However, it is not very eective when it comes to detecting
more subtle nger movements which is completely overcome by electromyogram-based
techniques since they are very sensitive to muscle activation and thus provide rich information about nger movements. Yet, due to some inherent problems with EMG measurements including separability and reproducibility of measurements, the size of discriminable
hand gesture set is limited to 4-8 gestures. As a result, after examining all the sensing
devices and techniques in literature, a 3-axis accelerometer is the sensing device utilized
to acquire the data pertaining to gestures. Gesture recognition based on data from an
accelerometer is an emerging technique for gesture-based interaction after the rapid development of the MEMS technology. Accelerometers are embedded in most of the new
generation personal electronic devices such as Apple iPhone , Nintendo wii mote which
provide new possibilities for interaction in a wide range of applications, such as home
appliances, in oces, and in video games.
2.3.1 Accelerometer
An accelerometer is a sensing element that measures acceleration; acceleration is the rate
of change of velocity with respect to time. It is a vector that has magnitude and direction.
Accelerometers measure in units of g - A g is the acceleration measurement for gravity
8
Figure 2.4: Various Gestures Along 3 axes

which is equal to 9.81m/s. Accelerometers have developed from a simple water tube with
an air bubble that showed the direction of the acceleration to an integrated circuit that
can be placed on a circuit board. Accelerometers can measure: vibrations, shocks, tilt,
impacts and motion of an object.
2.3.1.1
Types of Accelerometers
There are a number of types of accelerometers. What dierentiates the types is the sensing
element and the principles of their operation.
Capacitive accelerometers sense a change in electrical capacitance, with respect to
acceleration. The accelerometer senses the capacitance change between a static

condition and the dynamic state.
Piezoelectric accelerometers use materials such as crystals, which generate electric
potential from an applied stress. This is known as the piezoelectric eect. As stress
is applied, such as acceleration, an electrical charge is created.
Piezoresistive accelerometers (strain gauge accelerometers) work by measuring the
electrical resistance of a material when mechanical stress is applied

Hall Eect accelerometers measure voltage variations stemming from a change in
the magnetic eld around the accelerometer.

Magnetoresistive accelerometers work by measuring changes in resistance due to a
magnetic eld. The structure and function is similar to a Hall Eect accelerom9
eter except that instead of measuring voltage, the magnetoresistive accelerometer

measures resistance.
Heat transfer accelerometers measure internal changes in heat transfer due to ac-
celeration. A single heat source is centered in a substrate and suspended across a

cavity. Thermoresistors are spaced equally on all four sides of the suspended heat
source. Under zero acceleration the heat gradient will be symmetrical. Acceleration
in any direction causes the heat gradient to become asymmetrical due to convection
heat transfer. MEMS-Based Accelerometers MEMS (Micro-Electro Mechanical System) technology is based on a number of tools and methodologies, which are used
to form small structures with dimensions in the micrometer scale (one millionth of
a meter). This technology is now being utilized to manufacture state of the art
MEMS-Based Accelerometers.
2.3.1.2
Selecting an Accelerometer
When selecting an accelerometer for an application the rst factors to consider are:
Dynamic Range: Dynamic range is the +/- maximum amplitude that the accelerom-
eter can measure before distorting or clipping the output signal. Dynamic range is
typically specied in g's
Sensitivity: Sensitivity is the scale factor of a sensor or system, measured in terms
of change in output signal per change in input measured. Sensitivity references

the accelerometer's ability to detect motion. Accelerometer sensitivity is typically
specied in millivolt per (mV/g).
Frequency response: Frequency response is the frequency range for which the sensor
will detect motion and report a true output. Frequency response is typically specied
as a range measured in Hertz (Hz).
Sensitive axis: Accelerometers are designed to detect inputs in reference to an
axis; single-axis accelerometers can only detect inputs along one plane. Tri-axis
accelerometers can detect inputs in any plane and are required for most applications.
Size and Mass: Size and mass of an accelerometer can change the characteristics
of the object being tested. The mass of the accelerometers should be signicantly
smaller than the mass of the system to be monitored.
10
Chapter 3
Gesture Recognition and Methods of
Pattern Recognition
3.1 Gesture Recognition
Automated gesture recognition has been investigated in various academic research projects,
which yielded a number of practical applications and commercial products. Prior to the
actual recognition of gestures, a solution must be found to automatically register the position or movements of the human body or body parts, such as arms and hands. This is
sometimes also referred to as motion tracking and a variety of dierent techniques have
been investigated concerning this task.
An accelerometer is a small sensor, which can measure the acceleration of itself, or the
device it is built-in respectively. Based on the acceleration prole, which originates from
the movement of the device, the classication into previously dened gestures is possible.
3.2 Pattern Recognition Methods

The recognition of accelerometer-based hand gestures is primarily a task of pattern
recognition. As the accelerometer can be freely moved in space temporally varying, 3dimensional signal data is obtained from the 3-axis acceleration sensor. The sample data
generated in this way, needs to be classied into a set of preliminary dened reference
gestures. Gestures should be freely trainable, with minimal training eort. Data samples
that do not match one of the predened gestures well enough, should not be classied as
one of them, but instead stated separately, as an unclassiable, unknown gesture.
Dynamic time warping algorithms can measure the similarity between two sample
sequences, which vary in time or speed. Good classication results can be achieved by
using this method. The major drawback however is that, in order for the method to work
reliably, for each gesture, a relatively high number of gesture data templates needs to
11
be present. Therefore this method is only used for small alphabets of gestures; as the
creation of good gesture templates is described as a relatively tedious process.
Neural networks have also been applied to similar tasks of classication, yielding good
results. However, neural networks are generally not suitable to work on multi-dimensional
sample data that varies over time. Their major aw is that they typically operate with a
xed length time window, which imposes restrictions on the gestures. Extensive pre- and
post-processing steps, such as the inter- and extrapolation of data samples, need to be
employed in combination with special types of neural networks, such as timedelay neural
networks, in order to achieve good quality classication results.
The third pattern recognition method, which was considered and eventually chosen, are
so-called hidden Markov models (HMMs). They have been extensively studied primarily
in the context of speech recognition. Hidden Markov models work well on temporal data
that is not xed in length. A number of training algorithms exists, allowing the automated
creation of custom, free-form gestures models. They can be used to solve the segmentation
problem, which typically occurs with continuous temporal data.
For their particular properties and the promising results, achieved in comparable studies utilising them, discrete hidden Markov models were chosen to form the basis of the
pattern recognition process.
3.3 Recognition Methods

Hidden Markov models represent the core of the gesture recognition application, that is
developed in the course of this project. To be applied in a practical application, some
pre- and post-processing steps need to be applied, in order to bring the input data into
a form, which can be easily processed by discrete hidden Markov models. The optimisation achieved by this additional processing steps also helps minimise computational and
memory demands.
3.4 Sensing
Acceleration of the device (relative to free-fall) is constantly measured by MMA73X1 accelerometer. The acceleration is measured along three axis, resulting in a three-dimensional
data signal.
3.5 Vector Quantisation

Vector quantisation is a process of mapping a large input vector space onto a smaller subset
of so-called prototype vectors. This can also be described in terms of cluster analysis: The
original input data vectors are divided into a xed number of clusters. For each cluster,
12
there is one vector, which is most representative for its cluster, the so-called prototype
vector. The vector quantisation process `converts' the large number three-dimensional
acceleration vectors into a manageable amount of prototype vectors. Now each of the
prototype vectors, a symbol is assigned, resulting in a discrete set of symbols. In this
case, simply the integer indices from the list of prototype vectors serve as the observation
symbols. This pre-processing step is needed in order to generate a manageable amount of
observation symbols, that can be processed eciently by discrete hidden Markov models.
3.6 HMM Trainer

The hidden Markov model training module is responsible for creating a hidden Markov
model for each gesture, that should be in the recognition pool. This gesture model is
then trained, or optimised, based on multiple training sequences. Training sequences are
created by recording several repetitions of the desired hand gesture (vector quantisation
steps are applied too).
3.7 HMM Recogniser

The HMM recognition module is responsible for assessing how well the input data, obtained from a gesture just performed, matches the available gesture models. The pool
of available gesture HMMs is established through the HMM trainer module. Speaking
13
in HMM terms, the HMM recogniser calculates the probability that the observation sequence was generated by a given hidden markov model. For each of the previously learned
gesture models, this probability distribution is computed.
3.8 Bayes Classication

Unfortunately, the probability distributions obtained from the HMM recogniser can not
be directly used for the nal classication of the input data. This is because the maximum
probabilities of each of the gesture models are extremely diverging. By applying Bayes'
rule in combination with the average probabilities of the training sequences, normalised
probability values are generated. Finally, the gesture HMM with the highest probability
after Bayes' rule, is returned as the recognised gesture.
3.9 Vector Quantisation

The purpose of a so-called vector quantiser is to map vectors from a large input data space
onto a nite set of typical reproduction vectors. These reproduction or prototype vectors
are a small subset of the original vector space. During this mapping, ideally no information
should be lost, but at the same time the complexity of the data representation should be
signicantly reduced. When processing signal data with a digital computer, these digital
representations are already always nite. However, this data space is still very large,
and it needs to be broken down further, before it can be processed with discrete Hidden
Markov Models in an ecient manner.
The reduction in complexity is achieved by grouping similar vectors together. Each
input vector is mapped onto a corresponding prototype vector of a given codebook. This
mapping is done according to the nearest-neighbour condition. A vector quantiser can
also be seen as a method of cluster analysis.
There are two problems to be solved regarding vector quantisers. Firstly, the number
of prototype vectors needs to be decided upon. This number is an essential parameter for
characterising a vector quantiser, and it has major inuence on its quality. The second
problem concerns the selection of prototype vectors from the input data space. The set of
prototype vectors is also referred to as a codebook, and the number of prototype vectors
is the codebook size. The rst problem about the codebook size needs to be handled very
much application dependent. There is no general algorithm to determine this number.
Depending on the sample data, this is a trade o between minimising the quantisation
error (larger codebook size) and maximising the reduction in complexity.
For solving the second problem of nding appropriate prototype vectors, several algorithms exist. None of them can nd an optimal solution for a given set of sample data
and a given codebook size. For this project the so-called Lloyd's algorithm was used.
14
3.10 Denition
A vector quantiser Q is dened as a mapping of a k-dimensional1 vector space Rk onto a
nite subset Y Rk
Q : Rk Y
Let the codebook be dened as the set Y = y1 , y2 , ...yN . The prototype vectors yi are
sometimes also referred to as reproduction vectors or as code words. N denotes the size
of the codebook.
As a vector quantiser Q of size N maps every vector x form the input space to exactly
one prototype vector yi , it denes a complete and disjoint partition of Rk into regions or
cells R1 , R2 , ...RN . In the cell Ri lie all those vectors x Rk , which were mapped to the
prototype vector or code word yi .
N
Ri = Rk
i=0
and
Ri Rj = , i,j with i = j
The so-called nearest-neighbour condition describes the optimal selection of the partition {Ri } for a given codebook Y . Every partition cell Ri needs to be determined so that
it contains all vectors x Rk , which have minimal distance from the associated prototype
vector yi . This means that the corresponding vector quantiser Q maps a sample vector x
onto its nearest prototype vector of the codebook.
Q(x) = yi
if
d(x, yi ) d(x, yj )j = i
In the above equation d(:, :) denotes a general distance function. It can be substituted with
a specic distance function. In this paper, the Euclidean distance in three dimensional
vector space was used:
d(x, y) = ||y x|| =
(y1 x1 )2 + (y2 x2 )2 + (y3 x3 )2
If the distance between vector x has minimal distance to more than only one prototype vectors, it can be mapped to any one of those prototype vectors. The resulting
quantisation error is not aected by the actual choice.
The optimal choice of a codebook Y for a given partition Ri is dened by the centroid
condition. For a cell Ri of the partition the optimal reproduction vector is given by that
prototype vector yi , that corresponds to the centroid of the cell:
yi = cent(Ri )
The centroid of a cell R here is dened as the vector y R from that all other vectors
x R in the statistical average have minimal distance.
y = cent(R) = arg min {d(X, y)|X R}
yR
15
The random variable X is used here to represent the distribution of the data vectors
x in the input space.
A sample vector x Rk that was mapped onto a prototype vector y will in general
be dierent from the source vector. Therefore from every single quantisation operation
results an individual quantisation error (x|Q). This error can be described by the distance
between the original and the quantised vector:
(x|Q) = d(x, Q(x))
when using Euclidean distance:

(x : Q) = ||x Q(x)||
Finally, we also introduce index I = {1, 2, ...N } which denotes the index of codebook
Y . Later, it will be shown how this index serves as the alphabet of possible observation
symbols for discrete hidden Markov models.
3.11 Lloyd's Algorithm

In the previous section, the elements of a vector quantiser and optimality conditions, such
as the nearest neighbour rule were introduced. However, what was previously referred to
as the second problem vector quantiser, has not been answered yet: Given a sample set of
input vectors = {x1 , x2 ...xT } and the codebook size N - how can N optimal prototype
vectors be determined, which represent the codebook Y ?
There are various algorithms that can nd a near-optimal solution to this problem.
The one, that is used in this project, is called Lloyd's algorithm, sometimes also called the
Voronoi iteration. This algorithm is often referred to by mistake as k-means algorithm.
The Lloyds algorithm is an iterative algorithm, achieving a better solution at each
iteration. At the beginning of the procedure, N suitable prototype vectors are chosen.
This initial codebook is referred to as Y 0 . The choice of the initial codebook inuences
the quality of the vector quantiser. It needs, however, be done heuristically. A common
approach is to randomly select N vectors from the input vectors. In the corresponding
implementation section on vector quantisation, other methods are evaluated as well.
The next step is to map every vector xt from the sample set to the nearest prototype
m
vector yim of the current codebook Y m . This mapping also determines the partitions Ri .
From the newly formed partitions, an updated codebook can be computed. The centroids
of the new partitions serve as the prototype vectors for the updated codebook. Since
Euclidean distance is used for the centroid condition. The centroid for a given partition
can simply be calculated as the mean vector of that partition.
The algorithm terminates, when the relative decrease of the quantisation error is less
16
than the specied lower bound min .

Given
sample set of input vectors = (x1 , x2 , ...., xT )

desired codebook size N
lower bound min . for the relative improvement of the quantisation error
1. Initialisation
Choose a suitable initial codebook Y 0 of size N
set iteration count m 0
2. Partitioning
For the current codebook Y m , determine the optimal partition by classifying all
vectors xt with t = 1...T into cells
m
m
Ri = {x|yi = arg min d(x, y)}
m
yY
also compute the average quantisation error

(Y m ) =
1
T
min d(xt , y)
t=1
yY m
3. Codebook Update
m
For all cells Ri with i = 1...N calculate new reproduction vectors
m+1
m
yi
= cent(Ri )
these constitute the new codebook Y m+1 = {yim+1 |1 i N }

4. Termination
calculate the relative decrease of the quantisation error with respect to the last iteration
m=
(Y m1 (Y m )
(Y m )
if the relative decrease was large enough:

m > min
set m m + 1 and continue with step 2.
otherwise stop.
17
3.12 Hidden Markov Models

Hidden Markov models can be divided into two main types: discrete hidden Markov
models and continuos hidden Markov models. There are also hybrid types, such as semicontinuous hidden Markov models. Discrete Markov models operate with probability
distributions of symbols form a nite alphabet of symbols. Continuous hidden Markov
models operate with probability densities on continuous data, represented as mixture
models. Here, only the discrete models are examined. Therefore, the following theory
section, including mathematical formulas, only applies for discrete hidden Markov models.
3.12.1 Markov Chains

Hidden Markov models are fundamentally based on Markov chain models. Markov chains
can be used for the statistical description of symbol and state sequences.
A discrete Markov chain model can be visualised as a nite state automaton with
connections between all states. The nite set of N distinct states of the system is dened
as:
S = {1, 2, ...N }
At discrete time steps, the system undergoes a change of state. The change of state
happens according to the set of state transition probabilities, these can be thought of as
the connections between the state nodes. Because there may be connections pointing back
to the same state (a11 , a22 , a33 ) in gure, it is possible that the system ends up in the same
state. The time instants associated with the state changes are denoted as t = 1, 2, ....T
(for a nite time interval of length T). The actual state of the system at time t can be
denoted as St .
The behaviour of the system at a given time t is only dependent on the previous state
in time. This is the so-called Markov property and therefore the probabilistic description
of the system's states can be characterised as follows:
P (St |S1 , S2 , ...St1 ) = P (St |St1 )
The set of state transition probabilities is denoted as the matrix A:

A = {aij }
aij = P (St = j|St1 = i),
1 i, j N
It has not yet been dened how the initial state of the system is determined. Therefore
the vector is introduced. It holds the set of start probabilities for each state:
= {i |i = P (S1 = i)}
18
Figure 3.1: A markov chain with 3 states

The state transition matrix A and the start probability vector i obey the following
standard stochastic constraints:
N
aij = 1
j=1
N
i = 1
i=1
Expressed verbosely, this means that the state probabilities for each state in the model
(each row of the matrix) must sum up to 1. And the same applies for the start probabilities.
Now that all fundamental elements of a Markov chain have been introduced, the
concrete example of a 3-state system is presented.
The example of a 3-state Markov model of the weather shall be considered. Therefore,
it is assumed that once a day (e.g., at noon), the weather is observed as being one of the
following:
State 1: rain
State 2: cloudy
State 3: sunny
The weather on day t can only be a single one of the three states above. The matrix
A of the state transition probabilities is given as:
19
A = {aij} =
a11 a12 a13

a21 a22 a23
a31 a32 a33
0.4 0.3 0.3
0.2 0.6 0.2
0.1 0.1 0.8
The current weather shall be sunny (state 3), therefore the start vector of the model
looks as follows:
= {i } =
1
2
3
0.0
0.0
1.0
Now the question is asked: What is the probability (according to the given model)
that the weather for the next 7 days will be sunny sunny rainy rainy sunny
cloudsunny?. To be able to pose this question more formally, the observation sequence
O is dened:
O = {sun, sun, rain, rain, sun, clouds, sun}
= {S1 = 3, S2 = 3, S3 = 1, S4 = 1, S5 = 3, S6 = 2, S7 = 3}
= {3, 3, 1, 1, 3, 2, 3}
Note that the symbols {1, 2, 3} refer to the states of the 3-state Markov chain.
The question about the probability can be expressed and evaluated for the given model
and observation sequence as:
P(O|Model) =P(3, 3, 1, 1, 3, 2, 3|Model)
=P(3).P(3|3).P(3|3).P(1|3).P(1|1).P(3|1).P(2|3).P(3|2)
=3 .a33 .a33 .a31 .a11 .a13 .a32 .a23
=1.0 * 0.8 * 0.8 * 0.1 * 0.4 * 0.3 * 0.1 * 0.2
=1.536 104
3.13 Hidden Markov Model

3.13.1 Hidden Markov Model Denition
Hidden Markov models describe a two-stage stochastic process. The rst stage is exactly
a Markov chain as dened in the previous section. Therefore hidden Markov models can
be seen as extended versions of Markov chain models.
In an HMM, a Markov chain represents the so-called hidden part of the model. It
is called hidden, because in an HMM, the individual states do not directly correspond
to some observable event and are therefore hidden. Instead, the observable event is
itself another probabilistic function. It is a probabilistic function that depends (and only
depends) on the current state. This probabilistic function represents the second stage of
the HMM process.
In the example 3-state weather model discussed in the previous section, each of the
20
Figure 3.2:
HMM with
(rain,clouds,sun,fog,snow)
hidden
states
and
observation
alphabet
three states corresponds directly to an observable meaningful event, namely to a rainy,

cloudy or sunny day. In contrast, the states of an HMM do not have any obvious meaning,
they are primarily just states S = {1, 2, ...N } in which the model can be in. From this
point of view, the properties of the hidden states of an HMM are similar to those of a
multilayer perceptron's hidden nodes.
A hidden state does not correspond to an observation event directly, instead, each
hidden state maintains an associated probability distribution for each possible observation
symbol of the observation alphabet. These probability distributions are relevant for the
second stage of the HMM process, and are also called observation likelihoods or emission
probabilities.
The set of emission probabilities of an HMM can be grouped together in matrix B:
B = {bjk }
bjk = P (Ot = ok |St = j)
Ot denes the observation symbols (or emissions) at time t. The observations belong to
the discrete alphabet V of size K :
V = {o1 , o2 , ...oK }
The example model of the weather Markov chain, which was dened in the previous
21
section, shall be considered again. This example model shall now be extended into a full
blown hidden Markov model in order to further illustrate the additional elements that
characterise a hidden Markov model. Therefore, to each of the three states (these are now
the hidden states), the complete alphabet of possible observation symbols is assigned. The
probability relationship between the state and an individual emission symbol is described
by the probability distribution bjk . The observation alphabet is typically larger than the
number of hidden states in the model. Therefore the existing alphabet rain, clouds, sun is
extended to rain, clouds, sun, f og, snow. A visualisation of this example weather HMM
can be seen in gure.
Finally, the formal denition of a hidden Markov model is summarised, including the
parts that are overlapping with Markov chain models:
A hidden Markov model can be formally denoted as = (S, A, B, , V ) but usually in
literature, e.g. in the short form = (A, B, ) is used. The individual components are:
S = {1, 2, ...N } the set of N hidden states
A = {aij } the state transition probability matrix, each aij representing the probability
that the system changes from state i to state j
B = {bjk } the emission probability matrix, each bjk representing the probability that
state j generates observation ok
= {i } the vector of initial state probabilities, each i representing the probability that
the system starts in state i
V = {o1 , o2 ...oK } the alphabet of observation symbols
A given HMM can be used as a generator for an observation sequence O of length
T:
O = O1 , O2 , ...OT
3.14
Three Basic Problems for HMMs
In order for HMMs to be of any practical use, poses three fundamental problems that
need to be solved.
Problem 1: Evaluation
"Given the observation sequence O = O1 , O2 , ..OT , and a model = (A, B, ), how

do we eciently compute P (O|), the probability of the observation sequence, given the
model?"
To be able to compute the probability that an observed sequence was produced by a
given model, is extremely relevant for the gesture recognition application presented in this
paper. This problem can also be seen as one of scoring: How well does a given observation
22
sequence match a given model? Therefore, the solution to this problem lies at the heart
of the gesture recognition process.
This problem corresponds to the question about the likelihood of a weather forecast to
be generated by the simple 3-state Markov chain. As a consequence, an obvious solution
to this problem is to extend the existing method for hidden Markov models. The major
dierence here is that the state sequence an HMM runs through, in order to produce a
given observation sequence, can not be known in advance.
Because the state sequence can not be known, every hidden state (and its emission
probabilities) needs to be considered at every time time t. For an HMM with N hidden
states and an observation sequence of length T, there are N T possible hidden sequences.
The resulting algorithm is exponential in the length T of the observation sequence. Therefore, this method is out of question when it comes to practical applications. Luckily, as
it will be shown in section , there is a more ecient solution to this problem, namely the
so-called forward algorithm.
Problem 2: Decoding
"Given the observation sequence O = O1 , O2 , ...OT , and a model , how do we choose

a corresponding state sequence Q = q1 , q2 , ..qT which is optimal in some meaningful sense
(i.e. best 'explains' the observations)?"
This problem is not of immediate relevance for the presented gesture recognition task
and was therefore not further examined. For the sake of completion it is mentioned here
that a common solution to this problem is provided by the Viterbi algorithm.
Problem 3: Training
"How do we adjust the model parameters = (A, B, ) to maximize P (O|)?"

This problem refers to the training of a model. The model's parameters are to be
optimised in order to increase its probability for the generation of a given observation
sequence. Such an observation sequence is called training sequence.
The ability to train HMMs is crucial for the gesture recognition application. The
adaptation of the models parameter's to an observed training sequence, allows the creation
of a model that best describes a certain gesture.
Several training algorithms for HMMs have been developed. The most widely used
method is Baum-Welch algorithm.
23
3.15 Forward Algorithm

The forward algorithm provides a solution to calculate the likelihood that a particular
observation sequence O was generated by a given hidden Markov model. Formally this
can be written as P (O|)
In contrast to the 'brute fore' method, that has been mentioned under problem 1,
the forward algorithm exploits the Markov property (the probability of a particular state
dependent only on the previous state) in order to reduce the number of computations
required signicantly.
First, an auxiliary set of variables the so-called forward variables t (i) is introduced. It
denes, for a given model , the probability that the rst part of an observation sequence
up to Ot is generated and that at time t the system is in state i.
t (i) = P (O1 , O2 , ...Ot , st = i|)
1. Initialisation
The algorithm is initialised by obtaining the value for at time t = 1.

1 (i) = i bi (O1 )
1iN
2. Recursion
for all times t, t = 1, ...T 1:

N
[t (i)aij ]bj (Ot+1 )
t+1 (j) =
1iN
i=1
3. Termination
Finally, all N probabilities at time T are summed up.

N
P (O|) =
T (i)
i=1
3.16 Backward Algorithm

The backward algorithm by itself does not solve a new problem. Like the forward algorithm, it also calculates the probability P (O|). But the backward algorithm starts at
time t = T , the end of the observation sequence, working backwards-while the forward
algorithm. The backward algorithm is presented at this point, because it is used within
24
the Baum-Welch training algorithm.

Analogous to the forward variable t (i), a the backward variable t (i) is dened:
t (i) = P (Ot+1 , Ot+1 , ...Ot , st = i|)
1. Initialisation
T (i) = 1
2. Recursion
for all times t, t = T 1, ...1:

N
t (j) =
1jN
ij bj (Ot+1 t+1 (j)

i=1
3.Termination
i bi (O1 )1 (i)
P (O|) =
i=1
3.17 Baum-Welch Algorithm

The Baum-Welch algorithm, sometimes also called forward-backward algorithm, is an
iterative training method for hidden Markov Models. Training here means maximising
the likelihood that a model emits observation sequence O. The Baum-Welch algorithm
represents a variant of the expectation maximisation (EM) algorithm, which in general
optimises parameters of multi-stage stochastic processes.
At each iteration, the algorithm produces new estimates for the model parameters,
the new model and parameters are denoted as = (A , B , ).
In order to describe the algorithm, the following auxiliary variables are dened. The
rst variable t (i) represents the probability of the system being in state i at time t. It
can be calculated as follows, by drawing on the forward and backward variables dened
in the previous sections:
t (i) = P (St = i|O.) =
t (i)t (i)
P (O|)
The second variable denes the probability of a transition from state i to state j at
25
time t, it is denoted as t (i, j). This probability can be calculated, based on the forward
and backward variables, as follows:
t (i, j) = P (St = i, St+1 = j|O, ) =
t (i)aij bj (Ot+1 )t+1 (j)

P (O|)
By using the above formulas and the concept of counting event occurrences, the reestimates of the parameters of an HMM can be written as:
aij =
expected number of transitions f orm state i to state j

expected number of transitions f rom state i
=
bj (ok ) =
T 1 t (i, j)
t=1
T 1
t=1 t (i)
expected number of times in state j and observing symbol ok

expected number of times in state i
t:Ot =ok t (j)

T
t=1 t (j)
The start probabilities i can be seen as a special case of the transition probabilities
aij . The following equation determines the improved start probabilities:
i = 1 (i)
Now that it has been described how an improved set of model parameters can be
calculated, an overview of the overall algorithm is given:
1. Initialisation
Choice of suitable initial model = (A, B, ) with initial estimates for

i start probabilities
aij transition probabilities
bjk emission probabilities
2. Optimisation
Compute updated estimates = (A , B , ) according to the re-estimation formulas

presented above.
26
3. Termination
If the quality measure P (O| ) was considerably improved by the updated model
with respect to .
let and continue with step 2
otherwise stop.
The Baum-Welch algorithm was at rst implemented according the formulas presented
above. During the implementation phase of this project, some extensions and amendments
were made. The algorithm was extended to work with multiple observation sequences.
Thereafter, all probability calculations were moved to the negative-logarithmic scale, because of problems with numerical under ow.
3.18 Classication with Bayes

Bayes rule provides a solution for computing the so-called posterior event B, P (B|A),
based on prior knowledge about the random events P (A) and P (B) and the posterior
probability of event A, P (A|B). It can therefore virtually reverse the conditional dependence of the events A and B. Bayes' rule in its general form is given by:
P (B|A) =
P (A|B)P (B)
P (A)
Given is the gesture pool of N hidden markov models 1 , 2 ....N ,each model j
representing a gesture available for the recognition process. Now, the question is, which
model j is most likely to generate the sample observation sequence O. This question can
be more formally posed as:
P (j |O)
One problem with hidden Markov models is that P (O|, j ), the emission probability
of observations O given the models j , yields probability values that are varying in scale
(compare the values for P (O|)) , depending on the actual gesture model. Therefore,
these probabilities can not be directly compared with each other to nd out, which of the
N HMMs actually generated the observation O most likely.
By taking into account a models average probability of its training sequences and
denoting this as the models probability: P (j ), the posterior probability P (j |O) can be
computed by applying Bayes' rule
P (j |O) =
P (O|j )P (j )
=
N
i=1 P (O, i )
27
P (O|j )P (j )
P (O|i )P (j )
N
i=1
The model j with the biggest value for P (j |O) represents the classied gesture. A
nice side eect of the Bayes' rule is, that the computed values for P (j |O) add up to 1.
28
Chapter 4
Implementation
4.1 Test gestures
The theory chapter provides a general description of the relevant methods and software
algorithms, which are applied in this project. Some of these methods require the denition
of application specic constants or initial values. For example the codebook size and
initial prototype vectors for the vector quantiser need to be chosen. For some of these
parameters, an optimal value can only be determined empirically and heuristically. In
order to run experiments, which yield comparable results, four simple hand gestures are
dened. Based on the results for these test gestures, optimal values for the according
constants and model parameters are sought. The four test gestures are: Circle, bowling,
Triangle and Z.
4.2 Vector Quantiser

The Lloyd's algorithm is the vector quantiser training algorithm, utilised in this project.
The purpose of this algorithm is to nd an optimised codebook for a given set of input
vectors. The vector quantiser training algorithm is only of relevance during the training
phase of a gesture. In order to train the vector quantiser with the Lloyd's algorithm, the
vector data of all training sequences for one gesture is considered. This way, an optimised
codebook for each individual gesture is created.
Figure 4.1: Test Gestures

29
Once the gesture is learned, and available in the gesture pool, the Lloyd's training
algorithm is no longer involved, because the codebook is not re-estimated for the recognition of a gesture. When in recognition mode, the vector quantiser serves in its basic
mapping function. It maps every incoming data vector onto one prototype vector of the
gesture's codebook, according to the nearest-neighbour rule. Because each gesture has its
own codebook, each input data sample goes through the vector quantisation process for
as many times as there are trained gestures in the recognition pool.
The training algorithm has the following two prerequisites: Firstly, the number of
prototype vectors, the so-called codebook size N , needs to be dened. Secondly, an initial
set of N prototype vectors, the so-called initial codebook Y 0 needs to be chosen. In these
sections various experiments are conducted in order to seek suitable values for both.
In the rst series of experiments several methods for choosing the initial prototype
vectors are tried. Thereafter, various codebook sizes are investigated. A codebook size
of 20 serves as starting point for the rst series of experiments. Recognition results are
based on left-to-right hidden Markov models with 8 states and step size 2. Each test
gesture is trained based on the sample data from 20 training sequences. Training as well
as recognition trials are conducted by one and the same person.
4.3 Hidden Markov Model

Although various software libraries for hidden Markov models do already exist, the necessary HMM algorithms were implemented from scratch for this project. The important
reason is that the standard HMM training algorithm can only take one training sequence
to train the HMM. For a robust classication of gestures, hidden Markov models needs
to be trained with multiple training examples as described. Although the changes in
the HMM training algorithm for multiple training sequences are straight forward, when
trying to modify an existing software implementation, these changes go very deep. For
these reasons, a fresh implementation is preferred. However, the code base of existing
implementations has been consulted as a reference.
4.4 Training with Multiple Training Sequences

According to the theory of hidden Markov models, an HMM can only be trained with one
training sequence. However, the recognition rate of a real-world gesture can be greatly
improved by using several training sequences. By repeating a gesture in various dierent
ways, ideally also by dierent people, it is possible to set out a certain range that describes
the actual gesture.
The Baum-Welch algorithm can be modied to operate on multiple training sequences.
30
Figure 4.2: Left to right HMM with 5 states and step size 2
The set of Q training sequences is dened as O = {O(1) , O(2) ...O(q) }, where the q th
observation sequence is:
(q)
(q)
(q)
O(q) = {O1 , O2 ....OT q }
The modied equations to calculate the improved model parameters are:

aij =
Tq 1
Q
1
q=1 P (Oq |)
t=1
Q
1
q=1 P (Oq |)
j (ok ) =
b
q
q
q
t (i)aij bj (Ot+1 )t+1 (j)
Tq 1
t=1
q
q
t (i)t (i)
Q
q
q
1
q
t:Ot =ok t (i)t (j)
q=1 P (Oq |)
Tq
q
q
Q
1
t=1 t (i)t (i)
q=1 P (Oq |)
The start probabilities i do not need to be re-estimated. Since a left-to-right HMM

is used, they remain xed with the rst state having the maximum probability 1 = 1
4.5 HMM and Digital Computing Systems

At rst sight, the calculation of probability values for hidden Markov models with real
world computing systems, seems to be a trivial problem. Therefore, the initial implementation of the HMM algorithms just followed the mathematical formulas. Although
this initial HMM implementation basically worked, even in some practical tests with real
gesture data, problems quickly arose, especially with longer data sequences and multiple
training sequences.
The probability values involved in the computation lie all in the interval [0 ... 1]. Problems occur especially in longer computational procedures, when extremely small values
lying close to zero, need to be represented. Such vanishingly small probability values can
appear relatively quickly in computations for Markov models. For example, the simplied
31
case of a Markov chain model shall be considered. The probability that a sample sequence
O is generated by a given model can be computed as:
T
P (O|) =
aSt1 ,St
St = Ot
t=1
The above formula represents the multiplication of all state transition probabilities
involved in the generation of observation O. When the length of the sequence T becomes
increasingly larger, very small numerical values, which can hardly be represented in today's digital computers occur. The smallest number (closest to zero), that can be stored
in a double precision oating point variable is 2.2 10308 , if the value becomes even
smaller, a so-called under ow occurs, and the variable's content is set to zero.
However, when working with full blown hidden Markov models and real world data,
such small values occur surprisingly often and they are of signicant relevance for the
calculation of the formulas. Taking them as exactly zero leads to fatal errors. For example,
the calculation of the forward variables i , is basically a summation over a large number
of very small values. If these values are taken as equal to zero, the result of the summation
would still be zero. In contrast, when the actual very small values were considered, they
summed up to a considerable value, which would usually again be within the machine's
precision range.
To overcome this problem, the rst attempt was simply to change the oating point
format from double precision to quadruple precision, a 16-byte (128-bit) oating point
number. Indeed, this change solved the problem for the considered sample data that had
suered before from the under ow problem. One method for handling extremely small
probability values consists in scaling these appropriately. The idea behind this solution
is to multiply the probability values with a sucient large scaling factor, then to perform
the critical calculations and nally to restore the original scale. Although this method
should theoretically work ne, the actual implementation provided diculties. The scaling
factor needs to estimated dynamically for each observation at time t individually and then
cancelled out again for computations on the same time t. Also, the method can only be
applied locally, which makes it cumbersome with longer computations.
For the reasons mentioned above, another solution has been looked into and has been
nally implemented. A method called logarithmic probability representation is applied.
This method does not use real probability values in any computations. Instead, all probability values are represented on a negative logarithmic scale. To convert a probability
value p from the normal linear scale to the negative logarithmic scale, the following transformation is applied:
p = loge p
By this transformation, the original range of values [0... 1] for probabilities is mapped
32
to the entire non-negative oating point number that can be represented with contemporary oating point formats. Although the resolution of this negative-logarithmic representation is conned by the same oating-point range, the accuracy is sucient for the
practical application of the method. The negative logarithmic scaled probability value p'
can be converted back to into the linear domain:
p = ep
Being in the logarithmic domain, the operations for the HMM formulas need to be
converted as well. All multiplications in the formulas can now be written as simple additions, divisions become subtractions. However, doing additions when using the negativelogarithmic representation, becomes more complex. The values need to be converted rst
to the linear domain, then added, and nally transformed to the logarithmic domain
again. The following formula illustrates the logarithmic addition of value p1 and p2 :
p1 + (log)p1 = ln(ep1 + ep2 )
This is quite a costly operation, especially for mobile phone processors. By using
Kingsbury-Rayner formula
p1 + (log)p1 = p1 ln(1 + e(p2 p1 ) )
There is still one exponentiation and one logarithm, which needs to be computed.
Now, the problem of representing and manipulating numerically vanishing quantities
with digital computers can be solved by using the negative logarithmic scale. However,
there is one small problem left. What if `real' zeros appear, e.g. to denote non-existing
connections of a left-to-right HMM. The negative natural logarithm of zero is positive
innity, which is again not dened in the numerical range of a digital computer.
The keyword for the solution to this problem is ooring. Flooring means dening a
minimal value pmin and making sure that the probability values, such as state transition
or emission probabilities, never fall below this minimal value.
4.6 Training of an HMM

Prior to the training of a hidden Markov model, a fresh HMM needs to be instantiated.
According to the literature,the various HMM parameters, such as the state transition matrix A and the emission probability matrix B, should be initialised as neutral as possible.
The recommended method for choosing neutral initial values for A and B, is to use even
probability distributions, which obey the stochastic constraints of the respective model
conguration. A random initialisation in contrast, could bias the model in a negative
33
way and prolong the training phase. Following the recommendations, a freshly initialised
state transition probability matrix of a 8-state HMM with step size 2 looks like this:
A=
0.3333
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.3333
0.3333
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.3333
0.3333
0.3333
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.3333
0.3333
0.3333
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.3333
0.3333
0.3333
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.3333
0.3333
0.3333
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.3333
0.3333
0.5000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.3333
0.5000
1.0000
The emission probability matrix of an HMM with N = 8 states and observation alphabet size of K = 20 symbols is initialised as:
B = {bjk }
bjk =
1
1
=
= 0.05
K
20
1 j N, 1 k K
The start probabilities for a left-to-right HMM are given as 0 = 1 and remain x
during the training phase.
When a new HMM for a gesture has been instantiated, it is trained with the BaumWelch algorithm, which optimises the model for the given training sequences. The BaumWelch training algorithm, and also its extended version for multiple training sequences,
generally converges quite well towards a maximum likelihood for P (O|). According to
the theory, the training algorithm produces an improved HMM (), after each training
iteration:
P (O| ) P (O|)
It has been observed that the relative increase of the likelihood can be very small
during the rst couple of training iterations. Therefore, a lower bound value for the
likelihood-increase can not easily be used as a termination condition for the algorithm.
Otherwise, there would be a risk that the training stops, before the major increase could
have taken eect. Based on the obtained data samples and their training convergence, it
has been decided to simply stop after a xed number of training iterations.
The likelihood P (Oj , ) of the new model should always be greater, or at least stay
the same, compared with the previous model . However, the practical test with the
current training data shows that the likelihood sometimes slightly decreases.
34
Chapter 5
Design
5.1 Block Diagram
5.1.1 Accelerometer
We want to convert the hand movement to voltage levels so that it can be processed using
a microcontroller. For that we use an accelerometer which acts as a tranducer which
converts the mechanical motion into corresponding voltage levels . Here the accelerometer
used is MMA7361
5.1.2 Microcontroller
The analog signal from the accelerometer should be quantized and should be converted
to digital signals. The data so obtained should be sampled at various instants and we
should pass that data to the computer for processing. For this an arduino uno board is
used. It contain an atmega 328 ic. The inbuilt ADC present in it is used for analog to
digital conversion. The sampled data obtained is passed on to the computer with the help
of serial communication
5.1.3 Computer Interface

The data sent by the microcontroller should be received by the computer and it should
pass the values to the software used for processing that data
5.1.4 Gesture Recognition Algorithm

The sampled set of values from the accelerometer which varies according to the gesture
of the user is processed by a gesture recognition algorithm. This gesture recognition
algorithm identies the correct gesture. We implemented this part using matlab software
using discrete hidden markov models.
35
Figure 5.1: Block Diagram
5.1.5 Display
The correct gesture identied using the gesture recognition algorithm is displayed on the
computer screen.
5.2 Accelerometer Selection

The accelerometer used is MMA7361. The MMA7361L is a low power, low prole capacitive micro machined accelerometer featuring signal conditioning, a 1-pole low pass lter,
temperature compensation, self test, 0g-Detect which detects linear freefall, and g-Select
which allows for the selection between 2 sensitivities. Zero-g oset and sensitivity are
factory set and require no external devices. The MMA7361L includes a Sleep Mode that
makes it ideal for handheld battery powered electronics.
5.2.1 Features
3mm 5mm 1.0mm LGA-14 Package
Low Current Consumption: 400 A
Sleep Mode: 3 A
Low Voltage Operation: 2.2 V - 3.6 V
High Sensitivity (800 mV/g @ 1.5g)
Selectable Sensitivity (1.5g, 6g)
Fast Turn on Time (0.5 ms Enable Response Time)
36
Figure 5.2: MMA7361

Self Test for Freefall Detect Diagnosis
0g-Detect for Freefall Protection
Signal Conditioning with Low Pass Filter
Robust Design, High Shocks Survivability
Environmentally Preferred Product
Low Cost
5.2.2 Principle of Operation

The device consists of a surface micro machined capacitive sensing cell (g-cell) and a
signal conditioning ASIC contained in a single package. The sensing element is sealed
hermetically at the wafer level using a bulk micro machined cap wafer. The g-cell is a
mechanical structure formed from semiconductor materials (polysilicon) using semiconductor processes (masking and etching). It can be modeled as a set of beams attached
to a movable central mass that move between xed beams. The movable beams can be
deected from their rest position by subjecting the system to acceleration.
As the beams attached to the central mass move, the distance from them to the xed
beams on one side will increase by the same amount that the distance to the xed beams
on the other side decreases. The change in distance is a measure of acceleration.
The g-cell beams form two back-to-back capacitors As the center beam moves with
acceleration, the distance between the beams changes and each capacitor's value will
change, (C = Ae ). Where A is the area of the beam, e is the dielectric constant, and D is
D
the distance between the beams.
The ASIC uses switched capacitor techniques to measure the g-cell capacitors and
extract the acceleration data from the dierence between the two capacitors. The ASIC
37
Figure 5.3: Simplied transducer physical model of a capacitive accelerometer

also signal conditions and lters (switched capacitor) the signal, providing a high level
output voltage that is ratio metric and proportional to acceleration.
5.3 Interfacing Accelerometer With Microcontroller

The AREF voltage, which act as the reference voltage for ADC pins in the microcontroller
was set as 3V. The accelerometer was interfaced to the microcontroller. The analog voltage
from the accelerometer was converted to digital form in the range of 0 to 1023. This digital
voltage was divided into 4 sections of 256 each. The X pin of accelerometer was connected
to PA0. When the x-axis voltage changes according to the tilt of the accelerometer, the
corresponding LED connected to PORTD was lit, ie, if the output voltage of x pin is in
the range 0 to 0.75 the LED connected to PD0 was lit. Similarly for the voltages in each
range corresponding output pin was made high(PD0,PD1,PD2,PD3). In a similar way
the voltages in y and z directions were checked separately. Then all the three axis were
connected and programmed and the output was veried.
The following table shows the corresponding voltages and the pins where the LEDs
were glown.
Accelerometer voltage ADC value

0-0.74
0-255
0.75-1.49
256-511
1.50-2.24
512-767
2.25-3
768-1023
38
Pin
PD0
PD1
PD2
PD3
Figure 5.4: LED working on Voltage Source
Figure 5.5: LED working on Accelerometer
39
Figure 5.6: Static Acceleration
Figure 5.7: Dynamic Acceleration
40
Figure 5.8: Arduino UNO R3
5.4 Interfacing With The Computer

Various values were obtained from the 3 axis of the accelerometer. This received data
needs to be serially transmitted to the pc. We tried to use a female DB9 connector to send
the data from the microcontroller to the pc. A MAX232 IC was used for level conversion.
But the data from the accelerometer was not received at the pc successfully. We also tried
to communicate with RS232 which was unsuccesful. Thus we opted for arduino.
5.4.1 Serial Communication Using Arduino

The Arduino Uno is a microcontroller board based on the ATmega328. It has 14 digital
input/output pins (of which 6 can be used as PWM outputs), 6 analog inputs, a 16
MHz ceramic resonator, a USB connection, a power jack, an ICSP header, and a reset
button. It contains everything needed to support the microcontroller; simply connect it
to a computer with a USB cable or power it with a AC-to-DC adapter or battery to get
started. It features the Atmega16U2 (Atmega8U2 up to version R2) programmed as a
USB-to-serial converter.
Revision 2 of the Uno board has a resistor pulling the 8U2 HWB line to ground,
making it easier to put into DFU mode.
Revision 3 of the board has the following new features:
1. 0 pinout: Added SDA and SCL pins that are near to the AREF pin and two other
new pins placed near to the RESET pin, the IOREF that allow the shields to adapt
to the voltage provided from the board. In future, shields will be compatible both
with the board that use the AVR, which operate with 5V and with the Arduino Due
that operate with 3.3V. The second one is a not connected pin, that is reserved for
future purposes.
2. Stronger RESET circuit.
3. Atmega 16U2 replace the 8U2.
41
Features of arduino uno:

1. An easy USB interface . The chip on the board plugs straight into your USB port
and registers on your computer as a virtual serial port. This allows you to interface
with it as through it were a serial device. The benet of this setup is that serial
communication is an extremely easy (and time-tested) protocol, and USB makes
connecting it to modern computers really convenient.
2. Very convenient power management and built-in voltage regulation. You can connect an external power source of up to 12v and it will regulate it to both 5v and
3.3v. It also can be powered directly o of a USB port without any external power.
3. A 16mhz clock. This makes it not the speediest microcontroller around, but fast
enough for most applications.
4. 32 KB of ash memory for storing your code.
5. An on-board LED attached to digital pin 13 for fast an easy debugging of code.
6. A button to reset the program on the chip.
5.4.2 Reading the data

The data from the accelerometer was read using arduino uno board. A microcontroller,
Atmega328 has been programmed for serial communication, thereby making it easy to
use and is reliable. It is also easy to program. Hence we opted arduino.
Following is the program for serial communication in arduino.
42
5.4.2.1
Program
/* ADXL3xx
Reads an Analog Devices ADXL3xx accelerometer and communicates the acceleration
to the computer.
The circuit:
analog 0: vcc
analog 1: ground
analog 2: z-axis
analog 3: y-axis
analog 4: z-axis
*/
// these constants describe the pins. They won't change:
const int groundpin = A1; // analog input pin 1 ground
const int powerpin = A0; // analog input pin 0 voltage
const int xpin = A4; // x-axis of the accelerometer
const int ypin = A3; // y-axis const
int zpin = A2; // z-axis (only on 3-axis models)
void setup()
{
// initialize the serial communications:
Serial.begin(9600);
// Provide ground and power by using the analog inputs as normal
// digital pins. This makes it possible to directly connect the
// breakout board to the Arduino. If you use the normal 5V and
// GND pins on the Arduino, you can remove these lines.
pinMode(groundpin, OUTPUT);
pinMode(powerpin, OUTPUT);
digitalWrite(groundpin, LOW);
digitalWrite(powerpin, HIGH);
analogReference(EXTERNAL);
}
43
void loop()
{
// print the sensor values:
Serial.print(analogRead(xpin));
// print a tab between values:
Serial.print("");
Serial.print(analogRead(ypin));
// print a tab between values:
Serial.print("");
Serial.print(analogRead(zpin));
Serial.println();
// delay before next reading:
delay(100);
}
44
45
The serially received data lies in the range from 0 to 1024. The corresponding values
from the x,y and z axis are obtained continously in real time. That data was stored using
software called coolterm.
5.4.3 Serial Communication With Matlab

Using matlab the data was read into a port. The values were in ASCII or char array. It
was converted into integer value. A number of samples were taken for each gesture inorder
to train the HMM. After training the HMM we programmed HMM recogniser. With the
help of these programs the given gesture was classied onto one of the predened gestures.
46
47
Chapter 6
Results
The experiment was done by moving the accelerometer through various paths. These
data were read in from the accelerometer by the microcontroller and it was transmitted
serially using the arduino to the pc. The simulations results keeping the accelerometer at
various positions were obtained. The graphs showing the accelerometer output values are
as shown below.
The gestures implemented were traingle, z, bowling and circle. Samples of these
gestures were taken and the model was trained using these samples. The real time gesture
was mapped onto the nearest and accurate predened sample.the following are the results
obtained.
Figure 6.1: Accelerometer Kept Stationary

48
Figure 6.2: Acceleration at a slower rate
Figure 6.3: Acceleration at a faster rate
Figure 6.4: Z-Gesture

49
Figure 6.5: Circle-Gesture
Figure 6.6: Bowling-Gesture
Figure 6.7: Triangle-Gesture

50
Chapter 7
Issues faced
We tried to read the data from the accelerometer using a DB9 connector and MAX232
IC connected to the PC. But that was unsuccessful. Hence we decided to serially communicate with the PC using ARDUINO UNO R3 board.
The calculation of probability values for hidden Markov models with real world computing systems, seems to be a trivial problem. The probability values involved in the
computation lie all in the interval [0, 1]. Problems occured especially in longer computational procedures, when extremely small values lying close to zero, needs to be represented.
Taking the product of this probability made it negligible. Hence the probabilities were
represented in negative logarithmic scale.
51
References
[1] Marco Klingmann," Accelerometer-Based Gesture Recognition with the iPhone",
Goldsmiths University of London, London, 2009
[2] Jiayang Liu, Zhen Wang, and Lin Zhong, "uWave: Accelerometer-based Personalized
Gesture Recognition", Technical Report TR0630-08, Rice University and Motorola
Labs, June 2008
[3] A Revealing Introduction to Hidden Markov Models Mark Stamp Associate Professor
Department of Computer Science San Jose State University September 28, 2012
[4] Ahmad Akl,"A Novel Accelerometer-based Gesture Recognition System"
[5] www.arduino.cc
[6] Hand Gesture Recognition: A Literature Review by Raqul Zaman Khan and 2Noor
Adnan Ibraheem
[7] Goldsmiths University of London Accelerometer-Based Gesture Recognition with the
iPhone by Marco Klingmann
[8] Hand Gesture Recognition System by Emrah Gingir Using Distributed Accelerometers for Gesture Recognition and Visualization Pedro Emanuel dos Santos Vale Sousa
Trindade
52
Appendix
53
Datasheets
54

Gesture Recognition Using Computer Vision

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Gesture Recognition Using Computer Vision

Uploaded by

Copyright:

Available Formats

Chapter 2

Gestures and Gesture Recognition

2.3 Literature Survey

Figure 2.1: Gesture recognized using image processing

Figure 2.2: Gesture Recognized Using Finger Tip Detection

Figure 2.3: Twelve Gestures In A Data Set

Figure 2.4: Various Gestures Along 3 axes

acceleration. The accelerometer senses the capacitance change between a static

electrical resistance of a material when mechanical stress is applied

the magnetic eld around the accelerometer.

eter except that instead of measuring voltage, the magnetoresistive accelerometer

celeration. A single heat source is centered in a substrate and suspended across a

of change in output signal per change in input measured. Sensitivity references

3.2 Pattern Recognition Methods

3.3 Recognition Methods

3.5 Vector Quantisation

3.6 HMM Trainer

3.7 HMM Recogniser

3.8 Bayes Classication

3.9 Vector Quantisation

(y1 x1 )2 + (y2 x2 )2 + (y3 x3 )2

when using Euclidean distance:

3.11 Lloyd's Algorithm

than the specied lower bound min .

sample set of input vectors = (x1 , x2 , ...., xT )

also compute the average quantisation error

these constitute the new codebook Y m+1 = {yim+1 |1 i N }

if the relative decrease was large enough:

3.12 Hidden Markov Models

3.12.1 Markov Chains

The set of state transition probabilities is denoted as the matrix A:

Figure 3.1: A markov chain with 3 states

a11 a12 a13

0.4 0.3 0.3

0.2 0.6 0.2

0.1 0.1 0.8

3.13 Hidden Markov Model

three states corresponds directly to an observable meaningful event, namely to a rainy,

Three Basic Problems for HMMs

"Given the observation sequence O = O1 , O2 , ..OT , and a model = (A, B, ), how

"Given the observation sequence O = O1 , O2 , ...OT , and a model , how do we choose

"How do we adjust the model parameters = (A, B, ) to maximize P (O|)?"

3.15 Forward Algorithm

The algorithm is initialised by obtaining the value for at time t = 1.

for all times t, t = 1, ...T 1:

[t (i)aij ]bj (Ot+1 )

Finally, all N probabilities at time T are summed up.

3.16 Backward Algorithm

the Baum-Welch training algorithm.

for all times t, t = T 1, ...1:

ij bj (Ot+1 t+1 (j)

3.17 Baum-Welch Algorithm

t (i)aij bj (Ot+1 )t+1 (j)

expected number of transitions f orm state i to state j

expected number of times in state j and observing symbol ok

t:Ot =ok t (j)

Choice of suitable initial model = (A, B, ) with initial estimates for

Compute updated estimates = (A , B , ) according to the re-estimation formulas

3.18 Classication with Bayes

4.2 Vector Quantiser

Figure 4.1: Test Gestures

4.3 Hidden Markov Model

4.4 Training with Multiple Training Sequences

the magnetic eld around the accelerometer.

3.8 Bayes Classication

than the specied lower bound min .

3.18 Classication with Bayes

The modied equations to calculate the improved model parameters are:

Figure 5.3: Simplied transducer physical model of a capacitive accelerometer