You are on page 1of 10

Available online at www.sciencedirect.

com
Available online at www.sciencedirect.com
ScienceDirect
ScienceDirect
Procedia Computer Science 00 (2018) 000–000
Available online at www.sciencedirect.com
Procedia Computer Science 00 (2018) 000–000 www.elsevier.com/locate/procedia
www.elsevier.com/locate/procedia
ScienceDirect
Procedia Computer Science 126 (2018) 254–263

22nd International Conference on Knowledge-Based and Intelligent Information &


22nd International ConferenceEngineering
on Knowledge-Based
Systems and Intelligent Information &
Engineering Systems
Hand gesture recognition method based on HOG-LBP features for
Hand gesture recognition method based on HOG-LBP features for
mobile devices
mobile devices
Houssem Lahiania,b,c, *,Mahmoud Nejib,c
Houssem Lahiani *,Mahmoud Nejib,c
a,b,c,
a
National School of Electronics and Telecommunications (Enet'Com), University of Sfax,Tunisia
a FacultyofofElectronics
NationalbSchool Economicsand
and Telecommunications
Management (FSEGS), UniversityUniversity
(Enet'Com), of Sfax,Tunisia
of Sfax,Tunisia
c
Multimedia
b Information
Faculty Systems
of Economics and
and Advanced Computing
Management Laboratoryof(Miracl),
(FSEGS), University Sfax,Tunisia
Sfax,Tunisia
c
Multimedia Information Systems and Advanced Computing Laboratory (Miracl), Sfax,Tunisia

Abstract
Abstract
Hand Gesture recognition becomes a challenging task in computer vision field especially after the appearance of mobile devices.
Although
Hand gesture
Gesture recognition
recognition algorithms
becomes have beentask
a challenging widely applied to
in computer the human-mobile
vision field especiallyinteraction systems, it of
after the appearance is difficult to meet
mobile devices.
Although gesture
the real time recognition and
requirements algorithms have beeninwidely
the robustness applied
different to the human-mobile
lightening conditions and interaction systems,
backgrounds. it ispaper
This difficult to meeta
provides
the real time
description of arequirements and the robustness
practical investigation into ways ofin improving
different lightening
hand gestureconditions
recognitionand algorithms
backgrounds. This paper provides
for computationally limiteda
handheld
descriptionAndroid devices.
of a practical We thoroughly
investigation researched
into ways possiblehand
of improving approaches
gesture and find outalgorithms
recognition how they affect the mobile device
for computationally in
limited
handheld Android devices.
terms of execution time andWe thoroughly
energy researched
consumption. We havepossible approaches
identified and find
a combined out how methodology
HOG-LBP they affect the mobile
whose device in
performance
terms
improvedof execution
upon the time and energy
detection rate of consumption.
other systems.We havepaper,
In this identified a combined
we present a staticHOG-LBP
hand gesturemethodology
recognitionwhose
systemperformance
for mobile
devices byupon
improved combining the histogram
the detection of oriented
rate of other gradients
systems. In this (HOG)
paper, we andpresent
local binary
a staticpattern (LBP) features,
hand gesture which
recognition can for
system accurately
mobile
devices by combining
detect hand poses. With thethe
histogram of oriented HOG
help of combining gradients
and (HOG) and local
LBP features, webinary
achievedpattern (LBP) features,
a detection which can accurately
rate of approximately 92% on
detect hand poses.
the enhanced With the
NUS dataset I. help
This of combiningperforms
combination HOG anda LBP betterfeatures, we the
result than achieved
resultsa obtained
detectionwhen
rate of approximately
using only LBP or 92%HOGon
features
the in term
enhanced NUS of dataset
recognition rate.
I. This However, in
combination term ofaexecution
performs andthan
better result detection time,obtained
the results LBP andwhenHOGusing
perform
onlybetter
LBP orthan the
HOG
features
combinedinHOG-LBP.
term of recognition rate. However, in term of execution and detection time, LBP and HOG perform better than the
combined HOG-LBP.
© 2018 The Authors. Published by Elsevier Ltd.
© 2018
2018 The Authors. Published by by Elsevier Ltd.
©
This The
is an Authors.
open accessPublished
article under Elsevier Ltd.
the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
This is
Selectionan open
and access article
peer-review under
under the CC BY-NC-ND
responsibility of KES license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
International.
Selection and peer-review under responsibility of KES International.
Selection and peer-review under responsibility of KES International.
Keywords: Hand posture recognition; android; LBP; AdaBoost; HOG; Human-machine interaction
Keywords: Hand posture recognition; android; LBP; AdaBoost; HOG; Human-machine interaction

*Corresponding author. Tel.: 0021622877445.


E-mail address:
*Corresponding lahianihoussem@gmail.com
author. Tel.: 0021622877445.
E-mail address: lahianihoussem@gmail.com
1877-0509 © 2018 The Authors. Published by Elsevier Ltd.
This is an open
1877-0509 access
© 2018 Thearticle under
Authors. the CC BY-NC-ND
Published license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
by Elsevier Ltd.
Selection
This is an and
openpeer-review under
access article responsibility
under of KES International.
the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Selection and peer-review under responsibility of KES International.

1877-0509 © 2018 The Authors. Published by Elsevier Ltd.


This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Selection and peer-review under responsibility of KES International.
10.1016/j.procs.2018.07.259
Houssem Lahiani et al. / Procedia Computer Science 126 (2018) 254–263 255
2 Author name / Procedia Computer Science 00 (2018) 000–000

1. Introduction

Mobile devices such as smartphones face many constraints in terms of control due to their small sizes. On the one
hand, they offer rich features that make them closer to personal computers. On the other hand, their limited sizes and
the need to maximize the size of display screen make the use of physical interaction devices (keyboard, mouse, etc.)
uncomfortable and inefficient. To overcome this dilemma, today's devices rely mostly on interaction based on a
touch screen that allows either the selection of virtual buttons or the recognition of gestures made by 2D movements
of the finger. Nevertheless, this mode of use based on touch screens has limitations. These limitations have prompted
researchers to find alternative ways of interacting with mobile devices. In this context, we aim to develop a vision-
based hand gesture recognition system designed for mobile devices that has to interact with smartphones without the
need to any physical interaction. Gestures constitute a powerful method in Human-Computer interaction. The main
issues of hand gesture recognition are hand detection and tracking. They constitute a difficult task because the hand
is a non-rigid object. In addition, this task would be more complex when using a mobile device due to its
computational limits. We propose a vision-based system designed for Android handheld devices and which use only
the camera of the smartphone as a source of video acquisition. According to statistics made by the independent web
analytics company StatsCounter in march 2017, Android is the most popular operating system in the world [1]. The
Android platform was chosen for many reasons. On the one hand, the majority of mobile phones currently in use are
equipped with Android OS [2]. On the other hand, because Android platform allows integrating libraries for image
processing. For this system, we integrate Open CV library (Open Computer Vision library) [3], a free graphics
library for image processing. Moreover, according to C. Cobârzan in [4], which made a comparative study for image
processing between Android and iOS, better performance was achieved with Android platform.
The mainly challenge of hand gesture recognition for mobile devices is that mobile devices are computationally
limited and vision based tasks are computationally expensive in some cases. Several approaches for hand pose
estimation for mobile devices have been proposed. Among those approaches, we find approaches based on using
gloves [5][6], which are cumbersome for users. Moreover, we find vision-based approaches [10] which are
essentially based on skin color [7][8][9]. These methods are suitable for mobile devices due to the color based
segmentation (based on threshold), which is suitable for real time applications since thresholding is computationally
inexpensive and fast [11]. However, they are still non-robust against luminance changes.
We find also shape-based approaches, which achieve good results for rigid objects, but not for articulated objects
like hands. In addition, most of them are computationally expensive which makes them unsuitable for mobile
devices. For those reasons, we chose to develop a system based on a cascade architecture using AdaBoost algorithm
to compensate for the inadequacies of approaches cited above. Cascade based approaches are mainly based on
feature extraction and classification. In feature extraction, positive and negative images are densely scanned and
dominant features are extracted from the images. These features are used to train a classifier.

2. Related works

We have studied many works made by different researchers in hand gesture recognition field using mobile
devices. Smartphones have motivated researchers to use them in hand gesture recognition field due to their
simplicity and availability. Among approaches applied in this field, we cite the vision-based approach that uses only
the camera of the smartphone. However, this approach encounters challenges such as variations in lightening
condition and difference in skin color. This may produce a lower accuracy compared to sensor-based approaches.
Indeed, we propose an efficient and robust cascade-based approach against illumination changes since it is based on
LBP and HOG features, which are characterized by an invariance to illumination changes and a low computational
complexity [25], which makes them good for computationally limited handheld devices.
In [7] and [8] H. Lahiani et al. proposed a real time hand gesture recognition system based on skin color
segmentation, Convex Hull and SVM classifier. The smartphone does all steps and recognition rate of this system
was about 93%.
In [9] H. Lahiani et al. proposed a Static hand pose estimation system to control the smartphone, at first, the palm
is detected using the Viola-Jones algorithm and after that, features are extracted using convexity defects. The SVM
classifier was used to recognize postures. The smartphone does all steps and recognition rate was about 91%.
256 Houssem Lahiani et al. / Procedia Computer Science 126 (2018) 254–263
Author name / Procedia Computer Science 00 (2018) 000–000 3

Systems developed in [7][8] and [9] are based on skin color segmentation which is suitable for real time
application and for mobile devices since color is computationally inexpensive. However, skin color segmentation
makes those systems vulnerable to luminance changes.
In [12] P. Hays et al. Made a real time system to recognize American Sign Language by a consumer-level mobile
device. To detect the skin, they used YCrCb color space and Canny edge detector. The classification algorithm is
based on Locality Preserving Projections (LPP) as manifold learning along with a multi-class Support Vector
Machine (SVM). Among the limitations of this system, we cite the use of Canny edge algorithm which is
computationally more expensive compared to other approaches according to M. Raman et al. [26].
In [13] C. Jin et al. makes an application to translate American Sign Language. The Canny edge detection method
was used to detect the hand and SURF algorithm to extract features. They used Support Vector Machine for
classification. Obtained results show that the system can recognize 16 different American Sign Language gestures
with an accuracy of 97.13%. Here, the same limitation of using a computationally expensive edge detector as the
previous system in [12] exists.
In [14] R. Kamat et al. made an Application for android devices for hearing impaired people. It acts as a boon for
the deaf and dump people by eliminating the requisite of a human interpreter. The application uses the camera of the
smartphone to capture hand poses and convert the image file to the corresponding message using RGB color space
to detect skin and template matching for classification. The Fast Fourier Transform (FFT) was performed and the
convexity defects was deployed to know the number of fingers. This operation is quite computationally expensive
and it may not be suitable for devices with low computational capacities like mobile devices.
In [15] Setiawardhana et al. developed an Android-based application that interprets a sign and converts it into
written language. The hand detection is based on Haar Filters and the classification is done with the K-NN classifier.
Obtained results was 100% if distance < 50 cm and 25% if distance = 75cm. Limitations of this system are summed
up in the fact of providing good results only if the hand is close enough to the camera.
In [16] Prasuhn et al. make a static hand pose estimation system for the American Sign Language using android
device. They make a system based on Histogram of Oriented Gradients features (HOG) due to its invariance against
rotation. The sensitivity of HOG is treated by using a database of Histogram of Oriented Gradients descriptors
corresponding to generated hand images. The weakness of this system is summed up in its client-server architecture
that affects the response time. The mobile application representing the client is connected to a pc (server) via
internet in which the recognition process is done. The network speed represents an issue for this system.
In our work, we aim to develop a hand gesture recognition system that remedies the inadequacies of the existing
systems by seeking a reconciliation between the computational capacities of mobile phones and recognition
algorithms and features. For that purpose we used, as we said before, a cascade approach to minimize the total
computational cost. Our method is based on combining LBP and HOG features. The combination of those two
features was used by several researchers for human and pedestrian detection in [27][28] to get better accuracy. In
addition, Y.Ding et al. used this combination in [37] to recognize static hand gestures. The combined features have
demonstrated robustness against nonlinear illumination and image blurring. The recognition rate was about 97.83%.
For us, we used this combination not only to recognize hand gestures but also to know how it affects
computationally limited handheld mobile devices in terms of recognition, execution time and energy consumption.
We compared also the real-time recognition rate, execution and detection time between the combined features(HOG-
LBP), HOG and LBP features to show their different impacts on the mobile device.

3. Background

3.1. Local Binary Patterns : LBP

Local binary patterns were initially proposed by Ojala in 1996 [17] to characterize textures present in gray-scale
images. They consist in attributing to each pixel P of the image a value characterizing the local pattern around this
pixel. These values are calculated by comparing the gray level of the central pixel P with the values of the gray
levels of the neighboring pixels. The concept of LBP is simple; it proposes to assign a binary code to a pixel
according to its neighborhood. This code describing the local texture of a region is calculated by thresholding a
Houssem Lahiani et al. / Procedia Computer Science 126 (2018) 254–263 257
4 Author name / Procedia Computer Science 00 (2018) 000–000

neighborhood with the gray level of the central pixel. The gray scale level of the central pixel (ic) of coordinates (xc,
yc) is compared to the gray scale level of its neighbors (i n) following this formulae:

(1)

With p the number of neighboring pixels. In order to generate a binary pattern, all neighbors will then take a "1"
value if their value is greater than or equal to the current pixel and "0" otherwise (Fig.1). Pixels of this binary pattern
are then multiplied by weights and summed in order to obtain an LBP code of the current pixel. Thus, for the whole
image, pixels whose intensity lies between 0 and 255 as in an ordinary 8-bit image are obtained. Rather than
describing the image by the sequence of the LBP patterns, a 255-size histogram can be chosen as the texture
descriptor.

Fig 1.Calculating a local binary pattern

3.2. Histogram of oriented gradients : HOG feature

An Histogram of oriented gradients (HOG) is a feature used in computer vision for object detection. The
technique calculates local histograms of the orientation of the gradient on a dense grid, which is to say on zones
regularly distributed over the image. It has similarities with SIFT, Shape contexts [18] and histograms of contour
orientation [19], but differs from the use of a dense grid. HOG was particularly effective in detecting people.
The final step in the object detection process is the use of HOG descriptors to train a supervised classifier. This
step is not part of the definition of the HOG descriptor itself and different types of classifiers can be used. Dalal and
Triggs voluntarily choose a simple classifier, a linear kernel SVM, to measure essentially the HOG input. They
specified in particular that it would be interesting to develop a method based on cascade using HOG [20].
The HOG descriptors characterize the hand gestures by the distributions of local intensity gradients. Each cell in
the image has a histogram constructed using the directions and magnitudes of pixel gradients in the cell. We
associate each feature point to a 3 x 3 cells descriptor block. Each cell has a pixel size of 5x5. Magnitude of the
gradient “g” and the orientation of the gradient “θ” are computed for all pixels of the block by using respectively
equations 2 and 3.

(2)
(3)

For each cell in the block, a feature vector is computed by quantizing the unsigned orientation into K orientation
bins weighted by the gradient magnitude.
258 Houssem Lahiani et al. / Procedia Computer Science 126 (2018) 254–263
Author name / Procedia Computer Science 00 (2018) 000–000 5

3.3. AdaBoost (Adaptive Boosting)

Boosted cascades have become popular due to their satisfactory detection rate of a large number of objects.
Success is related to the ability and the effectiveness of this approach to recognize objects in real time and under
different lightening conditions. In addition, it can run on low-cost ARM (Advanced RISC Machine) architectures
such as smartphones and embedded devices. One of the most used Boosted cascades algorithms is called AdaBoost,
an abbreviation for adaptive boosting. It relies on the iterative selection of the weak classifier according to a
distribution of the learning examples. Each example is weighted according to its difficulty with the current classifier.
AdaBoost tries many weak classifiers over several cycles, selecting the best low classifier in each cycle and
combining the best classifier to create a strong classifier. AdaBoost has many variants as Logit AdaBoost, gentle
AdaBoost and Real AdaBoost. In our framework, we used gentle AdaBoost. It is a variant of the powerful boosting
learning technique [21]. Different AdaBoost algorithms differ in the weights update scheme. According to Lienhart
et al. the gentle AdaBoost is the most successful learning method tested for face detection applications [22]. Gentle
AdaBoost does not require the computation of log-ratios that can be numerically unstable. Experimental results on
benchmark data shows that the conservative Gentle AdaBoost demonstrates similar performance to Real AdaBoost
and Logit AdaBoost, and in majority of cases outperforms these other two variants [23].

4. Approach

We propose a method based on combining Histograms of Oriented Gradient (HOG) and Local Binary Pattern
(LBP) features, which can recognize hand gestures accurately. We combine texture information (provided by LBP)
and contour information (provided by HOG), and propose a cascade of combined HOG and LBP features. Because
images could contain both non-textured and textured regions, the cues of texture and contour were exploited
simultaneously. Gentle AdaBoost was used for training purposes. The framework of our recognition algorithm is
shown in Fig 2. System efficiency is critical with a smartphone since an efficient approach has to increase
computational speed and reduce power consumption. Indeed, we choose to work with LBP features, which are
integers, unlike other features (like Haar) that use floats, which are killers for mobile and embedded devices. For
that reason, detection speed and training with LBP are faster than with other features (like Haar features) [29].
In addition, according to A. Mammeri et al. [30] who made an evaluation of power consumption of Image
descriptors on Smartphone Platforms, the AdaBoost algorithm is significantly more energy-efficient than other
classification algorithms like SVM. Furthermore, results in [30] shows that HOG-AdaBoost detector has the most
energy-saving features, comparing to LBP and Haar descriptors.

Fig 2.The framework algorithm

4.1. Hand gesture detection using HOG-LBP features

As shown in Figure 2, HOG and LBP features are extracted, and then combined to obtain our training feature set.
Then, the feature set is transmitted to a gentle AdaBoost classifier to obtain a preliminary detector. During the
training process, only weak classifiers that could improve prediction are selected. For each weak classifier,
AdaBoost chooses an acceptance threshold, but a single weak classifier is not able to classify the desired hand
posture with a low error rate. In each iteration, the weights are normalized and the best weak classifier is selected
according to the weighted error value. In the next step, the weight values are updated and it can be decided whether
an example is correctly classified or not. After having traversed all the iterations, a set of weak classifiers
characterized according to a specified error rate is selected and we obtain a high intensity formed classifier.
Houssem Lahiani et al. / Procedia Computer Science 126 (2018) 254–263 259
6 Author name / Procedia Computer Science 00 (2018) 000–000

Classification is done iteratively and the number of learning examples affects the effectiveness of the classification
process. This process is based on the iterative selection of weak classifier based on a distribution of learning
examples. Each example is weighted according to its difficulty with the current classifier. AdaBoost is adaptive in
the sense that subsequent weak classifiers are adjusted in favor of misclassified samples by previous classifiers.

5. Experiments

5.1. Training Data

To train a classifier we need image samples, it means that we need many images that show wanted hand postures
(they are called positive sample) and even more images without the hand gesture (negative sample).
We used The NUS hand posture datasets I [24]. This database contain 240 images containing hand gestures. It
has 10 classes of postures and 24 images per class that have been captured by varying the position and size of the
hand within the image frame. There are two versions of the NUS dataset (NUS hand posture datasets I & NUS hand
posture datasets II). The first version contains 10 classes of gestures. Each class has 24 images representing the
gesture. Due to this small number, we enhanced this first version (NUS dataset I) by adding 51 images to each class.
These added images were taken in different lightening conditions and background and with different persons. The
enhanced dataset (that contains 75 images for each class) was used for training purposes
To recognize those hand gestures, we assign to each one of them a letter or word (Fig. 3).

Fig 3. Grammar to interpret gestures

Training an accurate classifier could take a lot of time and a huge number of samples. In our case, for each hand
gesture we choose the 24 samples representing the hand gestures as positive samples. Due to the limited number of
images representing each gesture, we added 51 images to each class of gestures in the database. The new added
images to the dataset were taken in different lightening conditions and with different persons. We obtained after that
an enhanced NUS database, which has 75 images per class. We cropped images of the database to obtain only the
needed area that represents the hand gesture. The ratios of the cropped images should not differ that much.
For each hand posture, we choose the 75 images representing the hand gestures as positive samples. The best results
come from positive images that look exactly like the ones we would want to detect. Thus, a large set of positive
images was created from the set of positive images having backgrounds from the set of negative images.
For the negative images that do not show hand postures, we used a lot of negative images that look exactly like
the positive ones, except that they don’t contain the hand we want to detect. We used more than 5000 negative
images containing gardens, persons, backgrounds, walls, faces, etc.

5.2. Evaluation

In the first experiment, we compared the classification performance between the LBP detector, the original HOG
detector and the HOG-LBP detector. As long as our systems and models are intended for real-time applications and
for computationally limited devices, we have chosen to do the evaluation through real-time tests to know not only
the recognition rate but also the impact of the three models on the mobile device (execution and detection time, etc).
In order to make a comparison, we tested hand poses made by different persons in different lightning condition and
different backgrounds for the three cases. For each gesture, 50 frames were captured in different backgrounds and
different distances and lightening condition. The recognition result depends on the distance. The Fig 4 presents some
gestures recognized correctly made in different lightening conditions and with different backgrounds using the
260 Houssem Lahiani et al. / Procedia Computer Science 126 (2018) 254–263
Author name / Procedia Computer Science 00 (2018) 000–000 7

HOG-LBP based system. We can see the name that corresponds to each sign on the top of the green box in the
capture. Here, we have used Tegra Android Development Pack (TADP) to develop the android application. We
have used also Python language to extract features and to perform training using the improved Hand gesture NUS
database I. Hardware used is Intel® Core i7® CPU, 8GB RAM, Windows 10 (64 bit), and a Smartphone equipped
with a 8 Megapixel P camera, Android5.1 Os (Lollipop) version.

Fig 4. Different poses in different lightening conditions and backgrounds

For every case (LBP,HOG,HOG-LBP), we tested the system by inviting people to test our applications by
making gestures in different lightning condition and different backgrounds in front of the device camera. For each
system, 50 frames were captured by inviting different persons to test the application in different backgrounds and
lightening condition. Invoked persons were invited to make the same gesture under the same conditions for the three
systems. Results are shown respectively in table 1. An online video shows the detection performance of our system
with different backgrounds and lightening conditions: https://youtu.be/u8oWhGJnVgk

Table 1. Recognition rate of different hand poses with LBP


Distance <= 75 cm
HOG LBP HOG-LBP
Sign Number Recognized Recognition Recognized Recognition Recognized Recognition
of rate rate rate
capture
B 50 46 96% 45 90% 48 96%
STOP 50 45 90% 45 90% 48 96%
E 50 43 86% 43 86% 45 90%
F 50 46 92% 44 88% 47 94%
D 50 45 90% 44 88% 45 90%
U 50 40 80% 45 90% 44 88%
A 50 47 94% 45 90% 47 94%
G 50 45 90% 42 84% 45 90%
C 50 45 90% 44 88% 44 88%
Y 50 43 86% 43 86% 45 90%
Average 89,4 % 88% 91.6%
We can say that recognition of different gestures for the three cases would be optimal if done at a distance less
than or equals to 75 cm. In addition, for the HOG-LBP features, the detection rate is 91.6%, which performs best.
Thus, the combination of HOG and LBP features can compensate for their inadequacy and considerably improve the
detection rate.
For the LBP case, even it has the lower recognition rate; it has the best detection speed by the mobile device,
which approves the hypothesis that says that LBP is more suitable for mobile device since with it, all calculations
are done in integers and not floats, which are killers for embedded devices.
As mentioned in previous sections of the paper, the smartphone is limited on computation and power. To know
how the proposed system affects the mobile device in term of execution time (detection time), we used
TimingLogger [31] to check consumed time for each detection task and its statements (Fig 5). The result of the
Houssem Lahiani et al. / Procedia Computer Science 126 (2018) 254–263 261
8 Author name / Procedia Computer Science 00 (2018) 000–000

TimingLogger is shown in the logcat [32] of the IDE (Integrated Development Environment) when the application is
executed in a smartphone linked to the pc via a USB cable. In fact, the logcat is the view in which the logs of the
application are shown. The LBP based method and HOG system show the best performance in terms of execution
time, an average time of 1 ms was found practically for each gesture. In contrast, HOG-LBP based method shows a
lower performance with an execution time of 2 ms for each gesture. This proves that combination of two descriptors
instead of one affects computational requirements. This test was made in a smartphone equipped with a 1.4 GHz
octo-core processor, 1.5 Go RAM and a 13-megapixel camera and with a resolution of 1280 x 720 pixels.

Fig 5. Execution time of STOP gesture with different methods

To test the object detection performance, we used the concept of pr (precision and recall). They differ slightly
from the most common ROC (receiver operating characteristic) in the statistical domain, which have the
disadvantage of relying on actual negative values, and with slippery Windows applications, this value becomes so
high that true positive, false positive and false negative values, will disappear compared to the true negatives. Pr
curves avoid this measurement and are therefore better for creating an evaluation of model [33].
Precision =TP/(TP+FP)
Recall =TP/(TP+FN)
F1 Score = 2*(Recall * Precision) / (Recall + Precision)
With : TP : true positive which is an annotation that is also found as a good detection, FP: false positive, which
is a detection for which no annotation exists, FN: false negative is an annotation for which no detection exists and
F1 score: weighted average of Precision and Recall.
To calculate the precision, recall and F1 score we have choose a test set formed by 35 images for each class of
gesture. These images have been chosen from version II of the NUS database, which contains gesture images with
human noise and noisy backgrounds. Table 2 gives an overview of the obtained results for the HOG-LBP system.

Table 2: results of the Precision recall and F1 score


Gesture Precision Recall F1 Score
A 0.45679 1 0.627119
B 0.935484 0.828571 0.878788
C 0.421053 0.914286 0.576577

D 0.617762 0.925267 0.740874


E 0.625471 0.954796 0.755818
F 0.754892 0.916582 0.827916
G 0.584253 0.910203 0.711682
U 0.619892 0.996974 0.764462
Y 0.611458 0.997035 0.758033
Stop 0.618532 0.989752 0.761300
The evaluation has to provide some quantitative measures on the efficiency of the system. Statistical metrics such
as precision, recall and F1 score were used to quantify the performance of the system (table 2). To compute
262 Houssem Lahiani et al. / Procedia Computer Science 126 (2018) 254–263
Author name / Procedia Computer Science 00 (2018) 000–000 9

Precision, Recall and F1 score we used the second version of the NUS dataset that contains images that differ from
those of version I (taken by other persons and with different backgrounds) to obtain impartial results. We have also
used real-time tests to evaluate the performance of the system (table 1) in terms of recognition and speed detection
(fig 5). System efficiency is critical with a smartphone since an efficient approach has to increase computational
speed and reduce power consumption.
At the end, the main conclusion is that the combination of features produces the best detection rate. Achieved
results with combination show a better performance than cascade based systems for mobile devices made by H.
Lahiani et al. in [34] where the detection rate was about 88% using LBP and AdaBoost and in [35] where the
detection rate was about 89% using Haar and AdaBoost. In [34], [35] and [38], we have just tested the real-time
recognition rate of models based respectively on LBP and Haar-like features fed to the AdaBoost algorithm. In
contrast, in our current paper we have made a comparison between models based on HOG, LBP and combined
HOG-LBP features in terms of real-time recognition rate, detection and execution time to find out how these models
affect real-time usage of a mobile device. Indeed, obtained result of the HOG-LBP system shows a lower detection
rate comparing to the work done by P. Kumar Pisharady et al. on the second version of the NUS hand gesture
dataset to Recognize of Hand Postures against Complex Backgrounds [36]. This system achieves a recognition rate
of 94.36 %. For this system, hand poses were classified using shape and texture features, with an SVM classifier.
This system was not tested using a real time detection (by mobile device or other type of devices) and the obtained
result is the result of the cross validation and not a real time detection.

6. Conclusion

In this paper, we described a simple and fast method for a hand pose estimation issues for mobile device. The
NUS hand poses database with some improvements was used for training purposes. We have choose to work with
LBP features, HOG descriptor and their combination, which demonstrate a good performance in the recognition step
using a mobile device. The paper tackling a very important issue in human-computer interaction (HCI) domain
specifically for controlling smartphones using hand postures. We have tried to make a system suitable for
computationally limited devices which remedies the inadequacies of systems mentioned in related work section by
seeking a reconciliation between the computational capacities of mobile phones and recognition algorithms and
features. For that purpose, we used a cascade approach to minimize the total computation cost. According to the
literature review (A. Mammeri et al.[30]), AdaBoost Algorithm is significantly more energy-efficient than other
classifiers as we cited in previous sections. In addition, experimental results show that our system performs a good
detection time (execution time) and an acceptable detection rate.
In a future work, to improve the training step, we will try to enhance this system by using the NUS hand poses
database II that contains a big number of images made in different lightening condition and with different
background. In a future work we will try to enhance the experimentation by inviting more and more people to test
our application and to quantify detection performance, we will use Detection Error Tradeoff (DET) curves and miss
rate versus False Positive per Window (FPPW). We will try to use deep learning instead of the AdaBoost algorithm
to make a comparison.

References

[1] Android overtakes Windows for first time. statcounter company web site: http://gs.statcounter.com/press/android-overtakes-windows-for-
first-time. 3rd April 2017.
[2] Manjoo, Farhad (May 27, 2015). "A Murky Road Ahead for Android, Despite Market Dominance". The New York Times. ISSN 0362-
4331. Retrieved May 27, 2015.
[3] OpenCV. In http://opencv.org/platforms/android.html
[4] C. Cobârzan, Marco A. Hudelist, K. Schoeffmann, Manfred J. Primus.( 2015) “Mobile Image Analysis: Android vs. iOS”. 21st
International Conference on MultiMedia Modelling( MMM ). PP.99-110
[5] M. Seymour, M. Tšoeu. (2015) “A Mobile Application for South African Sign Language (SASL) Recognition.” IEEE Africon 2015. PP.
281-285
[6] C. Xie, S. Luan, H. Wang, B. Zhang. (2016) "Gesture Recognition Benchmark Based on Mobile Phone." Chinese Conference on Biometric
Recognition (CCBR). Lecture Notes in Computer Science, vol 9967. pp.432-440.
Houssem Lahiani et al. / Procedia Computer Science 126 (2018) 254–263 263
10 Author name / Procedia Computer Science 00 (2018) 000–000

[7] H. Lahiani, M. Elleuch, and M. Kherallah. (2015) “Real Time Hand Gesture Recognition System for Android Devices,” 15th International
Conference on Intelligent Systems Design and Applications (ISDA). PP.592-597.
[8] H. Lahiani, M. Elleuch, and M. Kherallah. (2016) " Real Time Static Hand Gesture Recognition System for Mobile Devices". Journal of
Information Assurance and Security. ISSN 1554-1010 Volume 11 PP. 067-076
[9] H. Lahiani, M. Kherallah and M.. Neji. (2016) “ Hand pose estimation system based on Viola-Jones algorithm for android devices”. 13th
ACS/IEEE International Conference on Computer Systems and Applications, (AICCSA)
[10] H. Lahiani, M. Kherallah and M.. Neji. (2016) “Vision Based Hand Gesture Recognition for Mobile Devices: A Review.” 16th
International Conference on Hybrid Intelligent Systems (HIS 2016), Advances in Intelligent Systems and Computing. PP. 308-318.
[11] S. Kumar, B.M. Singh and K. Yadav. (2017) “A review on image segmentation techniques.” INTERNATIONAL JOURNAL OF CURRENT
ENGINEERING AND SCIENTIFIC RESEARCH (IJCESR). VOLUME 4, ISSUE 10, 2017. PP. 122–128.
[12] P. Hays, R. Ptucha, and R. Melton. (2013) “Mobile device to cloud co-processing of asl finger spelling to text conversion”. 2013 IEEE
Western New York Image Processing Workshop (WNYIPW). PP. 39–43.
[13] C. Jin, Z. Omar, and M. H. Jaward. (2016) “A mobile application of american sign language translation via image processing algorithms.”
2016 IEEE Region 10 Symposium (TENSYMP). PP. 104–109.
[14] R. Kamat, A. Danoji, A. Dhage, P. Puranik, and S. Sengupta. (2016) “Monvoix-an android application for hearing impaired people”.
Journal of Communications Technology, Electronics and Computer Science (8):24–28.
[15] ] Setiawardhana ,R. Y. Hakkun, A. Baharuddin. (2015) “ Sign language learning based on android for deaf and speech impaired
people”.2015 International Electronics Symposium (IES) .PP. 114–117.
[16] L. Prasuhn, Y. Oyamada, Y. Mochizuki and H. Ishikawa. (2014) "A HOG-Based Hand Gesture Recognition System On A Mobile Device".
IEEE International Conference on Image Processing (ICIP). PP.3973-3977.
[17] T. Ojala ,M. Pietikïnen and D. Harwood. (1996) "A comparative study of texture measures a with classification based on feature
distributions". Pattern Recognition (29): 51–59
[18] S. Belongie, J. Malik, and J. Puzicha. .(2001) “Matching shapes.” International Conference on Computer Vision (ICCV). PP. 454–461.
[19] W. T. Freeman and M. Roth. (1995). “Orientation histograms for hand gesture recognition.” Intl. Workshop on Automatic Face-and
Gesture- Recognition. PP. 296–301.
[20] N. Dalal, B.Triggs.(2005) "Histograms of Oriented Gradients for Human Detection". Conference on Computer Vision and Pattern
Recognition. PP.886-893.
[21] Y. Freund and R. E. Schapire.(1996) “Experiments with a new boosting algorithm.” Proceedings of the 13th International Conference
In Machine Learning. PP.148 – 156.
[22] R. Lienhart and J. Maydt. (2002) "An Extended Set of Haar-like Features for Rapid Object Detection". Proceedings of the IEEE
Conference on Image Processing (ICIP '02). PP. 155 - 162.
[23] A. Ferreira and M. Figueiredo.(2012) "Boosting Algorithms: A Review of Methods, Theory, and Applications." Ensemble Machine
Learning: Methods and Applications. PP. 35–85.
[24] The NUS hand posture datasets I: https://www.ece.nus.edu.sg/stfpage/elepv/NUS-HandSet/
[25] S. Paisitkriangkrai, C. Shen, J. Zhang. (2010) “Face Detection with Effective Feature Extraction.” Asian Conference on Computer Vision
ACCV. PP 460-470.
[26] R. Maini, H. Aggarwal. (2009) “Study and Comparison of Various Image Edge Detection Techniques.” International Journal of Image
Processing. (IJIP), Vol3: Issue 1.
[27] X. Wang , T. X. Han, S. Yan. (2009) “An HOG-LBP human detector with partial occlusion handling.” IEEE 12th International
Conference on Computer Vision. PP.32-39.
[28] Wen-Juan Pei, Yu-Lan Zhang, Yan Zhang, Chun-Hou Zheng. (2014) “Pedestrian Detection Based on HOG and LBP.” International
Conference on Intelligent Computing (ICIC) PP. 715-720.
[29] Curtis S. IkeharaJiecai HeMartha E. Crosby. (2013) "Issues in Implementing Augmented Cognition and Gamification on a Mobile
Platform." International Conference on Augmented Cognition. PP. 685-694
[30] A. Mammeri, A. Boukerche, D. zhou. (2016) “Evaluation of the Power Consumption of Image descriptors on Smartphone Platforms.” 12th
ACM Symposium on QoS and Security for Wireless and Mobile Networks. PP. 119-124.
[31] TimingLogger | Android Developers : https://developer.android.com/reference/android/util/TimingLogger.html
[32] Logcat Command-line Tool : https://developer.android.com/studio/command-line/logcat.html
[33] J. Howse, S. Puttemans, Q. Hua, U. Sinha. "Object detection performance testing " in "OpenCV 3 Blueprints"
[34] H. Lahiani, M. Kherallah and M. Neji. (2017) “Hand gesture recognition system based on Local Binary Pattern approach for mobile
devices” 17th International Conference on Intelligent Systems Design and Applications (ISDA). PP. 180-190.
[35] H. Lahiani, M. Kherallah and M. Neji. (2017) “Hand pose estimation system based on a cascade approach for mobile devices.” 17th
International Conference on Intelligent Systems Design and Applications (ISDA). PP. 619-629.
[36] P. K. Pisharady, P. Vadakkepat, A. P. Loh.(2013) "Attention Based Detection and Recognition of Hand Postures Against Complex
Backgrounds." International Journal of Computer Vision. Volume 101, Issue 3, PP. 403–419.
[37] Y. Ding, H. Pang, X. Wu.(2011) “Static hand-gesture recognition using HOG and improved LBP features.” International Journal of
Digital Content Technology and its Applications. 5(11): 236-243.
[38] H. Lahiani and M. Neji. (2017) “Comparative study beetween hand pose estimation systems for mobile devices.” Journal of Information
Assurance and Security. ISSN 1554-1010. Volume 12, Issue 6, 2017. PP. 218-226.

You might also like