You are on page 1of 10

CAT 400 Project Proposal:

USING HAND AND HEAD GESTURE TO CONTROL OBJECT IN


A VIRTUAL SPACE
[GM151618]
[Akbar Syahputra Lubis], [Ahmad Sufril Azlan Mohamed, Dr.]
[aslubis.ucom12@student.usm.my], [sufril@usm.my]
School of Computer Sciences, Universiti Sains Malaysia
11800 USM, Penang, Malaysia

Abstract
Persons with disabilities have been a major concern in homes, hospitals as well as in many working
environments particularly for those who have disfigured limbs. The current available technology in
controlling and navigating the operating system (OS) only limits the user with palm and fingers to use the
computer and even the mobile devices with touch capacitors only focuses on users without the defective
limbs. Research leads in helping the disables have been carried out extensively and majority focuses on only
using single type of motion sensing technology such as TOBII (eye tracking) and LEAP (leap motion)
which are either costly or limited to a small workable space. Others have also developed using the web
camera which is a cheaper solution, however it has limited viewing areas and comes with very low
resolutions, which will be difficult to handle noises such as illumination, occlusion and many more.
Microsoft has developed a multi sensors motion detection devices consisting infra red (IR) depth sensor,
colour sensor, IR emitter, microphone array and tilt motor to enhance gaming experience by using motion
movements. Furthermore, this device has been opened up for other research possibilities. Therefore, the
motivation of this project is to develop a system using the skeleton tracking algorithm to help a maimed
person to control and navigate the OS as well as interacting with objects within the virtual space (VR)
utilising the Kinect sensors. The development of the system consisting of reading the body gesture of both
head and limb with addition of voice input for specific commands. The language to be used in the
development of this system will be MySQL for storing registered gestures and voice commands, C++ as the
main programming language, OpenCV libraries for the image processing functions, and Microsoft Kinect
SDK for the libraries to control the Kinect sensors. The outcome of this project will benefits the disable
person to easily use a computer and interacting with object within VR space which opens up other
possibilities such as gaming, virtual shopping, and many more.
Keywords: Kinect, skeleton tracking, sensors, motion, user experience.

1. Project Background
Tracking sensors have been used in many applications particularly for gaming and simple
navigations to enhance the user experience over the conventional ways of using technology apparatus.
However, making these technologies available for disable persons are still under researched and there
are no specific methods that are robust and easy to use. For example, some of the sensors are only
limited to small viewing area within the webcam or small sensors such as the leap motion. In addition,
web cameras are low end and very poor at measuring the depth and distance of a person making it hard
to performs specific gesture identification which is not friendly enough for disable and elderly persons
[1].
Recently, Microsoft Research Ltd. has developed a technology known as Kinect for the XBOX 360
and XBOX One gaming console. Based on the popularity of Wii Motion Controller by Nintendo Ltd.
and 6 Degree of Freedom (6DOF) of Sony Move Controller, this technology eliminates the need of
special controller and replace them with multiple sensors allowing specific gestures to applied. To
eliminates the depth, illumination and angle issues, the system incorporated sensors such as the IR
emitter, microphone array, IR depth sensor, colour sensor and motorized tilt function to the system.
Figure 1.1 shows the diagram of a Kinect system [3].

Figure 1.1: The list of sensors found on a Kinect system.


The Kinect system has been opened up for further research application by the use of Kinect SDK
for Windows and a special connector to connect via USB port and these functions can be called as
libraries making it more useful for research purposes [2]. Kinect system can also read and translate
specific gestures, making it completely hands-free for gesture movements. Kinect also has a special
microchip to track movement any objects. Next thing Kinect can do is the voice recognition so that
command using voice can be applied.

2. Problem Statement

Using conventional camera poses several problems, especially related to body gesture. Sometimes, by
using the web camera camera, the algorithm could not read the body gesture completely due to the poor
quality of the input image, the lighting conditions and the background also affected the usability of
tracking gesture. To solve this problem, by using the camera on Kinect with the IR depth sensors,
distance, depth and positions can be detected and gestures of head, mouth, hands, body and legs can be
implemented by using the skeleton joint tracking [3]. Also, the availability of the sensor and cheaper
cost, would be the ideal solution compared to other technologies such as the TOBII system and the
LEAP system.
.

3. Motivation
The motivation of this project is to tap the limitation of conventional sensors such as the web camera
problems (limited viewing areas and single sensor), and applying it to solve the problems facing by the
disable persons to effectively use the computer and interacting with objects created within the gaming
environment or even at the shopping mall info kiosk leads to the proposal of this project. Lastly, the
availability of the system, cheaper cost compared to other system and software supports for the SDK
influenced heavily of choosing this platform to be developed further into a workable system.

4. Proposed Solution
To develop this project, the main tool is the Kinect itself. In here, the Kinect version is the version
one for Windows. Figure 4.1 shows the functions that Kinect can performs and what available in the SDK.
Kinect using 3 cameras and 1 multi array microphone. Those three cameras are RGB camera, infrared
emitter, and infrared depth. Those three cameras have their own function. First is RGB camera, RGB
camera display camera that response 30 frame per second. RGB camera used for object gesture. Second is
infrared emitter that captures any objects by reflection of the laser beam that generated across the field of
depth data. Then third is infrared depth sensor. Infrared sensor used for timing how long it takes that to get
back and doing calculation so you the distance for each pixel and later can get depth information. Infrared
depth sensor use CMOS sensor to capture 3d scene. Infrared depth sensor used for measure depth and also
can be used as night vision.

Figure 4.1: Functions of Kinnect system


Kinect has a lot of functions that can be used for the development of the proposed system mainly
the body gesture and voice command. Figure 4.2 shows the process of the proposed system.

Figure 4.2: The development of the system using Kinect tracking method
Kinect has two major functions to be implemented to the system. Firstly, is the voice recognition
and another is the body gesture by the use of skeleton joint tracking algorithm. With voice recognition user
can interact with the operating system such as opening a folder or browsing using voice command. Next, the
body gesture can be used for user to interact using body gesture. Not only for running within the operating
system, but by using body gesture, user can interact with object in a virtual space. User can zoom in and

zoom out any objects, and also move or rotate the object. A set of databases are to be used to store the
commands relating to the gesture applied.

To run all the function such as voice recognition, skeleton tracking, measuring depth and night
vision, Kinect need several sensors to be utilised. Figure 4.3 shows the type of sensors used by Kinect which
can be utilised for the development of the proposed solution.

.
Figure 4.3: Usage of Kinect sensors.
There are 3 main sensors that Kinect able to perform. Firstly, is the audio sensor, which performs
operation of processing the voice and interpret the voice to run commands. Next, is the depth sensor. which
measures the depth of objects within the enclosure of a room and also by using the infrared sensor to
recognise objects under low-light conditions. For the colour image sensor, captures the objects body and
transform identify gesture movements.
This project is focuses on skeleton tracking algorithm or body tracking algorithm. Kinect skeleton
tracking capture and recording the movement of user body by using OpenNI. With OpenNI body joints can
be tracked at high level skeleton tracking module. OpenNI require user calibration in order to generalize

information about user height and characteristic of the body. Figure 4.4 shows an example of skeleton body
tracking.

Figure 4.4: Skeleton tracking


With the inclusion of infrared sensor, the depth inside the room can be measured and low lighting condition
can be applied for detection. Figure 4.5a-b shows an example of the infrared sensor to detect depth and low
light condition. Notice that this sensor able to detect depth at many levels.

(a)

(b)

Figure 4.5: Infrared sensor with (a) Depth function and (b) low-light condition function
The whole system will be implemented using C++ with Kinect SDK v1.8 utilising the OpenCV and
OpenNI libraries. OpenCV for optimizing the images such as filtering, while OpenNI translates the objects

body into joints tracking so that gesture movements can be identified. As for the database, standard SQL
will be used. Navigating the OS will be based on Windows platform and creation of 3D objects within the
virtual space will be created using OpenGL.

5. Benefits / Impact / Significance of Project


The benefit of this system is an all in one solution to interpret the body gesture and also can read
and translate voice recognition. With Kinect system in mind, the development of this system can benefits for
the disable people, and future possibilities can be applied for blind people or for others with other
impairments. In addition, the availability of the Kinect system at a cheaper cost, making it available for
disable system will be an added advantage for them to interact with the computer and other technologies.

6. Uniqueness of Proposed Solution


The uniqueness of the Kinect system to be incorporated in the proposed system is that it utilizes many
sensors in a one whole package. And with the use of skeleton tracking, tackling the usability for disable
person is new in a sense that this solution is still under research. Furthermore, third party now developing
Kinect for many things instead of as a tool for playing games. An example of this is now there is some
researcher from MIT media lab that develop java script extension for google chrome that called depthJS.
With the use of multiple sensors to tackle the problem found in conventional body tracking, other
possibilities may surface in helping the communities to interact better and efficiently especially for the
disable and elderly people.

7. Expected Outcomes
The expected outcomes of this project is a workable system to help disable people interacting with the OS
and interacting objects in one of the running application setup to be a virtual space. Because the goal of this
project is for disable people, the interaction must be with body gesture and with the help of voice
recognition for specific commands. An example of this is operation is to open up a folder, closing running
applications and browsing by using body gesture. The next goal is to able the user to interact with virtual
space object depicting a game environment or even a shopping kiosk system. They can do basic operation
such as zoom in or zoom out of the object of interest by using only body gesture this include moving and
panning the object. The validation of the system will done by questionnaires to the participants of the
system.

8. Status of Project
Current status of this project is still on going, and the milestones and timeframe of the project are listed out
into a Gantt chart form in appendix A. At the moment testing on the SDK v1.8 has been completed and
development of the system is in progression. The drivers as well as the connector to the PC have been
implemented to make the Kinect system working. The next step is to call up the functions described in
proposed solution section.

9. References
1. Ren, Z., Jinjing, M., and Junsong, Y, Robust Hand Gesture Recognition with KinectSensor,
Singapore, Nanyang Technological University.
2. C.Wolf Maintaining older people at Issues and technologies related
tocomputer vision , Etia 2011.
3. Ben Hadj Mohamed, A., et al., Assisting People with Disabilities through Kinect
Sensor into a Smart House., Val, T., Andrieux, L. & Kachouri, A. 2013
4. Alexiadis, D.S, et al, Evaluating a Dancers Performance Using Kinect-Based
Skeleton Tracking, Kelly. P, Daras. P, OConnor, N.E, Boubekeur. T, Ben Moussa.
M, 2011

10. Appendix A
No

Task

Start

Finish

Duratio

Sep

t
28/9

1
2

Project Discussion
Proposal

28/9
28/9

1/10
5/10

4d
8d

3
4
5
6

System Development
Testing and Analysis
Enhancement
Prototype Preparation

5/10
14/12
14/1
15/2

14/12
14/1
15/2
22/2

51d
24d
23d
7d

7
8
9

Prototype Review
Add, Dev & Testing
Write Final Report

22/2
22/2
21/3

22/2
21/3
4/4

1d
29d
15d

Final Report

4/4

11/4

7d

13d

10

Submission

11

Final Demo

11/4

23/4

12

Final Submission

23/4

13/5

Figure A:Project timeframe and milestones (Gantt Chart)

October
1/10

5/10

Nov

Dec

Jan

14/9

14/12 14/1

February
15/2

22/2

Mar
21/3

April
4/4

11/4

May
23/4

13/5

You might also like