3D Image Reconstruction and Human Body Tracking Using Stereo Vision and Kinect Technology

Introduction
Two different types of 3D image reconstruction

methods
Generating depth perception information using high
quality stereo image textures. It is computationally
heavy and inefficient.
Kinect Technology : Overall image quality is not
refined and it is low resolution.
Explore a method that produces higher quality
images and faster computation of depth
information.
Kinect??
Motion sensing input peripheral device for the
Microsoft Xbox 360 video game console.
It consists of 3 basic components- an Infrared (IR)
Projector, an RGB Camera and an IR
Monochrome Camera.
The infrared signals are emitted from the IR projector
and the 3D depth map is generated by the reflected
infrared signals which are retrieved on the IR
monochrome camera.
Requires less computational time, hence suitable for
dynamic environment.
Microsoft Kinect..
3D Geometrical Models
It is used for representing real world objects in 3D
image models from 2D images captured by cameras.
It requires calibration of camera (as the intrinsic and
extrinsic parameters are different).
Two types of camera calibration- Single Camera
Calibration Geometry and Two Camera Image
Plane (Epipolar Geometry).
Single Camera Calibration
Geometry
It defines three different coordinate systems- a real-
world coordinate system, a camera coordinate
system and an image coordinate system.
The point on the image plane is moving from one
position to another position with respect to the
images taken by the camera for calibrating the
camera.
Intrinsic parameters are remains same. To estimate
the extrinsic, the rotation matrix R and the
translation vector t is calculated.

Calibration equation as given can be represented as:

Where,
Matrix M
1
represents the intrinsic parameters and
matrix M
2
is the extrinsic parameters.
P
w
is the real-world 3D coordinates.

Epipolar Geometry
The property of epipolar lines is defined as:

If a feature projects to a point P
l
in one camera
view, the corresponding image point P
r
in the other
camera view must lie somewhere on an epipolar line
in the camera image. An image point in camera 1
corresponds to an epipolar line in camera 2 and vice
versa.

3D Reconstruction and
Human Tracking

OpenCV Stereo Vision Calibration
3D geometrical model requires camera calibration
and depth perception mapping.
OpenCV stereo vision uses two cameras to construct
stereoscopic image.
The intrinsic and extrinsic parameters of the two
cameras are analyzed.
Stereo calibration is implemented by transforming
those estimated parameters from each camera to their
joint coordinate systems.
The corners of the chessboard image calibrate the real-
world coordinate system with the 3D geometrical
model.
3D Image Reconstruction
It requires the computation of depth image
perception image .
Firstly, the intrinsic and extrinsic parameters are
estimated using openCV stereo vision combined with
the openCV calibration function.
Then, calculate the 3D depth map and derived 3D
reconstructed image.
OpenCV stereo vision 3D depth map calculation
requires heavy computation which is inefficient for
real-time 3D video streaming.

Noise is present on the depth map which led to a
corrupted image for 3D reconstruction due to high
resolution mapping and the depth map
estimation errors.

3D image reconstruction
using Kinect
Requires less computational time compared to the
openCV stereo vision.
IR sensors requires less time to obtain depth
information and also gives accurate depth
perception.
By integrating the 3D depth map (obtained from
Kinect sensors) with the image from the RGB
camera on Kinect with the help of OpenGL (Open
Graphics Library), 3D image is reconstructed.

The main disadvantage of this method is the low
quality reconstructed 3D image, due to the low
quality of the Kinect RGB camera.

3D Image Reconstruction using
both Kinect and HD camera
Fast computation of depth perception
information from Kinect and high quality texture
from the HD webcam are combined to improve the
reconstructed 3D image.
Using Kinect, a 3D depth map was retrieved and
integrated with high quality image from the camera in
OpenGL, which is suitable for real time dynamic
systems.

This design flow based on Kinect and one HD
camera offers real-time high resolution 3D image
with 30 FPS (Frame per Second).

The reconstructed 3D image has high resolution,
which requires less computational time.
Conclusion
OpenCV stereo vision and Kinect are introduced for
3D image reconstruction. Taking advantage of the
Kinect depth map with infrared sensors, faster
generation of depth map is realizable.
Also, with the high definition webcam, texture of
reconstructed 3D images can be improved.
Thus, combing the Kinect with the HD webcam
can deliver high quality 3D image reconstruction for
real-time video streaming. This high quality 3D
image can be used for hand tracking and finger
gesture detection and recognition.

Thank you

3D Image Reconstruction and Human Body Tracking Using Stereo Vision and Kinect Technology

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3D Image Reconstruction and Human Body Tracking Using Stereo Vision and Kinect Technology

Uploaded by

Copyright:

Available Formats

Introduction

Two different types of 3D image reconstruction

You might also like