methods Generating depth perception information using high quality stereo image textures. It is computationally heavy and inefficient. Kinect Technology : Overall image quality is not refined and it is low resolution. Explore a method that produces higher quality images and faster computation of depth information. Kinect?? Motion sensing input peripheral device for the Microsoft Xbox 360 video game console. It consists of 3 basic components- an Infrared (IR) Projector, an RGB Camera and an IR Monochrome Camera. The infrared signals are emitted from the IR projector and the 3D depth map is generated by the reflected infrared signals which are retrieved on the IR monochrome camera. Requires less computational time, hence suitable for dynamic environment. Microsoft Kinect.. 3D Geometrical Models It is used for representing real world objects in 3D image models from 2D images captured by cameras. It requires calibration of camera (as the intrinsic and extrinsic parameters are different). Two types of camera calibration- Single Camera Calibration Geometry and Two Camera Image Plane (Epipolar Geometry). Single Camera Calibration Geometry It defines three different coordinate systems- a real- world coordinate system, a camera coordinate system and an image coordinate system. The point on the image plane is moving from one position to another position with respect to the images taken by the camera for calibrating the camera. Intrinsic parameters are remains same. To estimate the extrinsic, the rotation matrix R and the translation vector t is calculated.
Calibration equation as given can be represented as:
Where, Matrix M 1 represents the intrinsic parameters and matrix M 2 is the extrinsic parameters. P w is the real-world 3D coordinates.
Epipolar Geometry The property of epipolar lines is defined as:
If a feature projects to a point P l in one camera view, the corresponding image point P r in the other camera view must lie somewhere on an epipolar line in the camera image. An image point in camera 1 corresponds to an epipolar line in camera 2 and vice versa.
3D Reconstruction and Human Tracking
OpenCV Stereo Vision Calibration 3D geometrical model requires camera calibration and depth perception mapping. OpenCV stereo vision uses two cameras to construct stereoscopic image. The intrinsic and extrinsic parameters of the two cameras are analyzed. Stereo calibration is implemented by transforming those estimated parameters from each camera to their joint coordinate systems. The corners of the chessboard image calibrate the real- world coordinate system with the 3D geometrical model. 3D Image Reconstruction It requires the computation of depth image perception image . Firstly, the intrinsic and extrinsic parameters are estimated using openCV stereo vision combined with the openCV calibration function. Then, calculate the 3D depth map and derived 3D reconstructed image. OpenCV stereo vision 3D depth map calculation requires heavy computation which is inefficient for real-time 3D video streaming.
Noise is present on the depth map which led to a corrupted image for 3D reconstruction due to high resolution mapping and the depth map estimation errors.
3D image reconstruction using Kinect Requires less computational time compared to the openCV stereo vision. IR sensors requires less time to obtain depth information and also gives accurate depth perception. By integrating the 3D depth map (obtained from Kinect sensors) with the image from the RGB camera on Kinect with the help of OpenGL (Open Graphics Library), 3D image is reconstructed.
The main disadvantage of this method is the low quality reconstructed 3D image, due to the low quality of the Kinect RGB camera.
3D Image Reconstruction using both Kinect and HD camera Fast computation of depth perception information from Kinect and high quality texture from the HD webcam are combined to improve the reconstructed 3D image. Using Kinect, a 3D depth map was retrieved and integrated with high quality image from the camera in OpenGL, which is suitable for real time dynamic systems.
This design flow based on Kinect and one HD camera offers real-time high resolution 3D image with 30 FPS (Frame per Second).
The reconstructed 3D image has high resolution, which requires less computational time. Conclusion OpenCV stereo vision and Kinect are introduced for 3D image reconstruction. Taking advantage of the Kinect depth map with infrared sensors, faster generation of depth map is realizable. Also, with the high definition webcam, texture of reconstructed 3D images can be improved. Thus, combing the Kinect with the HD webcam can deliver high quality 3D image reconstruction for real-time video streaming. This high quality 3D image can be used for hand tracking and finger gesture detection and recognition.