You are on page 1of 55

ENGN8530: Computer Vision and Image Understanding: Theories and Research

Topic 1: Introduction to Computer Vision and Image Understanding


Dr Chunhua Shen Dr Roland Goecke VISTA / NICTA & RSISE, ANU
1

What is Computer Vision?


Vision is a process that produces, from images of the external world, a description that is useful to the viewer and not cluttered with irrelevant information. (Marr and Nishihara, 1978) Computer vision is the science and technology of machines that see. computer vision is concerned with the theory and technology for building artificial systems that obtain information from images or multi-dimensional data. (Wikipedia)
Reference: D. Marr and K. Nishihara, Representation and recognition of the spatial organisation of threedimensional shapes, Proc. Royal Society, B-200, 1978, pp. 269-294. ENGN8530: CVIU 2

What is Computer Vision? (2)


Sometimes seen as complementary to biological vision. In biological vision, the visual perception of humans and various animals are studied, resulting in models of how these systems operate in terms of physiological processes. Computer vision, on the other hand, studies and describes artificial vision system that are implemented in software and/or hardware.
ENGN8530: CVIU 3

What is Computer Vision? (3)


Applications:
Controlling processes (robots, vehicles) Detecting events (visual surveillance) Organising information (indexing databases of images / videos) Modelling objects or environments (medical image analysis) Interaction (HCI)
ENGN8530: CVIU

Source: Wikipedia

Image Understanding
Computer vision goes hand in hand with image understanding What information do we need to know to understand the scene? How can we make decisions about what objects are present, their shape, their positioning?
ENGN8530: CVIU

Source: CMU Computer Vision course

Image Understanding (2)


Many different questions and approaches to solve computer vision / image understanding problems:
Can we build useful machines to solve specific (and limited) vision problems? Is there anything special about the environment which makes vision possible? Can we build a model of the world / scene from 2D images?

Many different fields are involved, e.g. computer science, AI, neuroscience, psychology, engineering, philosophy, art.

ENGN8530: CVIU

Sub-areas of CVIU
Scene reconstruction Event detection Object tracking Object recognition Object structure recovery Ego-motion Multi-view geometry Indexing of image / video databases
ENGN8530: CVIU 7

Scene Reconstruction

From stereo
ENGN8530: CVIU

From multiple views


8

Event Detection

Source: MERL

Source: Roland Goecke

ENGN8530: CVIU

Object Tracking

Source: Roland Goecke

ENGN8530: CVIU

10

Object Recognition

Query

Result
Source: David Nister

Database 11

ENGN8530: CVIU

Object Structure Recovery

Reference: A.D. Worrall, J.M. Ferryman, G.D. Sullivan and K.D. Baker, Pose and structure recovery using active models, Proc. 6th British Machine Vision Conference, Vol.1, Birmingham, UK, pp137-146. ENGN8530: CVIU 12

Ego-motion
Estimated camera path

Optical flow

Source: Roland Goecke

ENGN8530: CVIU

13

Multi-View Geometry

Epipolar geometry

Source: Richard Hartley, Andrew Zisserman

ENGN8530: CVIU

14

Indexing and Retrieval


Results

1 Query

Reference: J. Sivic and A. Zisserman, Video Google: A Text Retrieval Approach to Object Matching in Videos, Proc. International Conference on Computer Vision, Nice, France, 2003, pp. 1470-1477. ENGN8530: CVIU 15

The Default Approach (Marr)


Work bottom up from the image to a 3D world model via hierarchy of representations as follows
Pixel array the image Raw primal sketch edge, corner, etc. representation Primal sketch structural information, i.e. groupings, segmentations, etc. 2-D sketch depth information in image-centred view 3-D world model

Reference: D. Marr, Vision, Freeman, 1982. ENGN8530: CVIU 16

The Default Approach (2)


Image sensor Visible,infra-red, radar

Image capture Digitisation

Image processing Feature detection (edges, corners, regions)

Feature grouping

Object recognition
ENGN8530: CVIU

Characterization of parts
17

What is in Image?
An image is an array/matrix of values (picture elements = pixels) on a plane which describe the world from the point of view of the observer. Because of the line of sight effect, this is a 2D representation of the 3D world. The meaning of the pixels depends on the sensors used for their acquisition.
ENGN8530: CVIU

Source: Antonio Robles-Kelly

18

Imaging Sensors
The information seen by the imaging device is digitised and stored as pixel values. Two important quantities of imaging sensors are:
Spatial resolution: How many pixels are there? Image size Signal resolution: How many values per pixel?

There are many different types of sensors


Optical: CCDs, CMOS, photodiodes, photomultipliers, photoresistors Infrared: Bolometers Others: Range sensors (laser), Synthetic Aperture Radar (SAR), Positron emission tomography (PET), Computed (Axial) Tomography (CAT/CT), Magnetic Resonance Imaging (MRI)
ENGN8530: CVIU 19

Electro-Magnetic Spectrum
UV Visible NIR
0.4m 1.0m

SWIR
1.7m2.5m3.0m

MWIR
5.0m 8.0m

LWIR
14.0m

The human eye can see light between 400 and 700 nm.
ENGN8530: CVIU 20

Charge-Coupled Device (CCD)


CCDs (Charge-Coupled Devices) were invented in 1969 by Willard Boyle and George Smith at AT&T. They are composed of an array of capacitors which are sensible to light. More modern devices are based upon photodiodes.

Source: Wikipedia

ENGN8530: CVIU

21

CCD (2)
Generally, the light-sensitive unit of construction is arranged in an array whose topology is a lattice Not always true, e.g. log-polar CCDs Colour CCDs:
Bayer filter: 1x Red, 1x Blue, 2x Green because the human eye is more sensitive to green RGBE filter: 1x Red, 1x Blue, 1x Green, 1x Emerald (Cyan)
ENGN8530: CVIU

Bayer filter

RGBE filter
Source: Wikipedia

22

Bolometers
Invented by the astronomer Samuel Pierpont Langley in 1878. It is a device comprised of an "absorber" in contact with a heat sink through an insulator. The sink can be viewed as a reference for the absorber temperature, which is raised by the power of the incident electromagnetic wave.
ENGN8530: CVIU

Source: Los Alamos National Laboratory

23

Microbolometer
The microbolometer, a particular kind of bolometer, is the basis for thermal cameras. It is a grid of vanadium oxide or amorphous silicon heat sensors atop a corresponding grid of silicon. IR radiation from a specific range of wavelengths strikes the vanadium oxide and changes its electrical resistance. This resistance change is measured and processed into temperatures which can be represented graphically.
ENGN8530: CVIU

Source: Roland Goecke

24

Synthetic Aperture Radar


SAR is an active sensing technique
Active sensor transmits radio waves Antenna picks up reflections

For a conventional radar, the footprint is governed by the size of the antenna (aperture). SAR creates a synthetic aperture and delivers a 2D image. One dimension is the range (cross track), whereas the other one is the azimuth (along track). Sonar and ultrasound work on the same principles but in different wavelengths
ENGN8530: CVIU 25

SAR (2)
Nadir Track

Radar Track Azimuth

Range

RADAR = Radio Detection and Ranging NADIR = Opposite of zenith ENGN8530: CVIU SAR image of Venus
Source: Wikipedia

26

Positron Emission Tomography


Active sensing technique. PET based on measuring emitted radiation. PET is a nuclear medicine imaging technique which uses radiation from a radio-isotope introduced into the target. PET produces a 3D image or map of functional processes in the body.
ENGN8530: CVIU

Source: Wikipedia 27

Magnetic Resonance Imaging


Active sensing technique. MRI also based on measuring emitted radiation. MRI simulates the emission of radiation by aligning the spins of water molecules making use of a high energy magnetic field (several Tesla!). Good for showing soft tissue Not good for showing bones
ENGN8530: CVIU

MRI

Magnetic Resonance Angiography Source: Wikipedia 28

Functional MRI
Functional MRI (fMRI) measures signal changes in the brain that are due to changing neural activity. Increases in neural activity cause changes in the MR signal due to change in ratio of oxygenated to deoxygenated haemoglobin. Deoxygenated haemoglobin attenuates the MR signal.
ENGN8530: CVIU

fMRI of head: Highlighted areas show primary visual cortex Source: Wikipedia

29

Computed (Axial) Tomography


Employs a set of axially acquired xray images to recover a 3D representation of the object. Originally, the images were in axial or transverse planes, but the modern CT scanner deliver volumetric data. Digital geometry processing is used to generate a 3D image of the internals of an object from a large series of 2D X-ray images taken around a single axis of rotation.
ENGN8530: CVIU

CT scan of head Source: Wikipedia

30

CAT/CT
Good for showing bones Not good for showing soft tissue

Modern diagnostic software ENGN8530: CVIU 31

Camera Geometry
aperture d z optical axis f (focal length) y' x' image plane

The aperture allows light to enter the camera The image plane is where the image is formed The focal length is the distance between the aperture and the image plane The optical axis passes through the center of the aperture and is perpendicular to it.
ENGN8530: CVIU 32

Camera Geometry (2)


x z optical axis x' f (focal length)

x'

By similar triangles, x'/f=x/z xf x = or x = f tan z For small angle x = f


ENGN8530: CVIU 33

Camera Geometry (3)


xt
x x' f

x'b
x'

xb

x't

And, using the formula in the previous slide ( xt xb ) f xf xt f xb f , xb = and x = xt xb = xt = = z z z z Hence, size transforms as x = f 2 tan f 2
ENGN8530: CVIU 34

Camera Geometry (4)

Distant object

Close object

Rays that pass through the camera aperture spread out and do not make a sharp point on the image. These rays need to be focussed to make a sharp point in the image. The rays from close objects diverge more than from distant objects For very distant objects, the rays are effectively parallel
ENGN8530: CVIU 35

Aperture and Resolution


Light diffracts as it passes through the aperture
A point in the scene spreads out into a blob in the image (fundamental limit on image sharpness)
Circular aperture Airy disk Square aperture Separate points

Size of Airy disk (and best resolution) is (Rayleigh criterion) f

min = 1.22

where is the wavelength of the light, d is the aperture


ENGN8530: CVIU 36

Rmin = 1.22

Resolution
The resolution of a camera is the minimum separation between two points such that they appear separately on the image plane Since distant objects appear smaller and closer together, the resolution varies with respect to the distance. The angle between separable objects does not vary wrt distance angular resolution The distance on the image plane does not vary image

plane resolution.

ENGN8530: CVIU

37

Camera Models
Pinhole camera Camera with lenses

ENGN8530: CVIU

38

Pinhole Camera
Advantages
No distortion of image Depth of field from a few cm to infinity Wide angular field Works on ultra-violet and X-rays

Disadvantages
Very limited light gathering Poor resolution

ENGN8530: CVIU

39

Pinhole Camera (2)

Simplest camera The pinhole (aperture d ) must be small to get a sharp image But we need a large pinhole to get enough light!
ENGN8530: CVIU 40

Pinhole Camera (3)


For distant objects the geometric limit is

R=d

Geometric

d R= d f

The diffraction limit is

R = 1.22f / d
The best resolution occurs when these two are equal:

Diffraction
R

d = 1.22f * / d
or

f * = d 2 / 1.22
41

f* is the optimal focal length


ENGN8530: CVIU

Pinhole Camera (4)


Geometric limit

Longer wavelength

Smaller aperture
ENGN8530: CVIU 42

Cameras with Lenses


For better light-gathering capabilities, we need to increase the aperture. A lens removes the geometric limit on resolution, since it focuses all light entering through the aperture on the same point on the image.

Pinhole path

ENGN8530: CVIU

43

Cameras with Lenses (2)


We can have apertures as large as we like The price to pay: chromatic and spherical aberration The image-plane resolution of lens based camera is the diffraction limit of the aperture:

= 1.22 / d
The larger the aperture, the better the resolution

R = 1.22f / d
The image-plane resolution is still f

ENGN8530: CVIU

44

Camera Resolution Examples


Pinhole camera, 0.5mm pinhole
Optimal focal length f*=37cm =4.6', equivalent to 1mm at 75cm

For a 35mm lens camera and visible light:


=3.9'', 1mm at 52m Focal length depends on the lens, but typically <10cm for a camera

Human eye, pupil 4.5mm (average)


=28'', 1mm at 7.4m Focal length ~2cm

ENGN8530: CVIU

45

Illumination
The amount of light entering the camera is proportional to the area of the lens (d 2/4) The area covered by the image is proportional to f 2 So, the brightness of the image is proportional to d 2/f 2 Dependent on the focal ratio f /d Brightness is controlled by a moveable aperture which changes d Referred to by a sequence of f-stops; f:1 is fully open, each successive f-stop halves the brightness (so the aperture is reduced by 2): f:1.4, f:2, f:2.8, f:4, f:5.6
ENGN8530: CVIU 46

Absorption and Reflection


Reflected + absorbed + transmitted energy = Incident light energy

Reflection

Absorption

All of these are object (material, surface) dependant!

Transmission
ENGN8530: CVIU 47

The BSDF
Bidirectional Scattering Distribution Function Describes the way in which light is scattered by a surface BSDF = BRDF + BSSRDF + BTDF
BRDF - Bidirectional reflectance distribution function BSSRDF - Bidirectional surface scattering reflectance distribution function (incl. subsurface scattering) BTDF - Bidirectional transmittance distribution function
ENGN8530: CVIU

Source: Wikipedia 48

The BRDF
It describes the reflectance of an object as a function of the illumination, viewing geometry and wavelength. Its given by the ratio of irradiance (incident flux per unit area) to radiance (reflected flux per unit area).
Reference: F. Nicodemus, "Reflectance nomenclature and directional reflectance and emissivity," Appl. Opt., Vol. 9, 1970, pp. 14741475. ENGN8530: CVIU 49

The BRDF (2)


The modelling of the lighting conditions in the scene is of pivotal importance for the acquisition and processing of digital imagery. The radiance function can be decomposed into a linear combination of ambient, diffuse and specular components. Recovering the radiance function from a single image is an underconstrained problem.

ENGN8530: CVIU

50

The BRDF (3)


In general, the BRDF has the following form

The function depends on


Incoming and outgoing angle Incoming and outgoing wavelength Incoming and outgoing polarisation Incoming and outgoing position (subsurface scattering) Delay between the incoming and outgoing light rays
ENGN8530: CVIU 51

Radiance
Power per unit projected area perpendicular to the ray per unit solid angle in the direction of the dA ray dw Flux given by d = L(x,) cos d dA L(x,w) Solid angle is proportional to the surface area, S of a projection of the object onto a sphere divided by the square of its radius R.
ENGN8530: CVIU 52

Example BRDFs
Oren and Nayar

Cook and Torrance

ENGN8530: CVIU

53

Example BRDFs (2)

where mp is the microfacet slope


ENGN8530: CVIU 54

Example BRDFs (3)


Phong

ENGN8530: CVIU

55

You might also like