You are on page 1of 43

Chapter 6

Imaging Hardware
With advances in image sensors and their supporting interfaces, image-based
measurement platforms will continue to offer higher performance and more
programmable features. While the evolving technologies will simplify the physical
connection and setting up of machine vision systems and open new application
areas, the level of benefits will continue to depend on the collective ability
of application engineers and system designers to quantify the characteristic
parameters of their target applications. Although software can enhance a noisy,
distorted, or defocused image, some of the measurement features embedded in the
target scene may be lost in the process. Thus, a good source image, rather than a
numerically enhanced image, is an essential building block of a successful machine
vision application. Capturing an image is not difficult, but acquiring an image with
the required characteristic features of the target requires insight into the imaging
hardware.
This chapter starts with an outline description of video signals and their
standards in the context of image display. This subject is followed by a description
of the key components of framegrabbers and their performance. With the
increasing use of images from moving targets for higher throughput and demands
on measurement accuracy, latency and resolution have become important in the
overall assessment of a machine vision system. This chapter concludes with some
definitions and concepts associated with these topics and illustrative examples.

6.1 Image Display110


As described in Chapter 2, the human visual system does not respond instantly for
a given stimulus, nor does the sensation cease immediately when the stimulus is
removed. Persistence of vision is a special feature of the eye that discriminates
the intensity of a time-varying stimulus up to the CFF (see Sec. 2.7). Since
the intensities of neighboring pixel brightness within an image are displayed as
continuous streams of gray and dark patches on a 2D plane, the CFF is related to
the brightness of the source as well as its 2D spatial variation. Early experimental
work indicated that the human visual system has a CFF of around 50 cycles/sec.1
In traditional cinematic films, the actual projection rate is 24 picture frames/sec.
To meet the above CFF, each frame is mechanically interrupted to present the
same picture twice, thus giving an effective repetition rate of 48 picture frames/sec.
161
162 Chapter 6

This leads to two basic terms: (1) update rate: the actual new picture frames/sec,
and (2) refresh rate: the number of times the same picture frame is presented
(twice the update rate in cinematic films). Computer monitors may be interlaced or
noninterlaced. In noninterlaced displays, the picture frame is not divided into two
fields. Therefore, noninterlaced monitors have only a refresh rate, typically upward
of 70 Hz. The refresh rate and resolution in multisync monitors are programmable.
Since the refresh rate depends on the number of rows to scan, it restricts the
maximum resolution, which is in turn related to the physical size of the monitor.
Image frames have traditionally been displayed by a raster-based CRT monitor,
which consists of an electronic beam moving on a 2D plane (display screen) and
a beam intensity that varies along the perpendicular axis (Fig. 6.1). The display
signal may be considered to be a spatially varying luminous signal. The timing
for the horizontal and vertical deflection scan and the amplitude of the luminous
signal are specified by the Electronic Industries Alliance (EIA) Recommended
Standard-170 (commonly referred to as RS-170) and the CCIR standards. Table 6.1
lists some of the key parameters in RS-170 plus three video standards. The
RS-343A standard, originally created for high-resolution closed-circuit television
cameras, defines higher resolution as 675 to 1023 lines/image frame with timing
waveforms modified from the RS-170 to provide additional signal characteristics.
The RS-170A, a modification of the RS-170 standard, works with color video
signals by adding color information to the existing monochrome brightness signal;

Figure 6.1 (a) Main components of a CRT (interlaced) display. (b) Excitation signals for
the x- and y-deflection coils control the beam location on the display surface deflection
coils. The beam intensity along the z axis contains the video signal (analog voltage);
corresponding timing and voltage values are given in Fig. 6.3.
Imaging Hardware 163

Table 6.1 Operational parameters of three video broadcasting standards. EIA RS-343A
operates from 675 to 1023 lines; the recommended values for 875 lines are included for
comparison.3
Parameter Format
EIA RS-170 CCIR SECAM EIA RS-343A

Frame rate 30 25 25 60
Number of lines/frame 525 625 625 875
Total line time, s 63.49 64 64 38.09
[=1/(frame rate lines per frame)]
Number of active lines/frame 485 575 575 809
Nominal active line time * , s 52.59 52 52 31.09
Number of horizontal pixels 437 569 620
Number of pixels/frame 212,000 527,000 356,500
Line-blanking time, s 10.9 12 12 7
[=total line time-nominal active line time]
Field blanking time, ms
  1.27 1.6 1.6
Line frequency, kHz = total line
1
time 15.750 15.625 15.625 26.25
*
Corresponds to the duration of the video signal (luminous intensity) in each horizontal line.

RS-170A provides the color television NTSC standard. The color video standard in
Europe, PAL, was adapted from the CCIR standard. Squentiel couleur mmoire
(SECAM) uses techniques similar to NTSC and PAL to generate a composite color
video signal.
In raster-based video display systems, the picture frame is divided into two
fields: the odd field contains the odd-numbered (horizontal) scan lines, and the
even field contains the even-numbered scan lines (Fig. 6.2). By displaying the two
fields alternately, the effective CFF for the whole picture frame is doubled. The
frame rate refers to the number of complete pictures presented, while the field rate
indicates the rate (or field frequency) at which the electron beam scans the picture
from top to bottom. By dividing the whole frame (= one whole image or picture)
into two fields, the frame rate becomes one-half of the field frequency. By choosing
the main frequency as the field frequency, the frame rates in RS-170 and the CCIR
standards become 30 fps and 25 fps, respectively. These give the respective picture
frame updating times of 33.33 ms for RS-170 and 25 ms for CCIR.
In the RS-170 standard, the whole picture is made of 525 horizontal scan lines,
with two fields interlaced as in Fig. 6.2. The scanning process begins with the
odd field starting at the upper left top corner. The beam moves from the left to
the right across the screen, shifting downward slightly to give a slanted display
of each horizontal scan line. When the beam reaches the edge of the screen, it
moves to the left edge of the odd line location on the screen. The time needed
by the beam to move from the end of one odd line to the beginning of the next
and settle down before beginning to scan again is known as the line flyback time
[Figs. 6.2(a) and (b), top]. When the beam reaches the very last line in the odd
field (end of screen), it moves to the starting point of the first even field, which is
above the very first odd line on top of the screen [Fig. 6.2(c), top]. The scanning
time between the end of one field and the beginning of the next is called the field
164 Chapter 6

Figure 6.2 Superposition of the (a) odd fields and (b) even fields to generate (c) one
picture frame (courtesy of Philips Research, Eindhoven, The Netherlands).

flyback time. To ensure that the line flyback and the field flyback tracks do not
distract the viewer, the beam is made invisible (field blanking) by bringing its
luminous intensity down to the ground level.2 For the RS-170, the total time taken
for the field flyback is equivalent to 20 scan lines/field, giving 242.5 active (visible)
lines/field (25 field blanking lines and 287.5 active lines in CCIR). The number of
lines for field synchronization and the timing parameters in the video signal are
shown in Fig. 6.3.
Because of the line-scanning nature of a video signal, the spacing between two
consecutive horizontal lines defines the screen height necessary to display a full
picture. Thus, the number of horizontal lines becomes the default value of the
number of vertical pixels available for display. The width of the screen in turn
is related to its height through the aspect ratio (ratio of display width to height)
specified in the video standard. The aspect ratio for the RS-170 and CCIR is 4:3.
A composite video signal contains all timing as well as analog signals, as shown in
Fig. 6.3. During display, these signals are extracted to drive the respective parts of
the display unit [Fig. 6.4(a)].
In image-processing operations, the input is normally a square image (aspect
ratio of 1, or width:height = 1:1). This implies that the displayed image is required
to have the same number of pixels along the horizontal (x) and the vertical (y)
axes. Consequently, the duration of the active video along the horizontal axis must
be adjusted; this adjusted time is referred to as the active line time and denoted by
TAC (=3/4TVH ). To obtain an aspect ratio of 1:1, the number of horizontal scan
lines is kept unchanged, but the sampling is delayed. In the RS-170, the delay is
6.575 s from the start of the horizontal scan line of the incoming video to 6.575 s
before the line reaches its end. In the CCIR standards, the sampling is delayed by
6.5 s and terminated early by the same period. The consequence is that black
strips appear on the left and right sides of the display, giving a smaller active video
Imaging Hardware 165

Figure 6.3 (a) Video lines for a raster-based video display. (b) One active line of mono-
chrome video line signal. (c) Timing parameters in images (a) and (b) and in Fig. 6.1(b). For
equal spatial resolution, TAC = 3/4 52 = 39 s; for 512 512-pixel resolution, visible line
sampling time is 76 ns (13.13-MHz sampling frequency).

area. For 256 256 images, one of the two fields is defined; for higher resolutions,
768 576 or more video lines are captured.
For image digitization, each video line signal in each field during TAC is
sampled, quantized, and stored in memory as an individual pixel with its x and
y locations appropriately registered. The value of the ADC sampling time will
166 Chapter 6

Figure 6.4 (a) Generation of timing pulses from a composite video signal. (b) Image
digitization from a composite video signal.

depend on the required resolution. If the required image size is 256 256, the
common practice is to capture only one field, discounting the half-line, which
corresponds to 242 lines/field in RS-170. The remainder is made up of blank
lines [AB plus BC in Fig. 6.3(a)]. For CCIR, excess lines are discarded equally
at the top and bottom of each field. A similar process follows for sampling to
generate a 512 512 image frame. The resolution of conventional TV cameras
limits the maximum possible vertical resolution available from the source image.
For the larger 512 512 image, both fields are captured. An adequate memory
space is necessary to store the entire digitized image frame. For the commonly used
machine vision image size of 512 512 pixels with an 8-bit gray-level resolution,
Imaging Hardware 167

the size of one image frame is 262,144 bytes. A functional block diagram for image
digitization and the associated memory map is shown in Fig. 6.4(b).68
By definition, a pixel is the smallest area on the screen that can be displayed
with time-varying brightness. The pixel size gives a quantitative measure of the
number of individual pieces of brightness information conveyed to the observer or
the display system resolution (independent of the display screen size). However,
due to errors in the beam control circuitry, neighboring pixels may be illuminated,
which reduces the effective resolution. The parameter addressability (the number
of pixels per unit length of the horizontal scan line) is used for indicating the
ability of the beam location controller to select and activate a unique area within
the display screen.3
The vertical pixel size is the width of the horizontal scan line, i.e., display
height/number of active lines. Because of the phasing effect in the human visual
system, the average number of individual horizontal lines that can be perceived is
less than the actual number of active horizontal lines present in a picture frame. The
ratio of the average number of horizontal scan lines perceived to the total number
of horizontal lines present in the frame (Kell factor) is usually 0.7. This value gives
an average vertical resolution of 340 lines for the RS-170 and 402 for the CCIR
systems. The corresponding parameter for HDTV is over 1000 lines.
Color displays contain three independently controlled electron beams that scan
small areas on the CRT face. Each area on the screen corresponds to a pixel location
that contains three different phosphor-coated dots: blue, red, and green (Fig. 6.5).
The three electron beams themselves do not control color, but the desired color
is produced by the combination of their intensities. The separation between the
adjacent dots of similar colors is known as the dot pitch and gives a quantitative
measure of the display resolution. To ensure that beams converge uniformly, a
dynamic convergence correction is employed to keep the three electron beams
together as they move across the 2D screen.3 In Sony TrinitronTM monitors, the
metal mask has vertical slots rather than circular holes. The geometric arrangement
of these vertical slots is such that the output of one gun can reach only one stripe
of color phosphor. Tables 6.2 and 6.3 summarize the display formats and typical
pixel densities of color monitors commonly used in machine vision platforms.
The brightness of the displayed pixels on the screen is a function of the intensity
of the electron beams and the luminosity of the coated phosphors. While the beam
intensity is linearly related to the applied voltage ( video signal), the luminous
output of the phosphors is related exponentially to the incident beam intensity,
generally with gamma as the exponential parameter:

Luminous output = (intensity of incident beam) . (6.1a)

CRT phosphor luminosity is generally characterized by 1.8 3.0, leading to


saturation in the displayed pixel brightness toward the darker side of the brightness
scale [Fig. 6.5(c)]. The process of removing this saturation effect by remapping
the monitor output response is referred to as gamma correction. For monochrome
168 Chapter 6

Figure 6.5 In color CRT monitors, the source beams converge through holes in a metal
mask approximately 18 mm behind the glass display screen. These holes are clustered
either as (a) shadow-mask pattern or (b) precision inline (PIL). (c) Gamma correction with
normalized axis scaling.

images, this is essentially a gray-level mapping in which the captured pixel gray-
level values are rescaled by the transformation
1
Rescaled pixel gray level = (captured pixel gray-level value) (6.1b)

and fed into the display.10 The collective result of Eqs. (6.1a) and (6.1b) is a
linearized relationship between the image pixel intensity and its brightness on
the CRT display. The gamma-corrected image generally has a brighter appearance
(Sec. 9.2). In addition to gamma correction for brightness, color displays require
variations in the color additive rules.
Imaging Hardware 169

Table 6.2 Classification of monitors by resolution.3,9 For reference, an image captured by


the human eye is estimated to have a resolution of 11,000 pixels/inch; the resolution of a
35-mm color negative film is upward of 1,500 pixels/inch.
Format Abbreviation Resolution Color bit depth

Color graphics card adapter * CGA 400 200 2


Extended graphics adapter * EGA 640 350/400 4
Video graphics adapter * VGA 640 480 8
Extended graphics array ** XGA 800 600 16 (high color)
Super video graphics array ** SVGA 800 600 24 (true color, 8-bit each for RGB)
Extended video graphics array XVGA 1024 768 24
Ultra extended video graphics array UXGA > 1024 768 32 (true color with Alpha channel # )
*
Now obsolete.
**
Controlled by the industry consortium known as the Video Electronics Standards Association (VESA).

Further subdivisions are listed in Tables 14.1 and 14.2.
#
Alpha channels 8-bit color depth adds translucency in the displayed image, which is used to create special effects in video
games and animation.

Table 6.3 Grouping of monitors by pixel density.3,9

Category Pixel density (per inch) Triad dot pitch (mm)

Low <50 >0.48


Medium 5070 0.320.48
High 71120 0.270.32
Ultrahigh >121 <0.27

CRT monitors are specified by the diagonal size of the screen. Since pixel
resolution is dependent on the number of vertical lines and hence the monitor
height, resolution (pixels/inch) is related to the monitor size. With an aspect
ratio of 4:3, the screen height is 0.2 the screen diameter, which allows for
a margin of around 7% for the dark areas around the edges. An approximate
pixel resolution value (also referred to as pixel density) is derived as 1.07 (5
monitor vertical resolution/3 monitor size). Since the number of pixels in an
image frame (display resolution) defines the pixel density, an image on a larger
screen may appear coarser than on a smaller screen with similar resolution.

6.2 Liquid Crystal Display1013


A liquid crystal is a semitransparent, viscous organic material in an intermediate
optical state of liquid and solid with rod-like molecules and a long-range
order in one direction over large distances compared with the molecular length.
A characteristic feature of these molecules is that the ordered arrangement
(alignment) can be influenced by external fields and surface inhomogeneity. Liquid
crystals are classified into three groups according to their long-range order in the
natural state: nematic, twisted nematic (or cholesteric), and smectic [Figs. 6.6(a),
(b) and (c)]. For reference, the structure of a passive matrix display is shown in
Fig. 6.6(d). If a liquid crystal is put on a finely grooved surface, the molecules
line up in a particular direction; by making these grooves exactly parallel, rows of
170 Chapter 6

Figure 6.6 Molecular structure of three types of liquid crystal materials: (a) nematic, (b)
twisted nematic, and (c) smectic. (d) Passive matrix display.11,12 The orientation of light as
it passes through the liquid crystal layers (e) in their natural state and (f) with an applied
electric field.
Imaging Hardware 171

parallel (aligned) molecules may be created. Liquid crystal displays (LCDs) use
two basic properties of the liquid crystal material: (1) light follows the alignment
of the molecules, and (2) the molecules tend to orient with their long axes parallel
to an electric field.
If a thin layer of liquid crystal material is sandwiched between two glass surfaces
(alignment layers) with grooves at right angles, the liquid crystal molecules on
one surface will be aligned at right angles to the other surface while those in
between will be forced to assume a twisted state between 0 and 90 deg. Thus,
incident light on one glass surface will be twisted by 90 deg as it passes through
the sandwiched liquid crystal layer and will exit the second glass surface without
any loss of intensity [Fig. 6.6(e)]. A transparent indium-tin-oxide (ITO) layer
is placed by photolithography on each glass plate to act as electrodes. When a
voltage is applied to the sandwiched layer using these terminals, all liquid crystal
molecules align with the resulting electric field. This axial lining up of the twisted
nematic molecules blocks all incoming light rays [Fig. 6.6(f)]. A pair of orthogonal
polarized films is added to ensure that only rays twisted exactly by 90 deg are
transmitted through the LCD sandwich in both natural and excited states. The
source of the incoming light is a backlit fluorescent tube mounted on the top and
bottom edges of the display panel. Light guides distribute light across the scene.
The extent of optical blocking depends on the response time and transmissivity of
the liquid crystal material; the best results are obtained by super-twisted nematic
(STN) material. The orientation of the alignment layers in STN screens vary from
90 deg to 270 deg, depending on the total amount of rotation of the liquid crystals
sandwiched between the layers.
The response time of the early passive LCD was around 350 msrather slow
for rapidly varying brightness levels and a fast-moving mouse/cursorwhich pro-
duced ghosting or smear effects. Evolutionary developments in LCD technology,
for example, lowering the viscosity of the liquid crystal material, permitted a faster
switching time between states, increasing contrast and reducing response time (to
around 150 ms by using the hybrid passive display technology). In monochrome
LCDs, the brightness level of the individual pixel area is controlled by varying the
voltage through a row-column addressing method; in color displays, three color
pixels are used in each row-column location [Fig. 6.7(a)]. This row-column ad-
dressing mode is limited because an N N display has (N 1) sneak paths around
a selected pixel [Fig. 6.7(b)] and because voltage applied to neighboring pixels
reduces contrast in a small neighborhood of the displayed image (crosstalk).
One way of reducing crosstalk12 is to drive the nonselected
 columns
 by V/b and
the nonselected rows by V/2b. For optimal design, b = N + 1 , and the ratio of
the rmsvoltage applied between the selected and nonselected columns is given

by = N + 1/ N 1 1 as N . This addressing problem and the slow
response time are partially overcome by dividing the screen into two halves and
scanning them separately. Though the problem of variable intensity across the
screen is not completely eliminated, by doubling the number of lines scanned/s,
the dual-scan super-twisted nematic (DSTN) display provides a sharper image. The
172 Chapter 6

Figure 6.7 (a) Passive-matrix display pixel addressing. (b) Sneak paths. (c) Addressing of
active-matrix display pixels. (d) Elements of one pixel in an active LCD display.12,13

problem of addressing and loss of contrast due to varying light levels at individual
pixel locations from the light guide distribution network is overcome by adding a
thin-film transistor (TFT) at each pixel location. In this active or TFT screen, one
row of pixels is selected by driving the corresponding transistor gates [ Fig. 6.7(c)].
TFT screens add to the manufacturing costs because they require three transistors
for each pixel location in a color display. However, TFT screens offer a uniform
brightness, increased viewing angle, and faster response (down to as low as 25
ms).12,13 Major limitations of these displays include the need for complex hardware
circuitry to uniformly distribute the backlight across the screen, and as much as
Imaging Hardware 173

50% loss of brightness as the light rays pass through the various layers and the
polarizing films [Fig. 6.7(d)].
CRT displays are emissive devices in that the three electron beams converge
on the coated phosphors behind the back of the display screen glass. For a sharp
image, all three beams must converge perfectly on the screen. The pixel intensity of
LCD panels depends on the transmissivity property of the liquid crystal molecules
behind the pixels, so it is not susceptible to imperfect convergence. However, a
limitation of backlit and transmissive displays is that the intensity of the individual
pixels varies with viewing angle in LCD devices, which reduces their overall
viewing angle compared with CRT displays. Some comparative figures are given
in Table 6.4.

Table 6.4 General comparison of CRT and TFT display parameters.

Parameter CRT TFT

Convergence error Depends on dot pitch (up to 300m) Absent


Geometric distortion Present Absent
Brightness uniformity Brighter at the center Brighter around the edges
Color quality Medium to high High
Flicker Not noticeable above 85-Hz refresh rate Absent
Power consumption, W 50200 (typically 90) 2060 (typically 30)
Contrast ratio 300:1 to 800:1 From 200:1 to over 400:1
Brightness, cd/m2 70150 150 to over 400
Response time, ms Insignificant 2030
Pixel error Absent Caused by defective transistors at pixel
locations (error is typically quoted to
be 20 for a 1024 768 screen)
Viewing angle * , deg >150 100180
*
Viewing angle for color pixels is slightly lower.

6.3 Framegrabber1418
The generic name framegrabber describes the interface and data conversion
hardware between a camera and the host processor computer. In analog cameras,
the image sensor generates an analog signal stream as a function of the incident
light, and the onboard timing circuits convert the sensor signal into a composite
or RGB video signal. An analog framegrabber receives the video signal from
the analog camera and performs all onboard signal-conditioning, digitization, and
elementary processing operations. A digital camera is essentially an analog camera
with all framegrabber hardware packaged within the camera casing that outputs
a digital image stream. Framegrabbers in low-end/high-volume applications
(e.g., security and web cameras) generally contain the minimal hardware and a
memory store provided by first-in first-out (FIFO) buffers (Fig. 6.8). A FIFO
buffer is essentially a collection of registers that can be written onto and read out
simultaneously, provided that new input data does not overwrite the existing data.
An important property of the FIFO buffer is that it does not need to be emptied
before new data is added. A FIFO buffer, with its read-and-write capacity and
174 Chapter 6

Figure 6.8 (a) Analog camera. (b) Functional block diagram of basic analog framegrabber
hardware, including a multiplexer that reads camera outputs with different video formats.

operating with minimal attention from the host processor, can transfer image data
to the host almost as soon as it acquires them from the ADC (subject to its own
internal delay). Analog cameras are common in machine vision applications; as
some of the front-end electronics in analog framegrabbers are embedded within
digital cameras, an overview of analog framegrabber components is given in this
section.
Line-scan cameras are widely used in moving-target inspection systems, but
despite their superior pixel density and physical dimensions, the requirement for
relative motion between the camera and the target adds some complications to
the camera setup. For a constant scan rate (number of lines/second), the vertical
resolution is related to the target motion (the horizontal resolution is dictated by the
sensor resolution). This makes the vertical resolution finer at slow target speeds;
at higher speeds, the individual scan lines that make up the image may become
darker with insufficient exposure time. In most applications, an encoder is used as
part of the speed control system to synchronize target motion with the cameras
acquisition timing (Fig. 6.9).
By containing all analog electronics in a shielded case, digital cameras offer
enhanced noise immunity. The hardware architecture of a digital framegrabber is
comparatively simpler than that of an analog framegrabber. It usually contains an
application-specific IC (ASIC) or a field-programmable gate array (FPGA) for low-
level, real-time operations on the image data prior to transferring them to the host
(Fig. 6.10). Various types of serialized parallel data cables are used with digital
cameras, each with its own data throughput rate.1922 These include:
the RS-644 low-voltage differential signaling (LVDS) cable with 28 single-
ended data signals (converted to four datastreams) and one single-ended clock
(up to 1.8 Gbits/sec),
Imaging Hardware 175

Figure 6.9 Schematic configuration of a line-scan camera setup. The encoder measures
the target position and controls the camera trigger time to ensure that images are captured
as the target travels a fixed distance17 (courtesy of RVSI Acuity CiMatrix, Nashua, NH).

the IEEE-1394 cable (Firewire, a packet-based peer-to-peer protocol) with a


physical link and transaction layers that correspond to the lowest layers of the
Open Systems Interconnection (OSI) model of the ISO (up to 400 Mbits/sec),
and
the channel link chip-set cable that converts 28 bits of data into four datastreams
and a clock signal (up to 2.38 Gbits/sec).
However, most machine vision framegrabbers have onboard memory to hold at
least one full frame of image data. The amount of onboard memory is related to the
bus latency during data transfer between the framegrabber and the host processor,
and the extent of onboard processing required for a given application. Latency is
defined as the time interval between the trigger to start an activity and the end
of that activity. For continuous transfer to the host with no bus latency (ideal
case), the onboard memory must be sufficient to hold the digitized image data that
corresponds to one horizontal scan line. To improve the processing cycle time,
computationally intensive applications perform a range of preprocessing tasks
using onboard dedicated or general-purpose digital signal processing hardware
(Fig. 6.11). These high-end applications require multiple-frame image store space
as well as faster data transfer protocols between the framegrabber and the host
processor.
The framegrabbers basic function is to read (acquire or capture) image sensor
output and stream video data into the host interface bus for processing per the
requirements of the application. The first generation of IBM PCs used an 8-bit
subset of the industry standard architecture (ISA) bus; the IBM PC-AT bus was
the first full implementation of a 16-bit ISA bus. Bus protocols used in personal
computer (PC)-based machine vision framegrabbers include the peripheral
component interconnect [PCI; 32-bit data transfer rate, 132 Mbits (Mb)/sec peak],
much faster than the ISA (16-bit, 3 to 5 Mb/sec), or the extended ISA (16-bit, 33
Mb/sec). A majority of the new generation of framegrabbers offer Universal Serial
Bus (USB) interface with higher data transfer rates: USB1.1 (up to 12 Mb/sec)
176 Chapter 6

Figure 6.10 (a) Digital camera components. Generally a high-performance host computer
is required to make use of the higher data throughput of digital cameras. (b) Camera link
standard for camera-to-framegrabber connection. (c) Functional block diagram of National
Instruments NI-1428 digital framegrabber with a channel link capable of a sustained data
rate of 100 MB/sec; 28 bits of data and the status are transmitted with four pairs of wire, while
a fifth pair is used to transmit clock signals (compared to the 56 wires used in the RS-644
LVDS).14 CC: command and control channels use the same protocols as serial ports. MDR:
miniature delta ribbon. RTSI: real-time synchronization information. DMA: direct memory
address.
Imaging Hardware 177
Figure continued on next page.
Figure 6.11 (a) Functional blocks in a conceptual framegrabber with full-frame buffer
memory and an onboard frame processor.7,8 (b) Diagrams of the dedicated hardware or
(c) general-purpose digital signal processor used in many commercial framegrabber boards
(adapted from the DT2858 block diagram, courtesy of Data Translation, Marlboro, MA). The
dedicated hardware blocks perform a wide variety of tasks; four sub-blocks are included in
(b) for illustration.15,16
178 Chapter 6

Figure 6.11 (continued)

and USB2 (up to 480 Mb/sec). USB2 is a four-wire cable (one pair for differential
data, and one pair for power and ground) for half-duplex transfer. By adding one
pair each for differential receive and transmit data (a total of eight wires in the
connection cable), the USB3 implements the full bidirectional data communication
protocol, resulting in an increased bandwidth, and a ten-fold improvement in data
transfer rate (up to 5 Gb/sec, design specification, November 2008).
In conventional instrumentation terms, framegrabber hardware acts as the
signal-conditioning and data-conversion unit with memory to store one or more
image frames. Thus, its specification plays a critical role in the overall performance
of a machine vision system. The input stage of a commercial framegrabber
card consists of an analog preprocessing block and a timing-and-control block
[Fig. 6.11(a)]. The analog front end picks up the analog video signal while the
sync stripper extracts the timing pulses to drive the digital modules within the
framegrabber hardware.
Imaging Hardware 179

6.3.1 Analog front end


The analog circuitry contains a low-pass anti-aliasing filter to remove high-
frequency noise and signals above the half-ADC sampling frequency. The anti-
aliasing filter is added to ensure that the digitization sampling frequency is at least
twice that of the highest frequency present in the input analog signal (Nyquist
sampling theorem; see Appendix B). Theoretically, the low-pass filters minimum
cutoff frequency should be set to twice the 5.5-MHz video signal bandwidth;
however, the resulting effect on image blurring must be taken into account. The
need to avoid image blurring may lead to a higher cutoff frequency than that given
by Nyquist sampling. The dc restoration circuit eliminates time-varying drift/bias
in the incoming video. Typically, the input analog signal is clamped to the reference
voltage of the ADC for uniform brightness in the digitized image (the black level
of the video is the ground voltage of the ADC). A programmable gain amplifier
scales the analog voltage level to within the ADC range. Under poor and unknown
lighting conditions during image capture, the analog gain and the ADC offset
compensate for the variable analog video signal.

6.3.2 Timing pulses


The composite video signal given by a camera contains the analog video signal as
well as composite sync pulses with timing information to identify odd/even fields
(field index), the field synchronization signal (field pulse), and the end/start of a
horizontal line (line pulse). The line pulse is fed into the pixel clock generator while
all three outputs, along with the pixel clock, are fed into the frame store address
counter and the control signal generator sub-blocks.7 The pixel clock drives the
video ADC at the input end and the digital-to-analog converter (DAC) at the output
end.

6.3.3 Pixel clock7,18


The conventional method of generating a stable pixel clock is to use a phase-
locked loop (PLL). The sync signals buried in the camera output signal are not
guaranteed to be regular, but the frame acquisition must be triggered with these
uncertain sync signals. With an inherent tendency to resist changes in timing, the
PLL output may take up to one field time (one-half of one image frame update
time) to synchronize with the sync pulses. This conflict may delay the generation
of the sync signals, leading to a loss or distortion in the captured image due to
pixel jitter, which refers to the pixel clock timing error that causes pixel position
changes, reducing the image integrity. Jitter for PLL-driven clocks may be as high
as 10 ns. The typical pixel jitter for digital clocks is around 2.5 ns. The use of
a PLL circuit to generate pixel clocks is particularly undesirable in applications
with externally triggered (resettable) cameras, where the dynamics of the scene,
e.g., the appearance of a target part on a conveyor belt, determines the timing
of the trigger signal. In this case a trigger signal may appear while a capture is
in progress. This image capture latency may be reduced by reading the odd and
even fields separately. For camera operation with moving targets, the framegrabber
180 Chapter 6

needs to be able to reset and resynchronize with the new video stream. (In this
context, resetting refers to abandoning the current operation, and resynchronization
implies detection of the horizontal sync pulses in the incoming image.) For this
reason, crystal-controlled digital clock synchronization is more appropriate for
resettable cameras than PLL-driven clock generators. Framegrabbers with digital
pixel clocks are able to resynchronize immediately to the first field after being
reset.18 For maximum image integrity (zero jitter), a digitally generated clock is
shared between the camera and the framegrabber so the framegrabber does not need
to extract the clock signal from the video sync signal. To avoid propagation delays,
unnecessary travel of the clock signal is eliminated by generating the pixel clock
in the camera and transmitting it to the framegrabber along with the video signal
cable. Excluding delays associated with line drivers and image sensor circuitry, the
propagation delay is estimated to be around 5.9 ns/m for typical camera cables.

6.3.4 Gray-level digitization


In a qualitative description, the analog video signal is a collection of horizontal
lines separated by line sync pulses. Since these horizontal lines are displayed
vertically, in mathematical terms, the digitization of a video image is a 2D sampling
process where the output at the end of each sample corresponds to an image pixel
with an intensity value equal to the actual digital output of the ADC (brightness or
gray level). Each of these pixels has a unique x location within the corresponding
scanned horizontal line, and all of the pixels coming from one horizontal line
are assigned the same y location as the identification of their original scan-line
identification. The timing and control circuit allocates the x-y coordinate location
to each of the ADC clocked-out intensity values by generating an address from the
sync stripper outputs and the pixel clock [Fig. 6.12(a)]. The entire collection of
these pixels for one pair of odd and even fields makes up one frame of the captured
digital image, and it is stored in the framegrabbers video memory. The size of the
memory space required to store one complete frame is related to the number of
samples taken along the horizontal and vertical directions as well as the resolution
of the ADC for the incoming video signal. The number of samples taken during the
digitization process is referred to as the spatial resolution. Horizontal resolution is
the sampling frequency of the ADC set by the pixel clock. The vertical resolution
is the distance between consecutive video lines in the analog signal and therefore
is related to the input video format. The majority of monochrome framegrabbers
for industrial machine vision have a 512 512 spatial resolution and an 8-bit-wide
ADC, giving 256 levels of pixel gray-level resolution.

6.3.5 Look-up table


Through the use of a 2D memory map [Fig. 6.12(b)], a programmable look-up
table (LUT) used in conjunction with an arithmetic logic unit (ALU) permits the
remapping of the input/output (I/O) data. By performing a set of basic logical
(AND, OR, XOR) and arithmetic operations (addition, subtraction, averaging), a
high-speed ALU is capable of implementing several onboard digital preprocessing
Imaging Hardware 181

Figure 6.12 (a) Address generation from video line number. (b) Memory mapping in an
8-bit LUT, where (s , r ) denotes 1 byte of data and its address location.7

operations in real time. Machine vision framegrabbers usually have two ALU
and LUT combinations around the frame store buffer (Fig. 6.11). By using them
independently, the user is able to transform image brightness by simply storing the
appropriate gray-level transformation map in the LUT RAM. In image processing,
the address bus of the RAM is used as the image input brightness, and the data
bus is connected to the image output; the size of the RAM is equal to the image
brightness resolution (256 words for 8-bit gray-level resolution). The memory
cycle time must be less than the pixel clock for real-time implementation of a LUT.
A key advantage of having two separate LUTs is that the output LUT can be used
for the sole purpose of modifying the image for display per a predefined intensity
map without affecting the data in the buffer store, which can be used as part of the
onboard processing operations.

6.3.6 Image store


The frame buffer stores the digitized image for further processing by the onboard
processor or for transfer to the host to compensate for bus latency. An image buffer
store may need to handle up to four pixel datastreams: (1) retrieving the video data
from the ADC, (2) outputting the display image data to the DAC, (3) sending the
pixel datastream to the host, and (4) bidirectional data transfer with the onboard
processor hardware. As pixel data are retrieved from the ADC or fed to the DAC
in a row-wise top-to-bottom manner per interlaced raster scanning, the two video
datastreams may be integrated into one serial datastream. The other two data
transfer operations require random access. Thus, two-port (dual-ported) memory
is used: serial access memory (SAM) and RAM [Fig. 6.11(a), image store].
In the read cycle, the selected memory data is first transferred to the data shift
register, which in turn clocks them out to the SAM port. In the write cycle, pixels
are first fed into the shift registers from the SAM port; when the registers are
full, their contents are transferred to locations (cells) within the memory body.
Although the data transfer time between the shift registers and the memory body
182 Chapter 6

is very short, there is a potential for conflict if both the RAM and the SAM ports
demand access to the memory body at the same time. This conflict is prevented
by giving priority to uninterrupted video input and image display operations. The
SAM and RAM ports have different cycle time requirements. For the SAM port, the
read/write operation effectively involves shifting and latching. Because the RAM
port is connected to either the host processor or the dedicated onboard processor
hardware, its cycle time is related to the time required by the data to reach the
destination (data-transfer time). To optimize performance, the usual practice is to
use a zero-wait state dual-ported memory module as the image frame buffer. The
wait state refers to the period (in clock cycles) during which a bus remains idle
due to a mismatch between the access times of different devices on the bus. Wait
states (in clock cycles) are inserted when expansion boards or memory chips are
slower than the bus. A zero-wait-state memory permits the processor to work at
its full clock speed, regardless of the speed of the memory device. The size of the
memory body (image store) is usually a multiple of the image frame dimension;
one image frame with a spatial dimension of 512 512 and an 8-bit gray-level
resolution requires 262 KB.

6.3.7 Dedicated processor


In many high-end applications many preprocessing tasks are performed onboard
the framegrabber [Figs. 6.11(b) and (c)]. Depending on the complexity of the
applications, these real-time front-end hardware modules perform a prescribed set
of arithmetic and logical operations on the image gray levels in the buffer memory.
Such operations may include histogram generation, gray-level scaling, image
masking and resizing, pixel coordinate transformation, and convolution from a
list of operator tables. Image-processing hardware with one or more digital signal
processor modules, complete with inter-process communication links and memory
to store multiple frames (graphics accelerator boards), are standard features in
plug-and-play boards for multimedia applications.

6.3.8 Video sampling frequency


The sampling grid shown in Fig. 6.13 may be rectangular or square. Preprocessing
can compensate for the geometric distortions caused by rectangular pixels
(circle appearing as an ellipse). However, since the majority of image-processing
algorithms assume equal interpixel separation by default, input video signals are
normally sampled to generate square pixels for software compatibility and reduced
computational overheads.
For TV monitors, if y s is the spacing between horizontally scanned lines and Y
is the total number of active horizontal scan lines (= the height of the displayed
image), then
Y





485
for NTSC,
ys =
(6.2a)


Y
for CCIR.
575
Imaging Hardware 183

Figure 6.13 2D sampling and a sampling grid of an image frame. The voltage levels
shown correspond to CCIR. For CCIR, xD = 6.5 s; the corresponding value for NTSC
is xD = 6.575 s. The analog signal voltage levels, 0.3 V and 0.7 V are CCIR standard
values (Sec. 6.1). The corresponding voltage levels for NTSC are 0.286 V and 0.714 V.

These are the highest spatial resolutions of NTSC/CCIR signals. The older
generation of TV-monitor-based displays had more pixels (768567, for example).
They did not provide higher resolution but did capture wider views.
The interpixel distance along the y axis is given by

y = ny s , (6.2b)

where n is a positive integer; n = 1 when the whole image frame (i.e., the odd
and even fields together) is sampled, and n = 2 when the two fields are sampled
separately.
The highest spatial resolution that can be achieved in the two standards is
512 512 pixels. For this resolution, the whole image frame is sampled (n = 1).
Interpixel separation of a square sampling grid along the x and along the y axis is
then derived as




3 52.59 s


= 81.33 ns for NTSC,

4 485
y = x =
(6.2c)


3 52.00 s


= 67.83 ns for CCIR.
4 575
184 Chapter 6

Interpixel separation along the y axis is y = nominal active line time/number of


active lines per frame; values of these parameters for the two standards are given
in Table 6.1.
With frame rates of 30 fps in the NTSC format and 25 fps in CCIR, the
corresponding pixel clock frequencies are

1 12.29 MHz for NTSC,
f pixel = = (6.3)
x 14.74 MHz for CCIR.

If the odd and the even fields are used for producing one image frame each (n = 2)
and sampled separately, the image acquisition rate increases to 60 or 50 fps at the
expense of increased interpixel separation of x = 2y s The pixel clock rate then
reduces to

f pixel = 6.15 MHz for NTSC, (6.4)
7.37 MHz for CCIR.

Digitization of the video signal starts at the beginning of a field marked by the
vertical sync signal. For a 512-line operation and assuming an even field, the video
counter that registers the number of samples taken per horizontal line is reset to
zero. The first 11 lines of video are ignored. When the twelfth horizontal sync is
detected, digitization is initiated by resetting the horizontal pixel counter (to count
down from 512) and then timing out by 6.575 s in NTSC (6.5 s in CCIR) for a
1:1 aspect ratio. After this timeout period, sampling begins at the pixel clock rate
until the correct number of samples has been taken. Each sampled value is stored
as an image data intensity with the ADC resolution (typically 8 bits, but possibly as
high as 12 bits in some hardware). After the last sample has been taken, horizontal
pixel counting and sampling stop until the next vertical sync (next field, odd in this
case) arrives and the entire process is then repeated, i.e., discounting of the first 11
lines, timing out for a 1:1 aspect ratio, and sampling for the next 512 points in each
new horizontal line. If the resolution is 256 256, the sampling process takes place
after alternate vertical sync signal, thereby capturing only the odd or the even field.
Because of the fast clock rates, all framegrabbers use a flash AD (or video)
converter (conversion time is one clock period). The front-end analog block in
Fig. 6.11(a) conditions the video input signal, samples it at the pixel clock fre-
quency, quantizes it with the ADC resolution, and puts a video datastream into the
dedicated image-processing block. A reverse process takes place at the back end to
convert the processed datastream into an analog video signal for display. Since any
variation in the input or output clock frequencies will create horizontal distortion,
the ADC and the DAC clocks are driven by the same pixel clock. Table 6.5 lists
some of the key parameters in the specifications of a machine vision framegrabber.

6.4 Latency Parameters


While the actual capture/exposure of the target image may be as low as a few
microseconds, the time taken to transfer the image to the processor may be a
Imaging Hardware 185

Table 6.5 Typical specification list of a PC-based monochrome machine vision framegrab-
ber (courtesy of CyberOptics and Imagenation, Portland, OR).
Specification feature Parameters

Bus and image capture (form factor) PCI bus-master, real-time capture
Composite video inputs Monochrome, RS-170 (NTSC), CCIR (PAL)
Up to four video inputs (switch or trigger)
Video format Interlace, progressive scan, and resettable
Analog front end Programmable line offset and gain
Image resolution NTSC: 640 480 pixels (768 486 max)
CCIR: 786 576 pixels
ADC resolution 8-bit ADC, 256 gray-level resolution
LUTs 256-byte programmable I/O LUTs
Onboard memory * 8-MB FIFO
Onboard processing Typically none for mid-range framegrabbers
Acquisition rate ** Typically 25 MHz
Display Typically none in mid-range framegrabber
Sampling jitter 2.6 ns with 1-line resync from reset;
0 with pixel clock input
Video noise 0.5 least significant bit (LSB)
External trigger Optically isolated or transistortransistor logic (TTL)
Strobe and exposure output One strobe and two exposure pulses (up to 59.99 min)
Digital I/O Four TTL inputs and four TTL outputs
Flexible memory Through scatter-gather technology
Image information Image stamp with acquisition status information
Framegrabber power requirement +5V, PCI, 700 mA
Camera power requirement +12 V, 1 A for up to four cameras
Operating system Windows 98/98SE/2000/ME, NT4, XP
Programming language supported Visual C/C++
*
Commercial framegrabbers offer more than 128 MB of onboard memory.
**
For digital framegrabbers, the acquisition rate is given in Mbits/sec.

limiting factor in measurement throughput. An overview of some latency para-


meters is given here.

6.4.1 Capture latency1820


Latency is the delay time between two successive operations. In digital cameras,
once the exposure button is pressed (exposure trigger), an image is captured,
transferred to the local memory, and all control and timing circuits are reset to
enable the next exposure. Latency time in the current generation of still digital
cameras varies from a few seconds to up to 20 sec, with the typical figure being
around 10 sec. For machine vision cameras, the image capture time and transfer
time, along with any delays between the exposure trigger input and the actual
exposure time, contribute to image-capture latency.
Standard video cameras run in continuous (synchronous) mode and produce an
analog image at the end of each charge transfer cycle, with sync signals built
into the output video. A framegrabber connected to such a camera extracts the
sync signal, reads the camera output, and stores the image frame with no external
exposure trigger to the camera. Since the transfer of the previous frame runs
concurrently with the capture of the current frame, the delay in this synchronous
186 Chapter 6

mode is equal to one frame transfer time [Fig. 6.14(a)]. Because of the limited
time, the continuous mode is used for offline measurement or analysis as part of
a statistical quality control process. In some very high-performance systems with
custom-built onboard processing hardware, a limited amount of online tasks may
be performed at the expense of missing a certain number of intermediate frames.
In applications that require particular image characteristicsfor example,
high-contrast images taken in low ambient lighting or variable contrast in the
target objectsa capture command/trigger is added to allow for a programmable
exposure time. In this pseudo-continuous (pseudo-synchronous) mode, capture
latency is increased because the camera outputs an image at a rate equal to the
exposure time plus the frame transfer time [Fig. 6.14(b)]. For a moving target,
interlaced cameras may produce an offset between the odd and even field images
due to the one-half frame delay at the start of scanning (uneven vertical edges or
motion tear). Motion tear for a target moving at a constant speed (tear ) may be
estimated18 by Eq. (6.5) (in pixel units):


pixel pixels per scan line
tear = target velocity field time . (6.5)
horizontal field of view

Figure 6.14 Continuous modes of camera operation: (a) without an external trigger and
(b) with an external trigger for exposure19 (courtesy of Matrox, Dorval, QC, Canada).
Imaging Hardware 187

With an interlaced camera image, capture latency may be reduced by a factor


of up to four if the odd and even fields are captured separately from the
camera. These two separately captured images are then combined by the host
processor to form a full frame (at the expense of some software overheads). For
very high-speed applications, motion tear is reduced by using progressive-scan
cameras with compatible noninterlaced framegrabbers. Some cameras can acquire
a noninterlaced image and transfer its odd and even fields to the framegrabber.
Many framegrabbers include external trigger modes to acquire an image frame
at a given time, either at an instance anytime in the frame time cycle (resettable),
or at a specific time (asynchronous) under program control. In the resettable mode,
the camera continuously scans the target and provides an output as in continuous
mode, but these images are not read by the framegrabber. The capture trigger to
the framegrabber puts the framegrabber in the receiver mode, making it accept the
next full-frame image that appears at its input port [Fig. 6.15(a)]. Since the capture
trigger may appear anywhere within the frame time window, latency uncertainty
in this mode is up to one frame time. This mode is suitable when a single image
or a sequence of images is to be captured through an externally generated trigger
signalfor example, by the appearance of a target part at a particular registration
position within the FOV.
In asynchronous mode, the camera maintains its reset position (not exposed or
capturing any image) until a capture trigger is received from an internal timing
circuit or an external event [Fig. 6.15(b)]. The latency time in this mode is the sum
of the exposure latency, any default or programmable delay set by the framegrabber
between receiving and reading an image frame, and frame transfer time. This mode
is suitable for interface to programmable logic controllers (PLCs).
A variation of the asynchronous mode involves attaching the capture trigger to
the framegrabber rather than to the camera. In this control mode, the total latency
is increased by the delay in the transfer of the trigger signal from the framegrabber
to the camera (Fig. 6.16). This mode is suitable when a new image is to be captured
after other tasks are completed in the host processors and onboard framegrabber,
e.g., after processing the previously captured image frame for online applications.
Table 6.6 summarizes the operations and latency factors of these modes.
A direct consequence of image capture latency is the scene-to-scene variations
in parts of the target within the FOV, especially when this variation is of comparable
size to the target part.18 For a part velocity of V part and acquisition latency of
latency
T acq , the positional variation pixel
position in the captured images is given by

pixel latency
position = V part T acq . (6.6)

Variations or uncertainty in the location of the target parts in the FOV may
require either closed-loop control of the image acquisition timing or a larger FOV.
Equation (6.6) may be used as a design basis in both cases. The former requires
additional hardware, while the latter reduces the pixel resolution in the captured
image. One option is to optimize the FOV area with respect to the statistically
188 Chapter 6

Figure 6.15 Capture cycle sequences for (a) resettable and (b) asynchronous operations19
(courtesy of Matrox, Dorval, QC, Canada).

collected data on positional uncertainty pixel
position uncertainty
within the target scene
for a given image resolution. The other option is to assign an acquisition latency
time and then compute the limit on positional variation for a given part velocity.
If the target is in motion during exposure, the captured image is likely to be
blurred. When the image of a target part moving at an axial velocity of V part is
captured with an exposure time of T exp , the magnitude of the image blur in units
of pixels is given by18


pixel number of pixels in the axial direction within the FOV
blur = V part T exp .
axial width of the FOV
(6.7)
The practical way of reducing blur is to slow down the target motion or to
reduce the exposure time. Since the target scene or target part speed may not be
Imaging Hardware 189

Figure 6.16 Control mode operation19 (courtesy of Matrox, Dorval, QC, Canada).

Table 6.6 Summary of operations and latency factors in image acquisition.

Mode Exposure Capture Transfer Read Latency time

Continuous continuous over continuous continuous continuous frame transfer


default * time

Pseudo- continuous over continuous continuous continuous exposure + frame


continuous specified time transfer
Resettable continuous continuous continuous next full frame up to one frame
after trigger to + frame transfer
framegrabber
Asynchronous on trigger to after camera after capture next full frame camera exposure,
camera exposure after camera camera trigger delay
trigger trigger + frame transfer
Control on trigger to on receipt of after capture next full frame transfer of trigger
framegrabber exposure after camera from framegrabber
trigger from trigger to camera + camera
framegrabber exposure + camera
to camera trigger delay
+ frame transfer
*
The default exposure time for video cameras is one frame time.

controllable in high-speed systems, an electronic shutter or strobe illumination is


commonly used to minimize the exposure time.
Strobe lighting is used to effectively freeze a moving object in the target scene
(Fig. 6.16) using a xenon or pulsed LED source. With strobe lighting, the critical
task is to fire the strobe during the cameras exposure time in a CCD camera and
only between image readouts in CID or CMOS cameras (because these image
sensors do not have an output register). Due to the potential uncertainty in software
latency, a custom-built hardware block within the framegrabber that is linked to
190 Chapter 6

the camera scanning cycle is used to generate the strobe signals. The intensity
of the strobe light is usually related to the pulse frequency, which ranges from
1.25 to 4 MHz. Since strobe signals are generally very intense, care is required
to ensure that the average intensity of the pulsed illumination is comparable with
the ambient lighting of the target scene. An alternative to a high-intensity strobe
light is an electronic shutter (typically with a 20-s duration) that is controlled
by either the camera or the cameraframegrabber interface. The signals required
to trigger the exposure and strobe as well as the vertical and horizontal controls
(known as the genlock signals) are normally grouped under digital I/O lines in
the framegrabber specification. Typically, eight digital I/O lines are included in
commercial machine vision framegrabber cards. Genlocking connects multiple
cameras to a single framegrabber, which ensures identical video timing as cameras
are sequentially switched into the video input stage.

6.4.2 Transfer latency18


As described in Sec. 6.1 [Fig. 6.4(b)], after the sync and video signals are split up,
the analog intensity signal is digitized and passed through LUTs, and the captured
frame is transferred to the host. To avoid any potential transfer latency between the
capturing hardware and the host, resulting in a missed acquisition, one frame store
buffer is usually put on the framegrabber. If a significant amount of processing
work must be performed between each capture, the image size may need to be
reduced or more frame store buffer added to avoid the loss of captured images (see
Fig. 6.17). For this reason, some framegrabbers permit image acquisition between
two user-defined horizontal scan lines. To improve image throughput to the host
processor, the acquisition and processing operations are performed in parallel by
using two onboard frame buffers to store the incoming image frames alternately
(popularly known as pingpong memory),which are then fed into the host processor
sequentially.
Although the PCI burst (peak) data throughput is 132 MB/sec, a mid-range bus-
mastering PC may have an average data rate of 60 to 80 MB/sec due to bus sharing

Figure 6.17 Image transfer from framegrabber to host through PCI bus. The timing pulses
illustrate bus latency and the loss of a framegrabber image during processing by the host.18
Imaging Hardware 191

between the PCs central processing unit (CPU) and memory. In some cases, the
motherboard RAM may not support the full PCI peak rate. Since the video input
from the image sensor is usually received at a constant rate, the video memory
acts as the buffer to accommodate bus sharing (control signals and memory data)
with other plugged-in devices, including multiple cameras. If the framegrabber in
an overloaded bus has insufficient video memory, the captured image data may be
lost or corrupted. Using dual-ported memory or FIFO buffers and scatter-gather
capability, PCI bus master devices can operate without the onboard shared memory
arrangement indicated earlier. (A bus master allows data throughput from the
external memory without the CPUs direct involvement. The scatter-gather feature
ensures that the image data received at the destination memory is contiguous.)
For high-speed applications, memory access latency may be reduced by
transferring the captured image to the host memory with the framegrabber
hardware operating as a PCI bus-master and managing the transfer itself. This
permits the host to handle the processing tasks using the captured image data. To
remove the need for data access through addressing, many framegrabbers use large
FIFO buffers that are capable of storing multiple image frames. In this operation,
the framegrabber issues an interrupt at the end of each frame transfer so that the
host CPU can proceed with its processing operations on the latest frame. In this
case the transfer latency is the time period between the image data transfer from the
camera and the conclusion of the framegrabbers end-of-frame interrupt servicing.
Data movement during the PCI bus transfer occurs in blocks during the time when
a target image is being captured from the camera, so the scatter-gather feature of
the PCI bus-master becomes relevant. When an application requests a block of
memory to hold image data (for example, in Pentium PCs, memory is available
as a collection of 4-KB pages), the required (logical) memory made available by
the operating system may not necessarily be physically contiguous. With scatter-
gather capability, the software driver for the board loads up a table to translate the
logical address to a physically contiguous address in the memory. In the absence
of scatter-gather capability, either the application software must ensure that the
destination memory is contiguous, or a software driver must be used to convert the
logical addresses issued by the processor to contiguous physical addresses in the
memory space. Accelerated graphics processor (AGP) slots have made it possible
to access the host RAM at a very high bandwidth without any framegrabber FIFO.
The image acquisition latency with an AGP is equal to the latency with the end-of-
frame interrupt servicing from the framegrabber.

6.4.3 Effects of latency20


Other than the capture- and transfer-related latencies, several other factors
influence the overall latency in a machine vision system. Figure 6.18 illustrates
one example, and some of the key parameters for high-speed imaging are listed
below:
Part detector: time between the target components arrival at the reference position
and the cameras receipt of the image capture command.
192 Chapter 6

Figure 6.18 Configuration of a part-dimension measuring setup (all parameters are given
in millimeters).

Image capture command: time interval between the vision systems receipt of the
capture signal and the actual start of image capture.
Strobe/shutter trigger: time between the start of image acquisition and the start of
a strobe pulse or opening of the shutter.
Exposure time: time required by the vision illumination system (e.g., pulsed light)
to create an exposure. In steady light, the exposure time corresponds to the
cameras exposure time.
Video transfer: time required to transfer a video image from the camera to the
framegrabber.
Transfer to host: time elapsed between the end of the video transfer to the end of
the image data transfer from the framegrabber to the host CPU. The time elapsed
is contingent on other devices competing with the framegrabber to communicate
with the host.
Image data processing: time taken by the host CPU to complete the assigned
processing tasks on a captured image frame. This latency is dependent on the
complexity of the image content and other demands on the processor resources. For
a given algorithm, this time may be computed from the host processors parameters.
Resynchronization: In all image-processing work, the processing time is closely
related to image content. A very efficient and structured algorithm or codes
may lead to a constant execution time, but various uncertainties within the
complete vision system may not permit a guaranteed cycle time for a given set
of numerical operations on the captured image data. A more pragmatic approach is
to resynchronize the processed results by tagging a time stamp on each input image.
A time stamp, which need not correspond to the actual time, is a sequential record
of the receipt of incoming images with respect to a time base, perhaps from the
operating system. This time-tag stamp remains with the image as it is processed and
placed on the output queue. Resynchronization of the processed results is achieved
by placing the outputs in the sequential order of their time stamps.
Imaging Hardware 193

Time base: One time-based interval is added if the incoming image is time-tagged.
Output activation: time interval between the end of image processing (or
resynchronization) and the final event within the vision system. This term includes
all mechanical delays, processing overheads, and the signal propagation lag.
While not all of the above factors may be present in a specific vision system, they
are related, and it is useful to refer to them when deriving the system specifications
for a given application. For the setup in Fig. 6.18, the first parameter to estimate
is the FOV. Assuming a resolution of 4 pixels/mm in the captured image (i.e., a
2-pixel minimum detectible feature size), FOVH = 1000 and FOVV = 800. If both
FOVs are rounded up to 1024 1024 pixels, the FOV = 256 mm 256 mm. If the
image blur is limited to 1 pixel and the motion tear to 1 mm (4 pixels), [combining
Eqs. (6.5) and (6.7)], then

field time pixel field time


T exp = pixel
blur  . (6.8)
tear 4

The parameters listed in Table 6.7 indicate an image capture and processing
subtotal range of 179 to 206 ms or an uncertainty of 27 ms, which corresponds to a
linear distance of 5.4 mm or 20 pixels. This value may be improved through further
iterations and changes in the characteristics of the vision system. (Optimization for
latency parameters is an application-specific task.)18,20

Table 6.7 Latency parameters for the setup in Fig. 6.18.18

Latency parameter Latency time (ms) Comments


Minimum Maximum

Part detector 1 3 From hardware setup specification


Image capture 33 33 For continuous camera operation
Strobe 0 0
Exposure 4 4 Computed from motion tear and blur assumptions
Video transfer 33 33 Frame rate
Transfer to host 33 33 Frame rate
Image processing 75 100 Estimate
Subtotal 179 206 Image capture and processing subtotal
Resynchronization 33 33 Assumed to be 1 frame time
Time base 33 33 Assumed to be 1 frame time
Output activation Not included
System total 245 272

While image-capturing latency is a quantifiable parameter, its value does not


necessarily dictate the real-time performance of the whole system. There is no
single definition of real time, but the premise is that actions or results are
available when required at various stages within an application, and it is assumed
that individual operations are required to be executed in a deterministic timing
sequence. By identifying the slowest subsystem, latency parameters form only a
part of a real-time system or high-speed machine vision system design.
194 Chapter 6

6.5 Resolution2024
The captured image is the primary source of all processing operations, so the
quality of the processed image data is closely tied to the characteristics of the input
image data. In this respect, resolution is a key feature in the quantification of the
input image data.

6.5.1 Gray-level resolution


The gray-level (or intensity) resolution corresponds to the width of the video ADC
output. For most mid-range machine vision applications, an 8-bit ADC giving
256 levels of intensity is adequate. Other common outputs are 10-bit (1024 gray
levels) and 12-bit (4096 gray levels) in medical-imaging and high-performance
multimedia applications. The intensity resolution indicates the granularity of
the captured pixels and hence the visual continuity between neighboring pixels.
However, in machine vision the numerical accuracy and repeatability are more
important than the visual appearance of the digitized image. Several other forms of
resolution are described here in the context of image processing.

6.5.2 Pixel resolution


Pixel resolution is the number of rowcolumn pixels generated by the image
sensor. The analog framegrabber rating refers to the maximum pixel resolution
input for the onboard hardware. The frame store buffer is designed around this
input rating. If the analog camera and the framegrabber share a digital pixel clock
generator, then the camera output pixel resolution is the same as the framegrabber
input pixel resolution. If the pixel clock signal is extracted from the camera
composite signal, as in the case of standard video/closed-circuit television (CCTV)
cameras, the framegrabber ADC output may not be synchronized with the clocking
signals in the image sensor. This may cause spatial nonuniformity in the captured
image. In such cases, the common practice is to use the lower of the two input
resolution values in the framegrabber rating.

6.5.3 Spatial resolution2124


The image sensor sees the target scene spanned by the FOV through the camera
lens/optics, and the captured image is a 2D transformation of the object image
scene in the FOV (Fig. 6.19). The scaling parameters that transform the 2D FOV
scene to the 2D captured image lead to the spatial resolution (Chapter 7), which is
given in pixels per linear distance unit along the x and y axes. The FOV area on the
image plane is related to optical magnification, so the spatial resolution is tied to
the optics in the cameras lens assembly. For a stationary target scene, the optical
relationships in Eq. (6.9) are commonly used to relate the focal length, sensor pixel,
and FOV dimensions.23 Using the parameters in Fig. 6.19 with z1 = si and z2 = so ,
and from Sec. 3.3.1 [Eq. (3.6c), y1 is image height]:

z1 = Magnification (M) = x1 = xi = y1 = yi . (6.9a)
z2 x2 y2
Imaging Hardware 195

Figure 6.19 Parameters for FOV computation. The solid FOVangle subtended by 1 pixel
is obtained by replacing the pixel width with its diameter (= 2w for square pixels). The
addition of a lens extension moves the image capture plane farther from the objective.

Adding the FOV dimensions and DHf ormat and DVf ormat , the horizontal and vertical
dimensions of the sensor format are derived as

FOVV = x2 |max =
1
xi | =
1 f ormat


M max
D
M V



. (6.9b)
1 1 f ormat


FOVH = y2 |max = yi | = D
M max M H
Also from Eq. (3.6a),


1 1 1 1 1

= + = 1+

f z1 z2 z2 M







1


z2 = 1 + f
, (6.9c)
M





FOV


= 1 + f ormat f


D

where stands for the horizontal or vertical dimensions. For a given square pixel
width w, the horizontal FOV for one pixel is given by [without extension tube
(Fig. 6.19)]


1 w
pixel = 2 tan . (6.9d)
2f

Most standard CCTV lenses do not focus below 500 mm (focal lengths of
standard CCTV lenses are 8, 12.5, 16, 25, and 50 mm), so the available lens and
object distances in some applications may not permit perfect focusing on the focal
plane. In such cases, an extension tube is added to move the lens away from the
196 Chapter 6

image plane. The relationships among magnification M, image distance z1 , and


extension tube length Lext are derived in Eq. (6.10):


z1
z1 = + 1 f = (M + 1) f, (6.10a)
z2

and
Lext = z1 f = M f. (6.10b)

Thus, a half-inch-format sensor (DformatH = 6.4 mm, Dformat


V = 4.8 mm) with a
50-mm lens located 700 mm from the object plane will have an FOV of around
90 mm 67 mm. If Eq. (6.10) is used for z2 = 700, M = 0.077, the required exten-
sion tube length becomes Lext = 50 0.077 = 3.85 mm.
Other than the operational features of commercially available lenses (Table 6.8),
lens choice is dictated by the object distance and the required FOV area. To
compute the optimum FOV, a priori knowledge is needed of the part size range
(for dimensional measurements) and the feature size (for defect detection). Due to
hardware limitations, in some measurement applications the FOV may contain a
partial image for its maximum pixel resolution.

Table 6.8 Combinations of applications and lenses. A telecentric lens collimates incoming
light with reduced shadows at the expense of a narrow FOV.
Lens Type
Application CCTV lens with Telecentric lens Zoom lens 35-mm standard
C/CS mount * photographic lens

Defect detection
Refractive defect detection
Gauging of thick objects
(variable depth)
High magnification
inspection
Alignment
Part recognition
Optical character reading
Pattern matching
Flat-field gauging
High resolution gauging
and other applications
Surveillance
Optical feature Medium to Constant viewing Low Very low distortion
high image angle over FOV and distortion
distortion large depth of field
Overall optical Poor to good Fair to excellent; Good to Good to excellent
performance performance excellent
improves with cost
Relative cost Low High Mid-range Low to high
to high
*
Parameters related to C and CS lens mounts are listed in Table 3.3.
Imaging Hardware 197

In defect detection, the camera orientation (alignment) and illumination may be


critical for placing the target defects within the FOV. Target parts are unlikely to
arrive at exactly the same location, so for general size measurements, alignment
factors are added to derive the FOV size:

FOVH = (L pH + LpH )FalignmentH and FOVV = (L pV + LpV )FalignmentV , (6.11)

where L pH L pV is the nominal dimension of the target part, and Lp and Lp are
their likely variations as parts arrive within the FOV. These figures are generally
given in the application specifications. The alignment parameters Falignment are
necessary to ensure that the FOV can encompass all likely variations in part size
with a reasonable degree of reliability (e.g., the part edges should not be too close
to the FOV boundary). Camera alignment is a major task in any vision system
installation, so an element of judgment is used to choose Falignment (a typical figure
is 10% around the nominal FOV). Spatial resolution is derived as the FOV length
along the vertical (or horizontal) direction divided by the pixel resolution of the
image sensor in the corresponding direction in units of mm/pixel (or inch/pixel).
Allowing dimensional tolerance limits of 5% along both sides of an oblong
target part with nominal dimensions of 10 mm 5 mm and Falignment = 10%, from
Eq. (6.11), an FOV of 12.1 mm 6.05 mm preserves the target objects aspect ratio.
However, if this image is to be captured on a standard TV camera with an aspect
ratio of 4:3, the FOV must be made larger in one direction for square image capture.
If this FOV is to be captured by a sensor with a pixel resolution of 512 512, the
spatial resolution in the captured image becomes 236 m/pixel 118 m/pixel.
The camera alignment figures and camera location are thus dependent on the
lens/optics parameters as well as the image format.
Feature size is the linear dimension of the smallest object to be captured with a
reasonable degree of reliability. The analog video image is sampled in the capturing
process, and the Nyquist sampling theorem is used to provide the theoretical lower
limit, which is given as two pixels. For analogy with another capturing activity,
the smallest fish that a fishing net of uniform mesh size can catch is twice the nets
mesh diameter. Thus, for the above pixel resolutions, the theoretically detectable
feature size is 572 m 236 m. However, allowing for the presence of noise in
any video signal, a size of three or four image pixels is considered more realistic
to recover a one-pixel-size feature in the object space with good contrast. Upward
adjustments are made for poor contrast and low SNRs. Although a captured image
has a resolution of one pixel, interpolation may be used to create subpixel accuracy
through numerical operations. While the achievable level of subpixel resolution
is limited by the accuracy of the numerical algorithms and computations and the
characteristics of the processor, 0.1 pixel is usually taken to be a realistic limit in
vision-based measurements. This limit in turn dictates the measurement accuracy
that may be expected from a given hardware setup.
In traditional metrology, measurement accuracy is better than the tolerance
by a factor of 10, with the measurement instruments resolution (the smallest
measurable dimension) being 10 times better than the measurement accuracy.
198 Chapter 6

Thus, the measurement resolution is 100 times better than the components
dimensional tolerance. Although much of the human error is eliminated in machine
vision systems, a ratio of 1:20 for part dimension to measurement resolution is
considered more realistic than 1:100.

6.5.4 Assessment of resolution8,20


In addition to scene illumination and alignment, the quality and reliability of the
captured image depend on calibration. Here, calibration refers to the determination
of the spatial resolution of the captured image, the location of the camera with
respect to the base reference axes, and the anticipated target part motion. The
latter is related to the time interval between successive image captures for pixel
data throughput computation. The calibration parameters may be divided into two
groups (Table 6.9): one group includes the application specifications matched with
alignment parameters for camera location, and the other group includes parameters
derived to estimate resolution and accuracy in the image-based measurement
output data.
Since many vision-based measurement applications involve moving parts within
the image scene, the processing cycle time of the entire system must be estimated
to appraise the throughput execution times of numerical algorithms in the context
pixel
of feature resolution. In a first analysis, the data throughput estimate (Nthoughput , in
pixels per second) in Eq. (6.12) may be made using knowledge of the time interval
between two successive image acquisitions T acq interval , which is equal to the scan
rate in line-scan cameras:
pixel NimageH NimageV
Nthoughput = . (6.12)
T acq interval

Table 6.9 and Eq. (6.12) do not include temporal variations, but these variations,
along with a few nominal parameters chosen by the designer, are adequate to

Table 6.9 Resolution parameters commonly used in machine vision applications (uncer-
tainties such as shock and vibration are excluded). The notation is used in subscripts to
indicate the horizontal or vertical directions.
Specification parameter Notation Source

FOV, mm mm FOVH FOVV Target part size range


Pixel resolution of image sensor (pixel number) NimageH NimageV Vision system specification
Pixel resolution of the vision system (measurement Nnumerical Numerical/algorithmic precision
resolution in pixels or fraction of a pixel) of the vision system
Feature resolution in mm D f eature Dimension of the smallest object
to be detected in the FOV
Measurement resolution in millimeters Dmeasurement Target measurement resolution
Derived parameters Relation with other parameters
FOV
Spatial resolution in the captured image (mm/pixel) R spatial Nimage
FOV
Image frame resolution in pixels Nimage R spatial
D f eature
Number of pixels to span minimum target feature N f eature R spatial
Imaging Hardware 199

provide baseline parameters. For example, if the only parameter available to


measure circular holes in the target parts are the nominal diameter d, FOV area
(FOVH FOVV ), and part arrival frequency (=1/T acq interval ), then the following
operating parameters may be computed by making an assumption about the
minimum feature size N f eature :
d
spatial resolution: R spatial = mm/pixel; (6.13a)
N f eature
FOVH FOVH


NimageH = = pixels


R (d/N
spatial f eature )
image resolution: (6.13b)


FOVH FOVV

NimageV = = pixels;
R spatial (d/N f eature )


NimageH NimageV


pixels/second


T acq interval
pixel
processing load: Nthoughput =
(6.13c)


FOVH FOVV


(d/N pixels/second.
f eature ) (d/N f eature ) T acq interval

For a circular part, the resolutions and feature sizes are the same along the
horizontal and vertical directions.
While the amount of data that can be processed by a vision system depends on
several factors, including bus transfer capacity and latency, a figure of 107 pixel/sec
is considered to be typical in a mid-range PC-based machine vision system.
Applications with over 108 pixel/sec may require onboard vision processors along
with high-performance host workstations. For reference, data rates in various
standards are CCIR: 11 MB/sec; RS-170: 10.2 MB/sec; and line-scan cameras:
15 MB/sec.
Figure 6.20 shows the primary sources of noise inherent in the sensing
mechanism. Dark current is due to Si impurities and leads to the buildup of
thermally generated photons (hot pixels) during the integration time. This type of
noise is not separable from photon noise and is generally modeled with a Poisson
distribution (see Table 5.6).3 The level of thermal noise may be reduced with a
shorter integration time and cooling. In thermally cooled cameras, dark current
may be reduced by a factor of 2 for every 6 C reduction in temperature. Air-cooled
cameras are susceptible to ambient humidity; cooling below 4 C requires a vacuum
around the sensing element. Some IR image sensors are built to run at temperatures
around 40 C using Peltier elements, which in turn are cooled by air or liquid
(e.g., ethylene glycol). In cooled slow-scan cameras, the noise floor is taken as the
readout noise.
Some image sensors are designed to operate in multiphase pinning mode, where
a smaller potential well size is used to lower the average dark current (at the
expense of quantum efficiency). A range of commercial devices estimate the level
of dark current from the output of calibration pixels (masked photosites around
the sensor edges) and subtract it from the active pixel output to increase the
overall dynamic range. Readout noise is primarily due to the on-chip electronics
200 Chapter 6

Figure 6.20 Image sensor noise: (a) locations of noise introduction in the signal flow and
(b) illustration of noise effects on the signal levels. Photon (shot) noise is independent of the
generated signal. Reset (or kTC) noise represents the uncertainty in the amount of charge
remaining on the capacitor following a reset. Amplifier noise (or 1/ f noise) is an additive
white noise that can be reduced by correlated double sampling (Sec. 5.4). A summary of
sensor noise definitions is given in Table 5.6.

and is assumed to be an additive noise affected by the readout rate. For readout
clock rates below 100 kHz, readout noise is taken to be constant; for higher
rates, this is modeled as a Gaussian distribution function of the signal intensity.
Quantization noise, the roundoff error due to the finite number of discrete levels
available in the video ADC, is taken as 1 LSB; for the commonly used 8-bit
ADC, 1 LSB = 1/28 1 = 1/255 = 0.4% of full-scale resolution (FSR), or 48 dB.
Imaging Hardware 201

In applications that require a high dynamic range, the ADC width is matched by
the lowest signal level to be detected, the well capacity, and the quantum efficiency.
A smaller well size corresponds to lower quantum efficiency and less blooming
(Sec. 5.6.2).
In addition to sensor noise, captured image scenes may have measurement errors
due to nonuniform illumination and shading. If the level of dark current is known,
one way of reducing some of these errors is to calibrate (normalize) the target
image with respect to its background using Eq. (6.14):

gcaptured (x, y) gdark (x, y)


g f lat- f ield (x, y) = G , (6.14)
gbackground (x, y) gdark (x, y)

where gdark () and gbackground () are the brightness levels associated with the dark
current and the background around the target image; gcaptured () is the intensity in
the captured image; and G is a scaling factor. Although the end result is a high-
contrast image,24 this flat-field correction may not always be convenient due to the
added computational overhead and the need for two additional image frames for
each target scene.
Although the dynamic range is an important factor in quantifying a sensors
ability to retain intensity levels during image capture, the ability of the machine
vision hardware (lens and image sensor) to reproduce the spatial variation of
intensity is also critical in image-based measurement applications. The spatial
characteristics of any optical system are contained in their MTFs, which are
considered in Chapter 7.

References
1. S. Hecht, S. Shlaer, and M. H. Pirenne, Energy, quanta and vision,
J. General Physiology 25, 819840 (1942).
2. K. B. Benson, Television Engineering Handbook, McGraw-Hill, NY (1992).
3. G. C. Holst, CCD Arrays, Cameras, and Displays, Second ed., SPIE Press,
Bellingham, WA (1998).
4. G. A. Baxes, Digital Image Processing: A Practical Primer, Prentice Hall,
Englewood Cliffs, NJ (1984).
5. G. A. Baxes, Digital Image Processing: Principles and Applications, John
Wiley & Sons, New York (1994).
6. P. K. Sinha and F.-Y. Chen, Real-time hardware for image edge thinning
using a new 11-pixel window, in Communicating with Virtual Worlds,
N. M. Thalmann and D. Thalmann, Eds., Springer Verlag, Berlin/Heidelberg,
pp. 508516 (1993).
7. F.-Y. Chen, A Transputer-based Vision System for On-line Recognition, PhD
thesis, University of Reading, UK (1993).
202 Chapter 6

8. P. K. Sinha, Image processing: European Union Erasmus/Socrates Summer


School Lecture Notes, University of Reading, UK (19901998).
9. S. Sherr, Electronic Displays, John Wiley & Sons, New York (1993).
10. C. A. Poynton, A Technical Introduction to Digital Video, John Wiley & Sons,
New York (1996).
11. E. Kaneko, LiquidCrystal Display TV Displays, KTK Scientific Publishers,
Tokyo (1997) and Springer, New York (1998).
12. D. W. Greve, Field Effect Devices and Applications, Prentice Hall, Upper
Saddle River, NJ (1998).
13. T. Voutsas, T.-J. King, Eds., Active Matrix Liquid Crystal Display Technology
and Applications, Proc. SPIE 3014(1997).
14. D. Marsh, Off-the-shelf components: Make light work of machine vision, in
EDN Europe, pp. 2026 (January 2002).
15. Data Translation, Image Processing Handbook, Data Translation, Marlboro,
MA (1996).
16. C. Poling, Designing a machine vision system, OE Magazine, pp. 3436,
May 2002.
17. Y. Kipman, and S. Cole, Linescan cameras expand image resolution, Test &
Measurement World, pp. 1924, October 1998.
18. P. West, High speed, real-time machine vision, Automated Vision Systems,
Los Gatos, CA, and Imagenation (Cyberoptics), Portland, OR (2001).
19. P. Boroero, and R. Rochon, Match camera triggering to your application,
Test & Measurement World, 5358, August 1988.
20. P. West, Roadmap for building a machine vision system, Technical Report,
Automated Vision Systems, Los Gatos, CA, and Imagenation, Portland, OR
(2001).
21. M. Maamri, Control of Mobile Platforms in a Visually Monitored
Environment, Ph.D. thesis, University of Reading, UK (1991).
22. Q. H. Hong, 3D Feature Extraction from a Single 2D Image, Ph.D. thesis,
University of Reading, UK (1991).
23. S. F. Ray, Applied Photographic Optics, Focal Press, Oxford, UK (1997).
24. L. J. van Vliet, F. R. Boddeke, D. Sudar, and I. T. Young, Image detectors
of digital microscopy, in Digital Image Analysis of Microbes: Imaging,
Morphometry, Fluorometry and Motility Techniques and Applications,
M. H. F. Wilkinson and F. Schut, Eds., John Wiley & Sons, Chichester, UK
(1998).
Chapter 7
Image Formation
In the derivation of optical imaging concepts, the object space is assumed to be
a collection of an infinite number of points, with each point being an irradiance
source of diverging spherical wavefronts.1 An optical wavefront is a collection of
rays. A ray is defined as a line marking the direction of flow of optical energy.
An optical wavefront is characterized by the locus of points of constant phase.
The shape of a wavefront depends on its source (Sec. 7.5.1). A lens transforms the
shapes of these emerging wavefronts. The refracted wavefronts from an ideal lens
are perfectly spherical and converge at the focal point on the optical axis of the lens
[Fig. 7.1(a)]. Variations in the incident lens surface shape from a chosen reference
sphere are manifested in the deviation of the refracted wavefronts from the ideal
spherical surface. The resulting error, referred to as the wavefront aberration or
wavefront error, introduces a phase difference between the converging wavefronts
that moves the focusing point of the refracted rays away from the ideal location.
The root mean square of all point-by-point deviations is generally quoted as the
wavefront error of the lens, with the common notation W (lens coordinates , ). It
is expressed as multiples or fractions of the wavelength of the incident wavefront.
An error of /4 (quarter wavelength) is a near-perfect lens for general optical work.
The concept of the reference sphere stems from the geometric property that rays
are perpendicular to the wavefront, so any ray that hits a perfectly spherical lens
surface will converge at the center of the sphere. From geometric definitions, if a
lens gives a perfect image, all rays from the object space intersect at the Gaussian
image point. In this ideal case, the focal point coincides with the center of the
spherical surface, and the converging rays arrive exactly in phase. For practical
purposes, the reference sphere is taken as a sphere of radius R with its center at
the ideal focal point of the lens. For a simple lens, R is the distance between the
exit pupil (or the lens axis) and a point on the image plane [Fig. 7.1(b)]. As a
spherical wavefront continues to diverge from its source, its radius of curvature
expands to become a collection of parallel traveling waves (planar wavefront). For
general optical imaging, the incident wavefront on the lens is assumed to be planar
[Fig. 7.1(c)].
Even if a lens surface coincides with the reference sphere, the object point
(xo , yo ) may be projected at ( xia = xi + xia , y ia = y i + yia ) rather than at the
ideal image point ( xi = Mxo , y i = Myo , M = magnification) due to local surface
unevenness, nonuniform density, or material defects in the lens. The cumulative
203

You might also like