You are on page 1of 5

EUROCON 2003 Ljubljana, Slovenia

Design of an integrated optical receiver


in a standard CMOS process
Eugene van Wyk
University of Pretoria
Pretoria 0002
South Africa
email: eugenevw@ ieee.org
hole pairs into the conduction and valence bands respectively.
To harness these photo-generated charge carriers they have to
be separated by an electric field before they recombine.
Abstract—This paper presents an integrated optical receiver that bands respectively.
allows for the “last-mile” access needed in today’s broadband
networks. The circuit consists of an integrated photodetector, an A. Photodetector Structures in CMOS
amplification chain and a phase- and frequency-locked clock In a standard CMOS process, significant electric fields are
recovery system. It operates at a data rate of 155.52 Mbps with a present in the depletion regions formed at all pn-junctions and
sensitivity -20.45 dBm at a BER of 10-10 and was implemented in can be used to harvest electron-hole pairs. Figure 1 illustrates
a 1.2µm CMOS process. The circuit complies with relevant the basic structures present in a standard CMOS process.
networking standards and synchronization of the data signal
occurs within 1 µs .The entire receiver consumes approximately
26.5 mW from a 3.3 V supply and occupies an area of 2.9 mm2 .

Index Terms—Clock and data recovery, CMOS analog


integrated circuits, optical receivers, photodetectors.

I. INTRODUCTION

T ODAY’S broadband networks support ever-increasing data


rates in order to cope with the global demand for
.
Fig. 1. Photo-detection structures available in a standard CMOS process.
bandwidth. Although modern network backbones can usually Incoming photons excite charge carriers within the substrate.
service this traffic, bottlenecks occur at the nodes where end-
users access these bandwidth resources. Consequently, high- An important parameter that has to be considered is the
speed ‘last-mile’ connectivity is rapidly becoming responsivity of the photodetector. Responsivity is a measure
increasingly important. of the current that is generated for a given optical input power
Even though fibre-optic systems form the backbone of our and is proportional to the pn-junction’s quantum efficiency.
telecommunication networks, optical communication circuits Figure 2 gives an indication of the responsivities obtainable
have been slow to follow the integration trend. This has led to through the use of various photodetector structures and is
prohibitively expensive optical solutions for short-range based on the 1.2 µm CMOS process that was be used for this
communication links. project.
Seen in this context, the capability to produce a high- Responsivity vs. Wavelength
0.35
performance CMOS optical receiver with an integrated photo-
detector would enable greater use of optics in short distance 0.3
n+ p-epi
communication systems.
Responsivity in A/W

0.25
This paper describes the design of a fully integrated optical
receiver that operates at a data rate of 155.52 Mbps and is 0.2
compatible with the relevant optical telecommunication
standards. 0.15
p+ n-well p-epi p+ n-well p-epi
0.1 n-well
II. INTEGRATED PHOTODETECTORS
When a burst of photons arrives at the surface of the photo- 0.05
detector, a fraction of the incident field is propagated into the
0
material where some of its energy is used to excite electron- 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1
Wavelength in um
Fig. 2. Photo-detector responsivity vs. optical wavelength for various
MXQFWLRQVWUXFWXUHVIRXQGLQWKH P CMOS process.
EUROCON 2003 Ljubljana, Slovenia

SML structure uses the relaxation of the gradient to extract


Even though detector responsivity is an important design information from the incoming burst of photons, the data rate
parameter, it is the slow detector response times that have that can be supported is much higher than can be realized
been found to prevent the implementation of fully integrated with other CMOS photo-diode structures. Table I summarizes
CMOS-based receivers. Electron-hole pairs that are generated some detector parameters.
deep within the substrate and have to diffuse a considerable TABLE I
PHOTODETECTOR PARAMETERS
distance to reach the depletion region slow down the photo-
detector and result in a low frequency current gain. The effect Responsivity 0.041 A/W
of this diffusion tail is to increase the amount of intersymbol Cutoff Frequency 442.5 MHz
interference, resulting in an increased bit error rate. Detector Capacitance 149 fF
Photodetector Area  P2
B. Spatially Modulated Detector Structures
7R PLQLPL]H WKH HIIHFW RI WKLV GLIIXVLRQ FXUUHQW RQ WKH III. AMPLIFIER DESIGN
GHWHFWRU IUHTXHQF\ UHVSRQVH D VSDWLDOO\ PRGXODWHG OLJKW
60/ GHWHFWRUZDVLPSOHPHQWHGXVLQJDJULGRILGHQWLFDOSQ In order to amplify the equivalent drift current response, the
MXQFWLRQV DQG PDVNLQJ DOWHUQDWH RQHV ZLWK D IORDWLQJ PHWDO receiver has to output the difference between the immediate
OD\HU>@:KHQWKLVGHWHFWRULVLOOXPLQDWHGWKHDEVRUEHGOLJKW and deferred current responses. This implies the need for two
FUHDWHVDVSDWLDOJUDGLHQWLQWKHFDUULHUFRQFHQWUDWLRQWKDWZLOO identical transimpedance-type preamplifier structures that
UHOD[ E\ GLIIXVLRQ 7KH GLIIHUHQFH EHWZHHQ LOOXPLQDWHG DQG feed into a high-speed difference amplifier, subtracting the
PDVNHG FXUUHQW UHVSRQVHV ZLOO WKHQ UHVXOW LQ DQ HTXLYDOHQW immediate from the deferred current responses. As input
GULIW FXUUHQW UHVSRQVH WKDW PLQLPL]HV WKH GLIIXVLYH WDLO $ signal strengths may vary considerably, a postamplifier that
VSDWLDOO\ PRGXODWHG OLJKW GHWHFWRU XVLQJ WKH SQMXQFWLRQ incorporates a limiting function for automatic gain control
IRUPHG EHWZHHQ WKH Q DQG S HSLWD[LDO OD\HU VXEVWUDWH ZDV was included in the design. Figure 5 illustrates the structure of
IRXQG WR EH WKH RQO\ VWUXFWXUH LQ WKH  —P &026 SURFHVV the complete amplifier.
WKDWFRXOGDFFRPPRGDWHWKHKLJKGDWDUDWHUHTXLUHG Incoming Photons

As the frequency response of a CMOS photo-detector is


To Clock
mainly determined by the transport of minority carriers [1], Recovery
SML Immediate
the time domain solution of the diffusion equation below,
Photo-
should give an indication of the speed obtainable. detector Deferred
Limiting
∂n p  ∂ np ∂ np
2 2
 np OTA
− Dn 2 + − = g (t , x )e −αy
Postamplifier
(1)
∂t  ∂x ∂y 2  τp Preamplifier
 
where Dn is the diffusivity of the minority carrier; np is the Fig. 5. Structure of the complete amplifier chain.
PLQRULW\ FDUULHU FRQFHQWUDWLRQ n is the minority carrier
lifetime and g(t,y) is the electron generation rate at the lower A. Transimpedance Preamplifier
boundary of the depletion region. Having established that a single gain stage would be used for
The following figures show Matlab simulations of the time- the transimpedance amplifier, due to a limited gain-bandwidth
domain solution of the minority carrier distribution for a product, the open loop gain and bandwidth required for a
simple SML structure with only one exposed junction. given feedback resistance and transfer function Q was
derived. A feedback resistor was selected and the required
open loop gains and cutoff frequencies were plotted against
transimpedance bandwidth values. Figure 6 shows these
design graphs plotted for a Q of 0.707 and 0.9 respectively.

Fig. 3. Minority carrier concentration 10 ps after photon impact.

Fig. 4. Minority carrier concentration 5 ns after photon impact.


Fig. 6. Transimpedance amplifier design graphs.
As can be seen in the figures above, the charge gradient
relaxes very quickly near the surface of the material. As the
EUROCON 2003 Ljubljana, Slovenia

The transistor implementation of the transimpedance


amplifier is based on a structure developed by [2] and
consists of a PMOS transistor with a diode-connected load input O1
O2
O3

output
biased by an NMOS current mirror. The input is connected to
the output by a feedback resistance that acts as a
transimpedance element and converts the photo-current to an
output voltage.
B. Operational Transconductance Amplifier OTA_OUT

The high-speed difference amplifier is based on a differential OTA T

pair of modified push-pull inverters that is biased at its VT

switching threshold by active resistors. The inverter input


Replica
stage is loaded with NMOS transistors to decrease the gain
and hence increase the bandwidth of the amplifier. The Fig. 8. CMOS implementation of one replica-biased gain stage
differential signal is then converted to a single-ended output
voltage for further processing. Table II presents some simulation results of the entire
C. Replica-biased Postamplifier amplification chain.
TABLE II
The postamplifier, based on a design by [3], was implemented AMPLIFIER SIMULATION RESULTS
by cascading two replica-biased inverter strings. Number of Stages 13
Replica biasing is a feedback strategy that aims to minimize Worst-Case Bandwidth 106 MHz
the effect of duty cycle degradation through the optimal
Transimpedance Gain G%
biasing of the various gain stages in the signal path. It
involves level-shifting the information-modulated signal Dynamic Range 106 dB
through a gain stage, to a set-point determined by a replica of
the subsequent amplification stage. Figure 7 illustrates the IV. CLOCK AND DATA RECOVERY SYSTEM
basic structure of the replica-biasing scheme.
A. CDR Architecture
Offset Voltage V offset The signal that is obtained at the output of the receiver
Data Signal X(s) amplifier stages may not be processed correctly by the
+ to Next Stage Y(s) subsequent digital circuitry due to input jitter and phase
+ A1 +
+ uncertainty. This signal has to be synchronized with a
recovered clock signal to output an estimate of the original
data sequence sent by the information source.
E(s) A2 - F(s) Clock and data recovery (CDR) circuits for the nonreturn-to-
+
zero (NRZ) data streams used in optical communications have
to include a non-linear processing element, such as the phase-
Replica Biasing Point R
Fig. 7. Block diagram representing the replica-biasing scheme E(s) and F(s)
locked loop used for this implementation, to create energy at
have low-pass characteristics. the missing bit rate. Once energy has been introduced at the
clock rate, this signal can be extracted.
At low frequencies the output signal reduces to Process variations require PLL’s to have a relatively large
A1 A1 A2 1 acquisition range in order to ensure that the signal can still be
Y (s) ≈ X (s) +R + V offset ≈ R (2) locked on to despite the unpredictable local oscillator centre
1 + A1 A2 1 + A1 A2 1 + A1 A2
frequency, which may cause significant frequency errors
Thus the biasing point of the next amplifier stage is driven to
between the incoming data rate, and the local clock. This
a value that is equal to the set-point determined by the replica
error prevents phase synchronization of the VCO with the
circuit. As can be seen from equation 2, all low frequency
incoming signal and leads to false clock frequencies being
components of the data signal X(s) are attenuated.
extracted. Unfortunately, a large acquisition range increases
Both feedback loops consist of five inverting stages: three
the output jitter of the recovered clock ([4], p550) to
modified inverters, a comparator, and a level shifter. The
unacceptable levels.
modified high-speed inverters consist of a standard push-pull
In order to solve this acquisition problem, a frequency-locked
structure that is gain-limited by a NMOS diode-coupled
loop (FLL) was implemented in parallel with the PLL in order
transistor to increase the bandwidth. Two RC filters and a
to compare the frequencies of the input and output signals and
comparator complete the feedback loop. The comparator is
to adjust a voltage-controlled oscillator (VCO) until the error
based on a simple operational transconductance amplifier
reaches a sufficiently small value. Once this value is reached,
(OTA) structure. The circuit implementation of figure 7 is
the FLL is disabled and the low jitter PLL takes over,
shown in figure 8.
acquiring lock. The CDR architecture used for this circuit is
EUROCON 2003 Ljubljana, Slovenia

shown in figure 9.
Decision
Recovered Data A current folding circuit was designed to steer the tail
Device
currents and implement a coarse and fine control function
Recovered Clock
with VCO gains for the FLL and PLL being approximately
Analogue S/H
V/I
equal to 26 MHz/V and 7 MHz/V respectively. The CCO
Phase Loop Filter 1
Detector
Converter transfer function shown below validates the need for dual-
loop architecture as the dramatic effects of process variations
Amplifier PLL CMFB Fine
Signal Control on the oscillator center frequency can be seen.
Current
V/I + Nominal Worst-Case Speed
Controlled
Converter + Worst-Case Power
Oscillator
I Q

Frequency (MHz)
Coarse
FLL
Control
220
VCO Buffer
CMFB 170
Digital FSM 120
Charge
Frequency Loop Filter 2 70
Pump
Detector
0 10 20 30 40 50 60
Fig. 9. Dual-loop clock and data recovery architecture. Fast Control Current (uA)
B. VCO Design
Fig. 11. VCO frequency as a function of a control current
Since the clock recovery circuit was implemented with a dual-
loop architecture, the oscillator was designed to output C. Frequency Detector
quadrature clock signals as a provision for the digital The frequency detector is implemented using a digital 3-state
frequency detector used in the FLL. The current-controlled machine to control the VCO frequency and is based on a
oscillator (CCO) in figure 9 is based on a 2-stage ring system developed by [6]. The detector samples the in-phase
oscillator with composite PMOS-loads [5] to ensure a -180° and quadrature outputs of the VCO at the rising and falling
phase shift at the required unity-gain frequency. edges of the unsynchronized data stream. By comparing these
In order to vary the oscillation frequency, the VCO two sampled clock signals with values stored in its memory,
incorporates delay interpolation, providing a tuning range the frequency detector can then go into either an UP, DOWN
wide enough to encompass process and temperature or RESET state. These states control a charge pump that can
variations. Each oscillator stage consists of a slow and a fast be used to pump the voltage on a floating loop filter up or
path. The transistor implementation is shown in figure 10. down depending on what is required.
The fast path consists of one differential pair, M5-M6, D. Charge Pump
whereas the slow path consists of two differential pairs, M1-
The charge pump is a switched current source that converts
M2 and M3-M4. Interpolation is achieved by varying the tail
the logic levels of the 3-state frequency detector into an
currents of M5-M6 and M3–M4 in opposite directions and
analog current signal that charges a loop filter.
hence, modulating the respective differential pair’s
A differential charge pump, based on [7], was designed. This
transconductance. The output currents from the gain-
charge pump provides two identical output paths for charging
modulated delay stages are then added to yield a signal equal
and discharging a floating loop filter to circumvent the
to the sum of the slow and fast path’s outputs.
problem of unequal currents flowing from the positive and
negative current pumps respectively.
E. FLL Loop Filter and CMFB
In order for the FLL to lock onto the signal, despite process
and temperature variations, the loop must have a capture
M5 M6 range of about 40MHz .As the capture range is approximately
Vin
equal to the loop bandwidth, the filter, based on a simple RC
network, was designed with this cutoff frequency.
Vout The output common-mode level of the floating loop filter is
incompatible with the VCO control inputs and mandates the
M3 M4 implementation of a common-mode feedback (CMFB)
M1 M2
scheme based on a design by [7].
Islow
F. Phase Detector
I constant Ifast

Fig. 10. Transistor implementation of one oscillator stage.


A low jitter, analog sample-and-hold phase detector based on
[5] was implemented for this design. The phase detector is
EUROCON 2003 Ljubljana, Slovenia

realized as a master–slave circuit that tracks the analog Sensitivity -20.45 dBm
oscillator output continuously in response to a rising data at a BER of 10-10
transition. A falling data transition opens the first switch and Locking Time 1 µs
the instantaneously sampled oscillator voltage is stored on Locking Range 100 MHz
parasitic capacitors at the input of the subsequent buffers .The Phase Noise -76 dBc/Hz
second switch is also closed and the sampled oscillator 130 kHz offset
voltage is transferred to the output and held. generating an
Power Supply 3.3V
output that is proportional to the input phase difference..
The detector was implemented using two complementary Power Dissipation 26.5 mW
differential switches that are buffered by source-follower CMOS Process 1.2 µm (SAMES)
stages. To minimize the amount of charge that leaks out onto IC Area 2.9 mm2
the parasitic capacitances of the buffers during data
transitions, dummy switches are added on either side and are
driven with an inverted clock to absorb a part of the injected VI. CONCLUSION
charge. This paper describes the implementation a fully integrated
G. V/I Converter and PLL Loop Filter optical receiver. A spatially modulated light detector feeds
two identical transimpedance amplifiers whose outputs are
The output voltage of the phase detector needs to be
processed by a difference amplifier to yield an equivalent
converted to a current in order to charge the PLL loop-filter.
A V/I converter, that also amplifies the phase detector signal drift current response. A post-amplifier that incorporates a
in order to compensate for the attenuation resulting from the limiting function for automatic gain control further processes
source-follower buffers, performs this function. Once the the signal and feeds a parallel phase- and frequency-locked
voltage that represents the phase error between the incoming loop structure that recovers the clock from the NRZ data
data stream and the VCO signal is converted to a current, the stream.
control signal is channeled through a loop-filter to shape its It was found that integrating optical functions in CMOS
spectral characteristics. The loop filter is based on a simple technology could provide an inexpensive solution for short-
lead-lag network and was optimized for low-jitter operation. range broadband applications. This project demonstrates that
the design of an optical receiver that is fully integrated in a
H. Decision Device
CMOS process and complies with a recognized
Once the clock has been recovered, it is used to clock out an telecommunications standard is indeed feasible and would
estimate of the data from a decision device. This decision allow for the so-called “last-mile” access needed in today’s
device is simply a high-speed D-flip-flop biased at the broadband networks.
optimal switching threshold.
ACKNOWLEDGMENT
V. SIMULATION RESULTS
I would like to thank prof. M. du Plessis from the Carl and
Figure 12 shows the PLL’s locking transient after frequency Emily Fuchs Centre for Microelectronics for his study
lock has been achieved. guidance.

Control
Signal(V)
REFERENCES
200m
[1] D. Coppée, J. Genoe, J.H. Stiens, R.A. Vounckx, M. Kuijk,
”Calculation of the Current Response of the Spatially Modulated
Light CMOS Detector,” IEEE Transactions on Electron Devices, vol.48,
0 pp. 1892-1902, September 2001.
[2] C. Rooman, D. Coppée, D. Kuijk, “Asynchronous 250-Mb/s Optical
Receivers with Integrated Detector in Standard CMOS Technology
-200m for Optocoupler Applications,” IEEE Journal of Solid-state
Circuits, vol.5, pp. 953-958, July 2000.
>@0,QJHOV06WH\DHUW³$*EV P&0262SWLFDO

-400m Receiver with Full Rail-to-Rail Output Swing,” IEEE Journal of


0 200n 400n 600n 800n Time(s) Solid-state Circuits, vol.34, pp. 971-977, July 1999.
[4] B. Razavi, Design of Analog CMOS Integrated Circuits,
McGraw-Hill, 2001.
Fig. 12. PLL locking transient in response to a random bit-sequence.
[5] B.A. Anand and B. Razavi, “A CMOS Clock Recovery
Circuit for 2.5-GB/s NRZ Data,” IEEE Journal of Solid-
Table III lists some results that characterize the receiver. State Circuits, vol. 36, pp. 432-439, March 2001.
[6] H. Wang and R. Nottenburg, “A 1Gb/s CMOS Clock and Data
TABLE III Recovery Circuit”, IEEE International Solid-State Circuits
OPTICAL RECEIVER SIMULATION RESULTS Conference, 2000.
[7] H. Djahanshahi and C.A.T. Salama, “Differential CMOS
Circuits for 622-MHz/933-MHz Clock and Data Recovery
Wavelength 770-880 nm Applications,” IEEE Journal of Solid-state Circuits,vol.35,
Data Rate 155.52 Mbps pp. 847-855, June 2000.

You might also like