You are on page 1of 512

Contents

Preface

xi

About the Author

xiii
Part I

Original Contributions

Devices and Circuits for Phase-Locked Systems


B. Razavi

Delay-Locked LoopsAn Overview


C-K. Ken Yang

13

Delta-Sigma Fractional-TV Phase-Locked Loops

23

/. Galton
Designing Bang-Bang PLLs for Clock and Data Recovery in Serial Data Transmission Systems
R. C. Walker

34

Predicting the Phase Noise and Jitter of PLL-Based Frequency Synthesizers

46

K. S. Kundert
Part II

Devices

Physics-Based Closed-Form Inductance Expression for Compact Modeling of Integrated Spiral Inductors
S. Jenei, B. K. J C. Nauwelaers, and S. Decoutere {IEEE Journal ofSolid-State Circuits, January 2002)
The Modeling, Characterization, and Design of Monolithic Inductors for Silicon RF IC's
J R. Long and M. A. Copeland {IEEE Journal of Solid-State Circuits, March 1997)
Analysis, Design, and Optimization of Spiral Inductors and Transformers for Si RF IC's
A. M. Niknejad, and R. G. Meyer {IEEE Journal of Solid-State Circuits, October 1998)
Stacked Inductors and Transformers in CMOS Technology
A. Zolfaghari, A. Chan, and B. Razavi {IEEE Journal of Solid-State Circuits, April, 2001)
Estimation Methods for Quality Factors of Inductors Fabricated in Silicon Integrated Circuit
Process Technologies
K. O {IEEE Journal of Solid-State Circuits, August 1998)

73
77

89

101

110

A Q-Factor Enhancement Technique for MMIC Inductors


M. Danesh, J. R. Long, R. A. Hadaway, and D. L. Harame {Dig. IEEE Radio Frequency Integrated Circuits
Symposium, April 1998)

114

On-Chip Spiral Inductors with Patterned Ground Shields for Si-Based RF IC's
C. Patrick Yue and S. S. Wong {IEEE Journal of Solid-State Circuits, May 1998)

118

The Effects of a Ground Shield on the Characteristics and Performance of Spiral Inductors
S.-M. Yim, T. Chen, and K. O {IEEE Journal of Solid-State Circuits, February 2002)
Temperature Dependence of Q and Inductance in Spiral Inductors Fabricated in
a Silicon-Germanium/BiCMOS Technology
R. Groves, D. L. Harame, and D. Jadus (IEEE Journal of Solid-State Circuits, September 1997)

127

135

Substrate Noise Coupling Through Planar Spiral Inductor


A. L Pun, T. Yeung, J Lau, E J R. Clement, and D. K. Su (IEEE Journal of Solid-State Circuits, June 1998)

140

Design of High-g Varactors for Low-Power Wireless Applications Using a Standard CMOS Process
A.-S. Porret, T. Melly, C C Enz, and E. A. Vittoz. (IEEE Journal of Solid-State Circuits, March 2000)

148

On the Use of MOS Varactors in RF VCO's

157

P. Andreani and S. Mattisson (IEEE Journal of Solid-State Circuits, June 2000)


Part III

Phase Noise and Jitter

Low-Noise Voltage-Controlled Oscillators Using Enhanced LC-Tanks


J. Craninckx and M. Steyaert (IEEE Transactions on Circuits and Systems-II, December 1995)
A Study of Phase Noise in CMOS Oscillators
B. Razavi (IEEE Journal of Solid-State Circuits, March 1996)

165

A General Theory of Phase Noise in Electrical Oscillators


A. Hajimiri, andT.H Lee (IEEE Journal of Solid-State Circuits, February 1998)

189

Physical Processes of Phase Noise in Differential LC Oscillators


J. J. Rael, and A. A. Abidi (IEEE Custom Integrated Circuits Conference, May 2000)

205

Phase Noise in LC Oscillators


K. A. Kouznetsov and R. G. Meyer (IEEE Journal of Solid-State Circuits, August 2000)

209

The Effect of Varactor Nonlinearity on the Phase Noise of Completely Integrated VCOs
JWM. Rogers, J A. Macedo, and C Plett (IEEE Journal of Solid-State Circuits, September 2000)

214

Jitter in Ring Oscillators


JA. McNeill (IEEE Journal of Solid-State Circuits, June 1997)

221

Jitter and Phase Noise in Ring Oscillators


A. Hajimiri, S. Limotyrakis, andT. H Lee (IEEE Journal of Solid-State Circuits, June 1999)

231

A Study of Oscillator Jitter Due to Supply and Substrate Noise


E Herzel, and B. Razavi (IEEE Transactions on Circuits and Systems-II, January 1999)

246

Measurements and Analysis of PLL Jitter Caused by Digital Switching Noise


P. Larsson (IEEE Journal of Solid-State Circuits, July 2001)

253

On-Chip Measurement of the Jitter Transfer Function of Charge-Pump Phase-Locked Loops

260

176

B. R. Veillette, and G. W.Roberts (IEEE Journal ofSolid-State Circuits, March 1998)


Part IV

Building Blocks

A Low-Noise, Low-Power VCO with Automatic Amplitude Control for Wireless Applications
M.A. Margarit, J. L. Tham, R. G Meyer, and M. J. Been (IEEE Journal of Solid-State Circuits, June 1999)
A Fully Integrated VCO at 2 GHz
M. Zannoth, B. Kolb, J. Fenk, and R. Weigel (IEEE Journal of Solid-State Circuits, December 1998)
vi

271
282

Tail Current Noise Suppression in RF CMOS VCOs


RAndreani and K Sjoland {IEEE Journal ofSolid-State Circuits, March 2002)

287

Low-Power Low-Phase-Noise Differentially Tuned Quadrature VCO Design in Standard CMOS


M. Tiebout {IEEE Journal of Solid-State Circuits, July 2001)

294

Analysis and Design of an Optimally Coupled 5-GHz Quadrature LC Oscillator


J. van der Tang, P. van de Ven, D. Kasperkovitz, and A. van Roermund {IEEE Journal of Solid-State Circuits,
May 2002)

301

A 1.57-GHz Fully Integrated Very Low-Phase-Noise Quadrature VCO


P. Vancorenland and M. S. J Steyaert {IEEE Journal of Solid-State Circuits, May 2002)

306

A Low-Phase-Noise 5GHz Quadrature CMOS VCO Using Common-Mode Inductive Coupling


S. L. J. Gierkink, S. Levantino, R. C. Frye, and V. Boccuzzi {European Solid-State Circuits Conference,
September 2002)

310

An Integrated 10/5GHz Injection-Locked Quadrature LC VCO in a 0.18jjLm Digital CMOS Process


A. Ravi, K. Soumyanath, L. R. Carley, and R. Bishop {European Solid-State Circuits Conference, September 2002)

314

Rotary Traveling-Wave Oscillator Arrays: A New Clock Technology


J. Wood and S. Lipa {IEEE Journal of Solid-State Circuits, November 2001)

318

35-GHz Static and 48-GHz Dynamic Frequency Divider IC's Using 0.2-jjum AlGaAs/GaAs-HEMT's
Z. Lao, W. Bronner, A. Thiede, M. Schlechtweg, A. Hulsmann, M. Rieger-Motzer, G. Kaufel, B. Raynor, and M. Sedler
{IEEE Journal of Solid-State Circuits, October 1997)

330

Superharmonic Injection-Locked Frequency Dividers


H. R. Rategh and T. H. Lee {IEEE Journal of Solid-State Circuits, June 1999)

337

A Family of Low-Power Truly Modular Programmable Dividers in Standard 0.35-|xm CMOS Technology
C. S. Vaucher, I. Ferencic, M. Locher, S. Sedvallson, U. Voegeli, and Z Wang {IEEE Journal of Solid-State Circuits,
July 2000)

346

A 1.75-GHz/3-V Dual-Modulus Divide-by-128/129 Prescaler in 0.7-|mm CMOS


J. Craninckx and M. S. J. Steyaert {IEEE Journal of Solid-State Circuits, July 1996)

353

A 1.2 GHz CMOS Dual-Modulus Prescaler Using New Dynamic D-Type Flip-Flops
B. Chang, J Park, and W Kirn {IEEE Journal of Solid-State Circuits, May 1996)

361

High-Speed Architecture for a Programmable Frequency Divider and a Dual-Modulus Prescaler


P. Larsson {IEEE Journal of Solid-State Circuits, May 1996)

365

A 1.6-GHz Dual Modulus Prescaler Using the Extended True-Single-Phase-Clock CMOS Circuit
Technique (E-TSPC)
J N. Soares, Jr. and W A. M. Van Noije {IEEE Journal of Solid-State Circuits, January 1999)

370

A Simple Precharged CMOS Phase Frequency Detector

376

H. O. Johansson {IEEE Journal of Solid-State Circuits, February 1998)


Part V

Clock Generation by PLLs and DLLs

A 320 MHz, 1.5 mW @ 1.35 V CMOS PLL for Microprocessor Clock Generation
V von Kaenel, D. Aebischer, C. Piguet, and E. Dijkstra {IEEE Journal of Solid-State Circuits, Nov. 1996)
A Low Jitter 0.3-165 MHz CMOS PLL Frequency Synthesizer for 3 V/5 V Operation
H. C Yang, L. K. Lee, and R. S. Co {IEEE Journal of Solid-State Circuits, April 1997)

VII

383
391

Low-Jitter Process-Independent DLL and PLL Based on Self-Biased Techniques


1 G. Maneatis (IEEE Journal ofSolid-State Circuits, Nov. 1996)

396

A Low-Jitter PLL Clock Generator for Microprocessors with Lock Range of 340-612 MHz
D. W. Boerstler (IEEE Journal of Solid-State Circuits, April 1999)

406

A 960-Mb/s/pin Interface for Skew-Tolerant Bus Using Low Jitter PLL


S Kim, K. Lee, Y Moon, D.-K. Jeong, Y Choi, and H K. him (IEEE Journal of Solid-State Circuits, May 1997)

413

Active GHz Clock Network Using Distributed PLLs


V Gutnik and A. P Chandrasakan (IEEE Journal of Solid-State Circuits, Nov. 2000)

422

A Low-Noise Fast-Lock Phase-Locked Loop with Adaptive Bandwidth Control


J. Lee andB. Kim (IEEE Journal of Solid-State Circuits, August 2000)

430

A Low-Jitter 125-1250-MHz Process-Independent and Ripple-Poleless 0.18-|xm CMOS PLL Based on a


Sample-Reset Loop Filter
A.Maxim, B. Scott, E. M. Schneider, M. L. Hagge, S. Chacko, and D. Stiurca (IEEE Journal of Solid-State Circuits,
Nov. 2001)
A Dual-Loop Delay-Locked Loop Using Multiple Voltage-Controlled Delay Lines
Y-JJung, S.-W.Lee, D. Shim, W.Kim, and C Kim (IEEE Journal of Solid-State Circuits, May 2001)
An All-Analog Multiphase Delay-Locked Loop Using a Replica Delay Line for Wide-Range Operation
and Low-Jitter Performance
Y. Moon, J Choi, K. Lee, D.-K. Jeong, and M.-K. Kim (IEEE Journal of Solid-State Circuits, March 2000)

439

449

456

A Semidigital Dual Delay-Locked Loop


S. Sidiropoulos and M. A. Horowitz (IEEE Journal of Solid-State Circuits, Nov. 1997)

464

A Wide-Range Delay-Locked Loop with a Fixed Latency of One Clock Cycle


H.-H. Chang, J.-W. Lin, C-Y Yang, and S.-I Liu (IEEE Journal of Solid-State Circuits, August 2002)

474

A Portable Digital DLL for High-Speed CMOS Interface Circuits


B. W. Garlepp, K S Donnelly, J. Kim, P. S. Chan, J L Zerbe, C Huang, C V Tran, C. L. Portmann, D. Stark,
Y-F. Chan, T. H. Lee, and M. A Horowitz (IEEE Journal of Solid-State Circuits, May 1999)

481

CMOS DLL-Base 2-V 3.2-ps Jitter 1-GHz Clock Synthesizer and Temperature-Compensated
Tunable Oscillator
C J. Foley and M. P Flynn (IEEE Journal of Solid-State Circuits, March 2001)

493

A 1.5 V 86 mW/ch 8-Channel 622-3125-Mb/s/ch CMOS SerDes Macrocell with Selectable Mux/Demux Ratio
F. Yang, J. O 'Neill, P Larsson, D. Inglis, and J. Othmer (Dig. International Solid-State Circuits Conference, Feb. 2002)

499

A Register-Controlled Symmetrical DLL for Double-Data-Rate DRAM


F Lin, J Miller, A. Schoenfeld, M. Ma, and R. J Baker (IEEE Journal of Solid-State Circuits, April 1999)

502

A Low-Jitter Wide-Range Skew-Calibrated Dual-Loop DLL Using Antifuse Circuitry for High-Speed DRAM
S. J Kim, S. H. Hong, J.-K. Wee, J. H Cho, P. S. Lee, J. H Ahn, and J Y Chung (IEEE Journal of Solid-State Circuits,
June 2002)

506

Part VI

RF Synthesis

An Adaptive PLL Tuning System Architecture Combining High Spectral Purity and Fast Settling Time
C S. Vaucher (IEEE Journal of Solid-State Circuits, April 2000)

517

A 2-V 900-MHz Monolithic CMOS Dual-Loop Frequency Synthesizer for GSM Receivers
W.S.T. Yan and H C Luong (IEEE Journal of Solid-State Circuits, Feb. 2001)

530

viii

A CMOS Frequency Synthesizer with an Injection-Locked Frequency Divider for a 5-GHz Wireless
LAN Receiver
H R. Rategh, H Samavati, and T. H Lee {IEEE Journal ofSolid-State Circuits, May 2000)

543

A 2.6-GHz/5.2-GHz Frequency Synthesizer in 0.4-|xm CMOS Technology


C. Lam and B. Razavi (IEEE Journal of Solid-State Circuits, May 2000)

551

Fast Switching Frequency Synthesizer with a Discriminator-Aided Phase Detector


C.-Y. Yang and S.-L Liu (IEEE Journal of Solid-State Circuits, Oct. 2000)

558

Low-Power Dividerless Frequency Synthesis Using Aperture Phase Detection


A. R. Shahani, D. K. Shaeffer, S. S. Mohan, H Samavati, H R. Rategh, M. del M. Hershenson, M. Xu, C. P Yue,
D. J Eddleman, M A. Horowitz, and T. H Lee (IEEE Journal of Solid-State Circuits, Dec. 1998)

566

A Stabilization Technique for Phase-Locked Frequency Synthesizers


T.-C. Lee and B. Razavi (Dig. Symposium on VLSI Circuits, June 2001)

574

A Modeling Approach for X-A Fractional-TV Frequency Synthesizers Allowing Straightforward Noise Analysis
M. H Perrott, M. D. Trott, and C G. Sodini (IEEE Journal of Solid-State Circuits, Aug. 2002)

578

A Fully Integrated CMOS Frequency Synthesizer with Charge-Averaging Charge Pump and Dual-Path
Loop Filter for PCS- and Cellular-CDMA Wireless Systems
Y Koo, H Huh, Y Cho, J Lee, J Park, K Lee, D.-K. Jeong, and W. Kim (IEEE Journal of Solid-State Circuits,
May 2002)

589

A 1.1-GHz CMOS Fractional-TV Frequency Synthesizer With a 3-b Third-Order 2-A Modulator
W.Rhee, B.-S. Song, and A. AH (IEEE Journal of Solid-State Circuits, Oct. 2000)

596

A 1.8-GHz Self-Calibrated Phase-Locked Loop with Precise I/Q Matching


C.-H. Park, O. Kim, and B. Kim (IEEE Journal of Solid-State Circuits, May 2001)

603

A 27-mW CMOS Fractional-TV Synthesizer Using Digital Compensation for 2.5-Mb/s GFSK Modulation
M. H Perrott, T. L Tewksbury III, and C G. Sodini (IEEE Journal of Solid-State Circuits, Dec. 1997)

610

A CMOS Monolothic 2A-Controlled Fractional-N Frequency Synthesizer for DSC-1800

622

B. De Mauer and M. S. J Steyaert (IEEE Journal of Solid-State Circuits, July 2002)


Part VII

Clock and Data Recovery

A 2.5-Gb/s Clock and Data Recovery IC with Tunable Jitter Characteristics for Use in LAN's and WAN's
K. Kishine, N. Ishihara, K Takiguchi, and H Ichino (IEEE Journal of Solid-State Circuits, June 1999)
Clock/Data Recovery PLL Using Half-Frequency Clock
M. Ran, T. Oherst, R. Lares, A. Rothermel, R. Schweer, and N. Menoux (IEEE Journal of Solid-State Circuits,
July 1997)

635
643

A 0.5-jxm CMOS 4.0-Gbit/s Serial Link Transceiver with Data Recovery Using Oversampling
C.-K. K. Yang, R. Farjad-Rad, andM.A. Horowitz (IEEE Journal of Solid-State Circuits, May 1998)

647

A 2-1600-MHz CMOS Clock Recovery PLL with Low- Vdd Capability


P Larsson (IEEE Journal of Solid-State Circuits, Dec. 1999)

656

SiGe Clock and Data Recovery IC with Linear-Type PLL for 10-Gb/s SONET Application
Y M. Greshishchev and P Schvan (IEEE Journal of Solid-State Circuits, Sept. 2000)

666

A Fully Integrated SiGe Receiver IC for 10-Gb/s Data Rate


Y M. Greshishchev, P Schvan, J L Showell, M.-L Xu, J J Ojha, andJ E. Rogers (IEEE Journal of Solid-State
Circuits, Dec. 2000)

673

ix

A 10-Gb/s CMOS Clock and Data Recovery Circuit with a Half-Rate Linear Phase Detector
J. Savoj and B. Razavi (IEEE Journal of Solid-State Circuits, May 2001)

681

A 10-Gb/s CMOS Clock and Data Recovery Circuit with Frequency Detection
J. Savoj and B. Razavi (Dig. International Solid-State Circuits Conference, Feb. 2001)

688

A 10-Gb/s CDR/DEMUX with LC Delay Line VCO in 0.18p,m CMOS


J. E. Rogers andJ. R. Long (Dig. International Solid-State Circuits Conference, Feb. 2002)

691

A 40-Gb/s Integrated Clock and Data Recovery Circuit in a 50-GHz/y, Silicon Bipolar Technology
M. Wurzer, J. Bock, H. Knapp, W.Zirwas, E Schumann, and A. Felder (IEEE Journal of Solid-State Circuits,
Sept. 1999)

694

A Fully Integrated 40-Gb/s Clock and Data Recovery IC With 1:4 DEMUX in SiGe Technology
M. Reinhold, C. Dorschky, E. Rose, R. Pullela, P. Mayer, E Kunz, Y Baeyens, T. Link, andJ-P. Mattia
(IEEE Journal of Solid-State Circuits, Dec. 2001)

699

Clock and Data Recovery IC for 40-Gb/s Fiber-Optic Receiver


G. Georgiou, Y. Baeyens, Y-K. Chen, A. H. Gnauck, C. Gropper, P. Paschke, R. Pullela, M. Reinhold, C Dorschky,
J.-P. Mattia, T. Winkler von Mohrenfels, and C Schulien (IEEE Journal of Solid-State Circuits, Sept. 2002)

707

Index

713

Devices and Circuits for Phase-Locked Systems


Behzad Razavi

AbstractThis turtorial deals with the design of devices


such as varactors and inductors and circuits such as ring
and LC oscillators. First, MOS varactors are introduced as
a means of frequency control for low-voltage circuits and
their modeling issues are discussed. Next, spiral inductors
are studied and various geometries targetting improved Q
or higher self-resonance frequencies are presented. Noisetolerant ring oscillator topologies are then described. Finally, a procedure for the design of LC oscillators is outlined.
The design of phase-locked systems requires a thorough
understanding of devices, circuits, and architectures. Intended
as a continuation of [1], this tutorial provides an overview
of concepts in device and circuit design for phase-locking in
digital, broadband, and RF systems.
I. PASSIVE DEVICES

design of the stage(s) driven by the VCO. On the other hand,


to avoid forward-biasing the varactors significantly, Vx and
Vy must remain above approximately Vcont 0.4 V. Thus,
the peak-to-peak swing at each node is limited to about 0.8 V.
Note that the cathode terminals of the varactors also introduce
substantial n-well capacitance at X and Y, further constraining
the tuning range.
In contrast to pn junctions, MOS varactors are immune to
forward biasing while exhibiting a sharper C-V characteristic
and a wider dynamic range. If configured as a capacitor [Fig.
2(a)], a MOSFET suffers from both a nonmonotonic C-V beCGs

The demand for low-noise PLLs has encouraged extensive


research on active and passive devices. In this section, we
study varactors and inductors as essential components of LC
oscillators.
A. Varactors
As supply voltages scale down, pn junctions become a less
attractive choice for varactors. Specifically, two factors limit
the dynamic range of pn-junction capacitances: (1) the weak
dependence of the capacitance upon the reverse bias voltage,
e.g., Cj = C ; o/(1 + VR/^B)"1, where m w 0.3.; and (2) the
narrow control voltage range if forward-biasing the varactor
must be avoided.
As an example, consider the LC oscillator shown in Fig. 1.
It is desirable to maximize the voltage swings at nodes X and

Accumulation

Strong Inversion

^TH

VGS

(a)
^var

Depletion

Accumulation

p-substrate
0

vQS

(b)

Fig. 2. (a) Simple MOSFET operating as capacitor, (b) MOS varactor.

havior and a high channel resistance in the region between


accumulation and strong inversion. To avoid these issues, an
"accumulation-mode" MOS varactor is formed by placing an
NMOS device inside an n-well [Fig. 2(b)]. Providing an
Voo
ohmic connection between the source and drain for all gate
voltages, the n-well experiences depletion of mobile charges
under the oxide as the gate voltage becomes more negative.
Thus, the varactor capacitance, Cvar, (equal to the series comX
Y
bination of the oxide capacitance and the depletion region
capacitance) varies as shown in Fig. 2(b). Note that for a
"cont
sufficiently positive gate voltage, Cvar approaches the oxide
capacitance.
Fig. 1. LC oscillator using pn-junction varactors.
The design of MOS varactors must deal with two important
Y so as to both minimize the relative phase noise and ease the issues: (1) the trade-off between the dynamic range and the

channel resistance, and (2) proper modeling for circuit simulations. We now study each issue.
Dynamic Range Deep-submicron MOSFETs exhibit susbtantial overlap capacitance between the gate and source/drain
terminals. For example, in a typical 0.13-/mi technology, a
transistor having minimum channel length, Lmin, displays an
overlap capacitance of 0.4 fF/^m and a gate-channel capacitance of 12 fF/fim2. In other words, for an effective channel
length of 0.12 /im and a given width, the overlap capacitance
between the gate and source/drain terminals of a varactor constitutes 2 x 0.4 fF /(0.12 x 12 fF+2 x 0.4 fF) 36% of the
total capacitance. Thus, even if the gate-channel component
varies by a factor of two across the allowable voltage range, the
overall dynamic range of the capacitance is given by (0.12 x 12
fF+2 x 0.4 ff)/(0.12 x 6 fF +2 x 0.4 fF) = 1.47.
In order to widen the varactor dynamic range, the transistor
length can be increased, thereby raising the voltage-dependent
component while maintaining the overlap capacitance relatively constant. This remedy, however, leads to a greater resistance between the source and drain, lowering the Q. The resistance reaches a maximum for the most negative gate-source
voltage, at which the depletion region's width is maximum and
the path through the n-well the longest (Fig. 3).1 Note that

var

''max

Cmin
VGS

Fig. 4. Typical MOS varactor characteristic.

circuits in terms of voltages and currents (e.g., SPICE) interpret


the nonlinear capacitance equation correctly. On the other
hand, programs that represent the behavior of capacitors by
charge equations (e.g., Cadence's Spectre) require that the
model be transformed to a Q-V relationship [3]:
Qv

I CvardVGS

(2)

Cmax Cmin T , . i

(-'max + l^min T ,

.,

. VGS J

Vo In cosh(a + - y - )
VGS,

(3)

which is then used to compute


*var

dt

(4)

If used in charge-based analyses, Eq. (1) typically overestimates the tuning range of oscillators.
p-substrate
Fig. 3. Effect of n-well resistance in MOS varactor.

the total equivalent resistance that appears in series with the


varactor is equal to 1/12 of the drain-source resistance. This
is because shorting the drain and source lowers the resistance
by a factor of 4 and the distributed nature of the capacitance
and resistance reduces it by another factor of 3 [2]. Depending
on both the phase noise requirements and the Q limitations
imposed by inductors, the varactor length is typically chosen
between Lmin and3L m t n .

B. Inductors
The design of monolithic inductors has been studied extensively. The parameters of interest include the inductance, the
Q, the parasitic capacitance (i.e., the self-resonance frequency,
fsR), and the area, all of which trade with each other to some
extent. For a spiral structure such as that in Fig. 5, the line
width, the line spacing, the number of turns, and the outer

Modeling The C-V characteristics of MOS varactors can be


approximated by a hyperbolic tangent function with reasonable
accuracy. Using the characteristic shown in Fig. 4 and noting
that tanh(oo) = 1, we can write

, T / ..
Cmax ~ Cn
Cvar{VGS) =
~

, , , VGS x . Cmax + Cmin


tanh(a+- )+

0)

Here, a and Vo allow fitting for the intercept and the slope,
respectively, and C m , n includes the overlap capacitance.
The above model yields different characteristics in different
circuit simulation programs! Simulation tools that analyze
1

Fortunately, the capacitance reaches a minimum at this point, and the Q


degrades only gradually.

Fig. 5. Spiral inductor.

dimension are under the designer's control, chosen so as to


obtain the required performance.
Quality Factor The quality factor of monolithic inductors
has been the subject of many studies. Before considering the
phenomena that limit the Q, it is important to select a useful

and clear definition for this quantity. For a simple inductor


operating at low frequencies, the Q is denned as

where Rs denotes the metal series resistance. In analogy with


this expression, a more general definition is sometimes given
as

lm(ZL)

where ZL represents the overall impedance of the inductor at


the frequency of interest. While reducing to Eq. (5) at low frequencies, this definition yields Q = 0 if the inductor resonates
with its own capacitance and/or any other capacitance. This is
because at resonance, the impedance is purely resistive. Since
nearly all circuits employ inductors in a resonance mode,2 this
expression fails to provide a meaningful measure of inductor
performance in circuit design. A more versatile definition assumes that a resonant tank can be represented by a parallel
combination [Fig. 6(a)], yielding
Fig. 7. Inductor loss mechanisms: (a) metal resistance, (b) substrate loss due
to electric coupling, (c) substrate loss due to magnetic coupling.

Q=f^-.

(7)

where WR is the resonance frequency. Note that the tank


reduces to Rp at u> = UR, exhibiting a finite (rather than zero)
Q. Hereafter, we consider the behavior of inductors at or near
resonance.
i/

Lp
LP

Rp

CP
Vout

CP
V,n'

(a)

RP

(2) the flow of displacement current through the series combination of the inductor's parasitic capacitance and the substrate
resistance; (3) the flow of magnetically-induced ("eddy") currents in the substrate resistance. At low frequencies, the dc
resistance is dominant, and as the frequency rises, the other
components begin to manifest themselves.
With the above observations in mind, let us construct a circuit model for inductors. Depicted in Fig. 8(a) is a simple
model where Rs denotes the series resistance at the frequency
L

C2

<M

(b)

Fig. 6. (a) Parallel tank for definition of Q, (b) common-source stage using a
tank.

The utility of Eq. (7) can be seen in the example illustrated in Fig. 6(b). Here, the knowledge of Lp and
Q = Rp/(LpL>ji) directly provides the voltage gain and the
output swing, whereas the Q given by Eq. (6) serves no purpose.
The Q of inductors is limited by resistive losses: parasitic
resistances dissipate a fraction of the energy that is reciprocated between the inductor and the capacitor in a tank. Note
that the finite Q is also accompanied by generation of noise.
For example, in the circuit of Fig. 6(b), Rp produces an output noise voltage of V = AkTRp = AkTQLpu>n per unit
bandwidth if Lp resonates with Cp.
The losses in inductors arise from three mechanisms (Fig.
7): (1) the series resistance of the spiral, including both lowfrequency resistance and current crowding due to skin effect;
2

One exception is inductive degeneration in low-noise amplifiers.

Rs

c2

Ci

S1

S2

(a)

c3:

"si

*S2

C4

(b)

Fig. 8. (a) Inductor model including magnetic coupling to substrate, (b)


simplified model.

of interest, Rsi and Rs2 represent the substrate resistance


through which the diplacement current flows, the transformer
models magnetic coupling to the substrate, and Rp is the substrate resistance through which the eddy currents flow. This
model reveals how the Q drops at high frequencies. As the
impedance of C\ and Ci falls, Rs\ and Rs2 appear as a constant resistance in parallel with the inductor, lowering the Q
as u rises. Similarly, at high frequencies, the effect of Rp becomes relatively constant, shunting Lp and further reducing
theQ.
In practice, the model of Fig. 8(a) is modified as shown
in Fig. 8(b) to both allow an easier fit to measured data and

account for the substrate capacitance. The model is usually


assumed to be symmetric, i.e., C\ = C2,C3 = C4, and Rsi =
Rs2, implying that the equivalent parasitic capacitance, Ceq, is
one-half of the total capacitance, Ctot, if one end of the inductor
is grounded. This result, however, is not correct because the
distributed nature of the structure yields Ceq = Ctot/3 in
this case [5]. To avoid this inaccuracy, the inductor must be
modeled as a distributed network [5].
Characterization Most inductor modeling programs provide limited capabilities in terms of the type of structure that
they can analyze or the maximum frequency at which their
results are valid. For this reason, it is often necessary to fabricate and characterize monolithic inductors and use the results
to revise the simulated models, thereby obtaining a better fit.
Owing to the need for precise measurements at high frequencies, inductors are typically characterized by direct onwafer probing. High-speed coaxial probes having a tightlycontrolled 50-2 characteristic impedance and a low loss are
positioned on pads connected to the inductor. Figure 9(a)
shows an example where one end of the spiral is tied to the

VDD
LP

Lp
"out

M2

'ss

Fig. 10. Setup for "in-situ" measurement of Q.

until the circuit fails to oscillate. For such value of Rp, we


have Q = Rp/(Lu). Of course, this technique assumes that
the value of the inductor and the oscillation frequency are
known.
The above method proves useful if (a) thefrequencyof interest is so high and/or the inductance so low that direct measurements are difficult, or (b) an oscillator has been fabricated but
the inductors are not available individually, requiring "in-situ"
measurement of the Q. Note that other oscillator parameters
such as phase noise and ouput swing are also functions of Q,
but it is much more straightforward to place the circuit at the
edge of oscillation than to calculate the Q from phase noise or
output swing measurements.
Choice of Geometry The design of inductors begins with the
choice of the geometry. Shown in Fig. 11 are two commonlyused structures. The asymmetric spiral of Fig. 11 (a) exhibits

(a)

(b)

Fig. 9. (a) On-wafer measurement of inductor using coaxial probe, (b) calibration structure.

"signal" (S) pad and the other to the "ground" (G) pads. The
signal pad is sensed by the center conductor and the ground
pads by the outer shield of the coaxial probe.
Since the capacitance of the pads and the wires connecting to
the spiral is typically significant, the test device is accompanied
by a calibration structure [Fig. 9(b)], where the spiral itself
is omitted. The scattering (S) parameters of both structures
are measured by means of a network analyzer across the band
of interest and subsequently converted to Y parameters. Subtraction of the Y parameters of the calibration geometry from
those of the device under test yields the actual characteristics
of the spiral.
An alternative method of measuring the Q of inductors is
illustrated in Fig. 10. Here, inductors are incorporated in
an oscillator and the tail current can be controlled externally.
In the laboratory measurement, the output is monitored on
a spectrum analyzer while Iss is reduced so as to place the
circuit at the edge of oscillation. Next, the value of Iss thus
obtained is used in the simulation of the oscillator and the
equivalent parallel resistance of each tank, Rp, is lowered

(a)

(b)

Fig. 11. (a) Asymmetric and (b) symmetric inductors.

a moderate Q, about 5 to 6 at 5 GHz, and its interwinding


capacitance does not limit the self-resonance frequency because adjacent turns sustain a small potential difference. The
line spacing is therefore set to the minimum allowed by the
technology.
The symmetric geometry of Fig. 11 (b) provides a greater Q
if stimulated differentially [4], about 7 to 10 at 5 GHz, but its
interwinding capacitance is typically quite significant because
of the large voltage difference between adjacent turns. For this
reason, the line spacing is chosen to be twice or three times
the minimum allowable value, lowering thefringecapacitance
considerably but degrading the Q slightly.
In differential circuits, the use of symmetric inductors appears to save area as well. For example, two asymmetric
1-nH inductors can be replaced by a symmetric 2-nH structure, which occupies less area. However, a cascade of differential stages employing multiple symmetric inductors [Fig.
12(a)] faces routing difficulties. As illustrated in Fig. 12(b),
the signal lines must travel across the spirals, impacting the

Voo

cantly. However, the capacitance between the spirals may


limit the self-resonance frequency. For the two-layer structure
of Fig. 13(a), the overall equivalent capacitance is given by
[5]

4Cl+C

12

>

(8)

Thus, if the bottom layer is moved down [Fig. 13(b)], then


Ceq falls considerably. For example, in a typical 0.13-/im
CMOS technology having eight metal layers, the geometry of
Fig. 13(b) exhibits one-fifth as much as capacitance as the
structure in Fig. 13(a)does.
Stacked structures use lower metal layers, which typically
suffer from a greater sheet resistance than the topmost layer.
As explained below, the resistance can be reduced by placing
spirals in parallel.
Figure 14 illustrates three other configurations aiming to
improve the quality factor. In Fig. 14(a), multiple spirals are

(a)

(b)

^eg

(c)

Fig. 12. (a) Cascade of inductively-loaded differential pairs, (b) layout of first
stage using a symmetric inductor, (c) layout of first stage using asymmetric
inductors.

performance of the inductors. Furthermore, the power and


ground lines must either cross the spirals or go around with
adequate spacing. With asymmetric inductors, on the other
hand, the lines can be routed as shown in Fig. 12(c), leaving
the inductors undisturbed. Note that B\ is quite larger than B2
because the symmetric structure must provide an inductance
twice that of each asymmetric spiral. Thus, the signal lines in
Fig. 12(b) are longer.
The two geometries of Fig. 11 can also be converted to
stacked structures, wherein spirals in different metal layers are
placed in series so as to achieve a greater inductance per unit
area. Figure 13(a) depicts an example using metal 8 and metal

(a)

(b)

Fig. 13. Stack of (a) metal 8 and metal 7 spirals, (b) metal 8 and metal 3
spirals.

7 spirals. The total inductance is equal to L\ + L2+2M, where


M denotes the mutual coupling between L \ and L2. Owing to
the strong magnetic coupling, the value of M is close to L j and
L2, suggesting a fourfold increase in the overall inductance as
a result of stacking. In the general case, n stacked identical
spirals raise the inductance by a factor of approximately n2.
Stacking reduces the area occupied by inductors signifi-

Fig. 14. (a) Parallel combination of spirals to reduce metal resistance, (b)
tapered metal width, (c) patterned shield.

placed in parallel so as to reduce the series resistance, but at the


cost of larger capacitance to the substrate. Nonetheless, in a
typical process having eight metal layers, metal 6 capacitance
is about 30% greater than that of metal 8. Since metal 8 is
typically twice as thick as metal 6 or metal 7, this topology
lowers the series resistance by twofold while raising the parasitic capacitance by 30%. By the same token, in the stacked
structure of Fig. 13(b), addition of a metal 2 spiral in parallel
with the metal 3 layer decreases the overall resistance by 30%
while increasing the equivalent capacitance by about 15%.
At frequencies above 5 GHz, the skin depth of aluminum
falls below 2 fim, making the parallel combination of spirals
less effective. Electromagneticfieldsimulations may therefore
be necessary to determine the optimum configuration.
The structure in Fig. 14(b) employs tapering of the line
width to reduce the resistance of the outer turns. The idea is
to maintain a relatively constant inductance-resistance product
per turn, achieving a slightly higher Q for a given inductance
and capacitance. Unfortunately, most inductor simulation programs cannot analyze such a geometry.
Shown in Fig. 14(c) is a method of lowering the loss due
to the electric coupling to the substrate. A heavily-conductive

shield is placed under the spiral and connected to ground so


that the displacement current flowing through the inductor's
bottom-plate capacitance does not experience resistive loss.
To stop the flow of magnetically-induced currents, the shield
is broken regularly. Note that eddy currents still flow through
the substrate, dissipating energy.
The conductive shield in Fig. 14(c) may be realized in n~
well, n + , or p + diffusion, polysilicon, or metal, thus bearing
a trade-off between the parasitic capacitance and the Q enhancement. The resulting increase in the Q depends on the
frequency of operation and the type of shield material, falling
in the range of 5 to 10%.
Thus far, we have studied square spirals. However, for a
given inductance value, a circular structure exhibits less series
resistance. Since mask generation for circles is more difficult,
some inductors are designed as octagonal geometries to benefit
from a slightly higher Q.

rates or retain a low-frequency clock in the "sleep" mode; (2)


ring oscillators occupy substantially less area than LC topologies do, an important issue if many oscillators are used; (3)
the behavior of ring oscillators across process, supply, and
temperature corners is predicted with reasonable accuracy by
standard MOS models, whereas the design of LC oscillators
heavily relies on inductor and varactor models.
In mostly-digital systems such as microprocessors, ring oscillators experience considerable supply and substrate noise,
making differential topologies desirable. Figure 15(a) shows
an example of a differential gain stage that allows several

C. MOS Transistors

vln

The modeling of MOSFETs for analog and high-frequency


design continues to pose challenging problems as sub-0.1fim generations emerge. BSIM models provide reasonable
accuracy for phase-locked system design, with the exception
that their representation of thermal and flicker noise may err
considerably. This issue becomes critical in the prediction of
oscillator phase noise.
The thermal noise arising from the channel resistance is
usually represented by a current source tied between the source
and drain and having a spectral density 1% = 4kTjgm, where
7 is the excess noise coefficient. For long-channel devices,
7 = 2 / 3 , but for submicron transistors, j may reach 2.5 to
3. Since some MOS models lack an explicit 7 parameter that
the user can set, it is often necessary to artificially raise the
effective value of 7 in circuit simulations. For linear, timevariant circuits, this can be accomplished using a noise copying
technique [6]. However, the time variance of currents and
voltages in oscillators make it difficult to apply this method. As
a first-order approximation, the contribution of the transistors
to the overall phase noise can be increased by a factor equal to
2.5/(2/3) before all of the noise components are summed.3
The flicker noise parameters are usually obtained by measurements. It is therefore important to check the validity of the
device models by comparing measured and simulated results.
Owing to their buried channel, PMOS transistors exhibit substantially less flicker noise than NMOS devices even in deep
submicron technologies.

VDD

\M5

VDD

"e

M5

MB

M4

Af3
X

"out"

X*

M2

.My

'ss

^cont

(b)

(a)

Fig. 15. (a) Differential stage for use in a ring oscillator, (b) effect of supply
noise.

decades of frequency tuning with relatively constant voltage


swings. Here, M5 and M$ define the output common-mode
(CM) level while M 3 and M4 pull nodes X and Y to VDD,
maintaining a constant voltage swing even at low current levels.
Unlike a simple differential pair, the stage of Fig. 15(a)
does respond to input CM noise even with an ideal Iss . This is
because the gate voltages of M 3 and M 4 are referenced to VDD ,
introducing a change in the drain currents if the input CM level
varies. In the presence of asymmetries, such a change results
in a differential component af the output. Nevertheless, since
the input CM level of each stage in the ring is referenced to
VDD by the diode-connected PMOS devices in the preceding
stage [Fig. 15(b)], the oscillator exhibits low sensitivity to
supply voltage.
Figure 16(a) depicts another ring oscillator topology that
has become popular in low-voltage digital systems. Here, the
VDD

VDD

M2
X{

Vcont

Mz

"1
'DD
/1

II. RING OSCILLATORS


Despite their relative high noise and poor drive capability, ring oscillators are used in many high-speed applications.
Several reasons justify this popularity: (1) in some cases, the
oscillator must be tuned over a wide frequency range (e.g.,
one decade) because the system must support different data
3

In reality, the effective value of 7 also depends on the drain-source voltage


to some extent, further complicating the matter.

CB
Kcont

(a)

(b)

Fig. 16. (a) Constant-current ring oscillator, (b) transistor-level implementation of (a).

inverters in the ring are supplied by a current source, IQD,

rather than a voltage source, and frequency tuning is also accomplished through IDD- If IDD is designed for low sensitivity to VDD, then the oscillator remains relatively immune
to supply noisethe principal advantage of this configuration
over standard inverter-based rings that are directly connected
to the supply voltage.
In practice, the nonidealities associated with IDD limit the
supply rejection. Shown in Fig. 16(b) is a transistor implementation where M\ operates as a contolled current source. If
I\ is constant, V\ tracks VDD variations whereas Vy does not,
yielding a change in IDD through channel-length modulation
in M\. Choosing long channels for M\ and Mi alleviates
this issue while necessitating wide channels as well to allow
a relatively small drain-source voltage for M\. However, the
resulting high drain junction capacitance of M\ at Y creates a
low-impedance path from VDD to this node at high frequencies. To suppress both resistive and capacitive feedthrough of
VDD noise, a bypass capacitor, CB, is tied from Y to ground.
However, the pole associated with this node now enters the
VCO transfer function, complicating the design of the PLL.
Let us now study the response of the circuit of Figs. 15(a)
and 16 to substrate noise, VSub- In the former, V8Ub manifests
itself through two mechanisms (Fig. 17): (1) by modulating the drain junction capacitance of M\ and M2 and hence
<

M2

*1

v*

Vln

cP
Vsub

"h

fsub

/ss
f

Fig. 17. Effect of substrate noise on a differential stage.

the delay of the stage (a static effect); and (2) by injecting a


common-mode displacement current through Cp (a dynamic
effect). If injected slightly before or after the zero crossings
of the oscillation waveform, such a current gives rise to a differential component at the drains of Mi and Mi because these
transistors display unequal transconductances as they depart
from equilibrium.
In the circuit of Fig. 16(b), Vsub modulates both the drain
junction capacitance of the NMOS devices and their threshold
voltage (and hence the transition points of the waveform).
Both effects are static, making the circuit susceptible even to
low-frequency noise.
It is instructive to determine the minimum supply voltage for
the above two circuits. At the midpoint of switching, where the
input and output differential voltages are around zero, the stage
of Fig. 15(a) requires that VDD > | V G S P | 4- VQSN + Viss,
where VQSP abd VQSN denote the gate-source voltages of
M3-M4 and M\-M2, respectively, and Viss is the minimum
voltage necessary for Iss> Interestingly, the circuit of Fig.
16(b) imposes the same minimum supply voltage.
Another critical issue in the circuits of Figs. 15(a) and

16 relates to frequency tuning by means of current sources.


The voltage-to-current (V/I) conversion required here presents
difficulties at low supply voltages. In the example of Fig.
16(b), as Vcont rises and Vx falls, transistor M3 eventually
enters the triode region, thus making I\ supply-dependent. The
useful range of Vcont is therefore given by VTHN < Vcont <
VDD - I VGSP\ - VTHN, suggesting the use of a wide device
for Mi to minimize |PGSP|-

III. LC OSCILLATORS
LC oscillators have found wide usage in high-speed and/or
low-noise systems. Extensive research on inductors, varactors, and oscillator topologies has provided the grounds for
systematic design, helping to demystify the "black magic."
LC oscillators offer a number of advantages over ring structures: (a) lower phase noise for a given frequency and power
dissipation; (b) greater output voltage swings, with peak levels
that can exceed the supply voltage; and (c) ability to operate
at higher frequencies.
However, LC VCO design requires precise device and circuit modeling because (a) the narrow tuning range calls for
accurate prediction of the center frequency; (b) the phase noise
is greatly affected by the quality of inductors and varactors and
the noise of transistors. Also, occupying a large area, spiral
inductors pick up noise from the substrate and make it difficult
to incorporate many such oscillators on one chip.
The design of LC VCOs targets the following parameters:
center frequency, phase noise, tuning range, power dissipation,
voltage headroom, startup condition, output voltage swing,
and drive capability. The last two have often received less
attention, but they directly determine the design difficulty and
power consumption of the stages following the oscillator. That
is, a buffer placed after the VCO may consume more power
than the VCO itself!
A. Design Example
As an example of VCO design, let us consider the topology
shown in Fig. 18. Here, M\ and M2 present a small-signal
negative resistance of -2/gm\}i
between nodes X and Y,

*fao
Cp

RP

LP

LP

RP

CP

M2

"1

/ss

Fig. 18. LC oscillator.

compensating for the resistive loss in the tanks and sustaining


oscillation. Each tank is modeled by a parallel RLC network,
with all loss mechanisms lumped in Rp.4
4
For a narrow frequency range, series resistances in the tank elements can
be transformed to parallel components.

The design process begins with a power budget and hence


a maximum value for Iss This is justified by the following
observation. Once completed and optimized for a given power
budget, the design can readily be scaled for different power
levels, bearing a linear trade-off with phase noise while maintaining all other parameters constant. For example, if Iss,
the width of M\ and M2, and the total tank capacitance are
doubled and the inductance value is halved, the phase noise
power falls by a factor of two but the frequency of oscillation
and the output voltage swings remain unchanged.5
Since subsequent stages typically require the VCO core to
provide a minimum voltage swing, Vmin9 we assume M\ and
M 2 steer nearly all of Iss to their correponding tanks and
write IssRp = Vmin- Thus, the minimum inductance value
is given by
=

LP

(typically a buffer), CL- Thus, the allowable varactor capacitance is given by the difference between Ctot and the sum of
these components:

(9)

Cvar = (LPU2)-1-CLP-CDB-CGS-4CGD-CL.

(13)

This expression gives the center value of the tolerable varactor


capacitance. Of course, a negative Cvar means the inductance
is excessively large, calling for a lower Lp, a smaller Rp, and
hence a larger Iss. However, to steer a greater tail current,
the circuit must employ wider MOS transistors, thus incurring a larger capacitance at nodes X and Y and approaching
diminishing returns. This ultimately limits the frequency of
oscillation in a given technology.
For a given supply voltage and oscillator topology, the varactor capacitance exhibits a known dynamic range Cvar,min <
CVar < Cvar,max, yielding a tuning range of u)min < u>osc <

Umax, where
-

IssQu'

l 0 )

UJmin

IT ir

-x-r

(14)

where it is assumed the tank Q is limited by that of the inducy L>p\Lsvar,max ~r Ofixed)
tor. Note that this calculation demands knowledge of the Q
"max =
.
,
(15)
before the inductance is computed, a minor issue because for a
VLP\^var,min + ^ fixed)
given geometry and frequency of operation, the Q is relatively
independent of the inductance.
and Cfixed = CLP -f CDB + CGS + 4CGD + CL.
We now determine the dimensions of Mi and M2. IncreasFigure 19(a) depicts the oscillator with MOS varactors diing the channel length beyond the minimum value allowed rectly tied to X and Y. Since the output common-mode level
by the technology does not significantly lower 7 unless the
V
length exceeds approximately 0.5 fim. For this reason, the
DD
VDD vb
vb
minimum length is usually chosen to minimize the capaciLi
L2
L2
M
tance contributed by the transistors. The transistors must be
*1
R2
Y
X
X
Y
wide enough to steer most of Iss while experiencing a voltage
C
C1
CC1
swing of Vmin at nodes X and Y. Viewing M\ and M2 as a
differential pair, we note that M\ must turn off as Vx - Vy
/W v 2
Mv1
Mv1
reaches Knn. For square-law devices,
Mv2
Vcont

VminZ=

\lnC0!w/L'

(U)

(a)

and hence

w-

2Iss

^cont

(b)

Fig. 19. LC oscillator with (a) direct coupling and (b) capacitive coupling of
varactors to tanks.

(n)

a r 1/2 IT '
^ '
but for short-channel devices, W must be obtained by simulations using proper device models. This choice of W typically
guarantees a small-signal loop gain greater than unity, enabling
the circuit to start at power-up.
With Lp computed from Eq. (10), the total capacitance
at nodes X and Y is calculated as Ctot = (Lpu2)~l.
This
capacitance includes the fo\\ovfingfixed components: (1) the
parasitic capacitance of Lp, CLP\ (2) the drain junction, gatesource, and gate-drain capacitances of Mi and M 2 , CDB +
Cos + 4CGD>6 and (3) the input capacitance of the next state
5
We assume that, at a given frequency, the Q is relatively independent of
the inductance value.
6
Since CQD experiences a total voltage swing of 2V p m m, its Miller effect
translates to a factor of two for each transistor.

10

is near VDD > M3 and M4 sustain only a positive gate-source


voltage (if 0 < VCOnt < VDD). A S seen from the C-V characteristic of Fig. 2(b), this limitation reduces the dynamic range
of the capacitance by about a factor of two. As a remedy, the
varactors can be capacitively coupled to X and Y, allowing
independent choice of dc levels. Illustrated in Fig. 19(b), such
an arrangement defines the gate voltage of Mv\ and Mv2 by
Vb VDD/2 through large resistors R\ and R2.
The coupling capacitors, Cc\ and Cci, must be chosen
much greater than the maximum value of Cvar so as not
to limit the tuning range. For example, if Cc\ Cci
5Cvartmax, then the equivalent series capacitance reaches only
5Ciar,max/(6Cvartmax) = 0.83Cvar,maar, Suffering from a
17% reduction in dynamic range. On the other hand, large coupling capacitors display significant bottom-plate capacitance,

thereby loading the oscillator and limiting the tuning range.7


It is possible to realize Cc\ and Cci as "fringe" capacitors
(Fig. 20) [7] to exploit the lateral field between adjacent metal

DD

t.1

L2

Cu

Cu

Cu

Cu

Fine Control
Coarse Control

Coarse Control
(a)
'out

Fig. 20. Fringe capacitor.

lines. This structure exhibits a bottom-plate parasitic of a few


percent, but its value must usually be calculated by means of
field simulators.
The tuning range of LC VCOs must be wide enough to
encompass (a) process and temperature variations, (b) uncertainties due to model inaccuracies; and (c) the frequency band
of interest. In wireless communications, the last component
makes the design particularly difficult, especially if a single
VCO must cover more than one band. For example, in the
Global System for Mobile Communication (GSM) standard,
the transmit and receive bands span 890-915 MHz and 935-960
MHz, respectively. For one VCO to operate from 890 MHz
to 960 MHz, the tuning range must exceed 7.8%. With another 7 to 10% required for variations and model inaccuracies,
the overall tuning rang reaches 15 to 18%, a value difficult
to achieve. In such cases, two or more oscillators may prove
necessary, but at the cost of area and signal routing issues.
The phase noise of each oscillator topology must be quantified carefully. The reader is referred to the extensive literature
on the subject.

Fewer
Capacitors
Switched in

(b)

Fig. 21. (a) VCO with fine and coarse digital control, (b) resulting characteristics.

the use of NMOS devices with a gate-source voltage equal to


VDD , minimizing their on-resistance.
The above technique entails three critical issues. First, the
trade-off between the on-resistance and junction capacitance
of the MOS switches translates to another between the Q and
the tuning range. When on, each switch limits the Q of its
corresponding capacitor to (ROnCuu)~] When off, each
switch presents its drain junction and gate-drain capacitances,
CPB + CGD, in series with Cu, constraining the lower bound
of the capacitance to CU(CDB + CGD)/(CU + CDBCGD)
rather than zero. In other words, wider switches degrade the
overall Q to a lesser extent but at the cost of narrowing the
discrete frequency steps.
B. Digital Tuning
The second issue relates to potential "blind" zones in the
Our study thus far implies that it is desirable to maximize the characteristic of Fig. 21(b). As exemplified by Fig. 22, if the
tuning range. However, for a given supply voltage, a wider tuning range inevitably translates to a greater VCO gain, Kvco,
thereby making the circuit more sensitive to disturbance ("ripple") on the control line. This effect leads to larger reference
sidebands in RF synthesizers and higher jitter in timing applications. With the scaling of supply voltages, the problem of
high Kvco has become more serious, calling for alternative
solutions.
A number of circuit and architecture techniques have been
Fig. 22. Blind zone resulting from insufficient fine tuning range.
devised to lower the sensitivity of the VCO to ripple on the discrete step resulting from switching out one unit capacitor is
control line. For example, a digital tuning mechanism can be greater than the range spanned continuously by the varactors,
added to perform coarse adjustment of the frequency, allowing then the oscillator fails to assume the frequency values between
the analog (fine) control to cover a much narrower range. Il- /i and f for any combination of the digital and analog controls.
lustrated in Fig. 21 (a), the idea is to switch constant capacitors For this2reason, the discrete steps must be sufficiently small to
into or out of the tanks, thereby introducing discrete frequency ensure overlap between consecutive bands.8
steps. The varactors then tune the frequency within each step,
The third issue stems from the loop settling speed. As
leading to the characteristic shown in Fig. 21(b). Note that described below, the PLL takes a long time to determine how
the switches are placed between the capacitors and ground 8
rather than between the tank and the capacitors. This permits
With afiniteoverlap, however, more than one combination of digital and
analog controls may yield a given frequency. To avoid this ambiguity, the
loop must begin with a minimum (or maximum) value of the digital control
and adjust it monotonically.

This is relatively independent of whether the bottom plates are connected


to nodes X and Y or to R\ and Rz.

11

many capacitors must be switched into the tanks. Thus, if a


change in temperature or channel frequency requires a discrete
frequency step, then the system using the PLL must remain idle
while the loop settles.
When employed in a phase-locked loop, the oscillator of Fig.
21 (a) requires additional mechanisms for setting the digital
control. Figure 23 depicts an example for frequency synthesis.

REFERENCES
[1] B. Razavi, "Design of Monolithic Phase-Locked Loops
and Clock Recovery Circuits - A Tutorial," in Monolithic
Phase-Locked Loops and Clock Recovery Circuits, B.
Razavi, Ed., Piscataway, NJ: IEEE Press, 1996.
[2] P. Larsson, "Parasitic Resistance in an MOS Transistor
Used as On-Chip Decoupling Capacitor," IEEEJ. SolidState Circuits, vol. 32, pp. 574-576, April 1997.
[3] K. Kundert, Private Communication.
[4] M. Danesh et al., "A Q-Factor Enhancement Technique
for MMIC Inductors," Proc. IEEE Radio Frequency Integrated Circuits Symp., pp. 217-220, April 1998.
[5] A. Zolfaghari, A. Y. Chan, and B. Razavi, "Stacked Inductors and Transformers in CMOS Technology," IEEE
Journal of Solid-State Circuits, vol. 36, pp. 620-628,
April 2001.
[6] F. Behbahani, et al., "A 2.4-GHz Low-IF Receiver for
Wideband WLAN in 0.6-//m CMOS," IEEE Journal of
Solid-State Circuits, vol. 35, pp. 1908-1916, December
2000.
[7] O. E. Akcasu, "High-Capacity Structures in a Semiconductor Device," US Patent 5,208,725, May 1993.

VMLogic
Coarse
Control

VL<

Charge
Pump

VCO
Fine
Control

Fig. 23. Synthesizer using fine and coarse frequency control.

Here, the oscillator control voltage is monitored and compared


with two low and high voltages, VL and VJJ, respectively. If
Vcont falls below Vi, the oscillation frequency is excessively
low 9 , and one unit capacitor is switched out. Conversely, if
Vcont exceeds V#, one unit capacitor is switched in. After each
switching, the loop settles and, if still unlocked, continues to
undergo discrete frequency steps.

We assume the frequency increases with VCOnt.

12

Delay-Locked Loops - An Overview


Chih-Kong Ken Yang
Abstract Phase-locked loops have been used for a wide
range of applications from synthesizing a desired phase
or frequency to recovering the phase and frequency of an
input signal. Delay-locked loops (DLLs) have emerged as
a viable alternative to the traditional oscillator-based
phase-locked loops. With its first-order loop
characteristic, a DLL both is easier to stabilize and has
no jitter accumulation. The paper describes design
considerations and techniques to achieve high
performance in a wide range of applications. Issues such
as avoiding false lock, maintaining 50% clock duty cycle,
building unlimited phase range for frequency synthesis,
and multiplying the reference frequency are discussed.

the data bus, the actual sampling clock is no longer properly


aligned with the data. A DLL is commonly used to lock the
phase of the buffered clock to that of the input data. The
phase locking significantly reduces timing uncertainty in
sampling the data, which then enables higher data rates as in
[3].
Although aperiodic signals can also be delayed by the
delay line in a DLL, the inputs to delay lines are typically
clock signals. By using a periodic signal, the delay lines do
not need arbitrarily long delays and typically only need to
span the period of the clock to generate all possible phases.
A data signal can be delayed by sampling the data with the
appropriately delayed clock.
The motivation for using DLLs is that the design of the
control loop is simplified by having only phase as the state
variable. Section II reviews how such a loop is
unconditionally stable and has better jitter characteristics.
However, a DLL is not without its own limitations. The
variable delay line has a finite delay range and finite
bandwidth. Section II also discusses these design
considerations.
Section
III
describes
different
implementations of the variable delay line. Within the past
ten years, modifications to the basic DLL architecture have
enabled clock and data recovery applications in
"plesiochronous" systems [4] where the sampling rates for
clock and data differ by a few hundred parts-per-million in
frequency. Delay lines with effectively infinite delay are also
addressed in Section III.
More recently, several researchers such as [5] and [6]
have introduced architectures that permit frequency
multiplication based on delay lines which further extends
their use in clock generation and frequency synthesis.
Section IV describes these architectures.

I. INTRODUCTION

Many applications require accurate placement of the


phase of a clock or data signal. Although simply delaying the
signal could shift the phase, the phase shift is not robust to
variations in processing, voltage, or temperature. For more
precise control, designers incorporate the phase shift into a
feedback loop that locks the output phase with an input
reference signal that indicates the desired phase shift. In
essence, the loop is identical to a phase-locked loop (PLL)
except that phase is the only state variable and that a
variable-delay line replaces the oscillator. Such a loop is
commonly referred to as a delay-line phase-locked loop or
delay-locked loop (DLL). As with a PLL, the goals are (1)
accurate phase position or low static-phase offset, and (2)
low phase noise or jitter.
Because a DLL does not contain an element of variable
frequency, it historically has fewer applications than PLLs.
Bazes in [1] demonstrated an example of precisely delaying
a signal in generating the timing of the row and column
access strobe signals for a DRAM. Another common
application uses a DLL to generate a buffered clock that has
the same phase as a weakly-driven input clock. Johnson in
[2] synchronizes the timing of the buffered clock of a
floating-point unit with the clock of a microprocessor. A
similar application recovers the data of a parallel bus by
generating a properly positioned sampling clock. Typically,
these systems provide a sampling clock with the same
sampling rate but with an arbitrary phase as compared to the
data (i.e. a "mesochronous" system [4]). A clocked DRAM
data bus is an example of such a system. A clock propagates
with the data as one of the signals in the bus and therefore
has a nominally known phase relationship with the data.
However, in order to receive and buffer the clock to sample

II. DLL

CHARACTERISTICS

The basic loop building blocks are similar to that of a


PLL: a phase detector, a filter, and a variable-delay line.
Figure 1 illustrates the three main functional blocks. Since
phase is the only state variable, a control loop higher than
first-order is not needed to compensate a fixed phase error.
The resulting transient impulse response is a simple
exponential. Although the simple loop characteristics are an
advantage that DLLs have over PLLs, the design is
complicated by the additional circuitry that is needed to
overcome having a limited delay range and not producing its
own frequency.
A. First-order Loop
A phase detector compares the phase of the reference
input and the delay-line output. The comparison yields a
signal proportional to the phase error. The error is low-pass

C.K. Ken Yang is with University of California at Los Angeles,


yang@ee.ucla.edu.

13

PLL

1.2
in

PD

Filter

KpD

Gp(s)

DLL

1
0.8

VC

0.6

Delay Line
dly__in

*^DL

0.4

dly_put

0.2

Figure 1: DLL architecture.

&
*HLOOP

open loop

H(s)
(dB)

20

^eO*)

60

Figure 3: Step response of PLL and DLL (with same loop characteristics).

,20dB/dec
the tracking of the phase of the input clock changes at
different frequencies. Based on the transfer function, the
loop bandwidth is (Obw = KPDKDLGF.
For frequencies
within the loop bandwidth the phase of the output clock will
track that of the reference input and reject noise within the
loop. The phase characteristics of the output clock above the
bandwidth of the loop depend on the phase behavior of the
delay-line input and the noise from the delay line. The noise
transfer function from a noise source lumped at the
delay-line output is a high-pass response.

closed loop
1

log CO

Figure 2: Open- and closed-loop transfer characteristics.


filtered to produce a control voltage or current that adjusts
the delay of the delay line. The delay-line input can be either
the reference input or a clean clock signal.
The s-domain representation of each loop element is
depicted within each block in Fig. 1. The open-loop transfer
function can be written as T(s) = KPDKDLGF(s)
where
Kprj is the phase-detector gain, Gp(s) is the filter transfer
function, and KTJL *S t n e delay-line gain. If the loop has
finite gain at dc, the resulting output signal will exhibit a
static phase error as shown in the following equation.

^ l - o -

l + l/{KPDKDLGF(s))s

1 -I-

In some degenerate cases, the delay-line input is also the


reference input. The feedback loop would guarantee a fixed
phase relationship between the delay-line output and the
reference so any phase variations in the reference would
directly appear at the delay-line output in an all-pass
response. However, noise due to the delay line is still
high-pass filtered.

0 )

B. Advantages over a PLL


The loop characteristics are considerably simpler than
those of a PLL. A PLL would contain at least two states to
store both the frequency and phase information. In order to
maintain loop stability, an additional zero is needed. A DLL
is less constrained with only a single pole. The loop gain
directly determines the desired bandwidth. The only stability
consideration is when the loop bandwidth is very near the
reference frequency. The periodic sampling nature of the
phase detection and the delay in the feedback loop degrade
the phase margin. For instance, if the feedback delay is one
reference cycle, the loop bandwidth should not exceed 1/4 of
the reference frequency.
Figure 3 illustrates the response to a noise step applied
to the control voltage for both a PLL and a DLL. A PLL
accumulates phase error due to its higher-order loop
characteristic. In response to a phase error, the control

=o

To eliminate the static phase error, the filter is often an


integrator to store the phase variable. This results in a
first-order closed-loop transfer function.

H(s) =
1+

(s/KPDKDLGF)

(s/KpDKDLGF)

(2)

The equation assumes that the delay-line input is a clean


reference as opposed to the reference input. Higher-order
loop filters have not commonly been used but can enable
better tracking of a phase ramp (i.e. a frequency difference).
Figure 2 shows the open-loop and closed-loop transfer
functions. With only a single integrator, the open-loop phase
margin is 90. The loop is unconditionally stable as long as
the delay in the loop does not degrade the phase margin
excessively. The closed-loop transfer function illustrates that

14

data sample
Rcvr
+7T
data

lock
point

transition sample
Rcvr

Filter

2nd lock
point

Vc

ref_clk

-71

Delay Line

Delay (V c )
Figure 5: Delay line phase/delay characteristic.

sampling clock

However, the data receiver is ultimately a binary


comparator and the phase detector does not indicate an error
that is proportional to the phase difference. Hence, the
timing-recovery loop is nonlinear. Although a higher-order
PLL using early-late control can be made conditionally
stable [8], the resulting phase dithers with a limit cycle
translating into jitter. The oscillation depends on the loop
parameters and can be considerable for high bandwidth
loops. With an early-late DLL, the phase of the clock output
also dithers. But because the stability only depends on the
delay within the loop, the dithering would only be a few
cycles and can be significantly less than the dithering of a
PLL.

Figure 4: Early-late receiver architecture using the receiver as the


phase detector. Timing diagram showing early and late data.

C. Design Considerations in a DLL


A typical DLL involves several design considerations.
First, the delay line usually has a finite delay range. If the
desired phase of the output signal is beyond the delay range,
the loop will not lock properly. Second, the output of the
DLL also depends greatly on the input to the delay line.
Since the delay-line input propagates to the DLL output,
tracking jitter and the output's duty cycle depend not only on
the delay-line design but also on the delay-line input. Third,
the basic DLL cannot generate new frequencies different
from that of the delay-line input.
A variable-delay line adjusts the delay by varying the
RC time constant of a buffer and often has limited
adjustment range. Section III will describe several
techniques in greater detail. Even though the delay range is
limited, DLLs for a periodic clock signal only need the range
to exceed 2n in phase across process and systematic
variations to cover all possible phases. For systems with a
range of operating frequencies, the delay line must span 2K
for the lowest input frequency.
An issue known as false-locking occurs when the delay
range exceeds IK. There can be several secondary lock
points repeating every 2TC. Figure 5 depicts an example of the
characteristic of a delay line with two lock points. Since
phase detectors must be periodic, if the delay line initializes
within 7i of the second lock point, the phase detector will
push the delay line toward lock with a longer than necessary
delay. Long delays require large RC time constants for a
given variable-delay buffer element. The bandlimiting by the

voltage alters the frequency of an oscillator. The output


phase is an integration of the frequency change. In response
to a noise perturbation, the loop accumulates a phase error
before correcting. In contrast, a DLL attenuates the phase
error by the time constant of the loop. In the figure, both
loops are designed with the same 3-dB bandwidth, the same
delay elements, and the PLL is a 2nd-order loop with a
damping factor of unity. Clearly, the PLL suffers from larger
phase errors due to the phase accumulation.
A second advantage relates to clock and data recovery
applications. An effective way to recover the timing for
sampling a data input is to use the data receiver as a phase
detector. The architecture, depicted in Fig. 4, uses the
180-shifted clock to sample the data transitions in addition
to sampling the data values [7]. Whenever data changes
values, the sampled transition and the data values can be
combined to indicate whether the sampling clock edge is
earlier or later than the data transition. Phase information is
only present with data transitions. The feedback loop locks
when the transition sampling clock samples a metastable
value. This commonly used design is known as an early-late
or bang-bang architecture. The timing diagram in Fig. 4
illustrates examples of the data being early and late. Due to
the inherent setup time of the data receiver, the transition
sampling clock may not occur at the same time as the data
transition. The phase shift compensates for the receiver setup
time and maximizes the margin of error for the data
sampling.

15

filter would significantly attenuate a high-frequency input


clock. The attenuation increases the jitter and may even
prohibit the input from reaching the output.
Even if the delay line is constrained to span only one
lock point but greater than 2rc, a second similar issue exists.
It is difficult to design a delay line such that the adjustable
range is exactly *c to -tic across different operating and
processing conditions. If initialized at the minimum or
maximum delay, the phase detector may push the loop
toward either the maximum or minimum delay limit and
"false-lock" to an incorrect phase.
To address false-locking, designers employ several
techniques depending on the application. For systems that
require a delay line with a known fixed delay, operating
condition variations may be small enough such that the delay
line only needs a small variable range that is less than +n and
-n. For systems that lock to a fixed phase over a wide range
of frequencies, one design [9] uses an auxiliary
frequency-sensing loop that generates a voltage to coarsely
set the delay for the given input frequency. Then DLL only
fine tunes the delay for the desired phase. For data recovery
applications where the clock phase can be arbitrary with
respect to the data, a common design uses a startup circuit
for the DLL that initializes the delay line at its minimum
delay to avoid any secondary lock points. However, as
mentioned earlier, the phase detector may keep the delay line
at the minimum delay. A sensing circuit or a state machine
detects when the delay line is at its limit and optionally
inverts the feedback clock. The phase would flip by 180 and
the loop would lock properly. As will be discussed in
Section III, a more robust alternative reconfigures the delay
line such that the delay only spans 2n and wraps back to 0
when the delay exceeds 360.
The jitter and duty cycle of the delay-line output clock
depend on the input, the coupling of the input to the delay
line, and the delay line itself. Often the input is from off-chip
and, therefore, it must be carefully received to prevent
supply and substrate noise from coupling onto the signal as
jitter. In contrast, the high-frequency phase noise of the
clock output of an oscillator-based PLL depends primarily
on the oscillator design. An improperly received input clock
can often result in worse jitter performance in a DLL as
compared to a PLL. Similarly, while the duty cycle from an
oscillator is only modestly distorted (by the difference
between the rising edge and falling edge delays), the duty
cycle of the DLL's input clock can be significantly distorted
as it propagates to the output. Since duty cycle is a
systematic error, a good design corrects duty cycle using an
explicit block instead of compounding the difficulty of the
delay-line design.
A duty-cycle corrector (DCC) is commonly added to
either the DLL input or output. Figure 6 illustrates the basic
components of the feedback loop: an input with finite slew
rate, a buffer element with adjustable threshold, a
comparator, and an integrator. The comparator determines
the threshold crossing of the clock waveform. The result is
integrated and used to skew the threshold of the buffer stage.

clock in
Vref

clock out

offset
vref

Vrefl

/
clock in

\vrcf2

duty c y c l e

clock out

reduction

*
mmmmmmm

Figure 6: Duty-cycle corrector block diagram. Timing diagram


shows change in duty cycle with changing offset.

Because the buffer input has finite slew rate, changing the
threshold effectively adjusts the output high and low
half-periods. The loop settles when the high and low
half-periods are equal. Figure 6 illustrates the reduction in
duty cycle as the threshold shifts from Vrefi to Vref2. Since
random variation of the duty cycle effectively appears as
jitter, single-ended implementations such as that shown in
the figure can be very sensitive to common-mode noise. For
this reason, differential architectures are preferred [3].
For low jitter on the output clock, the loop components
must be carefully designed. Many of the loop components
are very similar to that of a PLL and are well described in
[10]. For a charge-pump based loop filter, since the filter is
only first-order, a simple capacitor replaces the RC filter. As
in a PLL, noise on the control voltage directly translates into
jitter. Designers may use additional filtering to suppress the
noise. The loop element that has deviated the most from PLL
design and is critical for functionality and performance is the
design of the delay line.
III. D E L A Y - L I N E ARCHITECTURES

The primary characteristics of a delay line are (1) gain


(i.e. change in delay for a given change in voltage), and (2)
delay range. For most applications using periodic inputs, the
absolute delay is not critical as long as the range spans 27U.
Because delay lines are relatively short, they do not
contribute significant thermal or 1/f phase noise. However,
for large digital systems, low supply/substrate sensitivity is
needed to reject the on-chip switching noise.

16

vc
out

out

in +

in+

vb~

in_

Vc
(a)

capacitive
control (f)

(b)
V

CP

vc

out
in
in+

Figure 8: Delay versus voltage for two different delay buffer elements: types (d) and (f) of Fig. 7.

out

in.

For push-pull type elements such as inverters, the delay


can be changed by changing the rate at which the output
capacitance is charged [Fig. 7-(d)]. An adjustable current
source limits the peak current of an inverter and varies the
delay. An alternative method regulates the supply voltage of
the inverters and uses the control voltage to set the supply
voltage [Fig. 7-(e)]. The effective switching resistance varies
with the supply voltage. Instead of changing the resistance,
the effective capacitance can also be made adjustable [Fig.
7-(f)]. A transistor that behaves as an adjustable resistance
can be used to decouple an explicit output capacitance. The
larger the resistance the less capacitance is seen at the
output.
Figure 8 illustrates the delay versus control voltage for a
resistively-controlled delay element. For the element of Fig.
7-(d), either \fcs"^th o r m e ^*as c u r r e n t c a n De z e r o an d,
therefore, a single element's delay can span from the
minimum buffer delay to infinite. However, since the time
constant is proportional to the delay, a long delay setting
would significantly attenuate a high-frequency clock. Delay
lines with a wide range for high clock frequencies require a
large number of broadband delay elements.
Unlike resistive control, the maximum delay in a
capacitively-controlled element [Fig. 7-(f)] is proportional to
R(C int +C exp ) and the minimum delay is proportional to
RC int where C int is the intrinsic capacitance of the buffer
and the load of the subsequent stage, and C e x p is the explicit
capacitance added to the circuit. Because of the limited
range per buffer, obtaining a wide delay range involves a
large number of buffers. The maximum delay of each buffer
is chosen to avoid attenuating the signal. In designs where
the clock has a large voltage swing, the transistor in series
with the explicit capacitance no longer appears as a variable
resistor because the device enters saturation and cut-off. For
these buffers, the control voltage determines the fraction of
current and period of time in which the buffer's current
charges the explicit capacitance.
An example of the delay versus control voltage for a
capacitively-controlled element is overlaid in Fig. 8. Most

VOT

Vb~

resistive
control (d)

Vc
(d)

(c)

Vc

in

out

vc

in

out
(e)

(f)

Figure 7: Six different delay elements.

A. Basic Delay Line


A delay line comprises of a chain of variable-delay
elements. Each element is controllable by either a voltage or
a current. The delay of each element is proportional to its RC
time constant and changing the effective resistance or
capacitance adjusts the delay.
Figure 7 depicts several examples of buffer elements.
For a differential buffer, the load resistance can be an MOS
transistor in the triode region [Fig. 7-(a)] where the
resistance is proportional to Vos-^th- Varying the gate
voltage adjusts the delay of the element. A non-linear device
such as a diode can also serve as a load resistance [Fig.
7-(b)]. Since the resistance varies with the current, varying
the bias current of the buffer would adjust the delay.
Similarly, a negative transconductance that changes with the
bias current can be placed in parallel with a fixed load
resistance
[Fig.
7-(c)]. The
varying
negative
transconductance changes the effective load resistance and
hence varies the delay. Because nonlinear elements have
resistances that depend on both voltage and current, they can
be more sensitive to supply noise.

17

180 Phase
Detect +
Filter

<*inO

& cI W>

Ckjnl

ck

yc

<*inl

X
ck

inO

<*outO

^Io

clock^

4*90,270

^outOl

^0,180

in0

^outl

^outl

cllWoi

0135,315
Figure 9: 180-locked DLL to generate intermediate phases that are
a fraction of a cycle.
^45,225

0-oc)I0

\ JPhaselntei^Iartor, ^
delay elements exhibit some nonlinearity. As a result, the
delay-line gain, K DL , is a function of the delay. Because a
DLL is unconditionally stable, the loop still functions with
the varying loop parameter. However, more linear elements
are better for designs that require a constant loop bandwidth.
To compensate for the variable K D L , designers add
programmability to the loop-filter capacitor.
The control signal for either type of delay elements can
be digital. In a digital implementation [11], the current
source is binary weighted and switched by a digital word.
For capacitively-controlled elements, the capacitance can be
binary weighted and switched. A nearly all-digital DLL is
then possible by using a simple counter to replace the analog
integrating filter.

Figure 10: Phase interpolator design by shorting of the output of


two integrators/buffers..

Multiplexers are needed to select the phases to interpolate


between. For example, with phases tapped from a 4-stage
delay line, if the desired output clock phase is 120, the
interpolator inputs would be from the second and third delay
elements.
Interpolators essentially perform a weighted average of
the input phases. As shown in Fig. 10, ideally, the two input
phases drive two integrators which charge a single output.
The weighting of the average is by the relative currents of
the two integrators. When <x=l, the output clock phase
depends only on ckinQ. When a=0.5, i.e. the current is split
equally between the two integrators, the output phase is
additionally delayed by half the phase difference. As
illustrated in Fig. 10, the phase of the interpolated output
(ckoutQj) falls between the phases of the non-interpolated

B. Phase Interpolation

Instead of only using the clock phase at the end of a


delay line, an earlier clock phase can be tapped from the
middle of a delay line. Some applications require the delay
line to produce a delay that is a fixed fraction of the
input-clock period. Figure 9 shows one implementation that
uses a DLL to lock the input clock to the output. An 180
phase detector would guarantee the absolute delay of a delay
line to be a half-cycle. Tapping from different points on the
delay line provides different phases. As shown in Fig. 9, for
a 45 phase shift, the clock can be tapped from the first delay
stage of a 4-stage differential delay line. If an arbitrary phase
is needed, each delay stage can be tapped and multiplexers
can select the nearest desired phase. The number of delay
elements quantizes the phase step and limits the resolution
[12]. Fine phase resolution requires longer delay lines. Yet,
the resolution is limited at high clock frequencies because
the maximum number of delay elements needed to span 180
is limited.
An arbitrary intermediate phase can be obtained by
"interpolating" between two clock phases that are tapped
from a delay line. Depending on the weighting, an
interpolator produces a clock that has a programmable
output phase in between the input clock phases. As long as
discrete clock phases that span the entire cycle are available
as inputs, any phase for the interpolator's output is possible.

outputs (ckout0 and ck0UtJ).

With ideal integrators, the interpolation is linear,


resulting in a constant KpL. Alternatively, an interpolator
can effectively be formed with buffer elements instead of
integrators. By weighting the drive strength or current of two
buffer elements whose outputs are shorted together, one can
adjust the output phase. Because the output is not integrated,
the resulting interpolation is slightly nonlinear and depends
on (1) the phase difference between the inputs and (2) the
slew rate (or time constant) of the input and output signals
[13]. Figure 11 depicts the linearity of the interpolation for
two different input phase separations, s=r and S=2T where x
is the buffer's time constant. The larger phase spacing results
in greater nonlinearity. Similar to RC delay elements, the
interpolation can be digitally controlled. Since the weighting
of the interpolation depends on the proportional current, the
current sources of the integrators or buffers can be digitally
weighted and programmed.
In a design for clock and data recovery by [3],
quadrature clocks are interpolated to generate an
intermediate clock phase within a quadrant. Figure 12
illustrates the mostly analog architecture. An analog control

18

clocks
Phase Generator

3.61

l^O

J<feo

J<Pl80 % 7 0
,Mux A

Control

I 2.6T

Interpolator
datajn

1.6T
0.0

0.2

Phase
Detect

docksamp

Figure 12: Infinite-range delay line based on phase rotation.

0.4

0.6
0.8
1.0
current partition (X)
Figure 11: Buffer based phase interpolator linearity.

Nc-^K^-K^I

voltage produced by the phase detector and filter determines


the interpolator currents. Comparators indicate when the
current is fully steered to one integrator. A finite state
machine driven by the comparators selects the appropriate
quadrant by switching the interpolator inputs such that all
360 phases are possible. The quadrature input clocks is
generated from an external reference clock through the use
of a divide-by-two circuit.
Interestingly, because the phase rotates from one
quadrant to the next, the architecture effectively has an
unlimited delay range. If the input data rate and the reference
clock frequency are slightly different, a DLL would
continually increase or decrease the delay in order to track
the accumulating input phase. A typical DLL with a finite
delay range would run out of delay or lose lock. On the other
hand, plesiochronous operation is possible with an
interpolator-based delay line since the phase smoothly
rotates between quadrants.
Interpolating between clocks with large phase spacings
such as quadrature clocks results in an output clock with
slow slew rate. Such waveforms are more susceptible to
noise and result in higher jitter. An enhancement uses more
closely-spaced phases that span the cycle. The finer phases
spacing is possible using a multi-stage ring oscillator. As
shown in Fig. 13, a 4-stage differential oscillator would
generate 8 phases 45 apart. To guarantee a correct period
for each clock phase, the ring oscillator is locked to the
external reference clock using a PLL. The role of the PLL is
solely for generating the phases. A purely DLL-based
architecture is also possible by replacing by using the DLL
in Fig. 9 that locks the delay-line output with a 180 phase
shift [13].
The architecture is commonly known as a dual-loop
design because the first loop, a PLL or DLL, generates the
phases and the second loop, the interpolation-based DLL,
recovers the data and phase. Since the first loop is not in the
feedback of the second loop (or vice versa), the overall
system is stable as long as each loop is individually stable. A
dual-loop design is possible with the second loop within the

t
4*45,225

^ ^ O

t
$135,315

0,180

Figure 13: Oscillator with tapped outputs for multiple phases.

feedback of the first loop [15] as long as the stability of the


loop is carefully considered.
The data recovery portion of a dual-loop design is
conducive to a digital implementation. The binary output of
the receiver-replica phase detector can be accumulated using
a digital counter. The counter output selects the appropriate
phase from the oscillator and controls the digitally
programmable interpolators [13],[14]. As long as the
quantized phase step is small, the small error only minimally
impacts the data recovery.
C. Overs amp led Implementation
An alternative purely digital approach to clock and data
recovery can be implemented by oversampling the data.
Figure 14 illustrates an example of a digital architecture.
Multiple finely-spaced clock phases oversample the data
input. The sampled results are digitally processed to
determine both the correct data value and the optimal phase
of the data sample. The digital processing can vary in
complexity. Simple implementations use the optimal data
sample as the received data [18] or take a majority vote from
the samples of a single bit [17]. The bit boundaries
determine the samples associated with a bit. Transitions that
are detected in the samples from the prior or current bits
indicate the bit boundaries.
The sampling rate limits the timing error margin.
Greater amount of oversampling reduces the data-recovery
timing error, but increases the number of clock phases. Low
data rate UARTs [16] typically use 8 to 16 times
oversampling. For high data rates, generating accurate clock
phases separated by sub-lOOps is very challenging. More

19

datait1

L L Lr
R.
C

>

>

cloclT l^o
[1:N]

>

Receiver
Samplers

do'
Decision
Logic

Transition
Detect

Startable Oscillator
dafcin

TDH

180P Delay Line

->\
r

18(f Delay Line

dock^p
Receiver

delay control

f received data
Figure 15: Clock/data recovery using startable oscillator.

received data
Figure 14: Oversampled data recovery architecture.

uses logical AND-ORs to combine the multiple phased


clocks into a single high-frequency clock [23]. Alternatively,
the method in [24] converts each phase into a small pulse
and ORs the pulses together to form the output clock. In
cases where the output capacitance of the logic gates limits
the output frequency, one design [6] uses phases to excite a
tuned LC tank to combine the clock phases.
Instead of edge combining, the multiplied clock can be
the direct output of a delay line. The architecture is similar to
a technique for clock and data recovery that uses a startable
oscillator [22]. As shown in Fig. 15, the architecture uses
data transitions to trigger startable oscillators: high-value
data triggers one oscillator and low-value data triggers
another. Each startable oscillator comprises of a delay line
and an AND gate. The data value enables the AND gate and
the triggered oscillator propagates an edge through the delay
elements and produces a clock edge delayed by a half-cycle.
The edge is used to sample the data. In the absence of input
transitions, the delay line is configured an oscillator and
generates a sampling edge every cycle. Whenever a new data
transition occurs, the oscillator resynchronizes its phase to
that of the input. In the implementation by [22], the natural
oscillation frequency of the oscillator is determined by an
external plesiochronous clock reference. The architecture
has not been widely applied to higher data rate designs
because the sampling phase is directly derived from the input
data without any filtering. The deterministic and random
jitter inherent in the data are effectively doubled and can be
considerable.
If the input is a low-jitter reference clock, a similar
architecture can be used for clock multiplication [5]. As
illustrated in Fig. 16, a lower frequency but clean reference
clock is one input to a multiplexer that feeds into a delay
line. The output of the delay line is fed back to the
multiplexer as the second input. When a reference clock
edge is available, the multiplexer selects the reference input.
Otherwise, the multiplexer configures the delay line as an
oscillator with the output frequency controlled by the delay.
The multiplexer inputs are selected by a counter circuit that
determines the number of cycles to oscillate before accepting
the next reference clock edge. A phase detector compares the

aggressive designs with the least amount of clocking


overhead and high data rates use a minimum of 3x
oversampling [17], [18].
Even though phase spacing scales with the gate delay of
a technology, so does the bit time in each generation of
applications. For oversampling of the data bits, finely-spaced
clock phases are needed. Tapping from a delay line produces
phases separated by a buffer delay. For even finer phases,
several techniques are commonly used. For example, several
interpolators can be used where each interpolator has slightly
different weighting to generate intermediate phases with
spacing less than a buffer delay [19]. An alternative method
uses a chain or array of coupled oscillators [20]. By taking a
chain of oscillators and coupling them such that the output
and input of the chain are separated by only one gate delay,
sub-gate-delay phase spacings result from the outputs of
each oscillator. Lastly, if the data can be delayed with a
chain of delay buffers along with the clock, the clock at each
delay stage can be used to sample the data of the
corresponding stage. As long as the data and the clock delay
lines have slightly different delays, the sub-sampled outputs
are effectively an oversampling of the data. The effective
phase spacing depends only on the difference between the
data delay and clock delay [21]. The architecture has a
drawback in that it requires delaying the data and clock by
long delays of several cycles, which can significantly
increase jitter.
IV. CLOCK MULTIPLICATION

With a dual-loop architecture, a DLL can produce a


frequency plesiochronous to the delay-line input. However,
the rate at which the interpolator weight changes limits the
frequency difference. Generating a significantly different or
multiplied frequency from a low-frequency input reference
is not possible with the architecture.
Recently designers have explored several methods of
using DLLs for frequency multiplication. One method uses a
delay line that is locked to 180. With the phases that span an
entire cycle, the tapped clock edges are combined to form a
clock with multiplied frequency. The most direct method

20

restore a clock's duty cycle, the output clock requires


correction circuitry. To use DLLs in plesiochronous systems,
the delay line must have even more circuitry to achieve an
unlimited delay range. In clock multiplication applications,
very careful matching in the DLL components is critical to
eliminate reference tones. In the many designs that have
addressed these subtleties, DLLs have demonstrated
low-jitter clock outputs for a variety of clock generation and
data recovery applications.

Counter+
Control
clock^

Delay Line
Nxf ref
Filter

Detect

REFERENCES

**

[I]

Figure 16: DLL-based clock multiplication.

[2]

reference input with the oscillator output and tunes the delay
of the delay elements. Once locked, the resulting output
clock frequency is a multiple of the input reference
frequency. Recent designs [26] extend the frequency range
and use an interpolator instead of a multiplexer to blend the
delay-line feedback and the low-frequency reference clocks.
Both edge-combining multiplication and delay-line
multiplication reduce the phase noise of the output clock
because the core DLL does not have an oscillator that
accumulates phase error. After N cycles, where N is the
divide ratio, a new clean reference clock edge arrives and
resets any accumulated phase error to zero. The architecture
potentially lowers jitter by eliminating the peaking in the
transfer function and allows a high tracking bandwidth.
However, matching is critical in these designs. Mismatches
in the phase detector or charge pump result in a static phase
error that modulates the output frequency at the input
reference frequency. Similarly, in the edge combining
implementations, if the delay line is mismatched, the output
clock would contain significant reference tones. Designers
either choose the reference frequency carefully so that the
tones do not impact the system performance or employ
additional circuitry to compensate for the mismatches.

[3]

[4]

[5]

[6]

[7]
[8]

[9]

[10]
V.

CONCLUSION

DLLs have been commonly used for generating precise


phase delays of a signal and have been increasingly popular
in clock generation and data recovery applications. Most
importantly, because of the first-order loop characteristics
that controls the phase directly, DLLs can be designed with
high tracking bandwidths and do not exhibit the phase
accumulation of an oscillator-based PLL.
The more simple loop characteristics belie many
subtleties in DLL design. The delay-line input clock must
have low-jitter and good duty-cycle. Furthermore, it must be
carefully received and coupled to the input of the delay line
to maintain good jitter performance. This source of jitter
counter-balances the jitter accumulation of PLLs and results
in less jitter improvement. Additional circuitry is often
needed to prevent false-locking. Since a delay line does not

[II]

[12]

[13]

[14]

[15]

21

Bazes, M., "A Novel Precision MOS Synchronous Delay


Line," IEEE Journal of Solid-State Circuits, vol sc-20, no
6, Dec. 1985, pp. 1265-71
Johnson, M.G., E.L. Hudson, "A Variable Delay Line
PLL for CPU-Coprocessor Synchronization," IEEE Journal of Solid-State Circuits, vol 23, no 5, Oct. 1988, pp.
1218-23
Lee, T.H., et. al., "A 2.5V CMOS Delay-Locked Loop for
an 18 Mbit, 500Megabytes/s DRAM," IEEE Journal of
Solid-State Circuits, vol 29, no 12, Dec. 1994, pp. 1491-6
Messerschmitt, D.G., "Synchronization in Digital System
Design," IEEE Journal on Selected Areas in Communications, Oct. 1990, pp. 1404-1420
Waizman, A., "A Delay Line Loop for Frequency Synthesis of De-Skewed Clock," IEEE ISSCC Dig. of Tech.
Papers, Feb. 1994, San Francisco, Session 18.5
Chien, G., P.R. Gray, "A 900-MHz Local Oscillator
Using a DLL-Based Frequency Multiplier Technique for
PCS Applications," IEEE Journal of Solid-State Circuits,
vol 35, no 12, Dec. 2000, pp. 1996-9
Alexander, J.D., "Clock Recovery from Random Binary
Data," Electronic Letters, vol 11, Oct. 1975, pp 541-2
D1 Andrea, N.A., F. Russo, "A binary quantized digital
phase locked loop: a graphical analysis," IEEE Transactions on Communications, vol.COM-26, (no.9), Sept.
1978.p.l355-64
Moon, Y,, "An All-Analog Multiphase Delay-Locked
Loop Using A Replica Delay Line for Wide-Range Operation and Low-Jitter Performance," IEEE Journal of
Solid-State Circuits, vol 35, no 3, Mar. 2000, pp. 377-84
Razavi, B., "Design of Monolithic Phase-Locked Loops
and Clock Recovery Circuits - A Tutorial," Monolithic
Phase-locked Loops and Clock Recovery Circuits, IEEE
Press 1996 New Jersey, pp. 1-28
Dunning, J., et. al. "An All-Digital Phase-Locked Loop
with 50-Cycle Lock Time Suitable for High-Performance
microprocessors," IEEE Journal of Solid-State Circuits,
vol 30, no 4, Apr. 1995, pp. 412-22
Efendovich, A., et. al., "Multifirequency Zero-Jitter
Delay-Locked Loop," IEEE Journal of Solid-State Circuits, vol 29, no 1, Jan. 1994, pp. 67-70
Sidiropoulos, S., M.A. Horowitz, "A Semidigital Dual
Delay-Locked Loop," IEEE Journal of Solid-State Circuits, vol 32, no 11, Nov. 1997, pp. 1683-92
Garlepp, B., et. al., "A Portable Digital DLL for
High-Speed CMOS Interface Circuits," IEEE Journal of
Solid-State Circuits, vol 34, no 5, May 1996, pp. 632-44
Larsson, P., "A 2-1600-MHz CMOS Clock Recovery
PLL with Low-Vdd Capability," IEEE Journal of

[16]

[ 17]

[18]

[19]

[20]

[21]

Solid-State Circuits, vol 34, no 12, Dec. 1999, pp.


1951-60
Cordell, R., "A 45-Mbit/s CMOS VLSI Digital Phase
Aligner," IEEE Journal of Solid-State Circuits, vol 23, no
2, Apr. 1988, pp. 323-28
Lee, K., et. al., "A CMOS Serial Link For Fully Duplexed
Data Communication," IEEE Journal of Solid-State Circuits, vol 30, no 4, Apr. 1995, pp. 353-64
Yang, C.K., et al., "A 0.5-|im CMOS 4.0-Gb/s Serial
Link Transceiver with Data Recovery Using Oversampling," IEEE Journal of Solid-State Circuits, vol 33, no 5,
May 1998, pp. 713-22
Weinlader, D., et al., "An Eight Channel 36-GS/s CMOS
Timing Analyzer," IEEE ISSCC Dig. of Tech. Papers,
Feb. 2000, San Francisco, pp. 170-1
Maneatis, J., M. Horowitz, "Precise Delay Generation
Using Coupled Oscillators," IEEE Journal of Solid-State
Circuits, vol 28, no 12, Dec. 1993, pp. 1273-82
Gray, C , et. al., "A Sampling Technique and Its CMOS
Implementation with lGb/s Bandwidth and 25ps Resolution", IEEE Journal of Solid-State Circuits, vol 29, no 3,
Mar. 1994, pp. 340

[22]

[23]

[24]

[25]

[26]

[27]

22

Ota, Y. et. al., "High-Speed, Burst-Mode, Packet Capable


Optical Receiver and Instantaneous Clock Recovery for
Optical Bus Operation," IEEE Journal of Lightwave
Technology, vol 12, no 2, Feb. 1994, pp. 325-330
Foley, D., M.P. Flynn, "CMOS DLL-Based 2-V 3.2ps
Jitter 1-GHz Clock Synthesizer and Temperature-Compensated Tunable Oscillaor," IEEE Journal of Solid-State
Circuits, vol 36, no 3, Mar. 2001, pp. 417-23
Kim, C , I. Hwang, S.M. Kang, "Low-Power Small-Area
+/-7.28ps Jitter lGHz DLL-Based Clock Generator,"
IEEE ISSCC Dig. of Tech. Papers, Feb. 2002, San Francisco, Session 8.3
Farjad-rad, R., et. al., "A 0.2-2GHz 12mW Multiplying
DLL for Low-Jitter Clock Synthesis in Highly-Integrated
Data Communication Chips," IEEE ISSCC Dig. of Tech.
Papers, Feb. 2002, San Francisco, Session 4.5
Ye, S., L. Jansson, I. Galton, "A Multiple-Crystal Interface PLL with VCO Realignment to Reduce Phase
Noise," IEEE ISSCC Dig. of Tech. Papers, Feb. 2002,
San Francisco, Session 4.6
Kim, J., et. al., "A Low-Jitter Mixed-Mode DLL for
High-Speed DRAM Applications," IEEE Journal of
Solid-State Circuits, vol 35, no 10, Oct. 2000, pp. 1430-3

Delta-Sigma Fractional-TV Phase-Locked Loops


Ian Galton
AbstractThis paper presents a tutorial on delta-sigma
fractional-TV PLLs for frequency synthesis. The presentation assumes the reader has a working knowledge of integer-TV PLLs. It builds on this knowledge by introducing
the additional concepts required to understand A fractional-TV PLLs. After explaining the limitations of integerTV PLLs with respect to tuning resolution, the paper introduces the delta-sigma fractional-TV PLL as a means of
avoiding these limitations. It then presents a selfcontained explanation of the relevant aspects of deltasigma modulation, an extension of the well known integerTV PLL linearized model to delta-sigma fractional-TV PLLs,
a design example, and techniques for wideband digital
modulation of the VCO within a delta-sigma fractional-TV
PLL.

Reference Signal
Generator
V

ref

<//v

Phase/
Frequency
Detector

U
Lowpass
Loop Filter

VCO ]

out

Charge Parop

f N

V
V

ref

div

Phase/
Frequency
Detector

rfv

I. INTRODUCTION

ref-

Over the last decade, delta-sigma (AS)fractional-TVphase


locked loops (PLLs) have become widely used for frequency Figure 1: A typical integer-N PLL.
synthesis in consumer-oriented electronic communications underlying fractional-TV PLLs in general and AS fractional-TV
products such as cellular phones and wireless LANs. Unlike PLLs in particular are presented in Section III. The primary
an integer-TV PLL, the output frequency of a AS fractional-TV innovation in ASfractional-TVPLLs relative to other types of
PLL is not limited to integer multiples of a reference fre- fractional-TV PLLs is the use of AS modulation. Therefore, a
quency. The core of a ASfractional-TVPLL is similar to an self-contained introduction to AS modulation as it relates to
integer-TV PLL, but it incorporates additional digital circuitry ASfractional-TVPLLs is presented in Section IV. A AS fracthat allows it to accurately interpolate between integer multi- tional-TV PLL linearized model is derived in Section V and
ples of the reference frequency. The tuning resolution de- compared to the corresponding model for integer-TV PLLs. A
pends only on the complexity of the digital circuitry, so con- design example is presented to demonstrate how the model is
siderable flexibility and programmability is achieved. A sin- used in practice. Design issues that arise in AS fractional-TV
gle AS fractional-TV PLL often can be used for local oscillator PLLs but not integer-TV PLLs are presented in Section VI, and
generation in applications that would otherwise require a cas- recently developed enhancements to AS fractional-TV PLLs
cade of two or more integer-TV PLLs. Moreover, the fine tun- that allow wideband digital modulation of the VCO are preing resolution makes it possible to perform digitally-controlled sented in Section VII.
frequency modulation for generation of continuous-phase
(e.g., FSK and MSK) transmit signals, thereby simplifying
II. INTEGER-TV PLL LIMITATIONS
wireless transmitters. These benefits come at the expense of
increased digital complexity and somewhat increased phase
An example of a typical integer-TV PLL for frequency synnoise relative to integer-TV PLLs. However, with the relentless thesis is shown in Figure 1 [1], [2]. Its purpose is to generate
progress in silicon VLSI technology optimized for digital cir- a spectrally pure periodic output signal with a frequency of TV
cuitry, this tradeoff is increasingly attractive, especially in /,/, where TV is an integer, and/ re /is the frequency of the referconsumer products which tend to favor cost reduction over ence signal. The example PLL consists of a phase-frequency
performance.
detector (PFD), a charge pump, a lowpass loop filter, a voltage
This paper presents a tutorial on ADfractional-TVPLLs. It controlled oscillator (VCO), and an TV-fold digital divider.
is assumed that the reader has a working knowledge of inte- The PFD compares the positive-going edges of the reference
ger-TV PLLs. The paper builds on this knowledge by present- signal to those from the divider and causes the charge pump to
ing the additional concepts required to understand AS frac- drive the loop filter with current pulses whose widths are protional-TV PLLs. The limitations of integer-TV PLLs with respect portional to the phase difference between the two signals. The
to tuning resolution are described in Section II. The key ideas pulses are lowpass filtered by the loop filter and the resulting
waveform drives the VCO. Within the loop bandwidth phase
The author is with the Department of Electrical and Computer Engi- noise from the VCO is suppressed and outside the loop bandwidth most of the other noise sources are suppressed, so the
neering, University of California at San Diego, La Jolla, CA, USA.

23

/ . - 4 0 kHz
^-492

Phase/
Freq.
Detector

fw
Charge
Pump

Loop
Filter

Phase/
Freq.
Detector

- 2.402 GHz
+ *MHz<wk

VCO

Charge
Pump

Loop
Filter

f**r2-403

GHz

(on average)

19.68 MHz
-MM+JPI-I

19.68 MHz
^60050 + 2 5 *

Figure 2: An example integer-N PLL for generation of the Bluetooth wireless


LAN RF channel frequencies.

PLL can be designed to generate a spectrally pure output signal at any integer multiple of the reference frequency,/*/.
As indicated by the timing diagram in Figure 1, the loop
filter is updated by the charge pump once every reference period. This discrete-time behavior places an upper limit on the
loop bandwidth of approximately fnj/l0 above which the PLL
tends to be unstable [1]. In integrated circuit PLLs, it is common to further limit the bandwidth to approximately f^/20 to
allow for process and temperature variations.
The output frequency can be changed by changing N, but
N must be an integer, so the output frequency can be changed
only by integer multiples of the reference frequency. If finer
tuning resolution is required the only option is to reduce the
reference frequency. Unfortunately, this tends to reduce the
maximum practical loop bandwidth, thereby increasing the
settling time of the PLL, the noise contributed by the VCO,
and the in-band portions of the noise contributed by the reference source, the PFD, the charge pump, and the divider.
This fundamental tradeoff between bandwidth and tuning
resolution in integer-Af PLLs creates problems in many applications. For example, a PLL that can be tuned from 2.402
GHz to 2.480 GHz in steps of 1 MHz is required to generate
the local oscillator signal in a direct conversion Bluetooth
transceiver [3]. An integer-N PLL capable of generating the
local oscillator signal from a commonly used crystal oscillator
frequency, 19.68 MHz, is shown in Figure 2. A reference frequency of fref = 40 kHzthe greatest common divisor of the
crystal frequency and the set of desired output frequenciesis
obtained by dividing the crystal oscillator signal by 492. The
resulting PLL output frequency is 60050 + 25k times the reference frequency, where k is an integer used to select the desired frequency step.
The PLL achieves the desired output frequencies, but its
bandwidth is limited to approximately 2 kHz, i.e.,/^/20. Unfortunately, with such a low bandwidth the settling time exceeds the 200 jiS limit specified in the Bluetooth standard, and
the phase noise contributed by the VCO would be unacceptably high if it were implemented in present-day CMOS technology. One solution is to use a 1 MHz reference signal, but
this requires the crystal frequency to be an integer multiple of
1 MHz, or another PLL to generate a 1 MHz reference frequency. Unfortunately, in low cost consumer electronics applications such as Bluetooth, it is often desirable to be compatible with all of the popular crystal frequencies, so restricting the crystal frequencies to multiples of 1 MHz is not always
an option. In such cases, an additional PLL capable of generating the 1 MHz reference signal with very little phase noise
from any of the crystal frequencies is required, or, as de-

Shift Register with 51


ones and 441 zeros

_r
y\\

Figure 3: A fractional-.^ PLL that generates non-integer multiples of the reference frequency, but has phase noise consisting of large spurious tones.

scribed in the next section, a singlefractional-TVPLL can be


used.

III. THE IDEA BEHIND AS FRACTIONAL-//PLLs


In this section, the example problem of generating the second Bluetooth channel frequency, 2.403 GHz, with a reference
frequency of 19.68 MHz is used as a vehicle with which to
explain the idea behind AE fractional-N PLLs. First, a pair of
"bad"fractional-TVPLLs are presented that achieve the desired
frequency but have poor phase noise performance. Then the
AEfractional-TVPLL technique is presented as a means of improving the phase noise performance.
The output frequency of an integer-N PLL with a reference
frequency of 19.68 MHz is 2.40096 GHz when the divider
modulus, N, is set to 122 and 2.42064 GHz when N is set to
123. The problem is that to achieve the desired frequency of
2.403 GHz, TV would have to be set to the non-integer value of
122 + 51/492. This cannot be implemented directly because
the divider modulus must be an integer value. However the
divider modulus can be updated each reference period, so one
option is to switch between N = 122 and N= 123 such that the
average modulus over many reference periods converges to
122 + 51/492. In this case, the resulting average PLL output
frequency is 2.403 GHz as desired. This is the fundamental
idea behind most fractional-// PLLs [4].
While dynamically switching the divider modulus solves
the problem of achieving non-integer multiples of the reference frequency, a price is paid in the form of increased phase
noise. During each reference period the difference between
the actual divider modulus and the average, i.e., ideal, divider
modulus represents error that gets injected into the PLL and
results in increased phase noise. As described below, the
amount by which the phase noise is increased depends upon
the characteristics of the sequence of divider moduli.
For example, in the fractional-JV PLL shown in Figure 3,
the divider modulus is set each reference period to 122 or 123
such that over each set of 492 consecutive reference periods it
is set to 122 a total of 441 times and 123 a total of 51 times.
Thus, the average modulus is 122-1- 51/492 as required. The
sequence of moduli is periodic with a period of 492, so it repeats at a rate of 40 kHz. Consequently, the difference between the actual divider moduli and their average is a periodic
sequence with a repeat rate of 40 kHz, so the resulting phase
noise is periodic and is comprised of spurious tones at integer
multiples of 40 kHz. Many of the spurious tones occur at low
frequencies, and they can be very large. Unfortunately, the

24

HOI-

19.68 MHz

Phase/
Freq.
Detector

Charge
Pump

Loop
Filter

rD>T
HDr-J

fvaT 2 - 4 0 3 G H z
(on average)

* Phase/
Freq.
r Detector

Charge
Pump

Loop
Filter

vco

2.403 GHz

19.68 MHz
122 +y\n\
-f- 122 +y[n)
Randomized Pulse
Density Modulator

fl with probability 51/492


10 with probability 441 /492

51/492

**U

Figure 4: A fractional-TV PLL that generates non-integer multiples of the reference frequency, but has a large amount of in-band phase noise.

only way to suppress the tones is have a very small PLL


bandwidth, which negates the potential benefit of the fractional-N technique.
One way to eliminate spurious tones is to introduce randomness to break up the periodicity in the sequence of moduli
while still achieving the desired average modulus. For example, as shown in Figure 4, a digital block can be used to generate a sequence, y[n], that approximates a sampled sequence of
independent random variables that take on values of 0 and 1
with probabilities 441/492 and 51/492, respectively. During
the n^ reference period the divider modulus is set to 122 +
y[n], so the sequence of moduli has the desired average yet its
power spectral density (PSD) is that of white noise. Thus,
instead of contributing spurious tones, the modified technique
introduces white noise. Unfortunately, the portion of the
white noise within the PLL's bandwidth is integrated by the
PLL transfer function, so the overall phase noise contribution
again can be significant unless the PLL bandwidth is small.
In each fractional-M PLL example presented above, the sequence, y[n], can be written as y[n] =x + em[n], where x is the
desired fractional part of the modulus, i.e., x = 51/492, and
em[n] is undesired zero-mean quantization noise caused by
using integer moduli in place of the ideal fractional value. In
the first example, em[n] is periodic and therefore consists of
spurious tones at multiples of 40 kHz. In the second example,
em[n] is white noise. Each PLL attenuates the portion of em[n]
outside its bandwidth, but the portion within its bandwidth is
not significantly attenuated. Unfortunately, in each example
em[n] contains significant power at low frequencies, so it contributes substantial phase noise unless the PLL bandwidth is
very low.
A AS fractional-N PLL avoids this problem by generating
the sequence of moduli such that the quantization noise has
most of its power in a frequency band well above the desired
bandwidth of the PLL [5], [6], [7]. An example AS fractionally PLL is shown in Figure 5. The PLL core is similar to those
of the previous fractional-N PLL examples, but in this case
y[n] is generated by a digital AS modulator. The details of
how the AS modulator works are presented in the next section,
but its purpose is to coarsely quantize its input sequence, x[n],
such that y[n] is integer-valued and has the form: y[n] = x[n 2] + em[ri], where em[n] is de-free quantization noise with most
of its power outside the PLL bandwidth. In this example, x[n]
consists of the desired fractional modulus value, 51/492, plus
a small, pseudo-random, 1-bit sequence. As described in the
next section, the pseudo-random sequence is necessary to
avoid spurious tones in the AS modulator's quantization noise,
but its amplitude is very small so it does not appreciably in-

25

Simulated PLL Phase Noise


500 kHz Loop Bandwidth

{0,217}

pseudo-random
bit sequence

-80
y[n
{-1,0,1,2}

100

g-120-140
" 16

I 50 kHz Loop Bandwidth \

-180

Figure 5: A A I fractional-A^ PLL example.

crease the phase noise of the PLL.


Also shown in Figure 5 are PSD plots of the output phase
noise arising from AS modulator quantization noise, em[n], in
two computer simulated versions of the example AS fractional-Af PLL, one with a 50 kHz loop bandwidth and the other
with a 500 kHz loop bandwidth. As shown in the next section,
the PSD of em[n] increases with frequency, so the phase noise
PSD corresponding to the 50 kHz bandwidth PLL is significantly smaller than that corresponding to the 500 kHz bandwidth PLL. For example, the former easily meets the requirements for a local oscillator in a direct conversion Bluetooth transceiver, but the latter falls short of the requirements
by at least 23 dB.
IV.

DELTA-SIGMA MODULATION OVERVIEW

As mentioned above, a digital AS modulator performs


coarse quantization in such a way that the inevitable error introduced by the quantization process, i.e., the quantization
noise, is attenuated in a specific frequency band of interest.
There are many different AS modulator architectures. Most
use coarse uniform quantizers to perform the quantization with
feedback around the quantizers to suppress the quantization
noise in particular frequency bands. Therefore, to illustrate
the AS modulator concept, first a specific uniform quantizer
example is considered in isolation, and then a specific AS
modulator architecture that incorporates the uniform quantizer
is presented.
A. An Example Uniform Quantizer
The input-output characteristic of the example uniform
quantizer is shown in Figure 6. It is a 9-level quantizer with
integer valued output levels. For each input value with a
magnitude less than 4.5, the quantizer generates the
corresponding output sample by rounding the input value to
the nearest integer. For each input value greater than 4.5 or
less than -4.5, the quantizer sets its output to 4 or - 4 , respectively; such values are said to overload the quantizer. By
defining the quantization noise as eg[n] = y[n]-r[n]9
the
quantizer can be viewed without approximation as an additive
noise source as illustrated in the figure.

-y

TI

43
2

9-Levei
Quantizer

1
-4 5 - 3 5 -2.5 -I 5

0.5 1.5 2.5 3.5 4 5

Delay

f]
\

Delay

/J

9-Level
Quantizer

Figure 8: A AX modulator example.

-Si

=y-r

0.5

can be used to circumvent this problem. The structure incorporates the same 9-level quantizer presented above, but in this
case the quantizer is preceded by two delaying discrete-time
integrators (i.e., accumulators), and surrounded by two feedback loops [8], [9]. Each discrete-time integrator has a transfer function of z~ 1 /(l-z~ 1 ) which implies that its * output
sample is the sum of all its input samples for times k < n.
With the quantizer represented as an additive noise source as
depicted in Figure 6, the AS modulator can be viewed as a
two-input, single-output, linear time-invariant, discrete-time
system. It is straightforward to verify that
y[n] = x[n-2] + em[n],
(1)
where em[n] is the overall quantization noise of the AS modulator and is given by
em[n] = eq[n]-2eq[n-l] + eq[n-2].
(2)

-0.5
M

No-ovcrload range"

Figure 6: A 9-level quantizer example.


48 kHz sinusoid plus white
noise (SNR = lOOdB)
sampled at 48 MHz

(a),(b)

9-level
Quantizer

I*

2
500

80

jj -

1500

2000

< -120

To illustrate the behavior of the AS modulator, suppose that


the same 48 Msample/s input sequence considered above is
applied to the input of the AS modulator, and that the discretetime integrators in the AS modulator are clocked at 48 MHz.
Figure 9(a) shows the PSD plot of the resulting AS modulator
output sequence, y[n], and Figure 9(b) shows a time domain
plot of y[n] over two periods of the sinusoid. Two important
differences with respect to the uniform quantization example
shown in Figure 7 are apparent: the quantization noise PSD is
significantly attenuated at low frequencies, and no spurious
tones are visible anywhere in the discrete-time spectrum. For
instance, the SNR in the zero to 500 kHz frequency band is
approximately 84 dB for this example as opposed to 14 dB for
the uniform quantization example of Figure 7. Consequently,
subjecting the AS modulator output sequence to a lowpass
filter with a cutoff frequency of 500 kHz results in a sequence
that is very nearly equal to the AS modulator input sequence
as demonstrated in Figure 9(c).
Below about 120 kHz, the PSD shown in Figure 9(a) is
dominated by the two components of the AS modulator input
sequence: the 48 kHz sinusoid component, and the input noise
component. Above 120 kHz, the PSD is dominated by the AS
modulator quantization noise, em[n], and rises with a slope of
40 dB per decade. It follows from (2) that em[n] can be
viewed as the result of passing the additive noise from the
quantizer, eq[n], through a discrete-time filter with transfer
function (1 - z"1 ) 2 . Since this filter has two zeros at dc, the
smooth 40 dB per decade increase of the PSD of em[n] indicates that eq[n] is very nearly white noise, at least for the example shown in Figure 9.
It can be proven that eq[n] is indeed white noise; it has a
variance of 1/12 and is uncorrelated with the AS modulator
input sequence [10]. Moreover, this situation holds in general
for the example AS modulator architecture provided that the

Ao
-2

-140
160

1000

(c)

|.100
II

-fc>

(b)

(a)

I-

Lowpass Filter
(BW = 500 kHz)

10*

10S

1O 9

107

Hz

500

1000

1500

time (units of 1/(48 MHz))

Figure 7: (a) A power spectral density plot of the quantizer output in dB,
relative to the quantization step-size of A = 1, per Hz, (b) a time domain plot
of the quantizer output, and (c) a time domain plot of the quantizer output
filtered by a sharp lowpass filter with a cutoff frequency of 500 kHz.

To illustrate some properties of the example quantizer,


consider a 48 Msample/s input sequence, x[n], consisting of a
48 kHz sinusoid with an amplitude of 1.7 plus a small amount
of white noise such that the input signal-to-noise ratio (SNR)
is 100 dB. Figure 7(a) shows the PSD plot of the resulting
quantizer output sequence, and Figure 7(b) shows a time domain plot of the quantizer output sequence over two periods of
the sinusoid. Given the coarseness of the quantization, it is
not surprising that the quantizer output sequence is not a precise representation of the quantizer input sequence. As evident in Figure 7(a), the quantization noise for this input sequence consists primarily of harmonic distortion as represented by the numerous spurious tones distributed over the
entire discrete-time frequency band. Even in the relatively
narrow frequency band below 500 kHz, significant harmonic
distortion corrupts the desired signal. To illustrate this in the
time domain, Figure 7(c) shows the sequence obtained by
passing the quantizer output sequence through a sharp lowpass
discrete-time filter with a cutoff frequency of 500 kHz. The
significant quantization noise power in the zero to 500 kHz
frequency band causes the sequence shown in Figure 7(c) to
deviate significantly from the sinusoidal quantizer input sequence.
B. An Example AS Modulator
The example AS modulator architecture shown in Figure 8

26

48 kHz sinusoid plus white


noise (SNR = lOOdB)
sampled at 48 MHz

Second-Order
AI Modulator

(a),(b)

Lowpass Filter
(BW = 500 kHz)

Phase/
Freq.
Detector

-fc>

Charge
Pump

vco
'*>

*<

c,

^')

'm

Ci

(a)

(b),

J .0

Loop Filter

tQ

f:

+ N + y\n\
V

y\n\-

\
0

.100
II

SO0

1OO0

1500

2000

ref-

div-

(C)
2

<-120

-2
10 4

10 s

10 e

Hz

10T

500

1000

1500

2000

'

*-'.

Figure 10: The AL fractional-N PLL with the details of a commonly used
loop filter and a timing diagram relating to the charge pump output.

time (units of 1/(48 MHz))

-180

thereby more aggressively suppressing quantization noise in


particular frequency bands relative to the example secondorder AS modulator. Some of these higher-order AS modulators incorporate a higher than second-order loop filter (e.g.,
input sequence satisfies two conditions: 1) its magnitude is more than two discrete-time integrators) and a single quantizer
sufficiently small that the quantizer within the AS modulator surrounded by one or more feedback loops [15], [16]. In
never overloads, and 2) it consists of a signal component plus many cases, these AS modulators are designed specifically to
a small amount of independent white noise. It can be shown allow one-bit quantization [7], [17], [18]. This simplifies the
that the first condition is satisfied if the input signal is design of the divider in that only two moduli are required, but
bounded in magnitude by 3 A where A is the step-size of the such AS modulators tend to have spurious tones in their quanquantizer (for this example, A = 1) [11]. Input sequences with tization noise that cannot be completely suppressed even with
values even slightly exceeding 3 A in magnitude generally elaborate dithering techniques. Others of these higher-order
cause the quantizer to overload with the result that eq[n] con- AS modulators, often referred to as MASH, cascaded, or multains spurious tones and the SNR in the frequency band of tistage AS modulators, are comprised of multiple lower-order
interest is degraded. For this reason, the range between -3A AS modulators, such as the second-order AS modulator preand 3 A is said to be the input no-overload range of the AS sented above, cascaded to obtain the equivalent of a single
modulator. For the second condition to be satisfied, the power higher-order AS modulator [5], [19], [20].
of the AS modulator input sequence's white noise component
may be arbitrarily small, but if it is absent altogether, eq[n], is
V. AS FRACTIONAL-N PLL DYNAMICS
not guaranteed to be white. For instance, in the example
shown in Figure 9 the input sequence contains a white noise
A AS fractional-TV PLL linearized model is derived in this
component with 100dB less power than the signal component. section in the form of a block diagram that describes the outIf this tiny noise component were not present, the resulting AS put phase noise in terms of the component parameters and
modulator output PSD would contain numerous spurious noise sources in the PLL. As in the case of an integer-Af PLL
tones. Since the AS modulators used in AD fractional-iV PLLs the model provides an accurate tool with which to predict the
are all-digital devices, the noise must be added digitally. As total phase noise, bandwidth, and stability of the PLL.
shown in [12], it is sufficient to add a 1-bit, sub-LSB, independent, white noise dither sequence with zero mean at the A. Derivation of a AS fractional-N PLL L inearized Model
input node. In practice, a 1-bit pseudo-random dither seIn PLL analyses it is common to assume that each periodic
quence is typically used in place of a truly random dither se- signal within the PLL has the form v(t) = A(t) sin(wt + 0(t)\
quence. Such a sequence can be generated easily using a lin- where A(t) is a positive amplitude function, co is a constant
ear feedback shift register, and has the desired result with re- center frequency in radians/sec, and 6{f) is zero-mean phase
spect to the quantization noise despite not being truly random noise in radians. In most cases of interest for PLL analysis,
[13], [14].
the amplitude is well modeled as a constant value, and the
phase noise is very small relative to TC with a bandwidth that is
C. Other AS Modulator Options
much lower than the center frequency. Solving for the time of
To this point, the AS modulation concept has been illus- the w* positive-going zero crossing, yn, of v(t) gives yn = [n trated via the particular example AS modulator architecture 0(yn)/(2n)]'T, where T = litlco is the period of the signal.
shown in Figure 8, namely a second-order multi-bit AS modu- Therefore, the sequence, yn, is a sampled version of the phase
lator. While this type of AS modulator is widely used in AS noise with very little aliasing, so knowing the sequence and T
fractional-N PLLs, there exist other types of AS modulators is approximately equivalent to knowing the phase noise. This
that can be applied to AS fractional-N PLLs. Most of the approximation is made throughout the following analysis.
The relationship between the charge pump output current
other architectures are higher-order AS modulators that perform higher than second-order quantization noise shaping, and the PFD input signals is shown in Figure 10. Ideally, dur-

Figure 9: (a) A power spectral density plot of the A I modulator output in dB,
relative to the quantization step-size of A = 1, per Hz, (b) a time domain plot
of the AE modulator output, and (c) a time domain plot of the A I modulator
output filtered by a sharp lowpass filter with a cutoff frequency of 500 kHz.

27

ing the w* reference period the charge pump output is a current pulse of amplitude / or - / and duration \tn - rn\, where /
and rn are the times of the charge pump output transitions triggered by the positive-going edges of the divider output and
reference signal, respectively. Therefore, the average current
sourced or sunk by the charge pump during the * reference
period is /(/ - rn)ITnf. In practice, the PFD is usually designed such that, except for a possible constant offset, this
result holds even though the current sources have finite rise
Figure 11: The A fractional-N PLL linearized model. Except for the shaded
and fall times [2].
region the model is identical to the corresponding integer-Af PLL model.
The first step in deriving the model is to develop an expression for / xn. Ideally, rn = nTKf, but phase noise intro- tions in (5) can be neglected, and the charge pump output can
duced by the reference source and PFD cause it to have the be modeled as a smoothly varying function of time with an
average value over each reference period equal to that of (5).
form
With these approximations, (5) implies that
T = nT
' "m(j)-Kcoum{t)-kycoUVc^t)-vml)dt~^l
lE
where 6reJ(j) and OPFDO) a r e the reference source and PFD
LO) = I

cp
phase noise functions, respectively. If the VCO output were
N+a
ideal its positive-going edges would be spaced at uniform in(6)
1
tervals of Tnf I (N + a), where a is the fractional part of the
0
^,v(') , ~AO , &pFD(t)
modulus (e.g., a = 51/492 in Figure 5). Therefore, ideally,
2n + 2n + 2n

"

* "fj[MO+*m>(O]

(3)

^v^i{N+m'

but in practice it deviates because of VCO phase noise,


OyccAi), divider phase noise, #</,v(*X and instantaneous deviations of the VCO control voltage from its ideal average value
of
v^, =(N+a)l(TrefKyco),
where Kvco is the VCO gain in
units of Hz/Volt. As a result,

<.-iyg<"+**j)-^-(v-<<>-v,)*-^]
T

2n e*.(O,
which reduces to

4 ^ ^

#/(*)
where

tn=nTnf +^\y\(yW-<*)

where ujj) is the result of discrete-time integrating and converting to continuous-time the quantity, y[n] a.
The AL fractional-N PLL linearized model follows directly
from (6) and Figure 10. It is shown in Figure 11, where in(t)
represents the noise contributed by the charge pump current
sources and the loop filter, and z//s) is the transfer function of
the loop filter. The model specifies the phase noise transfer
functions and loop dynamics of the PLL. For example, the
model implies that

N + alh

+ aJ ) ^ - ,

T(s)=

'

and ^ l l

l + T(s)

eYCO{s)

vc

y Y

_L_

l + T(s)

(7)

(8)

-Kco f ( v ^ C ) - ^ ) * - ^ ] (4)is the loop gain of the PLL.

For the loop filter shown in Figure 10, the transfer function is
1
l + sRCt
z (J,
(9)
"
C^+C2 s[\ +sRClC2 /(C, + C 2 )]'

ref

0*(O-

2n
Subtracting (3) from (4) yields an expression for the average
current sourced or sunk by the charge pump during the * B. Differences Between the AS Fractional-N and Integer-N
reference period:
PLL Models
I(tn-Tn)/Tre/ =
The shaded region in Figure 11 indicates the part of the
model that is specific to AZ fractional-TV PLLs; except for the
shaded region the model is identical to the corresponding
_ _
(5)
Il
(5)
model
for integer-iV PLLs. Therefore, each phase noise transN+a
fer function in an integer-iVTLL is identical to the corresponding phase noise transfer function in a AZfractional-.AfPLL,
except
every occurrence of TV in the former is replaced by N+a
g,C.) , M O , <WO
in
the
latter.
In most cases, N \ and a<\, so
N+a~N
2n
In ' In
and the corresponding transfer functions in integer-N and AE
As mentioned above, the phase noise terms are assumed to fractional-AT PLLs are nearly identical in practice. Similarly,
have bandwidths that are much smaller than the reference fre- the loop dynamics and stability issues are nearly the same in
quency. Consequently, the sampling of the phase noise func- ASfractional-TVPLLs and integer-TV PLLs.

SUtj-^-^fMo-v,,)*-^

28

^ ^ /

L|QI I pi Debtor H

pum

pM

Fiiter

margin, and AS modulator quantization noise suppression.


The process is demonstrated below for the AS fractional-N
PLL presented in Section III to generate the local oscillator
frequencies in a direct conversion Bluetooth wireless LAN
transceiver. The PLL is shown in Figure 12 with additional
detail regarding the frequency plan. As described previously,
the desired output frequencies are/pco = 2.402 GHz + k MHz
for k = 0, ..., 78, and the crystal reference frequency is 19.68
MHz. Each of the 79 possible output frequencies is chosen by
selecting m and N as indicated in the figure. In each case, the
divider modulus is restricted to the set of four integers {N- 1,
N,N+ 1, N + 2}. The combinations of m and N were chosen
to achieve the desired output frequencies yet keep the signals
at the input of the AS modulator sufficiently small so as not to
overload the AS modulator [11].
Typical requirements for such a PLL are that the loop
bandwidth must be greater than 40 kHz, the phase margin
must be greater than 60, and the PLL phase noise be less than
-120 dBc/Hz at offsets from the carrier of 3 MHz and above.
Assume that the VCO, divider, PFD, and charge pump circuits
have been designed such that the overall PLL phase noise
specification can be met provided the phase noise contributed
by the AS modulator and loop filter are each less than -130
dBc/Hz at offsets from the carrier of 3 MHz and above. Furthermore, assume that the VCO and charge pump circuits are
such that Kvco and / are 200 MHz/V and 200 uA, respectively,
and that the loop filter has the form shown in Figure 10. Thus,
the remaining design task is to choose the loop filter components such that the bandwidth, phase margin, and phase noise
specifications are met.
The PLL phase margin, bandwidth, and phase noise arising
from AS modulator quantization noise can be derived from the
linearized model equations, (7) through (10). While this can
be done directly, it involves the solution of third order equations which can be messy. Alternatively, approximate solutions of the equations can be derived that provide better intuition [21]. A particularly convenient set of approximate solutions are

= 2.402 GHz

r V T r

19.68 MHz
| N+y\n\

2nd-Order~|
I
m/492*<>-*> Digital A
*
T
1 Modulator | y\n\ = {-1, 0,1, 2}
{0,2~i7} pseudo-random bit sequence
Frequency Plan:
Toget* = 0, I , . . . , o r l 8 :
To get* = 19, 2 1 , . . . , or 38:
To get * = 39, 4 1 , . . . , or 57:
To get k = 58,60,..., or 79:

set N=
set N=
settf=
set N=

122, m = k-25 + 26
123, m = (k- 19)25 + 9
124, m = (*-39)-25 + 17
125, m = (k- 58)25

Figure 12: The example A I fractional-N PLL and frequency plan for generation of the Bluetooth wireless LAN RF channel frequencies.

The primary difference between the AS fractional-^ and


integer-TV PLL models is the signal path corresponding to the
AX modulator shown in the shaded region of Figure 11. The
sequence, y[n] - a, consists of AS modulator quantization
noise, e m [], which, as described previously, gives rise to
phase error in the PLL output. For the example second-order
AS modulator it follows from the results presented in Section
IV and the AS fractional-AT PLL model equations presented
above that the PLL phase noise component resulting from
em[n] has a PSD given by

l0.J-*J2.J*tf

_j_Au(W] dBc/Hz

(10)
The argument of the log function has the form of a highpass
function times a lowpass function, which is consistent with the
claim in Section III that the PLL lowpass filters the primarily
high frequency quantization noise from the AS modulator. It
follows from (10) that the phase noise resulting from em[n] can
be decreased by reducing the PLL bandwidth or increasing the
reference frequency. If a higher-order AS modulator is used,
an equation similar to (10) results except that the exponent of
the sinusoid is greater than two. This reduces the in-band portion of the quantization noise, but increases the out-of-band
portion, which, depending upon the loop parameters of the
PLL, can result in a somewhat lower overall phase noise.
However, the PLL loop filter is highly constrained to maintain
PLL stability, so the phase noise reduction that can be
achieved by increasing the order of the AS modulator is limited in most applications [16].

=tan

iH)'

_ IKycoR b-\
~ 2nN " b '

hw

(11)

.
(

XJBW

and

S0 (/)|

C. A System Design Example

l 0 .logf^-sin 2 Mf^T] dBc/Hz,

(14)
where PM is the phase margin of the PLL, fBw is the 3 dB
bandwidth of the PLL, and b = 1 + C2IC\ is a measure of the
separation between the two loop filter capacitors [22]. The
derivations assume that b is greater than about 10, and (14) is
valid for frequencies greater than (C2+Ci)/(2^RC2Ci).
These equations are sufficient to determine appropriate
loop filter component values. For example, suppose b is set to

The PLL bandwidth and the phase margin both depend


upon the loop gain, 7\s)9 which, for the loop filter shown in
Figure 10, depends upon the parameters fn/, N, /, KVco, ^> C\9
and C2. Usually,/*,/and //are dictated by the application, and
/ and Kyco are, at least partially, dictated by circuit design
choices. This leaves the loop filter components as the main
variables with which to set the desired PLL bandwidth, phase

29

tion delay depends upon the divider modulus and the number
of AI modulator output levels is greater than two, the effect is
-80
that of a hard non-linearity applied to the AI modulator quantization noise. This tends to fold out-of-band AI modulator
-100
quantization noise to low frequencies and introduce spurious
-120
tones,
which can significantly increase the PLL phase noise.
N
The problem is analogous to that of multi-bit digital-to-analog
-140
O
converter step-size mismatches in analog AI data converters
CQ
o
[23]. Unfortunately, circuit simulations are required to evalu-160
ate the severity of the problem on a case by case basis as both
-180
the extent of any modulus-dependent delays and their affect
on the PLL phase noise are difficult to predict using hand
-200
analysis.
-220
There are two well-known solutions to this problem. One
105
106u
10?
108
solution is to resynchronize the divider output to the nearest
Hz
VCO edge or at least a higher-frequency edge obtained from
FigureB: Simulated and calculated PSD plots of the phase noise arising from
within
the divider circuitry [22], [24]. The ^synchronization
A I modulator quantization noise for the example A I fractional-N PLL.
erases memory of modulus-dependent delays and noise intro49, so, as indicated by (11), the phase margin is approximately duced within the divider circuitry, but care must be taken to
70. Solving (14) with the phase noise set to -130 dBc/Hz a t / ensure that the signal used for resynchronization is itself free
= 3 MHz indicates t h a t y ^ ~ 50 kHz. Therefore, the phase of modulus dependent delays. The primary drawback of the
noise resulting from AI modulator quantization noise is suffi- approach is that it increases power consumption.
ciently suppressed with a 50 kHz bandwidth and a phase marThe other solution is to use a AI modulator with single-bit
gin of 70. With this information (12) can be solved to find R (i.e., two level) quantization. In this case, modulus-dependent
= 960 Q. with which (13) and the definition of b can be used to delays give rise to phase error at the output of the divider that
calculate C2 = 23 nF and C\ = 480 pF. It is straightforward to consists of a constant offset plus a scaled version of the AI
verify that the phase noise introduced by the loop filter resistor modulator quantization noise. Since, by design, the AI modu(the only noise source in the loop filter) is well below -130 lator quantization noise has most of its power outside the PLL
dBc/Hz at offsets from the carrier of 3 MHz and above as re- bandwidth, the modulus-dependent delays increase the phase
quired.
noise only slightly. Unfortunately, AI modulators with singleFigure 13 shows PSD plots of the phase noise arising from bit quantization tend not to perform as well as AI modulators
AI modulator quantization noise for the example PLL with the with multi-bit (i.e., more than two-level) quantization. For
loop filter component values derived above. The heavy curve example, if the 9-level quantizer in the 48 Msample/s AI
was calculated directly from the linearized model equations modulator example presented in Section IV were replaced by
(7) through (10). The light curve was obtained through a a one-bit quantizer, the dynamic range of the AI modulator in
behavioral computer simulation of the PLL. As is evident the zero to 500 kHz band would be reduced from 88.5 dB to
from the figure, the two curves agree very well which suggests approximately 65 dB. Moreover, unlike the 9-level quantizer
that the approximations made in obtaining the linearized case, the additive noise from the single-bit quantizer would
not be white and would be correlated with the input sequence.
model are reasonable.
An effect that does not have a counterpart in integer-Af Its variance would be input dependent and it would contain
PLLs is the presence of zeros in the PSD of the phase noise spurious tones.
arising from AI modulator quantization noise at multiples of
These problems can be mitigated by using a higher-order
the reference frequency. These zeros are a result of the dis- AI modulator architecture to more aggressively suppress the
crete-to-continuous-time conversion of the AD modulator in-band portion of the additive noise from the two-level quanquantization noise; each zero is a sampling image of the dc tizer. However, to maintain stability in a higher-order AI
zero imposed on the quantization noise by the AI modulator.
modulator with single-bit quantization, the useful input range
of the AI modulator input signal must be reduced and more
poles
and zeros must be introduced within the feedback loop
VI. AS FRACTIONAL-TV PLL SPECIFIC PROBLEMS
as compared to a multi-bit design with a comparable dynamic
One of the most significant problems specific to AI frac- range. Even then, the problem of spurious tones persists, and
tional-AT PLLs is that they can be sensitive to modulus- it is difficult to predict where they will appear except through
dependent divider delays. In practice, each positive-going extensive simulation. Furthermore, to compensate for the
divider edge is separated from the VCO edge that triggered it restricted input range of the AI modulator the reference freby a propagation delay. Ideally, this propagation delay is in- quency must be large enough that all of the desired PLL outdependent of the corresponding divider modulus, in which put frequencies can be achieved. This can severely limit decase it introduces a constant phase offset but does not other- sign flexibility. For example, if the magnitude of the AI
wise contribute to the phase noise. However, if the propaga- modulator input signal were limited to less than 0.5 in the case
-60

"Exact" simulation
Linearized Model

30

of the Bluetooth local oscillator application considered above,


the reference frequency would have to be greater than 79
MHz. Otherwise, it would not be possible to generate all the
Bluetooth channel frequencies.
Another issue specific to AI fractional-TV PLLs is that
modulus switching increases the average duration over which
the charge pump current sources are turned on each period
relative to integer-TV PLLs. For comparison, consider a AI
fractional-TV PLL and an integer-TV PLL with the same TV
(where TV a), the samey^/, and identical loop components.
It follows from (5) that

(15)
The last term in (15), which is caused by having the AI modulator switch the divider modulus, represents a significant increase in the time during which the charge pump current
sources are turned on each reference period. Consequently,
the phase noise arising just from charge pump current source
noise is larger in the AIfractional-TVPLL by
T Averagefractional-TVPLL charge pump "on time"!
L Average integer-TV PLL charge pump "on time" J
where A is a constant between 10 and 20. The value of A depends upon the autocorrelation of the charge pump current
source noise. For example, if the current source noise in successive charge pump pulses is completely uncorrelated, then A
is 10. Near the other extreme, A is close to 20.
VII. TECHNIQUES TO WIDEN AE FRACTIONAL-N
PLL LOOP BANDWIDTHS
A transmitter with virtually any modulation format can be
implemented using D/A conversion to generate analog baseband or IF signals and upconversion to generate the final RF
signal. However, many of the commonly used modulation
formats in wireless communication systems such as MSK and
FSK involve only frequency or phase modulation of a single
carrier [25]. In such cases, the transmitted signal can be generated by modulating a radio frequency (RF) VCO, thereby
eliminating the need for conventional upconversion stages and
much of the attendant analog filtering. At least two approaches have been successfully implemented in commercial
wireless transmitters to date. One is based on open-loop VCO
modulation, and the other is based on AI fractional-TV synthesis.
An example of a commercial transmitter that uses the
open-loop VCO modulation technique is presented in [26] and
[27], in this case for a DECT cordless telephone. Between
transmit bursts, the desired center frequency is set relative to a
reference frequency by enclosing the VCO within a conventional PLL. During each transmit burst the VCO is switched
out of the PLL and the desired frequency modulation is applied directly to its input. The primary limitation of the approach is that it tends to be highly sensitive to noise and interference from other circuits. For example, in [27], the required
level of isolation precluded the implementation of a single-

chip transmitter. Furthermore, the modulation index of the


transmitted signal depends upon the absolute tolerances of the
VCO components which are often difficult to control in lowcost VLSI technologies and can also drift rapidly over time.
In principle, AI fractional-TV PLLs can avoid these problems by modulating the VCO within the PLL. This can be
done by driving the input of the digital AS modulator with the
desired frequency modulation of the transmitted signal. The
primary limitation is that bandwidth of the PLL must be narrow enough that the quantization noise from the AI modulator
is sufficiently attenuated, but sufficiently high to allow for the
modulation. For instance, the phase noise PSD of the example
ADfractional-TVPLL shown in Figure 5 with a 50 kHz loop
bandwidth meets the necessary phase noise specifications
when used as a local oscillator in a conventional upconversion
stage within a Bluetooth wireless LAN transmitter. However,
if the Bluetooth transmitter is to be implemented by modulating the VCO through the digital AI modulator, then the loop
bandwidth of the PLL must be approximately 500 kHz. Unfortunately, when the loop bandwidth of the fractional-// PLL
shown in Figure 5 is widened to 500 kHz, the resulting phase
noise becomes too large to meet the Bluetooth transmit requirements.
Nevertheless, commercial transmitters with VCO modulation through A I fractional-TV synthesizers are beginning to be
deployed, especially in low-performance, low-cost wireless
systems such as Bluetooth wireless LANs [28]. Facilitating
this trend are various solutions that have been devised in recent years to allow for wideband VCO modulation in AI fractional-TV PLLs without incurring the phase noise penalty mentioned above. One of the solutions is to keep the loop bandwidth relatively low, but pre-emphasize (i.e., highpass filter)
the digital phase modulation signal prior to the digital AI
modulator [29]. Unfortunately, this approach requires the
highpass response of the digital pre-emphasis filter to be a
reasonably close match to the inverse of the closed-loop filtering imposed by the largely analog PLL. Another of the solutions is to use a high-order loop filter in the PLL with a sharp
lowpass response [30]. Increasing the order of the loop filter
increases the attenuation of out-of-band quantization noise
which allows for higher-order AI modulation to reduce inband quantization noise thereby allowing the loop bandwidth
to be increased without increasing the total phase noise.
However, as described in [30], this necessitates the use of a
Type 1 PLL which significantly complicates the design of the
phase detector. Yet another solution is to use a narrow loop
bandwidth but modulate the VCO both through the digital AI
modulator and through an auxiliary modulation port at the
VCO input [28]. The idea is to apply the low-frequency
modulation components at the AI modulator input and the
high frequency modulation components directly to the VCO.
Again, matching is an issue, but it has proven to be manageable at least for low-end applications such as Bluetooth transceivers.

VIII. CONCLUSION
The additional concepts and issues associated with AI

31

fractional-^ PLLs for frequency synthesis relative to integer-Af


PLLs have been presented. It has been shown that AI fractionak/V PLLs provide tuning resolution limited only by digital logic complexity, and, in contrast to integer-^ PLLs, increased tuning resolution does not come at the expense of reduced bandwidth. Since one of the main innovations in a AS
fractional-^ PLL is the use of a AE modulator to control the
divider modulus, the relevant concepts underlying AI modulation have been described in detail. A linearized model has
been derived from first principles and a design example has
been presented to illustrate how the model is used in practice.
Techniques for wideband digital modulation of the VCO
within a delta-sigma fractional-TV PLL have also been presented.

ory, vol. 38, no.3, pp.1015-1028, May 1992.


12.

I. Galton, "One-bit dithering in delta-sigma modulatorbased D/A conversion," Proc. of the IEEE International
Symposium on Circuits and Systems, 1993.

13.

S. W. Golomb, Shift Register Sequences. Laguna Hills,


CA: Aegean Park Press, 1982

14. E. J. McCluskey, Logic Design Principles. Englewood


Cliffs, NJ: Prentice-Hall, 1986.
15.

16. W. Rhee, B. S. Song, A. AH, "A 1.1-GHz CMOS fractional-N frequency synthesizer with a 3-b third-order AI
modulator," IEEE Journal of Solid-State Circuits, vol.
35, no. 10 , pp. 1453-1460, October 2000.

ACKNOWLEDGEMENTS

17. W. L. Lee, C. G. Sodini, "A topology for higher order


interpolative coders," Proceedings of the 1987 IEEE International Symposium on Circuits and Systems, vol. 2,
pp.459-462, May 1987.

The author is grateful to Sudhakar Pamarti, Eric Siragusa,


and Ashok Swaminathan for their helpful discussions and advice regarding this paper.

18. K. C.-H. Chao, S. Nadeem, W. L. Lee, C. G. Sodini, "A


higher order topology for interpolative modulators for
oversampling A/D converters," IEEE Transactions on
Circuits and Systems, vol. 37, no.3, p.309-318, March
1990.

REFERENCES
1.

P. M. Gardner, "Charge-pump phase-lock loops," IEEE


Transactions on Communications, vol. COM-28, pp.
1849-1858, November 1980.

2.

B. Razavi, Design of Analog CMOS Integrated Circuits,


McGraw Hill, 2001.

3.

Bluetooth Wireless LAN Specification, Version 1.0,


2000.

4.

U. L. Rohde, Microvave and Wireless Synthesizers Theory and Design, John Wiley & Sons, 1997.

5.

B. Miller, B. Conley, "A multiple modulator fractional


divider," Annual IEEE Symposium on Frequency Control, vol. 44, pp. 559-568, March 1990.

6.

B. Miller, B. Conley, "A multiple modulator fractional


divider," IEEE Transactions on Instrumentation and
Measurement, vol. 40, no. 3, pp. 578-583, June 1991.

7.

T. A. Riley, M. A. Copeland, T. A. Kwasniewski,


"Delta-sigma modulation in fractional-N frequency synthesis," IEEE Journal of Solid-State Circuits, vol. 28, no.
5, pp. 553-559, May, 1993.

8.

S. K. Tewksbury, R. W. Hallock, "Oversampled, linear


predictive and noise-shaping coders of order N >1,"
IEEE Transactions on Circuits and Systems, vol. CAS25, pp. 436-447, July 1978.

9.

G. Lainey, R. Saintlaurens, P. Serin, "Switched-capacitor


second-order noise-shaping coder," IEE Electronics Letters, vol. 19, pp. 149-150, February 1983.

10. I. Galton, "Granular quantization noise in a class of


delta-sigma modulators," IEEE Transactions on Information Theory, vol. 40, no. 3, pp. 848-859, May 1994.

S. K. Tewksbury, R. W. Hallock, "Oversampled, linear


predictive and noise-shaping coders of order N >1,"
IEEE Transactions on Circuits and Systems, vol. CAS25, pp. 436-447, July 1978.

19. Y. Matsuya, K. Uchimura, A. Iwata, T. Kobayashi, M.


Ishikawa, T. Yoshitome, "A 16-bit oversampling A-to-D
conversion technology using triple integration noise
shaping," IEEE Journal of Solid-State Circuits, vol. SC22, pp. 921-929, December 1987.
20.

K. Uchimura, T. Hayashi, T. Kimura, A. Iwata, "Oversampling A-to-D and D-to-A converters with multistage
noise shaping modulators," IEEE Transactions on
Acoustics, Speech, and Signal Processing, vol. AASP36, pp. 1899-1905, December 1988.

21.

J. Craninckx, M. S. J. Steyaert, "A fully integrated


CMOS DCS-1800 frequency synthesizer," IEEE Journal
of Solid-State Circuits, vol. 33, pp. 2054=2065, December 1998.

22.

S. Pamarti, "Techniques for Wideband Fractional-Af


Phase-Locked Loops," PhD Dissertation, University of
California, San Diego, 2003.

23.

S. R. Norsworthy, R. Schreier, G. C. Temes, Eds. DeltaSigmaData Converters, Theory, Design, and Simulation,
New York: IEEE Press, 1997.

24.

L. Lin, L. Tee, P. R. Gray, "A 1.4 GHz differential lownoise CMOS frequency synthesizer using a wideband
PLL architecture", IEEE ISSCC Digest of Technical Papers, pp. 204-205, Feb. 2000.

25.

J. G. Proakis, Digital
McGraw Hill, 2000.

26.

S. Heinen, S. Beyer, J. Fenk, "A 3.0 V 2 GHz transmitter


IC for digital radio communication with integrated
VCO's," Digest of Technical Papers, IEEE International
Solid-State Circuits Conference, vol. 38, pp. 150-151,

11. N. He, F. Kuhlmann, A. Buzo, "Multiloop sigma-delta


quantization," IEEE Transactions on Information The-

32

Communications,

fourth ed.,

Feb. 1995.
27. S. Heinen, K. Hadjizada, U. Matter, W. Geppert, V.
Thomas, S. Weber, S. Beyer, J. Fenk, E. Matshke, "A 2.7
V 2.5 GHz bipolar chipset for digital wireless communication," Digest of Technical Papers, IEEE International
Solid-State Circuits Conference, vol. 40, pp. 306-307,
Feb. 1997.
28. N. Filiol, et. al., "A 22 mW Bluetooth RF transceiver
with direct RF modulation and on-chip IF filtering," Digest of Technical Papers, IEEE International Solid-State
Circuits Conference, vol. 43, pp. 202-203, Feb. 2001.
29. M. H. Perrott, T. L. Tewksbury III, C. G. Sodini, "A 27Ian Galton received the Sc.B. degree from Brown
University in 1984, and the M.S. and Ph.D. degrees
from the California Institute of Technology in 1989
and 1992, respectively, all in electrical engineering.
Since 1996 he has been a professor of electrical
engineering at the University of California, San
Diego where he teaches and conducts research in the
field of mixed-signal integrated circuits and systems
for communications. Prior to 1996 he was with UC
Irvine, the NASA Jet Propulsion Laboratory, Acuson, and Mead Data Central. His research involves the invention, analysis,
and integrated circuit implementation of key communication system blocks
such as data converters, frequency synthesizers, and clock recovery systems.
The emphasis of his research is on the development of digital signal processing techniques to mitigate the effects of non-ideal analog circuit behavior with
the objective of generating enabling technology for highly integrated, lowcost, communication systems. In addition to his academic research, he regularly consults at several communications and semiconductor companies and
teaches portions of various industry-oriented short courses on the design of
data converters, PLLs, and wireless transceivers. He has served on a corporate Board of Directors and several corporate Technical Advisory Boards, and
his is the Editor-in-Chief of the IEEE Transactions on Circuits and Systems II:
Analog and Digital Signal Processing.

33

mW CMOS fractional-N synthesizer using digital compensation for 2.5-Mb/s GFSK modulation," IEEE Journal of Solid-State Circuits, vol. 32, no. 12, pp. 20482059, Dec. 1997.
30. S. Willingham, M. Perrott, B. Setterberg, A. Grzegorek,
B. McFarland, "An integrated 2.5GHz LA frequency
synthesizer with 5 ns settling and 2Mb/s closed loop
modulation," Digest of Technical Papers, IEEE International Solid-State Circuits Conference, vol. 43, pp. 200201, Feb. 2000.

Designing Bang-Bang PLLs for Clock and Data


Recovery in Serial Data Transmission Systems
Richard C. Walker
coaxial delay lines for setting the timing of the recovered sampling
clock with respect to the data eye [1].

Abstract - Clock recovery using phase-locked loops (PLL)


with binary (bang-bang) or ternary-quantized phase detectors
has become increasingly common starting with the advent of
fully monolithic clock and data recovery (CDR) Circuits in the
late 1980's. Bang-bang CDR circuits have the unique advantages of inherent sampling phase alignment, adaptability to
multi-phase sampling structures, and operation at the highest
speed at which a process can make a working flip-flop. This
paper gives insight into the behavior of the nonlinear bangbang PLL loop dynamics, giving approximate equations for
loop jitter, recovered clock spectrum, and jitter tracking performance as a function of various design parameters. A novel
analysis shows that the bang-bang loop output jitter grows as
the square-root of the input jitter as contrasted with the linear
dependence of the linear PLL.

Early monolithic CDR designs imitated these discrete block


diagrams. The propagation delay differences between data and
clock paths could be ignored as long as the gate delay skew was a
negligible fraction of the total bit time, or unit interval. The need
for higher link speeds grew faster than Moore's law, and as clock
frequencies approached the effective fT of the active devices, it
became increasingly difficult to maintain an optimum sampling
phase alignment between the recovered clock and the data over
process, temperature, data-rate, and voltage variations.
A second problem was that most linear phase detectors produced narrow pulses with widths proportional to the phase error
between the timing of the data and the clock [2], [3]. These narrow
pulses required a process speed in excess of that required to simply sample data at a given rate. The timing skew and speed of linear phase detector circuits then became the limiting factor for
aggressive designs.

I. INTRODUCTION
Prior to the advent of fully monolithic designs, clock recovery
was traditionally performed with some variant of the circuit in Fig.
1. The clock frequency component was typically extracted from
Variable Delay Block 4
nput NRZ Data

Both these difficulties are eliminated by a family of circuits


which simultaneously retime data and measure phase error by
using matched flip-flops to sample both the middle of each data bit
and the transitions between the data bits. Fig. 2 shows such an

Retiming Latch

D Q

Retimed Data

/\

d
dt

X2

pulse
conditioning

samples of master transitions-,

BPF
or
PLL

Recovered Clock

samples of all transitions


Input
Data

frequency
extraction

D Q
X
A

Y
jt

Fig. 1. Traditional non-monolithic clock and data recovery architecture.

t D Q

2
divide by 20

V
Y
D Q Retimed Data

the data stream using some combination of differentiation, rectification and filtering. The bandpass frequency filtering was provided by LC tank, surface acoustic wave (SAW) filter, dielectric
resonator or PLL. Because the clock recovery path was separate
from the data retiming path, it was difficult to maintain optimum
sampling phase alignment over process, temperature, data-rate,
and voltage variations. Even the PLL techniques had the drawback
of using phase detectors with different set-up times than the retiming flip-flop so that the recovered clock was not intrinsically
aligned to the optimum sampling point in the data eye. Circuits
utilizing SAW resonator filtering typically required hand matching
of SAW and circuit temperature coefficients along with custom cut

vco
loop
filter

Fig. 2. A simple bang-bang loop using a flip-flop for a phase detector


to lock onto a data stream with a guaranteed "0" to " 1 " transition
every 20 bits.
early gigabit-rate monolithic example of such a circuit [4] which
samples data with two matched flip-flops. Flip-flop "Y" samples
the middle of each data bit on the rising edge of the VCO clock to
produce retimed data, while flip-flop "X" samples the transition of
each bit using the falling edge of the VCO clock.
The loop is designed to use the 16B/20B line code of Fig. 3
which guarantees a " 0 1 " "master transition" every 20 bits. The
divide by 20 circuit and associated flip-flop in Fig. 2 discard every

R. Walker is with Agilent Laboratories, 3500 Deer Creek Road, MS


26-U4, Palo Alto CA 94304. (e-mail: rick_walker@labs.agilent.com).

34

0.8

training sequence
16 data

data

means: 16 data
means: 16 data
means: Training Sequence
means: Control Word

0.6

:
:

:
:

[38](8x)^j|
[SOpx) 7 :

0.4

g - { 3 5 J ;

T> 2 9 1

:
[361 (4x)
#

J30J(4x)
'
"

[ 1 9 l ( 1 O x >

f13K10x>

*[27]

0.2
0.1
0.08
0.06
0.04

*
[25J

:
fo-n
^

i (15](2X)-" / ? ^ ( 2 X ) H 3 3 ]
i
: 1261(8*) B . i [22] !- Q . . .
l J
j .PI]
j ["]
: [32]

lij*i"

Trck] * H 7 3 i

:
I o

0.02

Master Transition

Linear PLL
BBPLL

1121.
0.01

Fig. 3. Format of 16B/20B line code used with bang-bang CDR


of.Fig. 2.

1988

1990

1992

1994

1996

1998

2000

2002

year of publication

Fig. 4. CDR PLL designs over time. The ratio of link speed to
effective process transit frequency is plotted vs year of publication.
Multi-phase BB PLLs predominate as data rate approaches the process transit frequency limit. (The number of retiming phases used
in each design is given in parentheses.)

transition sample except for this master transition sample. During


link start-up a training sequence is sent that has only one rising
transition at the location of the master transition. Once the loop is
locked, arbitrary data is allowed to be sent at the other 18 bits of
the frame, while the transition sampler pays attention only to the
data stream in the vicinity of the master transition. If the VCO frequency is too high, the transition flip-flop starts sampling prior to
the master transition and outputs a "0" to the loop filter. A slightly
lower VCO frequency, on the other hand, will cause the loop to be
driven by l's.

II. FIRST-ORDER LOOP DYNAMICS


Unfortunately, transition-sampling flip-flop-based phase detectors can provide only binary (early/late) or ternary (early/late +
hold) phase information. This amounts to a hard non-linearity in
the loop structure, leading to an oscillatory steady-state and rendering the circuit unanalyzable with standard linear PLL theory.
Precise loop behavior can be simulated efficiently with time-step
simulators, but this is cumbersome to use for routine design. Fortunately, simple approximate closed-form expressions can be
derived for performance parameters of interest, such as loop jitter
generation, recovered clock spectrum, and jitter tracking performance as a function of various design parameters.

The loop drives the falling edge of the VCO into alignment
with the data transitions based on the binary-quantized phase
error. Because the clock-to-Q delay of the retiming flip-flop is
monolithically matched with the phase detector flip-flop, the PLL
aligns the recovered clock precisely in the middle of the data eye
with nofirst-ordertiming skew over process and temperature variations. Because the narrowest pulse is the output of a flip-flop,
such detectors operate at the full speed at which a process is capable of building a functioning flip-flop. This ensures that the phase
detector will not be the limiting factor in building the fastest possible retiming circuit.

VCO

An additional advantage of flip-flop-based phase detectors is


that since they only require simple processing of digital values,
they easily generalize to multi-phase sampling structures allowing
CDR operation at frequencies in which it would be impossible to
build a working full-speed flip-flop. In contrast, most linear phase
detectors require at least some analog processing at the full bit
rate, limiting process speed and poorly generalizing to multi-phase
sampling architectures.

'update

Fig. 5. A simple bang-bang loop using aflip-flopfor a phase detectoi


to lock onto square-wave input.

A simple BB PLL is shown in Fig. 5. A flip-flop is used as a


phase detector to lock onto a square wave input signal. Depending
on whether the VCO phase samples slightly before or after the rising edge of the input square wave, the flip-flop output is either low
or high, adjusting the VCO period in such a way as to move the
sampling phase error back towards zero. The dynamics of such a
binary-quantized loop are equivalent to a data-driven phase detector operating on alternating 0,1 data with 100% transition density,
or a master-transition based loop similar to that shown in Fig. 2.
For simplicity, we assume that a valid binary phase determination
can be made at every timestep. The consequence of random data

Because of these compelling advantages, the bang-bang loop


has become a common design choice for state-of-the CDR designs
which are pushing the capability of available IC processes. Fig. 4
surveys CDR designs presented from 1988 to 2001 at the International Solid State Circuits Conference. Designs are plotted by year
of presentation against each design's ratio of link speed to effective fp The majority of current designs utilize a combination of
multiphase sampling structures and bang-bang PLLs. In addition,
all CDRs operating at data rates greater than 0.4 fT are bang-bang
designs.

35

and the introduction of a ternary hold mode are considered in a


later section.

frequency detector, these non-uniform sampling times must be


accounted for.

Thefirst-orderBB PLL of Fig. 5 can be rendered into a block


diagram for analysis as shown in Fig. 6. The loop phase error

With the uniform time step approximation, the VCO phase


changes
up
or
down
(or
"walks
off') by

ev

w
i
s

I
Q

Kv

fvco-fnom

fin = fnom V

^ a n s during each update period.

In summary, the first order loop obeys a simple set of discrete


time difference equations:

1
s

tn

ra(

bb ~ ^(fbb^fnom)

ee{l}

</(') = e</(0) + 28// B + <)>(')

+ z

fbb
%(tn+l)

Fig. 6. Block diagram offirstorder loop showing definition of signal


names.

Wn)

r,Qbt

(2)

en = sign[e^ M )-e v (^)]

0)

&e(tn) , is defined as the difference between the data phase


Bd(tn)

and the VCO phase 6v(tn)

As long as the VCO frequency step brackets the input signal


frequency error, the loop will remain phase locked. Assuming

at the nth sampling time

<|>(0 small, the lock range is: - / ^ < 8 / < fbb.

tn. For convenience, phase is measured with respect to an ideal


clock source running at fnom

The loop gen-

erates an excess hunting jitter with a peak-to-peak value of two

bang-bang phase steps Jpp=

4%(fbb/fnom).

The frequency of the incoming data signal differs from the


For the loop to be locked, the average VCO frequency must
equal the average data frequency. The phase detector duty cycle

VCO center frequency by 8 / , and has a zero mean phase jitter of


<K0 In other words, the data can be considered to have been

C, must satisfy the relation

generated by a pattern generator clocked on the rising edges of the


jittered clock signal sm[2n(fnom

phase Qd(tn) is then 2ndftn

+ 8 / > + ty(t)]. The data

/= C(fbb) +

+ Wn)-

(l-C)(-fbb).

The value of C is then given by

c = (l+

The phase detector binary-quantizes the loop phase error at

df

each sampling time to give w = s i g n [ 9 e ( / n ) ] . (Note: In the


The phase detector duty cycle, and therefore its average output
voltage are proportional to the loop frequency error. Fig. 7 shows a
simulated loop with a range of input frequencies. The loop is

case of a ternary data-driven phase detector, Zn may be set to 0


when it is not possible to make a determination of phase error due
to consecutive identical bits in the data stream. The consequence
of this "hold" state is treated in a later section). The error signal
drives the VCO through an attenuator p , to produce a change in
frequency of fbb P^vco' F r o m t * me *n unt** t * m e *n + 1 '
the VCO operates at one of the two frequencies given by
J nom

fvGO

249O.0

/-/

|
2484.C
200.0

nJ bb'

/,
/..-/

Fin
out of lock

In lock

out of lock

V:

Because the VCO frequency changes on each cycle, the system


has non-uniform sampling times. The time of phase sample

e8
-200.0

' + 1

*n + l/(fnom

Jb^

on the order of 0.1% of fnom


form time steps of t

In a ^ ^

CDR

> fbb

is

5.0

, so that an analysis assuming uni-

^ate = 1 /fnom

10.0

15.0

time (jiseconds)

Fig. 7. Simulated response offirst-orderPLL to a range of input frequencies.

is sufficiently accurate

for most purposes. However, for loop analyses requiring exact


charge pump balance, such as wide-range loop pull-in without a

"locked" whenever the input frequency is bracketed by the two


VCO frequencies. The rapid alternation between frequencies

36

slightly too high and slightly too low creates a bounded hunting
jitter (Jpp).

Proportional (BB) branch

The derivative of the input data phase deviation,


d[$(t)]/dt,
adds to the frequency error that must be tolerated by the loop.
Assuming 8 / = 0 , then for <|)(f) = Asin(2Kfmodt),
the

VCO

v$

Xs

>

maximum amplitude A of phase modulation at frequency fmod


before onset of slew-rate limiting is \f^A/

mod

Integral branch

Fig- 8. demon-

Fig. 9. Second-order bang-bang loop schematic.


2490.0

tered on the average incoming data frequency. If certain assumptions are met, as described later, we can consider the system to be
composed of two non-interacting loops. These are the loops
labeled "bang-bang branch" and "integral branch" If the center
frequency control loop is slow enough, the resulting loop behavior
will be very similar to a simple first-order loop, but with an
extended frequency lock range.

vco
fin
2486.CH

Mb

0.0

v.

-200.0

ee

0.0

-100.0

5.0

A. Stability Factor

6.0
7.0
time (^seconds)

To preserve the desirable qualities of the first order loop, it is


critical that the phase change due to the proportional branch dominate over the phase change from the integral branch.

8.0

Fig. 8. Simulated response of first-order PLL to sinusoidal input jitter


just slightly beyond the tracking capability of the loop.

The loop phase change in one update time due to the proportional connection is AQbb = (5 VK tupdate.

strates the loop at the onset of jitter-induced slew-rate limiting.


Although the average input frequency lies within the lock range of
the loop, the added sinusoidal jitter causes the instantaneous input

due to the integral branch is A 9 / r t / = V^JL f

The phase change


upda%e/{!%)

. The

ratio of these two is the stability factor of the loop

frequency deviation to exceed i / ^ The loop stops toggling and


goes into slew rate limiting, leading to a transient phase error.

qt __

^proportional

__

^^integral

A. Summary of First-Order Loop

2pT

'update

The reader should be careful not to confuse the bang-bang loop


The first-order bang-bang loop has only one degree of freedom.

stability factor t, with the linear loop damping factor [5].

Jitter generation, lock range, and jitter tolerance are all inconveniently controlled by one parameter, fbb.

This situation can be

The discrete time difference equations for the second-order


loop can be written as

improved by using a second control loop to dynamically adjust the


nominal VCO frequency fnom

to be equal to the incoming data

Qd(tn) = Qd(0) + 2n8ftn + Wn)

frequency. Because the phase detector duty cycle is proportional to


the loop frequency error, this dynamic centering of VCO frequency can be accomplished by adjusting the VCO center frequency in a feedback loop to drive the phase detector duty cycle
C to 50%. This decouples the lock range from jitter tolerance and
jitter generation, giving more design freedom.

8A+i> = W + ^

2" 1

+^ + ^ J

en = sign[e rf (/ n )-9 v (g]

III. SECOND-ORDER LOOP DYNAMICS

(4)

(3)

From this, it can be seen that the second-order loop has two
degrees of freedom, the loop phase step Qbb (or equivalently, the

To extend the loop tracking range independent of the jitter generation, an extra integrator is added between the phase detector
and the VCO as in Fig. 9. Since the first-order loop dynamic produces a phase detector duty cycle proportional to the loop frequency error, this added integrator can be viewed as an automatic
means for keeping the first-order portion of the loop properly cen-

loop frequency step fbb

) and the stability factor . The added

loop integrator extends the frequency tracking range, leaving


6 ^ free to control jitter tolerance and jitter generation.

37

B. Simulations of Second-Order Loop

ited to ^fbb,

Fig. 10 shows two block diagrams for the second-order loop.


The upper diagram is a straightforward translation of the schematic in Fig. 9. The lower diagram is a topological re-arrangement

sient at the sampling flip-flop.

then there is no jitter accumulation or phase tran-

2480 0 1
400.0

!_

jpTH

tT

Si

i" H

, v rElh

II

-!!^z)J7L^

?
,

,
*

T
,

*"i^

!
_

fi

| ? ! I j ye r^*\

4.0

KV =

.- LI* ..-' J

0.0

2.0

W 6

tint
f

i i

^^

I I ! ?

5.0
6.0
time (^seconds)

7.0

Fig. 12. Second-order loop response to instantaneous frequency

\K '' ''

step larger than fbb

Af
A6i A92 0 e
tp
V^
Fig. 10. Two equivalent second-order bang-bang loop block diagrams.
The proportional phase-control signal flow is highlighted with a
dashed line, and the integral frequency-control loop with a solid line.

Fig. 12 is a simulation in which the input frequency step is bigger than fbb,

so the loop goes into slew rate limiting, leading to a

transient phase error Qe at the sampler.


which places an inner first-order phase tracking loop inside an
outer frequency tracking loop. If one writes the transfer function
from the output of the non-linear quantizer block back to the input
of the quantizer, it can be shown that both diagrams are exactly
equivalent. Some of the signals in the second diagram do not correspond to actual physical variables in the circuit, but they are
helpful in understanding the operation of the loop.

C. Response to Phase Step


For a normalized transient phase step of A = G ^ p / Q ^ , a
first-order loop relocks in A update times. The total time for
relocking is then ^step/(2nfbb)

During the relocking transient of the second-order loop, the


loop integrator overshoots the correct steady-state VCO tune voltage. This causes a quadratic overshoot in the phase trajectory.

2.0

00

i
I

.
I

.
I

.
I

.
I

.
I

.
I

.
I

.
I

.
I

i
I

.
I

.
I

Fig. 13 shows the second-order phase step response with ^ as


a parameter. Up to the first zero crossing, the phase trajectory is
given by

1
I

IHfMffllfillil
i

4.0

5.0
6.0
time (^seconds)

7.0

bb

S /

Fig. 11. Second-order loop response to instantaneous frequency

with n = t/tupdate>

step smaller than fbb

approaches A as t, > o 9 consistent with a first-order loop. In


general, the second-order loop is quicker to reach zero phase error
than the first-order loop, but pays for this with an oscillatory overshoot. As a conservative rule of thumb, the magnitude of the oscillatory transient of a second-order step response can be considered
bounded by the simple linear transient of the first-order loop. The
time required to reach steady state, given a step of A is always
less than or equal to A timesteps, independent of .

Fig. 11 shows the second-order loop responding to a step


change in input frequency fin,

producing a slow response

in the outer integral loop. The resulting phase error A 0 j


tracked by the inner bang-bang loop 0 V

fint
is

to produce the final

sampler phase error Qe . Notice that, unlike linear PLLs, if the


power-supply noise-induced VCO frequency modulation is lim-

38

The time of the first zero crossing

200.0

5OO
400

I\ -S

^infinite

300
200

-200.0
100.0

g2d00

100

CO

CD

-10O

-200

-200
-300

-400

'\*z

-500
0

100

A91
AG2
9

0.0.

-100.0
2.0 I

S*20.

ot.tr
v

A0 2 V6-

0.0

200

300

400

500

600

700

time / typdafe

Fig. 13. Noise-free loop response to a phase step with stability

-2.04.0
1

5.0
6.0
time (jiseconds)

7.0

Fig. 15. Second-order loop response to large sinusoidal input jitter.

factor ^ as a parameter.

load so that a loop can be easily designed to never slew for signals
meeting a typicalfrequency-domainjitter tolerance specification.

IV. S L O P E O V E R L O A D

A. Delta-Sigma Analogy

Many systems, such as SONET, specify jitter tolerance in the


form of a sinusoidal jitter at various frequencies.
100.0

e i.
%.

0.0'
5

-50.0
2.0

SB

4(t)'

-AegS

-100.0
50.0

TJ

Before developing an analytic equation for slope overload, it is


helpful to introduce a further rearrangement of block diagram II
from Fig. 10. Fig. 16 transforms the loop by pulling two integra-

%.

s**(t)

-A92

fin

CE

fbb

[L

2fbb

0.0
%

-2.0

4.0

6.0

5.0

7.0

tn

first order AE on Af

Fig. 16. Redrawing of the loop to show inner AX inner modulator


operating on the loop frequency error.

time (n seconds)
Fig. 14. Second-order loop response to sinusoidal input jitter.

tors through the last summing node prior to the quantizer. The
update time interval is set to 1. The definition for bang-bang frequency
step
f^
= $KVV,
and
stability
factor

Fig. 14 shows the loop response with a sinusoidal input phase


jitter <|>(f) . The outer integral loop tracks the input jitter at AGj

^ = 2pT/ u p ( j a t e are also substituted in.

with a slight phase lag. The resulting phase error A 9 2 is tracked


The shaded area in Fig. 16 shows how the proportional feedback loop can be thought of as an inner AX modulator producing
a phase detector duty cycle proportional to the VCO frequency
error [6],[7].

by the inner bang-bang loop 6 y to produce the final sampler


phase error 0^ . The duty-cycle of the PD output F . varies with
the slope of A 9 2 which is proportional to the instantaneous fre-

Fig. 17 summarizes an analysis of the first order delta-sigma


(after [8]). When the loop is not in slew rate limiting, or in a periodic limit-cycle, the quantizer (e.g., PD) can be replaced with a

quency error of the outer loop.


In Fig. 15, the phase modulation is increased until the instantaneous frequency error exceeds the inner loop's ability to track.
Slew-rate limiting produces a tracking error at the sampler Qe . A

unity gain element and a noise source Q(z)

Asin(2ntft/tupda(e)/(2nft/tupdate)

with the same

noise characteris-

tics as a random binary bitstream. Both these constraints are met


in practice as the VCO phase noise is sufficient to eliminate any
deterministic limit cycles, and the loop is designed to never slew
rate limit on any conforming input signal. This insight is critical as

CDR would normally be designed such that slewing would never


occur for any valid signal allowed by a particular standard. The
next two sections develop an analytic expression for slope over-

39

maximum normalized input phase as a function of normalized frequency


max.

O7

H(z)

X(z)

Y(z)

Q(z)

^-TrmXU)+rniu)Q^
c

O)

CD

s+s 2^^/(

2 "\

')

This is a curious bootstrapped analysis, in that it assumes a lack


of slewing to justify the linearization which permits the computation of the onset of slew rate limiting.

// 2

-V-(( i)

(integration)

(s)

Fig. 18 shows a good agreement between this expression and


simulated loop performance in which slewing is defined as a contiguous sequence of ten or more identical phase-error indications.
This expression can be used to design a loop for a given jitter tol100G

freq

freq

6-0.1;

1G

Fig. 17. Simplified analysis of delta-sigma circuit.

fctQ.

10M

it allows linear analysis to be applied whenever the bang-bang


loop is not in slew rate limiting.

100k

0.1

in

10H

1m

100|A

10m

0.1

10

jitter frequency * t u p d a te

Fig. 18. Normalized amplitude of sinusoidal jitter just sufficient to


cause slope overload as a function of normalized jitter frequency
and with ^ as a parameter.

A closed-form analysis of slope overload can now be derived.


Referring to Fig. 16, the system slews when

|AF| >

f^-

Assuming no slew rate limiting, we can use the results from the
AZ analysis to justify replacing the loop quantizer with a unity
gain element. The maximum input phase jitter in UI as a function
(s) , normalized to 8 ^

can then be calcu-

erance. The tolerance plots are single-pole slope for high ^ and
high jitter frequency, becoming double-pole at lower frequencies
and small i;. At high frequencies, all of the curves become
asymptotic to the single-pole tolerance of a first-order bang-bang
PLL. The operating region below each of these curves is where the
AE approximation is valid, and where a linear loop analysis is
justified.

lated using Laplace transforms.


We want to find an input excitation F(s),

for which

at all frequencies. The inner AZ of Fig. 16 has a

linearized transfer function of \/{s

+f

b b

source
phase
noise

) . Using standard

feedback loop theory, the expression for AF can then be written


as

AF =

F\

Setting A F = fbb,
and tUpciate

P + JT

Kv
S

output

VCO open loop phase noise

Fig. 19. Loop redrawn replacing phase detector with unity gain
element and additive quantization noise.

1 ^

and normalizing the equation by letting


= 1 , we can solve for F(s)/s

L
i

m.

1+ ffbbV

BB phase noise
of form: Asin(x)/x

U* JU+/J

fbb

W.999

10

B. Expression for Slope Overload

|AF| = fbb

points shown
are from numerical
simulation

"e-iod

1k

With the AE substitution, the inner loop becomes a wide-band


unity-gain block as seen from the viewpoint of the outer integral
frequency control loop. The noise in the delta-sigma core is firstorder frequency shaped towards high frequencies. However, when
the frequency noise is converted to phase noise, the shaping is lost
and the noise becomes flat.

of frequency, O

(s2 + s + ?)/(s3 + s2)

S=3

to get the

40

V. JITTER GENERATION

VI. GAUSSIAN INPUT NOISE

With these insights, it is possible to accurately predict the loop


jitter generation in the frequency domain. Fig. 19 is a redrawing of
the loop replacing the phase detector by a unity gain element, and
an additive noise source. The forward loop gain is

Fig. 21 is a plot of output jitter vs input jitter with as a


1OM j

100k

From this can be calculated two transfer functions: the lowpass


seen by both the source phase noise and the PD noise to the output, A (5)= 1 / [ 1 + H(s)], and the high-pass transfer function
from
VCO
phase
noise
to
the
output,
B(s)= H(s)/[l
+ H(s)]. As shown in Fig. 20, with a source
phase noise P(s), a PD phase noise Q(s), and a VCO phase noise
R(s), the total loop jitter generation spectrum becomes the RMS
combination of each of the three weighted terms
J(PA)2

J(s)=

-80 1

" 90

1777(7) ...
JL
IV^^

TTJHT)

.--{
!
.

0.1

approximated
J

I
10k

_J
100k

s***^..:

I
1M

!
10M

"
100M
1G

Fig. 20. Example computation of loop jitter generation spectrum

with parametersfrom[11].
generally taken to be the spectrum of the clock driving the data
source or BERT, or in the case of a clock multiplying circuit, the
spectrum of the reference clock corrected by 20 times the log of
the loop frequency multiplication ratio.

max

W -

RMS

RMS
=

atan

jitter

in

(^M^)

/TC

unit

intervals

is

'
1k

10k

100k

1M

by
+ J

three

regions

of

+ J

InRe

the out

ls a

PP r o x i m a t e l Y

ion J

operation:

-In Region III, the output RMS jitter

ec ual t 0

0-7 * J<5j . This surprising

result says that loops with large ^ have output jitter which grows
as the square root of the input jitter. Contrast this with a linear PLL
which simply low-pass filters the input jitter and thus has an output jitter which grows linearly with the input jitter.
An approximate analysis of loop jitter can shed light on this curious square-root dependence of output jitter on input jitter. Assume

0
The

^walk

RMS = J

'
100

JUn * 2 a . / ( 1 + 7 | )

The phase noise power is given by


S

*
10

total " idle


linear
walk
g
>
P u t J itter
is independent of input jitter G .. This occurs when the self-generated hunting jitter exceeds the input jitter. The RMS jitter in this
region is empirically determined to be well approximated by
^idle ~ ^ + (1.65/2;) . In Region II, the output jitter is proportional to the input jitter. This occurs when the input jitter is so
high that, for a given , the bang-bang dynamic is unable to control the second-order portion of the loop. This leads to large quadratic trajectories in the phase domain, causing the loop phase to
"hunt" towards the limits of the input jitter distribution. As the
loop phase nears the limits of the input jitter distribution, the bangbang hunting has more effect on stabilizing the second-order loop.
In this region, the output jitter is proportional to the input jitter:

^^-r- measured phase noise ..J-.-.r?*^'


}
*
"J^toaaii,

-140 I
1k

*
1

^^r

0 , the loop phase step size. The total loop output jitter can be

-130 source phase noise TTfoiiaiMV-..


I..>K^...I
-140 I
! ^~in |
Tiihi ii MrtllH

"O -120
-130

parameter. For convenience, all jitter sigmas are normalized to

I120 11'" 1""_" 1 ? ii^jii^? 1 *^:!: 111111 11 iTTr^r^ =*> r>n&se noiso_35^

iS-110 ....A^.:...^w?!...i.i^[.;;

Fig. 21. Normalized output jitter vs input jitter sigma with ^ as a


parameter. Simulation is for a non-tristated loop, with square wave
data input, 10 timesteps per point, and ignoring phase wrapping.

."--.
1

.
,
.
} computed phase noise
.1
1...

'\^"y5\'^' j ^ ^ ^ " ^ ^ ^ ^ ^ ^ ^ $-1*3

0-1

TilEnmttiLy.J.y l-^^^^v.^ii^*' vco phase noise

-80 | ^
.
M
~E -90 - ^ y g ^ ^ i ^ - - - -{
75 - 1 0 0 ...^'yiWHUmiWiHL^.,. 1

1 +**^%&rmtrMiu i

+ (QA)2 + (RB)2 . The source phase noise is

-100 ^fff^Z.A
.12o L
1

then

It should be noted that the linearized loop model is only suitable for computation of the jitter spectrum but not for computing
the actual sampling point phase error or other time-domain transient response. The linearized response only covers the dynamics
of the outerfrequencytracking loop, but does not capture the extra

a zero-mean input jitter distribution with a sigma G -. Using a linearized approximation to the standard probability distribution
function, the probability of getting an "early" phase error indication for small loop phase deviation A 6 ,

tracking of the internal nonlinear A S core.

41

is approximately

VII. DATA-DRIVEN PHASE DETECTORS

1 A9
The expected phase change in the loop after one update time is

((i-")~"')-^e

Unless the data contains a guaranteed periodic transition, the


CDR will be required to lock onto random transitions embedded in
the data stream. The effects of runlength and transition density on
loop performance must then be considered. The effect of these two
data attributes is dependent on the type of phase detector used.
Most modern codes use some variation of Alexander's phase
detector [9] shown in Fig. 22.Two matched flip-flops form the

The discrete time equation for the average evolution of loop phase
under the condition of a small input phase error can then be
expressed as

Retimed Data

26 ^

a time constant of T = GJ2K/(2Q^^)

50% duty-cycle
clock from VCO

tune

given by OBLW = vppJtbi/&x)


' E x t e n d i n g t h i s analogy to
the loop, we can consider the output of the phase detector as a
50% duty-cycle random NRZ data stream. Given that the output
from each "bit" must cause a loop phase change of Qbb , we can
to satisfy our loop difference

We can then compute the loop jitter by

pump

Transition Samples

Fig. 22. Modified form of Alexander's ternary-quantized phase


detector for NRZ data along with a typical charge pump for driving
the VCO tuning input.

This "lowpass" loop characteristic is being driven by random


energy from the early/late phase detector output. A related problem is the computation of the baseline wander voltage generated
by passing a random NRZ data stream through a coupling capacitor. It can be shown that the sigma on the capacitor voltage is

equation must be JlTlO,.

DOWN

Input
Data

This equation has the same form as a discrete time approximation


to the capacitor voltage in an RC lowpass filter. By analogy, when
time is expressed in units of loop update times, any transient phase
error in the bang-bang loop can then be said to decay to zero with

compute that the effective V

UP

'pump

front-end of Alexander's phase detector, with the first flip-flop


driven on therisingedge of the 50% duty-cycle clock, and the second flip-flop driven on the falling edge of the same clock. (Using
a fully-differential monolithic ring-oscillator, it is possible to
achieve a very precise 50% duty-cycle clock source). When the
loop is locked, the rising-edge retiming flip-flop samples the center of each data bit and produces a retimed data bit at (A) and the
following retimed bit at (B). The falling-edge flip-flop functions as
a phase detector by sampling the transition (T) between the data
bits (A,B). To improve the circuit's operating speed, the (T) sample is delayed an extra half bit time by a latch so that the logic on
(A,T,B) has a full bit time for resolution.

using the analogous baseline wander expression with the effective


loop V

and t . The result is

/eTToJ^n

The transition sample is then compared to the surrounding data


bits to determine whether the clock sampling phase is early or late
to derive a binary-quantized (bang-bang) or ternary phase error
indication. A truth-table for the logic in Fig. 22 is given in Table 1.

TABLE 1. Truth table for logic in Fig. 22.


which is consistent with empirical analysis of simulation results.
One further insight into this behavior is offered. The secondorder loop drives the phase detector output to a steady-state 50%
duty-cycle. In this condition, the loop phase splits the input jitter
distribution into equal early and late halves. This means that the
bang-bang loop phase is servoed to the median of the input jitter
distribution rather than to the mean as would be the case with a
linear loop. Because of this, the bang-bang loop makes a constant
modest correction in response to large jitter outliers, rather than
the proportionally large overcompensation of a linear loop. This
insight supports the idea that the bang-bang loop jitter should only
be sub-linearly affected by the magnitude of the input jitter.

42

State

UP

DOWN

Meaning

hold

early

hold

3
4

0
1

1
1

1
0

late
late

5
6
7

1
1

0
1

0
1

0
1
1

hold
early
hold

The states 2 and 5 in Table 1 correspond to the normally impossible condition of sampling a " 1 " midway between two "0" bits. A
custom truth table can use these states to detect either a high biterror-rate condition [10], a VCO running grossly too slowly (eg:
lump these states into the "late" condition), or taken as an indication that a link has locked onto its own VCO crosstalk, perhaps by
amplification of power supply noise by pick up from a high-gain
optical transimpedance amplifier [11].

when Aj > AQ , for this implies exponential growth of the acquisition transient. The convergence is guaranteed

whenever

$>2X.
Although usable for tightly constrained block codes such as 8b/
10B, binary phase detectors are essentially unusable for codes
such as 10Gb Ethernet 64b/66b or SONET which can have very
long runlengths of up to 66 or 80 bits, respectively.

Since the mid-bit samples (A,B) straddle the (T) transition


sample, it is also possible to detect the lack of a transition. This
condition corresponds to states 0 and 7 in Table 1. This information can be used to create an extra ternary hold-state in the PD output, causing the charge pump to hold its value during long runlengths. Both binary and ternary PDs will be discussed in turn,
along with their implications on loop performance.

B. Ternary Phase-Detector
The 3-state, or ternary phase detector provides superior jitter
performance for data with long runs [12]. Ternary PDs neither
charge nor discharge the loop filter during long runs causing the
loop to hold the current estimate of the data frequency. Such loops
effectively "stop time" during long runs.

A. Run-length and Latency


If the charge pump does not have a hold-mode, it is possible to
emulate a ternary loop, with some loss of performance, by continuously toggling the phase-detector output to approximately maintain the current charge pump voltage during long runs.

Binary phase detectors have no hold state, so the PD continues


to put out the last valid phase error indication during long data
runlengths. In this situation, the loop idling jitter will be multiplied from the expected value by the maximum runlength of the
data. For example, an 8B/10B code has a maximum code runlength of 5 and will have a peak jitter walk-off five times the value
of that computed for a "10" repetitive data pattern. The average
RMS jitter will be a function of the runlength distributions of each
particular code. There is also a trade-off in effective stability factor
as a repetitive pattern such as "11110000" will be equivalent to a
loop with an effective update time 4 times larger than the expected

The peak idling jitter for ternary loops is unchanged from the
simple 100% transition density analysis. The RMS jitter will be
reduced by the average transition density. Because the loop phase
cannot change during hold mode, the jitter tolerance will be derated by the average transition density. This can easily be taken into
account by increasing 8 ^ appropriately for the characteristics of
the code to be used.

S i n c e t h e stabilitv factor is
update = l/fnominversely
dependent on update time, it is possible for binary PDs to become
unstable with data patterns containing very long runs due to the
delay in timely phase-error feedback.

C. VCO Tuning Bandwidth


The previous analyses all assumed an infinite VCO tuning
bandwidth for the proportional tuning input. A VCO time-constant

Slope(t=0) = S

tvco , can slightly reduce hunting jitter if it is small compared to


*1

X% - SX

the loop update time.

*i

Timeconstants larger than the loop update time prevent the


loop from reversing phase slope within an update period and
lengthen the loop limit cycle. If the extra pole is thought of as an

Fig. 23. Setup for computing onset of loop instability with latency X.

extra latency 2 T y c o , then the result of the previous section can be

used to give an approximate bound on loop stability. To avoid


divergence: ^vco<ktupdate^^'

Comparison with simulation

Fig. 23 shows the loop phase trajectory during an acquisition


transient. At t=0, the loop crosses zero phase error with

verifies this equation as a conservative limit on Xyco .

d/dt
= S. From this we can compute an overshoot AQ.
When the loop phase again crosses zero phase error, the phase
detector is late in responding by a time X. This time is a combination of runlength, latency in the phase detector logic, and highorder poles in the VCO tuning characteristic.

However, it cannot be recommended to flirt with this boundary.


Unless one meticulously checks performance by numerical simulation, it is safest to design the VCO to essentially respond fully in
one update time. This is usually very easy to achieve in ring-oscillators and possible with some care using low-Q LC VCOs.
VIII. CONCLUSION

Due to the loop latency X, the loop overshoots zero phase by

Bang-bang CDR circuits have the unique advantages of inherent sampling phase alignment, adaptability to multi-phase sam-

X /t,-SX
before the "braking" effect of the proportional
branch starts to act. The onset of catastrophic instability occurs

43

pling structures, and operation at the highest speed at which a


process can make a working flip-flop. Approximate equations for
loop jitter, recovered clock spectrum, and jitter tracking performance as a function of various design parameters have been
derived. The median-tracking property of the bang-bang loop
resulting in an output jitter equal to the square root of the input jitter has been presented.

[11]

R. C. Walker, C. Stout and C. Yen, "A 2.488Gb/s SiBipolar Clock and Data Recovery IC with Robust Loss of
Signal Detection," in ISSCC Digest of Technical Papers
pp. 246-247,466, Feb. 1997.

[12]

N. Ishihara and Y. Akazawa, "A Monolithic 156 Mb/s


Clock and Data Recovery PLL Circuit Using the Sampleand-Hold Technique," IEEE Journal of Solid-State Circuits, vol. 29, no. 12, pp. 1566-1571, Dec. 1994.

[13]

D. Chen, and M. O. Baker, "A 1.25 Gb/s, 460mW


CMOS Transceiver for Serial Data Communication," in
ISSCC Digest of Technical Papers, pp. 242- 243,465 Feb.
1997.

[14]

L. DeVito, J. Newton, R. Goughwell, J. Bulzacchelli


and F.Benkley, "A 52MHz and 155 MHz Clock-Recovery
PLL," in ISSCC Digest of Technical Papers, pp. 142143,306, Feb. 1991.

[15]

J. F. Ewen, A. X. Widmer, M. Soyuer, K. R. Wrenner,


B. Parker and H. A. Ainspan, "Single-Chip 1062Mbaud
CMOS Transceiver for Serial Data Communication," in
ISSCC Digest of Technical Papers, pp. 32-33,336, Feb.
1995.

[16]

A. Fiedler, R. Mactaggart, J. Welch and S. Krishnan, "A


1.0625Gbps Transceiver with 2x-Oversampling and
Transmit Signal Pre-Emphasis," in ISSCC Digest of Technical Papers, pp. 238-239,464, Feb. 1997.

[17]

B. Guo, A. Hsu, Y. Wang and J. Kubinec, "125Mb/s


CMOS All-Digital Data Transceiver Using Synchronous
Uniform Sampling," in ISSCC Digest of Technical Papers,
pp. 112-113, Feb. 1994.

[18]

Y. M. Greshishchev, P. Schvan, J. L. Showell, M. Xu, J.


J. Ojha and J. E. Rogers, "A Fully Integrated SiGe
Receiver IC for 10-Gb/s Data Rate," IEEE Journal of Solid
State Circuits, vol. 35, no. 12, pp. 1949-1957, Dec. 2000.

ACKNOWLEDGMENT
The author is grateful to the contributions of Birdy Amrutur,
Bill Brown, John Corcoran, Craig Corsetto, Dave DiPietro, Brian
Donoghue, Jeff Galloway, Andrew Grzegorek, Tom Hornak, Jim
Homer, Tom Knotts, Benny Lai, Adolf Leiter, Bill McFarland,
Charles Moore, Rasmus Nordby, Cheryl Owen, Pat Petruno, Kent
Springer, Guenter Steinbach, Hugh Wallace, Bin Wu, J.T. Wu, and
Chu Yen for technical discussions and helpful insights into bangbang loop behavior.
REFERENCES

[1]

C. B. Armitage, "SAW Filter Retiming in the AT&T


432 Mb/s Lightwave Regenerator," in Conference Proceedings: AT&T Bell Labs, pp. 102-103, Sept. 3-6, 1984.

[2]

C. R. Hogge, Jr., "A Self Correcting Clock Recovery


Circuit," IEEE Transactions on Electron Devices, vol. ED32, no. 12, pp. 2704-2706, Dec. 1985.

[3]

J. Tani, Crandall, D., Corcoran, J. Hornak, T., "Parallel


Interface ICs for 120Mb/s Fiber Optic Links," in ISSCC
Digest oj Technical Papers, pp. 190-191,390, Feb. 1987.

[4]

R. C. Walker, T. Hornak, C. Yen and K. H. Springer, "A


Chipset for Gigabit Rate Data Communication," in Proceedings of the 1989 Bipolar Circuits and Technology
Meeting, pp. 288-290 September 18-19 1989.

[5]

F. Gardner, Phaselock Techniques, New York: John


Wiley & Sons, 1979, pp. 8-14.

[6]

I. Galton, "Higher-order Delta-Sigma Frequency-toDigital Conversion," in Proceedings of IEEE International


Symposium on Circuits and Systems, pp. 441-444, May 30
-June 2, 1994.

[19]

R. Gu, J. M. Tran, H. Lin, A. Yee and M. Izzard, "A 0.53.5Gb/s Low-Power Low-Jitter Serial Data CMOS Transceiver," in ISSCC Digest of Technical Papers, pp. 352353,478, Feb. 1999.

[7]

I. Galton, "Analog-Input Digital Phase-Locked Loops


for Precise Frequency and Phase Demodulation," Transactions on Circuits and Systems-II: Analog and Digital Signal Processing, vol. 42, no. 10, pp. 621-630, Oct. 1995.

[20]

[8]

M. W. Hauser, "Principles of Oversampling AID Conversion," J. Audio Eng. So. vol 39, no. 1/2, pp 3-26, Jan./
Feb. 1991.

J. Hauenschild, C. Dorshcky, T. W. Mohrenfels and R.


Seitz, "A lOGb/s BiCMOS Clock and Data Recovery 1:4Demultiplexer in a Standard Plastic Package with External
VCO," in ISSCC Digest of Technical Papers, pp. 202203,445, Feb. 1996.

[21]

T. He, and P. Gray, "A Monolithic 480 Mb/s AGC/Decision/Clock Recovery Circuit in 1.2 urn CMOS," IEEE
Journal of Solid State Circuits, vol. 28, no. 12, pp. 13141320, Dec. 1993.

[22]

P. Larsson, "A 2-1600MHz 1.2-2.5V CMOS ClockRecovery PLL with Feedback Phase-Selection and Averaging Phase-Interpolation for Jitter Reduction," in ISSCC
Digest of Technical Papers, pp. 356-357, Feb. 1999.

[9]

[10]

J. D. H. Alexander, "Clock Recovery from Random


Binary Signals," Electronics Letters, vol. 11, no. 22, pp.
541-542, Oct. 1975.
J. Hauenschild, D. Friedrich, J. Herrle, J. Krug, "A TwoChip Receiver for Short Haul Links up to 3.5Gb/s with
PIN-Preamp Module and CDR-DMUX " in ISSCC Digest
of Technical Papers, pp. 308-309,452, Feb. 1996.

44

[23]

B. Lai, and R. C. Walker, "A Monolithic 622Mb/s Clock


Extraction Data Retiming Circuit," in JSSCC Digest of
Technical Papers, pp. 144,145, Feb. 1991.
[24] T. H. Lee, and J. F. Bulzacchelli, "A 155MHz Clock
Recovery Delay- and Phase-Locked Loop," IEEE Journal
of Solid State Circuits vol. 27, no. 12, pp. 1736-1746, Dec.
1992.
[25]

[26]

R. H. Leonowich, and J. M. Steininger, "A 45-MHz


CMOS phase/frequency-locked loop timing recovery circuit," in ISSCC Digest of Technical Papers, pp. 14-15,278279, Feb. 1988.
I. Lee, C. Yoo, W. Kim, S. Chai and W. Song, "A
622Mb/s CMOS Clock Recovery PLL with Time- Interleaved Phase Detector Array," in ISSCC Digest of Technical Papers, pp. 198-199,444, Feb. 1996.

M. Meghelli, B. Parker, H. Ainspan and M. Soyuer, "A


SiGe BiCMOS 3.3V Clock and Data Recovery Circuit for
lOGb/s Serial Transmission Systems," in ISSCC Digest of
Technical Papers, pp. 56-57, Feb. 2000.
[28] T. Morikawa, M. Soda, S. Shiori, T. Hashimoto, F. Sato
and K. Emura, "A SiGe Single-Chip 3.3V Receiver IC for
lOGb/s Optical Communication System," in ISSCC Digest
of Technical Papers, pp. 380-381,481, Feb. 1999.
A. Pottbacker, and U. Langmann, "An 8GHz Silicon
Bipolar Clock-Recovery and Data-Regenerator IC," IEEE
Journal of Solid State Circuits vol. 29, no. 12, pp. 15721576, Dec. 1994.

[30]

M. Reinhold, C. Dorschky, F. Pullela, E. Rose, P.


Mayer, P. Paschke, Y. Baeyens, J. Mattia and F. Kunz, "A
Fully-Integrated 40Gb/s Clock and Data Recovery / 1:4
DEMUX IC in SiGe Technology," in ISSCC Digest of
Technical Papers, pp. 84-85,435, Feb. 2001.

[31]

M. Soyuer, and H. A. Ainspan, "A Monolithic 2.3 Gb/s


lOOmW Clock and Data Recovery Circuit," in ISSCC
Digest of Technical Papers, pp. 158-159,282, Feb. 1993.

[32]

S. Ueno, K. Watanabe, T. Kato, T. Shinohara, K.


Mikami, T. Hashimoto, A. Takai, K. Washio, R. Takeyar
and T. Harada, "A Single-Chip lOGb/s Transceiver LSI
using SiGe SOI/BiCMOS," in ISSCC Digest of Technical
Papers, pp. 82-83,435, Feb. 2001.

[33]

H. Wang, and R. Nottenburg, "A lGb/s CMOS Clock


and Data Recovery Circuit," in ISSCC Digest of Technical
Papers, pp. 354-355,477, Feb. 1999.

[34]

P. Wallace, R. Bayruns, J. Smith, T. Laverick and R.


Shuster, "A GaAs 1.5Gb/s Clock Recovery and Data
Retiming Circuit," in ISSCC Digest of Technical Papers,
pp. 192-193, Feb. 1990.

[35]

Z. Wang, M. Berroth, J. Seibel, P. Hofinann, A. Hulsmann, Kohler, B. Raynor and J. Schneider, "19GHz
Monolithic Integrated Clock Recovery Using PLL and
0.3um Gate-Length Quantum-Well HEMTs," in ISSCC
Digest of Technical Papers, pp. 118-119, Feb. 1994,

R. C. Walker, K. Hsieh, T. A. Knotts and C. Yen, "A


lOGb/s Si-Bipolar TX/RX Chipset for Computer Data
Transmission," in ISSCC Digest of Technical Papers, pp.
302-303,450, Feb. 1998.

[37]

R. C. Walker, J. Wu, C. Stout, B. Lai, C. Yen, T. Hornak


and P. Petruno, "A 2-Chip 1.5Gb/s Bus-Oriented Serial
Link Interface," in ISSCC Digest of Technical Papers, pp.
226-227,291, Feb. 1992.

[38]

C. K. Yang, and M. A. Horowitz, "0.8um CMOS 2.5Gb/


s Oversampled Receiver for Serial Links," IEEE Journal
of Solid State Circuits vol. 31, no. 12, pp. 20150-2023,
Dec. 1996.

Richard Walker was born in San Rafael


CA, in 1960. He received the B.S.
degree in Engineering and Applied
Science from the California Institute of
Technology in 1982, and an M.S.
degree in Computer SciencefromCalifornia State University, Chico, CA in
1992. Rick joined Agilent Laboratories
(formerly Hewlett-Packard Laboratories) in 1981, where he is currently a
Principal Project Engineer. Since that
time, he has worked in the areas of
broadband-cable modem design, solidstate laser characterization, phase-locked-loop theory, linecode
design, and gigabit-rate serial data transmission. He holds 15 U.S.
patents.

[27]

[29]

[36]

45

Predicting the Phase Noise and Jitter of PLL-Based


Frequency Synthesizers
Kenneth S. Kundert
Abstract Two methodologies are presented for predicting the
phase noise and jitter of a PLL-based frequency synthesizer
using simulation that are both accurate and efficient. The methodologies begin by characterizing the noise behavior of the
blocks that make up the PLL using transistor-level RF simulation. For each block, the phase noise or jitter is extracted and
applied to a model for the entire PLL.
I. INTRODUCTION

Phase-locked loops (PLLs) are used to implement a variety of


timing related functions, such as frequency synthesis, clock
and data recovery, and clock de-skewing. Any jitter or phase
noise in the output of the PLL used in these applications generally degrades the performance margins of the system in
which it resides and so is of great concern to the designers of
such systems. Jitter and phase noise are different ways of
referring to an undesired variation in the timing of events at
the output of the PLL. They are difficult to predict with traditional circuit simulators because the PLL generates repetitive
switching events as an essential part of its operation, and the
noise performance must be evaluated in the presence of this
large-signal behavior. SPICE is useless in this situation as it
can only predict the noise in circuits that have a quiescent
(time-invariant) operating point. In PLLs the operating point
is at best periodic, and is sometimes chaotic. Recently a new
class of circuit simulators has been introduced that are capable of predicting the noise behavior about a periodic operating
point [1]. SpectreRF is the most popular of this class of simulators and, because of the algorithms used in its implementation, is likely to be the best suited for this application [2].
These simulators can be used to predict the noise performance of PLLs. The ideas presented in this paper allow those
simulators to be applied even to those PLLs that have chaotic
operating points.
A. Frequency Synthesis
The focus of this paper is frequency synthesis. The block diagram of a PLL operating as a frequency synthesizer is shown
in Figure 1 [3]. It consists of a reference oscillator (OSC), a
phase/frequency detector (PFD), a charge pump (CP), a loop
filter (LF), a voltage-controlled oscillator (VCO), and two
Ken Kundert is with Cadence Design Systems, San Jose, California, kundert@cadence.com.

frequency dividers (FDs). The PLL is a feedback loop that,


when in lock, forces /ft, to be equal to/ r e f . Given an input frequency ^ n , the frequency at the output of the PLL is
'out

(1)

where M is the divide ratio of the input frequency divider, and


N is the divide ratio of the feedback divider. By choosing the
frequency divide ratios and the input frequency appropriately,
the synthesizer generates an output signal at the desired frequency that inherits much of the stability of the input oscillator. In RF transceivers, this architecture is commonly used to
generate the local oscillator (LO) at a programmable frequency that tunes the transceiver to the desired channel by
adjusting the value of N.
PU

OSC

/ref
PFD

/in

CP

LF

VCO

/fbp

/out

-f-N
Fig. 1. The block diagram of a frequency synthesizer.

B. Direct Simulation
In many circumstances, SpectreRF* can be directly applied to
predict the noise performance of a PLL. To make this possible, the PLL must at a minimum have a periodic steady state
solution. This rules out systems such as bang-bang clock and
data recovery circuits and fractional-Af synthesizers because
they behave in a chaotic way by design. It also rules out any
PLL that is implemented with a phase detector that has a dead
zone. A dead zone has the effect of opening the loop and letting the phase drift seemingly at random when the phase of
the reference and the output of the voltage-controlled oscillator (VCO) are close. This gives these PLLs a chaotic nature.
To perform a noise analysis, SpectreRF must first compute
the steady-state solution of the circuit with its periodic steady
state (PSS) analysis. If the PLL does not have a periodic solution, as the cases described above do not, then it will not converge. There is an easy test that can be run to determine if a
circuit has a periodic steady-state solution. Simply perform a
transient analysis until the PLL approaches steady state and
t Spectre is a registered trademark of Cadence Design Systems.

46

then observe the VCO control voltage. If this signal consists


of frequency components at integer multiples of the reference
frequency, then the PLL has a periodic solution. If there are
other components, it does not. Sometimes it can be difficult to
identify the undesirable components if the components associated with the reference frequency are large. In this case, use
the strobing feature of Spectre's transient analysis to eliminate all components at frequencies that are multiples of the
reference frequency. Do so by strobing at the reference frequency. In this case, if the VCO control voltage varies in any
significant way the PLL does not have a periodic solution.

jitter parameters for the corresponding behavioral models [8].


Once everything is ready, simulation of the PLL occurs with
the blocks of the PLL being described with behavioral models
that exhibit jitter. The actual jitter or phase noise statistics are
observed during this simulation. Generally tens to hundreds
of thousands of cycles are simulated, but the models are efficient so the time required for the simulation is reasonable.
This approach allows prediction of PLL jitter behavior once
the noise behavior of the blocks has been characterized. However, it requires the use of an experimental simulator that is
not readily available to characterize the jitter of the blocks.

If the PLL has a periodic solution, then in concept it is always


possible to apply SpectreRF directly to perform a noise analysis. However, in some cases it may not be practical to do so.
The time required for SpectreRF to compute the noise of a
PLL is proportional to the number of circuit equations needed
to represent the PLL in the simulator times the number of
time points needed to accurately render a single period of the
solution times the number of frequencies at which the noise is
desired. When applying SpectreRF to frequency synthesizers
with large divide ratios, the number of time points needed to
render a period can become problematic. Experience shows
that divide ratios greater than ten are often not practical to
simulate. Of course, this varies with the size of the PLL.

In an earlier series of papers [9, 10], the relevant ideas of


Demir were adapted to allow use of a commercial simulator,
Spectre [11], and an industry standard modeling language,
Verilog-A^ [12]. These ideas are further refined in the later
half of this paper.

For PLLs that are candidates for direct simulation using SpectreRF, simply configure the simulator to perform a PSS analysis followed by a periodic noise (PNoise) analysis. The period
of the PSS analysis should be set to be the same as the reference frequency as defined in Figure 1. The PSS stabilization
time (tstab) should be set long enough to allow the PLL to
reach lock. This process was successfully followed on a frequency synthesizer with a divide ratio of 40 that contained
2500 transistors, though it required several hours for the complete simulation [4].
C. When Direct Simulation Fails
The challenge still remains, how does one predict the phase
noise and jitter of PLLs that do not fit the constraints that
enable direct simulation? The remainder of this paper
attempts to answer that question for frequency synthesizers,
though the techniques presented are general and can be
applied to other types of PLLs by anyone who is sufficiently
determined.
D. Monte Carlo-Based Methods
Demir proposed an approach for simulating PLLs whereby a
PLL is described using behavioral models simulated at a high
level [5, 6]. The models are written such that they include jitter in an efficient way. He also devised a simulation algorithm
based on solving a set of nonlinear stochastic differential
equations that is capable of characterizing the circuit-level
noise behavior of blocks that make up a PLL [6, 7]. Finally,
he gave formulas that can be used to convert the results of the
noise simulations on the individual blocks into values for the

E. Predicting Noise in PLLs


There are two different approaches to modeling noise in
PLLs. One approach is to formulate the models in terms of
the phase of the signals, producing what are referred to as
phase-domain models. In the simplest case, these models are
linear and analyzed easily in the frequency domain, making it
simple to use the model to predict phase noise, even in the
presence of flicker noise or other noise sources that are difficult to model in the time domain. Phase-domain models are
described in the first half of this paper.
The process of predicting the phase noise of a PLL using
phase-domain models involves:
1. Using SpectreRF to predict the noise of the individual
blocks that make up the PLL.
2. Building high-level behavioral models of each of the
blocks that exhibit phase noise.
3. Assembling the blocks into a model of the PLL.
4. Simulating the PLL to find the phase noise of the overall
system.
The other approach formulates the models in terms of voltage, which are referred to as voltage-domain models. The
advantage of voltage-domain models is that they can be
refined to implementation. In other words, as the design process transitions to being more of a verification process, the
abstract behavioral models initially used can be replaced with
detailed gate- or transistor-level models in order to verify the
PLL as implemented.
A voltage-domain model is strongly nonlinear and never has a
quiescent operating point, making it incompatible with a
SPICE-Iike noise analysis. Often such models have a periodic
operating point and so can be analyzed with small-signal RF
noise analysis (SpectreRF), but it is also common for that not
to be the case. For example, a fractional-^ synthesizer does
t Verilog is a registered trademark of Cadence Design Systems
licensed to Accellera.

47

not have a periodic operating point. Occasionally, the circuit


is sensitive enough that the noise affects the large-signal
behavior of the PLL, such as with bang-bang clock-and-data
recovery PLLs, which invalidates any use of small-signal
noise analysis.

involves hundreds or thousands of cycles at the input to the


phase detector. With large divide ratios, this can translate to
hundreds of thousands of cycles of the VCO. Thus, the number of time points needed for a single simulation could range
into the millions.

Modeling large-signal noise in a voltage-domain model as a


voltage or a current is problematic. Such signals are very
small and continuously and very rapidly varying. Extremely
tight tolerances and small time steps are required to accurately resolve such signals with simulation. To overcome
these problems, the noise is instead represented using the
effect it has on the timing of the transitions within the PLL. In
other words, the noise is added to circuit in the form of jitter.
In this case there is no need for either small time steps or tight
tolerances.

This is all true when simulating the PLL in terms of voltages


and currents. When doing so, one is said to be using voltagedomain models. However, that is not the only option available. It is also possible to formulate models based on the
phase of the signals. In this case, one would be using phasedomain models. The high frequency variations associated
with the voltage-domain models are not present in phasedomain models, and so simulations are considerably faster. In
addition, when in lock the phase-domain-based models generally have constant-valued operating points, which simplifies
small-signal analysis, making it easier to study the closedloop dynamics and noise performance of the PLL using either
AC or noise analysis.

The process of predicting the jitter of a PLL with voltagedomain models involves:
1. Using SpectreRF to predict the noise of the individual
blocks that make up the PLL.
2. Converting the noise of the block to jitter.
3. Building high-level behavioral models of each of the
blocks that exhibit jitter.
4. Assembling the blocks into a model of the PLL.
5. Simulating the PLL to find the jitter of the overall system.

A linear phase-domain model of a frequency synthesizer is


shown in Figure 2. Such a model is suitable for modeling the
behavior of the PLL to small perturbations when the PLL is in
lock as long as you do not need to know the exact waveforms
and instead are interested in how small perturbations affect
the phase of the output. This is exactly what is needed to predict the phase noise performance of the PLL.

The simple linear phase-domain model described in the first


part of this paper, and the nonlinear voltage-domain model
described in the second part, represent the two ends of a continuum of models. Generally, the phase-domain models are
considerably more efficient, but the voltage-domain models
do a better job of capturing the details of the behavior of the
loop, details such as the signal capture and escape processes.
The phase-domain models can be made more general by making them nonlinear and by analyzing them in the time domain.
It is common to use such models with fractional-TV synthesizers. Conversely, simplifications can be made to the voltagedomain models to make them more efficient. It is even possible to use both voltage- and phase-domain models for different parts of the same loop. One might do so to retain as much
efficiency as possible while allowing part of the design to be
refined to implementation level. In general it is best to understand both approaches well, and use ideas from both to construct the most appropriate approach for your particular
situation.
II. PHASE-DOMAIN MODEL

It is widely understood that simulating PLLs is expensive


because the period of the VCO is almost always very short
relative to the time required to reach lock. This is particularly
true with frequency synthesizers, especially those with large
multiplication factors. The problem is that a circuit simulator
must use at least 10-20 time points for every period of the
VCO for accurate rendering, and the lock process often

osc
+^

FD/u

PFD

I
'ft

LF

CP
*det
271

//(CO)

VCO
^out

/CO

DN

1
N

Fig. 2. Linear time-invariant phase-domain model of the synthesizer


shown in Figure 1.
The derivation of the model begins with the identification of
those signals that are best represented by their phase. Many
blocks have large repetitive input signals with their outputs
being primarily sensitive to the phase of their inputs. It is the
signals that drive these blocks that are represented as phase.
They are identified using a (]) variable in Figure 2. Notice that
this includes all signals except those at the inputs of the LF
and VCO.
The models of the individual blocks will be derived by assuming that the signals associated with each of the phase variables
is a pulse train. Though generally the case, it is not a requirement. It simply serves to make it easier to extract the models.
Define ri(r0, T, T) to be a periodic pulse train where one of the
pulses starts at / 0 and the pulses have duration t and period T
as shown in Figure 3. This signal transitions between 0 and 1
if t is positive, and between 0 and - 1 if z is negative. The
phase of this signal is defined to be $ = 2%t^/T. In many cases,
the duration of the pulses is of no interest, in which case
n(r 0 , T) is used as a short hand. This occurs because the input

48

that the signal is driving is edge triggered. For simplicity, we


assume that such inputs are sensitive to the rising edges of the
signal, that r0 specifies the time of a rising edge, and that the
signal is transitioning between 0 and 1.

't-fhn , t.r - H

4 i w ' ;tura^r
T>0

T<0

Fig. 3. The pulse train waveform represented by FI(^Q, X, 7).


The input source produces a signal vin = n(/Q, T). Since this is
the input, t0 is arbitrary. As such, we are free to set its phase <>|
to any value we like.
Given a signal vj = n(/ 0 , T) a frequency divider will produce
an output signal v o = U(t0, NT) where N is the divide ratio.
The phase of the input isfa= 2nto/T and the phase of the output is <|>o = 2ntrf(NT) and so the phase transfer characteristic
of a divider is
o = *iW.
(2)
There are many different types of phase detectors that can be
used, each requiring a somewhat different model. Consider a
simple phase-frequency detector combined with a charge
pump [13]. In this case, the detector takes two inputs,
vj = n(*i, T) and v 0 = n(f 0 , T) and produces an output
*cp = /maxn(^o ?i~ro T) where / max is the maximum output
current of the charge pump. The output of the charge pump
immediately passes through a low pass filter that is designed
to suppress signals at frequencies of 1/7 and above, so in most
cases the pulse nature of this signal can be ignored in favor of
its average value, <i ) . Thus, the transfer characteristic of
the combined PFD/CP is
</cp> = 7max~Y"" = 7max

=
1%

~^^\

"*o)

The model of (3) is a continuous-time approximation to what


is inherently a discrete-time process. The phase detector does
not continuously monitor the phase difference between its
two input signals, rather it outputs one pulse per cycle whose
width is proportional to the phase difference. Using a continuous time approximation is generally acceptable if the bandwidth of the loop filter is much less than/ ref (generally less
than/ ref /10 is sufficient). In practical PLLs this is almost
always the case. It is possible to develop a detailed phasedomain PFD model that includes the discrete-time effects, but
it would run more slowly and the resulting phase-domain
model of the PLL would not have a quiescent operating point,
which makes it more difficult to analyze.
The voltage-controlled oscillator, or VCO, converts its input
voltage to an output frequency, and the relationship between
input voltage and output frequency can be represented as
/out = *Xv c )

(4)

The mapping from voltage to frequency is designed to be linear, so a first-order model is often sufficient,
/ o u t = ^vco v c-

(5)

It is the output phase that is needed in a phase-domain model,


outW = 2nJ* vco v c (r)dr

(6)

or in the frequency domain,


<t>out() = ^ ^ ( c o ) .

(7)

A. Small-Signal Stability
This completes the derivation of the phase-domain models for
each of the blocks. Now the full model is used to help predict
the small-signal behavior of the PLL. Start by using Figure 2
to write a relationship for its loop gain. Start by defining
G fwd = I2H1 = - J 2 f f ( ( D ) _ l E 2 =

<3)

dct

vc

(8)

where AT
Kdetdet = / m a x . Of course, this is only valid for to be the forward gain,
ost
^% a t t n e mmost.
- The behavior outside this range
l^l"" ^2! < 2fl
<I>4V
1
depends strongly on the type of phase detector used [3]. Even
G = =~
(9)
rCV
<>ou
*
^
within this range, the phase detector may be better modeled
with a nonlinear transfer characteristic. For example, there to be the feedback factor, and
can be a flat spot in the transfer characteristics near 0 if the
detector has a dead zone. However it is generally not producT ~ GfwdGrev
^
(10)
tive to model the dead zone in a phase-domain model.'
to be the loop gain. The loop gain is used to explore the smallsignal stability of the loop. In particular, the phase margin is
an important stability metric. It is the negative of the differt This phase-domain model is a continuous-time model that ignores ence between the phase shift of the loop at unity gain and
the sampling nature of the PFD. A dead zone interacts with the sam- 180, the phase shift that makes the loop unstable. It should
pling nature of the PFD to create a chaotic limit cycle behavior that
is not modeled with the phase-domain model. This chaotic behavior be no less than 45 [14]. When concerned about phase noise
creates a substantial amount of jitter, and for this reason, most mod- or jitter, the phase margin is typically 60 or more to reduce
ern phase detectors are designed such that they do not exhibit dead peaking in the closed-loop gain, which results in excess phase
zones.
noise.

49

B. Noise Transfer Functions


In Figure 4 various sources of noise have been added. These
noise sources can represent either the noise created by the
blocks due to intrinsic noise sources (thermal, shot, and
flicker noise sources), or the noise coupled into the blocks
from external sources, such as from the power supplies, the
substrate, etc. Most are sources of phase noise, and denoted
FDw

in"

PFD/CP

(g

LF

*det

//(co)

271

'det

^fdm

VCO
2TC

oui

^vco

j(O
TVCO

FD*

^fdn

Fig. 4. Linear time-invariant phase-domain model of the synthesizer


shown in Figure 2 with representative noise sources added. The <|)'s
represent various sources of noise.
in* fdm* ^fdn' anc* vcc because the circuit is only sensitive

to phase at the point where the noise is injected. The one


exception is the noise produced by the PFD/CP, which in this
case is considered to be a current, and denoted /det.
Then the transfer functions from the various noise sources to
the output are
r

- *out _

ref =

^"f
*

vco - i

NG

fwd _

i ^
_

tfa

Ml-T

'

MN-G^'

and by inspection,
Gfdn = j = * - "Graf
G fdm - j 2 * - ~GKf,
Tfdm
- *out _
^det - "j

n n

(11)

; f - TTTa

C = isH! = S*a* =
*

fwd

^-Gfwd'

Consider further the asymptotic behavior of the loop and the


VCO noise at low offset frequencies (co > 0). Oscillator
phase noise in the VCO results in the power spectral density
5(})vco being proportional to 1/co2, or fyvco~ 1/co2 (neglecting flicker noise). If the LF is chosen such that //(co) ~ 1,
then Gfwd - 1 /co, and contribution from the VCO to the output noise power, GvccAvco*s f*mte anc* nonzero. If the LF
is chosen such that //(co) - 1/co, as it typically is when a
true charge pump is employed, then Gfwd ~ 1/co2 and the
noise contribution to the output from the VCO goes to zero at
low frequencies.
C. Noise Model
One predicts the phase noise exhibited by a PLL by building
and applying the model shown in Figure 4. The first step in
doing so is to find the various model parameters, including
the level of the noise sources, which generally involves either
direct measurement or simulating the various blocks with an
RF simulator, such as SpectreRF. Use periodic noise (or
PNoise) analysis to predict the output noise that results from
stochastic noise sources contained within the blocks using
simulation. Use a periodic AC or periodic transfer function
(PAC or PXF) to compute the perturbation at the output of a
block due to noise sources outside the block, such as on supplies.

'

Once the model parameters are known, it is simply a matter of


computing the output phase noise of the PLL by applying the
(\3) equations in Section II-B to compute the contributions to (J)out
from every source and summing the results. Be careful to
account for correlations in the noise sources. If the noise
sources are perfectly correlated, as they might be if the ultid4) mate source of noise is in the supplies or substrate, then use a
direct sum. If the sources produce completely uncorrelated
noise, as they would when the ultimate source of noise is ran(15) dom processes within the devices, use a root-mean-square
sum.

27CG

ref
7?'

As co - 0, Gfwd - because of the 1 /(/G>) term from the


VCO. So at DC, G r e f , G f d m , G f d n ^ N , Cm->N/M and
G
vco -* 0 At low frequencies, the noise of the PLL is contributed by the OSC, PFD/CP, FD M and FD M and the noise
from the VCO is diminished by the gain of the loop.

,v

Alternatively, one could build a Verilog-A model and use simulation to determine the result. The top-level of such a model
A
'det
det
is shown in Listing 1. It employs noisy phase-domain models
On this last transfer function, we have simply referred *det to
for each of the blocks. These models are given in Listings 3-7
the input by dividing through by the gain of the phase detecand are described in detail in the next few sections (HI-VI). In
tor.
this example, the noise sources are coded into the models, but
These transfer functions allow certain overall characteristics the noise parameters are not set at the top level to simplify the
of phase noise in PLLs to be identified. As co->o, model. To predict the phase noise performance of the loop in
Gfwd - 0 because of the VCO and the low-pass filter, and so lock, simply specify these parameters in Listing 1 and perG
form a noise analysis. To determine the effect of injected
ref> Gdet> Gfdm> Gfdn> G i n ~> a n d G vco ~> l A t h i S h f r e "
quencies, the noise of the PLL is that of the VCO. Clearly this noise, first refer the noise to the output of one of the blocks,
must be so because the low-pass LF blocks any feedback at and then add a source into the netlist of Listing 1 at the approhigh frequencies.
priate place and perform an AC analysis.
r

n
u

50

Listing 1 Phase-domain model for a PLL configured as a


frequency synthesizer.

include "discipline.h"
module pll(out);
output out;
phase out;
parameter integer m = 1 from [1 :inf);
parameter real Kdet = 1 from (O:inf);
parameter real Kvco = 1 from (0:inf);
parameter real d = 1 n from (O:inf);
parameter real c2 = 200p from (0:inf);
parameter real r = 10K from (0:inf);
parameter integer n = 1 from [1 :inf);
phase in, ret, fb;
electrical c;

//input divide ratio


//detector gain
// VCO gain
//Loop filter C1
//Loop filter C2
//Loop filterR
// fb divide ratio

oscillator OSC(in);
divider #(.ratio(m)) FDm(in, ref);
phaseDetector #(.gain(Kdet)) PD(ref, fb, c);
loopFilter#(.c1(c1), .c2(c2), .r(r)) LF(c);
vco #(.gain(Kvco)) VCO(c, out);
divider #(.ratio(n)) FDn(out, fb);
endmodule

is because oscillators inherently tend to amplify noise found


near their oscillation frequency and any of its harmonics. The
reason for this behavior is covered next, followed by a
description of how to characterize and model the noise in an
oscillator. The origins of oscillator phase noise are described
in a conceptual way here. For a detailed description, see the
papers by Kaertner or Demir et al [15, 16, 17].
A. Oscillator Phase Noise
Nonlinear oscillators naturally produce high levels of phase
noise. To see why, consider the trajectory of a fully autonomous oscillator's stable periodic orbit in state space. In steady
state, the trajectory is a stable limit cycle, v. Now consider
perturbing the oscillator with an impulse and assume that the
deviation in the response due to the perturbation is Av, as
shown in Figure 5. Separate Av into amplitude and phase variations,
Av(r) = [\+a(t)]v(t + $&)-v(t).

(17)

where v represents the unperturbed T-periodic output voltage


of the oscillator, oc represents the variation in amplitude, is
the variation in phase, and/ o = \IT is the oscillation frequency.

Listings 1 and 3-7 have phase signals, and there is no phase


discipline in the standard set of disciplines provided by Verv2
ilog-A or Verilog-AMS in discipline.h. There are several different resolutions for this problem. Probably the best solution
'CL Av(0)
is to simply add such a discipline, given in Listing 2, either to
'6
'o
discipline.h as assumed here or to a separate file that is
A!>6 '51
h
'l
included as needed. Alternatively, one could use the rotav
l
tional discipline. It is a conservative discipline that includes
l
A
'6
torque as a flow nature, and so is overkill in this situation.
h
h
h
Finally, one could simply use either the electrical or the volth
h
K
age discipline. Scaling for voltage in volts and phase in radians is similar, and so it will work fine except that the units Fig. 5. The trajectory of an oscillator shown in state space with and
will be reported incorrectly. Using the rotational discipline without a perturbation Av. By observing the time stamps (?Q ..., fg)
would require that all references to the phase discipline be one can see that the deviation in amplitude dissipates while the
changed to rotational in the appropriate listings. Using either deviation in phase does not.
the electrical or voltage discipline would require that both the
Since the oscillation is stable and the duration of the disturname of the disciplines be changed from phase to either elecbance is finite, the deviation in amplitude eventually decays
trical or voltage, and the name of the access functions be
away and the oscillator returns to its stable orbit (oc(f) - 0 as
changed from Theta to V.
t - oo). In effect, there is a restoring force that tends to act
against
amplitude noise. This restoring force is a natural conListing 2 Signal flow discipline definition for phase signals (the
sequence
of the nonlinear nature of the oscillator that acts to
nature Angle is defined in discipline.h).
suppresses amplitude variations.
* include "discipline.h"
discipline phase
potential Angle;
enddiscipline
m.

OSCILLATORS

Oscillators are responsible for most of the noise at the output


of the majority of well-designed frequency synthesizers. This

The oscillator is autonomous, and so any time-shifted version


of the solution is also a solution. Once the phase has shifted
due to a perturbation, the oscillator continues on as if never
disturbed except for the shift in the phase of the oscillation.
There is no restoring force on the phase and so phase deviations accumulate. A single perturbation causes the phase to
permanently shift ((t) > A(|) as t > oo). If we neglect any
short term time constants, it can be inferred that the impulse
response of the phase deviation <|>(0 can be approximated with

51

a unit step s(t). The phase shift over time for an arbitrary input
disturbance u is
oo

<K0~ js(t-x)u(x)dx = J(x)rfr,


oo

(18)

at A/= 1 Hz and/ c is the flicker noise corner frequency. As


shown in Figure 6, n is extracted by simply extrapolating to
1 Hz from a frequency where the noise from the white
sources dominates.

oo

\\^3:

or the power spectral density (PSD) of the phase is

lHz

White sources dominate


\T2:1

External noise sources

f\ **
/o

Fig. 6. Extracting the noise parameters, n> a, and/ c , for an oscillator.


The parameter a is an alternative to n where n = afo2. It is used later.
The graph is plotted on a log-log scale.
Sty is not directly observable and often difficult to find, so now
Sty is related to L, the power spectral density of the output
voltage noise Sv normalized by the power in the fundamental
tone. Sv is directly available from either measurement with a
spectrum analyzer or from RF simulators, and i s defined as

SJ

> - ^f.

B. Characterizing Oscillator Phase Noise


Above it was shown that oscillators tend to convert perturbations from any source into a phase variation at their output
whose magnitude varies with I/A/ (or l / A / 2 i n power). Now
assume that the perturbation is from device noise in the form
of white and flicker stochastic processes. The oscillator's
response will be characterized first in terms of the phase noise
Sty, and then because phase noise is not easily measured, in
terms of the normalized voltage noise L. The result will be a
small set of easily extracted parameters that completely
describe the response of the oscillator to white and flicker
noise sources. These parameters are used when modeling the
oscillator.
Assume that the perturbation consists of white and flicker
noise and so has the form

SJL6f)~\+f.

\^
^v

This shows that in all oscillators the response to any form of


perturbation, including noise, is amplified and appears mainly
in the phase. The amplification increases as the frequency of
the perturbation approaches the frequency of oscillation in
proportion to l/A/(or I/A/ 2 in power).
Notice that there is only one degree of freedom the phase
of the oscillator as a whole. There is no restoring force when
the phase of all signals associated with the oscillator shift
together, however there would be a restoring force if the
phase of signals shifted relative to each other. This observation is significant in oscillators with multiple outputs, such as
quadrature or ring oscillators. The dominant phase variations
appear identically in all outputs, whereas relative phase variations between the outputs are naturally suppressed by the
oscillator or added by subsequent circuitry and so tend to be
much smaller [8].

Flicker sources dominate

(20)

where Vj is the fundamental Fourier coefficient of v, the output signal. It satisfies


oo

Vkei2kf'.

(,) =

(23)

In (41) of [15], Demir et al shows that for a free-running


oscillator perturbed only by white noise sources*

440 =I

n
2 2
2

A,2,

2w 7C + A /

(24)

which is a Lorentzian process with corner frequency of


/comer = " * / o At frequencies above the corner,

<25>

Then from (19) the response will take the form


which agrees with Vendelin [18],

where the factor of (2ri)2 in the denominator of (19) has been


absorbed into , the constant of proportionality. Thus, the
response of the oscillator to white and flicker noise sources is
characterized using just two parameters, n &ndfc, where n is
the portion of Sty attributable to the white noise sources alone

Use (21) to extract/ c . Then use both (21) and (26) to determine n by choosing JSf well above the flicker noise corner frequency,/^ and the corner frequency of (25),/ c o r n e p to avoid
ambiguity and well b e l o w / 0 to avoid the noise from other
sources that occur at these frequencies.
t Demir uses c rather than , where n = c/02.

52

C. Phase-Domain Models for the Oscillators

though they should not contain any white space, wpn was
chosen to represent white phase noise and/p/i stands for
flicker phase noise.

The phase-domain models for the reference and voltage-controlled oscillators are given in Listings 3 and 4. The VCO
model is based on (6). Perhaps the only thing that needs to be
explained is the way that phase noise is modeled in the oscillators. Verilog-AMS provides the flicker jioise function for
modeling flicker noise, which has a power spectral density
proportional to l / / a with a typically being close to 1. However, Verilog-AMS does not limit a to being close to one,
making this function well suited to modeling oscillator phase
noise, for which a is 2 in the white-phase noise region and
close to 3 in the flicker-phase noise region (at frequencies
below the flicker noise corner frequency). Alternatively, one
could dispense with the noise parameters and use the
noise jtable function in lieu of the flicker jfioise functions to
use the measured noise results directly. The "wpn" and "fpn"

When interested in the effect of signals coupled into the oscillator through the supplies or the substrate, one would compute the transfer function from the interfering source to the
phase output of the oscillator using either a PAC or PXF analysis. Again, one would simply assume that the perturbation in
the output of the oscillator is completely in the phase, which
is true except at very high offset frequencies. One then
employs (12) and (13) to predict the response at the output of
the PLL.
IV.

Even in the phase-domain model for the PLL, the loop filter
remains in the voltage domain and is represented with a full
circuit-level model, as shown in Listing 5. As such, the noise
behavior of the filter is naturally included in the phasedomain model without any special effort assuming that the
noise is properly included in the resistor model.

Listing 3 Phase-domain oscillator noise model.


include "discipline.h'
module oscillator(out);
output out;
phase out;
parameter real n = 0 from [O:inf);
// white output phase noise at 1 Hz (rad2/Hz)
parameter real fc = 0 from [O:inf);
// flicker noise corner frequency (Hz)

Listing 5 Loopfiltermodel.
include "discipline.h"
module loopFilter(n);
electrical n;
ground gnd;*
parameter real d = 1n from (0:inf);
parameter real c2 = 200p from (0:inf);
parameter real r = 10K from (O:inf);
electrical int;

analog begin
Theta(out) <+ flicker__noise(n, 2, "wpn")
+ flicker_noise(n*fc, 3, "fpn");
end
endmodule

capacitor #(.c(d)) C1(n, gnd);


capacitor #(.c(c2)) C2(n, int);
resistor #(.r(r)) R(int, gnd);

Listing 4 Phase-domain VCO noise model.


include "discipline.h"
include "constants.h"
module vco(in, out);
input in; output out;
voltage in;
phase out;
parameter real gain = 1 from (0:inf);
//transfer gain, Kvco (HzN)
parameter real n = 0 from [0:inf);
// white output phase noise at 1 Hz (rad2/Hz)
parameter real fc = 0 from [0:inf);
// flicker noise corner frequency (Hz)
analog begin
Theta(out) <+ 2*'M_PI*gain*ldt(V(in));
Theta(out) <+ flickerjioise(n, 2, "wpn")
+ flicker_nofse(n*fc, 3, "fpn");
end
endmodule
strings passed to the noise functions are labels for the noise
sources. They are optional and can be chosen arbitrarily,

LOOP FILTER

endmodule
t The ground statement is not currently supported in Cadence's Verilog-A implementation, so instead ground is explicitly passed into
the module.
V. PHASE DETECTOR AND CHARGE PUMP

As with the VCO, the noise of the PFD/CP as needed by the


phase-domain model is found directly with simulation. Simply drive the block with a representative periodic signal, perform a PNoise analysis, and measure the output noise current.
In this case, a representative signal would be one that produced periodic switching at the output. This is necessary to
capture the noise present during the switching process. Generally the noise appears as in Figure 7, in which case the noise
is parameterized with n and/ c . n is the noise power density at
frequencies above the flicker noise corner frequency,/ c , and
below the noise bandwidth of the circuit.
The phase-domain model for the PFD/CP is given in
Listing 6. It is based on (3). Alternatively, as before one could

53

A
SQ

A.
A Cyclostationary Noise.
Flicker sources dominate

P 0]
Formally,
the term cyclostationary implies that the autocorrea
lation
function
of a stochastic process varies with / in a peri^V^TJI
White sources dominate
* tJ
odic fashion [19, 20], which in practice is associated with a
periodic variation in the noise power of a signal. In general,
T"
TTS.
pei
/c
\.
the noise produced by all of the nonlinear blocks in a PLL is
strongly
cyclostationary. To understand why, consider the
1
N*^
str
noise produced by a logic circuit, such as the inverter shown
Noise bandwidth
noi
i n Figure 8. The noise at the output of the inverter, n out ,
Fig. 7. Extracting the noise parameters, n and / c , for the PFD/CP. in
comes
from different sources depending on the phase of the
The graph is plotted on a log-log scale.
coi
output signal, v out . When the output is high, the output is
ou
use the noise jtable function in lieu of the white_noise and jinsensitive
ng
to small changes on the input. The transistor A/p is
flickerjnoise
functions to use the measured noise results on
Q n and the noise at the output is predominantly due to the
directly.
the
thermal
noise from its channel. This is region A in the figure.
Wl
When the output is low, the situation is reversed and most of
Listing 6 Phase-domain phase detector noise model.
the output noise is due to the thermal noise from the channel
of
o f A/N. This is region B. When the output is transitioning,
'include "discipline.h"
thermal
noise from both Afp and M N contribute to the output.
* include "constants.h"
the
In addition, the output is sensitive to small changes in the
module phaseDetector(pin, nin, out);
input.
In fact, any noise at the input is amplified before reachm
input pin, nin; output out;
*
ing
the output. Thus, noise from the input tends to dominate
in
phase pin, nin;
*
over
the thermal noise from the channels of M P and M N in
ov
electrical out;
this
region.
Noise at the input includes noise from the previparameter real gain = 1 from (O:inf);
thi
ous
stage
and
noise from both devices in the form of flicker
// transfer gain (A/cycle)
ou
noise
and
thermal
noise from gate resistance. This is region C
parameter real n = 0 from [O:inf);
no
in
// white output current noise (A2/Hz)
[n the figure.
parameter real fc = 0 from [Orinf);
// flicker noise corner frequency (Hz)

Hr

analog begin
l(out) <+ gain * Theta(pin,nin) / (2**M_PI);
l(out) <+ white_noise(n, "wpn")
+ flicker_noise(n*fc, 1, "fpn");
end
endmodule

in

^p
out

outh

VI. FREQUENCY DIVIDERS

There are several reasons why the process of extracting the


noise produced by the frequency dividers is more complicated
than that needed for other blocks. First, the phase noise is
needed and, as of the time when this document was written,
SpectreRF reports on the total noise and does not yet make
the phase noise available separately. Secondly, the frequency
dividers are always followed by some form of edge-sensitive
thresholding circuit, in this case the PFD, which implies that
the overall noise behavior of the PLL is only influenced by
the noise produced by the divider at the time when the threshold is being crossed in the proper direction. The noise produced by the frequency divider is cyclostationary, meaning
that the noise power varies over time. Thus, it is important to
analyze the noise behavior of the divider carefully. The second issue is discussed first.

p^
Fig. 8. Noise produced by an inverter (nout) as a function of the
oui
output signal (vout). In region A the noise is dominated by the thermal
n0]

noise of Mp in region B its dominated by the thermal noise of A/^,

n< in region C the output noise includes the thermal noise from both
|jand

devices as well as the amplified noise from the input.

JThe
J challenge in estimating the effect of noise passing
m]
through
a threshold is the difficulty in estimating the noise at

tfa
the point where the threshold is crossed. There are several dif-

ferent ways of estimating the effect of this noise, but the simplest is to use the strobed noise feature of SpectreRF. * When
the strobed noise feature is active, the noise produced by the

54

circuit is periodically sampled to create a discrete-time random sequence, as shown in Figure 9. SpectreRF then computes the power-spectral density of the sequence. The sample
time would be adjusted to coincide with the desired threshold
crossings. Since the T-periodic cyclostationary noise process
is sampled every T seconds, the resulting noise process is stationary. Furthermore, the noise present at times other than at
the sample points is completely ignored.

i Pi Pi P.

\J \J u

V (t)

S^f) = [2nfo/^Pjsn{f).

C. Phase-Domain Model for Dividers


To extract the phase noise of a divider, drive the divider with a
representative periodic input signal and perform a PSS analysis to determine the threshold crossing times and the slew rate
{dvldt) at these times. Then use SpectreRF's strobed PNoise
analysis to compute Sn(f). When running PNoise analysis,
assure that the maxsidebands parameter is set sufficiently
large to capture all significant noise folding. A large value
will slow the simulation. To reduce the number of sidebands
needed, use T as small as possible. S^(f) is then computed
from (32). Generally the noise appears as in Figure 10. Notice
that the noise is periodic in/with period 1/7 because n is a
discrete-time sequence with period T. The parameters n and/ c
for the divider are extracted as illustrated. The high frequency
roll-off is generally ignored because it occurs above the frequency range of interest.

Fig. 9. Strobed noise. The lower waveform is a highly magnified


view of the noise present at the strobe points in vn, which are chosen
to coincide with the threshold crossings in v.

B. Converting to Phase Noise


The act of converting the noise from a continuous-time process to a discrete-time process by sampling at the threshold
crossings makes the conversion into phase noise easier. If vn
is the continuous-time noisy response, and v is the noise-free
response (response with the noise sources turned off), then^
n;=v n (/7)~v0T).
(27>
Then if vn is noisy because it is corrupted with a phase noise
process 0, then

Vn{t) = 2itf/
v(t+m

5A

r\

(28)

M L D | p . (30)
lntn

Finally, <|>,- can be found from nt using

,dv(iT)

A
/ \
UT

With ripple counters, one usually only characterizes one stage


at a time and combines the phase noise from each stage by
assuming that the noise in each stage is independent (true for
device noise, would not be true for noise coupling into the
divider from external sources). The variation due to phase
noise accumulates, however it is necessary to account for the
increasing period of the signals at each stage along the ripple
counter. Consider an intermediate stage of a /sT-stage ripple
counter. The total phase noise at the output of the ripple
counter that results due to the phase noise 5 ^ at the output of
stage k is (TK/T02S^. So the total phase noise at the output
of the ripple counter is

and

. ,

White sources dominate

Fig. 10. Extracting the noise parameters, n and/ c , for the divider.

(29)

at

Flicker sources dominate

Noise bandwidth \ /

Assume the phase noise is small and linearize v using a Taylor series expansion

ni . sv(jT) + MiI)gZ)_ v ( j T ) =
'
at 2nf0n

(32)

where Sn(f) and S^(f) are the power spectral densities of the
ni and <(), sequences.

n. r

v is T periodic, which makes dv(iT)ldt a constant, and so

(31)

t The strobed-noise feature of SpectreRF is also referred to as its


time-domain noise feature.
t It is assumed that the sequence nt is formed by sampling the noise
at iT, which implies that the threshold crossings also occur at iT. In
practice, the crossings will occur at some time offset from iT. That
offset is ignored. It is done without loss of generality with the understanding that the functions v and vn can always be reformulated to
account for the offset.

Vut = 4 ^

(33)

*=o *
arem e

where S^ and 7Q
phase noise and signal period at the
input to the first stage of the ripple counter.
With undesired variations in the supplies or in the substrate
the resulting phase noise in each stage would be correlated, so
one would need to compute the transfer function from the sig-

55

nal source to the phase noise of each stage and combine in a


vector sum.

a frequency divider that implements non-integer divide ratio


except in a few very restrictive cases, so instead a divider that
is capable of switching between two integer divide ratios is
used, and one rapidly alternates between the two values in
such a way that the time-average is equal to the desired noninteger divide ratio [13]. A block diagram for a fractional-Af
synthesizer is shown in Figure 11. Divide ratios of N and
N + 1 are used, where N is the first integer below the desired
divide ratio, and N + 1 is the first integer above. For example,
if the desired divide ratio is 16.25, then one would alternate
between the ratios of 16 and 17, with the ratio of 16 being
used 75% of the time. Early attempts at fractional-N synthesis
alternated between integer divide ratios in a repetitive manner, which resulted in noticeable spurs in the VCO output
spectrum. More recently, AZ modulators have been used to
generate a random sequence with the desired duty cycle to
control the multi-modulus dividers [21]. This has the effect of
trading off the spurs for an increased noise floor, however the
AZ modulator can be designed so that most of the power in its
output sequence is at frequencies that are above the loop
bandwidth, and so are largely rejected by the loop.

Unlike in ripple counters, phase noise does not accumulate


with each stage in synchronous counters. Phase noise at the
output of a synchronous counter is independent of the number
of stages and consists only of the noise of its clock along with
the noise of the last stage.
The phase-domain model for the divider, based on (2), is
given in Listing 7. As before, one could use the noisejtable
function in lieu of the white_noise and flickerjnoise functions
to use the measured noise results directly.
Listing 7 Phase-domain divider noise model.
Include "discipline.h"
module divider(in, out);
input in; output out;
phase in, out;
parameter real ratio = 1 from (OAni);//divide ratio
parameter real n = 0 from [0:inf);
// white output phase noise (rads?/Hz)
parameter real fc = 0 from [0:inf);
// flicker noise corner frequency (Hz)

osc

analog begin
Theta(out) <+ Theta(in) / ratio;
Theta(out) <+ white_noise(n, "wpn")
+ flicker_noise(n*fc, 1, "fpn");
end
endmodule

/ref

CP

PFD

LF

VCO
/out

/J

FD
+N, N+l

Mod

Fig. 11. The block diagram of a fractional-N frequency synthesizer.


VII.

FRACTIONAL-N SYNTHESIS

One of the drawbacks of a traditional frequency synthesizer,


also known as an integer-N frequency synthesizer, is that the
output frequency is constrained to be N times the reference
frequency. If the output frequency is to be adjusted by changing N, which is constrained by the divider to be an integer,
then the output frequency resolution is equal to the reference
frequency. If fine frequency resolution is desired, then the reference frequency must be small. This in turn limits the loop
bandwidth as set by the loop filter, which must be at least 10
times smaller than the reference frequency to prevent signal
components at the reference frequency from reaching the
input of the VCO and modulating the output frequency, creating spurs or sidebands at an offset equal to the reference frequency and its harmonics. A low loop bandwidth is
undesirable because it limits the response time of the synthesizer to changes in N. In addition, the loop acts to suppress the
phase noise in the VCO at offset frequencies within its bandwidth, so reducing the loop bandwidth acts to increase the
total phase noise at the output of the VCO.
The constraint on the loop bandwidth imposed by the
required frequency resolution is eliminated if the divide ratio
N is not limited to be an integer. This is the idea behind fractional-N synthesis. In practice, one cannot directly implement

56

The phase-domain small-signal model for the combination of


a fractional-// divider and a AE modulator is given in
Listing 8. It uses the noisejtable function to construct a simple piece-wise linear approximation of the noise produced in
an rfi1 order AE modulator that is parameterized with the low
frequency noise generated by the modulator, along with the
corner frequency and the order.
VIII. JITTER

The signals at the input and output of a PLL are often binary
signals, as are many of the signals within the PLL. The noise
on binary signals is commonly characterized in terms of jitter.
Jitter is an undesired perturbation or uncertainty in the timing
of events. Generally, the events of interest are the transitions
in a signal. One models jitter in a signal by starting with a
noise-free signal v and displacing time with a stochastic process./. The noisy signal becomes
vn(0 = v(r+y(0)

(34)

withy assumed to be a zero-mean process and v assumed to be


a 7-periodic function, j has units of seconds and can be interpreted as a noise in time. Alternatively, it can be reformulated
as a noise is phase, or phase noise, using

between transitions. The next metric characterizes the correlations between transitions as a function of how far the transitions are separated in time.

Listing 8 Phase-domain fractional-N divider model.


include "discipline.h"
module divider(in, out);
input in; output out;
phase in, out;
parameter real ratio = 1 from (O:lnf);// divide ratio
parameter real n = 0 from [0:inf);
// white output phase noise (rads?/Hz)
parameter real fc = 0 from [0:inf);
// flicker noise corner frequency (Hz)
parameter real bw = 1 from (O:inf);//AX mod bandwidth
parameter integer order = 1 from (0:9);//AZ mod order
parameter real fmax = 10*bw from (bw:inf);
// maximum frequency of concern
analog begin
Theta(out) <+ Theta(in) / (ratio + noise_table([
0,
n,
bw,
n,
fmax,
n*pow((fmax/bw),order)
], "dsn"));
end
endmodule

Define Jk(i) to be the standard deviation of ti+k - th


7,(0 = Vvar(f,. + , - * , . ) .

(38)
1

Jk(i) is referred to as k-cycle jitter or long-term jitter '. It is a


measure of the uncertainty in the length of k cycles and has
units of time. 7j, the standard deviation of the length of a single period, is often referred to as the period jitter, and it
denoted J, where J = 7].
Another important jitter metric is cycle-to-cycle jitter. Define
7} = ft-+i - tx to be the period of cycle i. Then the cycle-tocycle jitter 7CC is
' c c = V Var < 7 '.- + l- 7 '*>-

<K0 = 27t/ o7 (0,

(35)

vn(0 = v(, + f | ) .

(36)

where/ o = 1/Fand

A. Jitter Metrics
Define {^} as the sequence of times for positive-going zero
crossings, henceforth referred to as transitions, that occur in
vn. The various jitter metrics characterize the statistics of this
sequence.
The simplest metric is the edge-to-edge jitter, 7 ee , which is
the variation in the delay between a triggering event and a
response event. When measuring edge-to-edge jitter, a clean
jitter-free input is assumed, and so the edge-to-edge jitter 7 ec
is

Cycle-to-cycle jitter is like edge-to-edge jitter in that it is a


scalar jitter metric that does not contain information about the
correlation in the jitter between distant transitions. However,
it differs in that it is a measure of short-term jitter that is relatively insensitive to long-term jitter [22]. As such, cycle-tocycle jitter is the only jitter metric that is suitable for use
when flicker noise is present. All other metrics are unbounded
in the presence of flicker noise.
If7(0 is either stationary or T-cyclostationary, then {t{\ is stationary, meaning that these metrics do not vary with i, and so
7 e e (0, J&(0> a n d Jcc(0 c a n b e shortened to 7 ee , Jk, and 7CC.
These jitter metrics are illustrated in Figure 12.

edge-to-edge jitter

Edge-to-edge jitter is also a scalar jitter metric, and it does not


convey any information about the correlation of the jitter

~ \

jeeu) = 7^8^j
fc-cycle jitter

Jk(i) = Jynr(tl + k-tii

(37)
Edge-to-edge jitter assumes an input signal, and so is only
defined for driven systems. It is an input-referred jitter metric,
meaning that the jitter measurement is referenced to a point
on a noise-free input signal, so the reference point is fixed. No
such signal exists in autonomous systems. The remaining jitter metrics are suitable for both driven and autonomous systems. They gain this generality by being self-referred,
meaning that the reference point is on the noisy signal for
which the jitter is being measured. These metrics tend to be a
bit more complicated because the reference point is noisy,
which acts to increase the measured jitter.

<39>

K_c

"*" p '
"~i

|I

H*
HI

cycle-to-cycle jitter

||

|i

k cycles
p

r~\

^\
~ti+k

H * ~H

'= jiTi+l-Ti) J l J l J U L f
Fig. 12. The various jitter metrics.
B. Types of Jitter
The type of jitter produced in PLLs can be classified as being
from one of two canonical forms. Blocks such as the PFD,
CP, and FD are driven, meaning that a transition at their output is a direct result of a transition at their input. The jitter
t Some people distinguish betweenfc-cyclejitter and long-term jitter
by defining the long-term jitter J^ as being thefc-cyclejitter Jk as
k ->.

57

exhibited by these blocks is referred to as synchronous jitter,


it is a variation in the delay between when the input is
received and the output is produced. Blocks such as the OSC
and VCO are autonomous. They generate output transitions
not as a result of transitions at their inputs, but rather as a
result of the previous output transition. The jitter produced by
these blocks is referred to as accumulating jitter, it is a variation in the delay between an output transition and the subsequent output transition. Table I previews the basic
characteristics of these two types of jitter. The formulas for
jitter given in this table are derived in the next two sections.
T A B L E I: THE TWO CANONICAL FORMS OF JITTER.

Jitter Type

Circuit Type

synchronous

driven
(pFD/cp ^

. .
accumulating |

autonomous
( Q S Q yep)

Period Jitter
,
/var( ( r ) )
J = ^ / < f t
|

Synchronous jitter is exhibited by driven systems. In the PLL,


the PFD/CP and FDs exhibit synchronous jitter. In these components, an output event occurs as a direct result of, and some
time after, an input event. It is an undesired fluctuation in the
delay between the input and the output events. If the input is a
periodic sequence of transitions, then the frequency of the
output signal is exactly that of the input, but the phase of the
output signal fluctuates with respect to that of the input. The
jitter appears as a modulation of the phase of the output,
which is why it is sometimes referred to as phase modulated
or PM jitter.

W) = ^

<>*(0 = Vvaraa + k)T+jsync(ti

+ k)]

ft-'i>'

- [iT+j^)])

< 43)

,(44)

y,(/) = 72varO sync (/ /) ).

(45)

Jk(i) = V27ee(0 .

(46)

Since 7Sync(0 is jT-cyclostationary ysync =; syn c('/) is independent of i, and so is 7 ee and Jk. The factor of 72 in (46) stems
from the length of an interval including the independent variation from two transitions. From (46), Jk is independent of ,
and so
Jk = J for k = 1,2, ...m.
(47)
Using similar arguments, one can show that with simple synchronous jitter,
Jcc = J,

(48)

A. Extracting Synchronous Jitter

IX. SYNCHRONOUS JITTER

v n (0 = Ht+jsync(O)

k = Vvar<'i +

Generally, the jitter produced by the PFD/CP and FDs is well


approximated by simple synchronous jitter if one can neglect
flicker noise.

,
7 = ToT

Let T| be a stationary or T-cyclostationary process, then

The jitter in driven blocks, such as the PFD/CP or FDs, occurs


because of an interaction between noise present in the blocks
and the thresholds that are inherent to logic circuits.
In systems where signals are continuous valued, an event is
usually defined as a signal crossing a threshold in a particular
direction. The threshold crossings of a noiseless periodic signal, v(0, are precisely evenly spaced. However, when noise is
added to the signal, vn(r) = v(i) + nv(t), each threshold crossing is displaced slightly. Thus, a threshold converts additive
noise to synchronous jitter.

The amount of displacement in time is determined by the


amplitude of the noise signal, nv(t) and the slew rate of the
(40) periodic signal, dv(tc)/dt, as the threshold is crossed, as shown
in Figure 13 [23]. If the noise /^ is stationary, then
(41)

var0
))s
(49)
exhibits synchronous jitter. If t| is further restricted to be a
white Gaussian stationary or T-cyclostationary process, then
vn(0 exhibits simple synchronous jitter. The essential charac- where tc is the time of a threshold crossing in v (assuming the
teristic of simple synchronous jitter is that the jitter in each noise is small).
event is independent or uncorrelated from the others, and (35)
shows that it corresponds to white phase noise. Driven cir'c
Av
cuits exhibit simple synchronous jitter if they are broadband
and if the noise sources are white, Gaussian and small. The
Noise
sources are considered small if the circuit responds linearly to
Threshold
Histogram
the noise, even though at the same time the circuit may be
responding nonlinearly to the periodic drive signal.

-^

^7^

For systems that exhibit simple synchronous jitter, from (37),


J

Jf>=
Similarly, from (38),

J<Jsynci-

At

(42)

Jitter Histogram

Fig. 13. How a threshold converts noise into jitter.

58

Generally nv is not stationary, but cyclostationary (refer back With ripple counters, one usually only characterizes one stage
to Section VI-A). It is only important to know when the noisy at a time. The total jitter due to noise in the ripple counter is
periodic signal vn(t) crosses the threshold, so the statistics of then computed by assuming that the jitter in each stage is
nv are only significant at the time when vn(t) crosses the independent (again, this is true for device noise, but not for
noise coupling into the divider from external sources) and
threshold,
taking
the square-root of the sum of the square of the jitter on
var(n ( O )
each stage.

"- < '< )) - T^pk>-

<50)

The jitter is computed from (42) using (49) or (50),


dv(tc)/dt

'

To compute var(nv(rc)), one starts by driving the circuit with a


representative periodic signal, and then sampling v(t) at intervals of 7to form the ergodic sequence {v(rl)} where tt = tc for
some i. Then the variance is computed by computing the
power spectral density for the sequence by integrating from
/ = ~/0/2 to/ o /2. Recall that the noise is periodic in/with
period/o = 1/7because n is a discrete-time sequence with rate
T.

Unlike in ripple counters, jitter does not accumulate with synchronous counters. Jitter in a synchronous counter is independent of the number of stages and consists only of the jitter of
its clock along with the jitter of the last stage.
2) Extracting the Jitter of the Phase Detector: The PFD/CP is
not followed by a threshold. Rather, it feeds into the LF,
which is sensitive to the noise emitted by the CP at all times,
not just during transitions. This argues that the noise of the
PFD/CP be modeled as a continuous noise current. However,
as mentioned earlier, doing so is problematic for simulators
and would require very tight tolerances and small time steps.
So instead, the noise of the PFD/CP is referred back to its
inputs. The inputs of the PFD/CP are edge triggered, so the
noise can be referred back as jitter.

In practice, this is done by using the strobed noise capability


of SpectreRF^ to compute the power spectral density of the
sequence. When the strobed noise feature is active, the noise To extract the input-referred jitter of a PFD/CP, drive both
produced by the circuit is periodically sampled to create a dis- inputs with periodic signals with offset phase so that the PFD/
crete-time random sequence, as shown in Figure 9. SpectreRF CP produces a representative output. Use SpectreRF's PNoise
then computes the power-spectral density of the sequence. analysis to compute the output noise over the total bandwidth
The sample time should be adjusted to coincide with the of the PFD/CP (in this case, use the conventional noise analydesired threshold crossings. Since the T-periodic cyclostation- sis rather than the strobed noise analysis). Choose the freary noise process is sampled every T seconds, the resulting quency range of the analysis so that the total noise at
noise process is stationary. Furthermore, the noise present at frequencies outside the range is negligible. Thus, the noise
should be at least 40 dB down and dropping at the highest fretimes other than at the sample points is completely ignored.
quency simulated. Integrate the noise over frequency and
1) Extracting the Jitter of Dividers: To extract the jitter of a apply Wiener-Khinchin Theorem [24] to determine
divider, drive the divider with a representative periodic input
var(n) = f Sn(f)df,
(53)
signal and perform a PSS analysis to determine the threshold
crossing times and the slew rate (dv/dt) at these times. Then
use SpectreRF's strobed PNoise analysis to compute (/). the total output noise current squared [19]. Then either calcuThe sample point should be set to coincide with the point late or measure the effective gain of the PFD/CP, K^cV Scale
where the output signal crosses the threshold of the subse- the gain so that it has the units of amperes per second. Then
quent stage (the phase detector) in the appropriate direction. divide the total output noise current by the gain and account
When running PNoise analysis, assure that the maxsidebands for there being two transitions per cycle to distribute the noise
parameter is set sufficiently large to capture all significant over to determine the input-referred jitter for the PFD/CP,
noise folding. A large value will slow the simulation. To
J
= T F**W
(54)
reduce the number of sidebands needed, use T as small as
ee
K }
PFD/cp
2nKdJ
2 '
possible. SpectreRF computes the power spectral density,
which is integrated to compute the total noise at the sample As before, when running PNoise analysis, assure that the
points,
maxsidebands parameter is set sufficiently large to capture all
significant noise folding. A large value will slow the simula//2
tion. To reduce the number of sidebands needed, use T as
var(nv(fc)) = J
Sn(f,tz)df.
(52) small as possible.
'o
oo

Then J^ is computed from (51).


X. ACCUMULATING JITTER

t The strobed-noise feature of SpectreRF is also referred to as its


timf-Hotnain nnisp. fpfltiire.

Accumulating jitter is exhibited by autonomous systems, such


as oscillators, that generate a stream of spontaneous output

59

transitions. In the PLL, the OSC and VCO exhibit accumulating jitter. Accumulating jitter is characterized by an undesired
variation in the time since the previous output transition, thus
the uncertainty of when a transition occurs accumulates with
every transition. Compared with a jitter free signal, the frequency of a signal exhibiting accumulating jitter fluctuates
randomly, and the phase drifts without bound. Thus, the jitter
appears as a modulation of the frequency of the output, which
is why it is sometimes referred to as frequency modulated or
FM jitter.
Again assume that T| be a stationary or T-cyclostationary process, then

W ) = f neorfc
J

(55)

Similarly,
Jcc = 727.

(59)

Generally, the jitter produced by the OSC and VCO are well
approximated by simple accumulating jitter if one can neglect
flicker noise.
A. Extracting Accumulating Jitter
The jitter in autonomous blocks, such as the OSC or VCO, is
almost completely due to oscillator phase noise. Oscillator
phase noise is a variation in the phase of the oscillator as it
proceeds along its limit cycle.
In order to determine the period jitter / of vn(f) for a noisy
oscillator, assume that it exhibits simple accumulating jitter
so that T| in (55) is a white Gaussian r-cyclostationary noise
process (this excludes flicker noise) with a power spectral
density of

v n (0 = v(/+y a c c (O)
(56)
exhibits accumulating jitter. While Tj is cyclostationary and so
S^(f)= a,
has bounded variance, (55) shows that the variance of y acc ,
and hence the phase difference between v(t) and v n (0, is and an autocorrelation function of
unbounded.
Rr](tvt2) = ab{tx-t2)y
If t| is further restricted to be a white Gaussian stationary or
T-cyclostationary random process, then v n (0 exhibits simple where 8 is a Kronecker delta function. Then

(60)

(61)

accumulating jitter. In this case, the process {yacc(*T)} that


AccW = f T\TW
(62>
results from sampling y a c c every T seconds is a discrete
Wiener process and the phase difference between v(/7) and
vn(/7) is a random walk [19]. As shown next, simple accumu- is a Wiener process [19], which has an autocorrelation funclating jitter corresponds to oscillator phase noise that results tion of
R
from white noise sources.
j (*!> l2> = amin(f 1? h^
( 63 >
The essential characteristic of simple accumulating jitter is
that the incremental jitter that accumulates over each cycle is The period jitter is the standard deviation of the variation in
independent or uncorrelated. Autonomous circuits exhibit one period, and so
Jl
simple accumulating jitter if they are broadband and if the
= 0'acc('+7Wacc).
(64)
noise sources are white, Gaussian and small. The sources are
^2 = E [ 0 a c c 0 + 7 ) - j a c c ( 0 ) 2 ]
(65)
considered small if the circuit responds linearly to the noise,
though at the same time the circuit may be responding nonlin2
2
2
J = E[/ acc (r + T) - 2jacc(t + 7); acc (0 +; a c c (0 ]
(66)
early to the oscillation signal. An autonomous circuit is considered broadband if there are no secondary resonant Jl = EL/acc + 7) 2 ] " 2 Et/ a c c (/ + 7)y acc (/)] + E[/ a c c (0 2 ] (67)
responses close in frequency to the primary resonance.*
J2 = R. (t + T,t + T)-2Rj
(/+7W) + * / . (M) (68)
For systems that exhibit simple accumulating jitter, each tran'ace
'ace
'ace
sition is relative to the previous transition, and the variation in
J2 = a(t + T) - 2at + at
(69)
the length of each period is independent, so the variance in
/ = Jaf
(70)
the time of each transition accumulates,
Jk= 4~kJ for k = 0, 1 , 2 , . . . ,
(57) We now have a way of relating the jitter of the oscillator to the
PSD of T|. However, x\ is not measurable, so instead the jitter
where
is related to the phase noise S. To do so, consider simple
accumulating
jitter written in terms of phase,
J = ^varO^. +^-varO-^,.)).
(58)
accW = 2nfohcc^
= 2%fo hOOrfC,
t Oscillators are strongly nonlinear circuits undergoing large periodic variations, and so signals within the oscillator freely mix up and
down in frequency by integer multiples of the oscillation frequency. where/ = 1/r. From (60) and (71) the PSD of ^
For this reason, any low frequency time constants or resonances in
(2rc/o)2 _ aft
supply or bias lines would effectively act like close-in secondary res5* (A/) = a
onances. In fact, this is the most likely cause of such phenomenon.
^acc
(2nAf)2
A/ 2 '

60

(71)
is
(72)

From (26)

XI. JITTER OF A PLL

If a PLL synthesizer is constructed from blocks that exhibit


simple synchronous and accumulating jitter, then the jitter
2 ^acc
2A/
behavior of the PLL is relatively easy to estimate [26].
a = 2UAf)^
.
(74) Assume that the PLL has a closed-loop bandwidth of/ L , and
that x L = l/2rc/L, then for k such that kT T L , jitter from the
VCO dominates and the PLL exhibits simple accumulating
Determine a by choosing A/well above the corner frequency,
jitter equal to that produced by the VCO. Similarly, at large k
t0
/comer avoid ambiguity and well below/ o to avoid the noise
(low frequencies), the PLL exhibits simple accumulating jitter
from other sources that occur at these frequencies.
equal to that produced by the OSC. Between these two
1) Example: To compute the jitter of an oscillator, an RF sim- extremes, the PLL exhibits simple synchronous jitter. The
ulator such as SpectreRF is used to find L &ndfo of the oscil- amount of which depends on the characteristics of the loop
lator. Given these, a is found with (74), J is found with (70) and the level of synchronous jitter exhibited by the FDs and
and Jk is found with (57). This procedure is demonstrated for the PFD/CP. The behavior of such a PLL is shown in
the oscillator shown in Figure 14. This is a very low noise Figure 15.
oscillator designed in O.35JI CMOS by of Rael and Abidi
Accumulating jitter
[25]. The frequency of oscillation is 1.1 GHz and the resonafrom OSC
tor has a loaded Q of 6.
Accumulating jitter

UAf) = fa (A/) = ^ - ,

(73)

logC/*)

from VCO ^
Synchronous jitter from
PFD/CP, FDs

AJ

log(*)

Fig. 15. Long-term jitter (Jk) for an idealized PLL as a function of


the number of cycles.

' <j -|jK<-t-. ; ' '

XII. MODELING A PLL WITH JITTER

'DD
Fig. 14. Differential LC oscillator.
The procedure starts by using an RF simulator such as SpectreRF to compute the normalized phase noise L. Its PNoise
analysis is used, with the maxsidebands parameter set to at
least 10 to adequately account for noise folding within the
oscillator.* In this case, = - 1 1 0 dBc at 100 kHz offset from
the carrier. Apply (74) to compute a from L, where ( A/) =
10"11, A/= 100 kHz, and/ 0 = 1.1 GHz,
a = 2 10" 11

1Q

= 165.3X10" 21 .

(75)

Vl.lxlOV
The period jitter J is then computed from (70),
fa
/165.3 x 10~ 21
,nf-,
1O~.
/ - = /
= 12.3 fs.
(76)
^/0
A/ 1.1 GHz
In this example, the noise was extracted for the VCO alone. In
practice, the LF is generally combined with the VCO before
extracting the noise so that the noise of the LF is accounted
for.
/
r*
J = JaT =

The basic behavioral models for the blocks that make up a


PLL are well known and so will not be discussed here in any
depth [27, 28]. Instead, only the techniques for adding jitter to
the models are discussed.
Jitter is modeled in an AHDL by dithering the time at which
events occur. This is efficient because it does not create any
additional activity, rather it simply changes the time when
existing activity occurs. Thus, models with jitter can run as
efficiently as those without.
A. Modeling Driven Blocks
A feature of Verilog-A allows especially simple modeling of
synchronous jitter. The transitionQ function, which is used to
model signal transitions between discrete levels, provides a
delay argument that can be dithered on every transition. The
delay argument must not be negative, so a fixed delay that is
greater than the maximum expected deviation of the jitter
must be included. This approach is suitable for any model that
exhibits synchronous jitter and generates discrete-valued outputs. It is used in the Verilog-A divider module shown in
Listing 9, which models synchronous jitter with (41) where
7 sync *s a stationary white discrete-time Gaussian random process. It is also used in Listing 10, which models a simple
PFD/CP.

t At one point it was mistakenly suggested in the documentation for


SpectreRF that maxsidebands should be set to 0 for oscillators. This
causes SpectreRF to ignore all noise folding and results in a signifi- 1) Frequency Divider Model: The model, given in Listing 9,
cant underestimation of the total noise.
operates by counting input transitions. This is done in the

61

Listing 9 Frequency divider that models synchronous jitter.

Listing 10 PFD/CP model with synchronous jitter.


include "discipline.h"

include "discipline.h"
module divider (out, in);

module pfd_cp (out, ret, vco);

input in; output out; electrical in, out;

input ref, vco; output out; electrical ref, vco, out;

parameter real Vlo=-1, Vhi=1;


parameter integer ratio=2 from [2:inf);
parameter integer dir=1 from [-1:1] exclude 0;
//dir=1 for positive edge trigger
//dir=-1 for negative edge trigger
parameter real tt=1n from (0:inf);
parameter real td=O from (0:inf);
parameter real jitter=O from [0:td/5);//edge-to-edge jitter
parameter real ttol=1p from (0:td/5);// ttoi jitter

parameter real lout=100u;


parameter integer dir=1 from [-1:1] exclude 0;
//dir=1 for positive edge trigger
//dir=-1 for negative edge trigger
parameter real tt=1n from (0:inf);
parameter real td=O from (0:inf);
parameter real jitter=O from [0:\d/5);//edge-to-edge jitter
parameter real ttol=1p from (0:td/5);//tfo/jitter

integer count, n, seed;


real dt;
analog begin
@(initial_step) seed = -311 ;
@(cross(V(in) - (Vhi + Vlo)/2, dir, ttol)) begin
//count input transitions
count = count + 1;
if (count >= ratio)
count = 0;
n = (2*count >= ratio);
//add jitter
dt = jitter*$dist_normal(seed,0,1);
end
V(out) <+ transition^ ? Vhi: Vlo, td+dt, tt);
end
endmodule

integer state, seed;


real dt;
analog begin
@(initiaLstep) seed = 716;
@ (cross(V(ref), dir, ttol)) begin
if (state > -1) state = state - 1 ;
dt = jitter*$dist_normal(seed,0,1);
end
@(cross(V(vco), dir, ttol)) begin

if (state < 1) state = state + 1;


dt = jitter*$dist_normal(seed,0,1);
end
l(out) <+ transition(lout*state, td + dt, tt);
end
endmodule

input in the direction dir, the output is decremented. If both


the VCO and reference inputs are at the same frequency, then
@ cross block. The cross function triggers the @ block at the
the average value of the output is proportional to the phase
precise moment when its first argument crosses zero in the
difference between the two, with the average being negative if
direction specified by the second argument. Thus, the @
the reference transition leads the VCO transition and positive
block is triggered when the input crosses the threshold in the
otherwise [3]. As before, the time of the output transitions is
user specified direction. The body of the @ block increments
randomly dithered by dt to model jitter. The output is modthe count, resets it to zero when it reaches ratio, then detereled as an ideal current source and a finite transition time promines if count is above or below its midpoint (n is zero if the
vides a simple model of the dead band in the CP.
count is below the midpoint). It also generates a new random
dither dT that is used later. Outside the @ block is code that
B. Modeling Accumulating Jitter
executes continuously. It processes n to create the output. The
value of the ?: operator is Vhi if n is 1 and Vlo if n is 0. Finally, 1) OSC Model: The delay argument of the transition^) functhe transition function adds a finite transition time of tt and a tion cannot be used to model accumulating jitter because of
delay of td + dt. The finite transition time removes the discon- the accumulating nature of this type of jitter. When modeling
tinuities from the signal that could cause problems for the a fixed frequency oscillator, the timerQ function is used as
simulator. The jitter is embodied in dt, which varies randomly shown in Listing 11. At every output transition, the next tranfrom transition to transition. To avoid negative delays, td must sition is scheduled using the timerQ function to be
always be larger than dt. This model expects jitter to be speci- T/K + Jb/Jk in the future, where 8 is a unit-variance zerofied as igg, as computed with (51).
mean random process and K is the number of output transitions per period. Typically, K = 2.
2) PFD/CP Model: The model for a phase/frequency detector
combined with a charge pump is given in Listing 10. It imple- C. VCO Model
ments a finite-state machine with a three-level output, - 1 , 0
and +1. On every transition of the VCO input in direction dir, A VCO generates a sine or square wave whose frequency is
the output is incremented. On every transition of the reference proportional to the input signal level. VCO models, given in

62

Listing 11 Fixed frequency oscillator with accumulating jitter.

AT isis aa random
random variable
variable with
with variance
variance
AT

include "discipline.h"
module osc (out);

var(AT) = 2 j p = Jl.

(78)

75,Axf. = lz

(79)

Therefore,

output out; electrical out;


parameter real freq=1 from (0:inf);
parameter real Vlo==-1, Vhi=1;

JK

where 8 is a zero-mean unit-variance Gaussian random process. The dithered frequency is

parameter real tt=O.O1/freq from (O:inf);

parameter real jitter=O from [0:0M1req);// period jitter


integer n, seed;
real next, dT;

f - if1^ fi

analog begin
@ (initiaLstep) begin
seed = 286;
next = 0.5/freq + $abstime;
end

Y,]

ram
(

''-ursj.-

<81>

Finally varCr,) = J2/K, so AT,- = JS/Jk

and AT1- = jKJdr

The final model given in Listing 12. This model can be easily
modified to fit other needs. Converting it to a model that generates sine waves rather than square waves simply requires
replacing the last two lines with one that computes and outputs the sine of the phase. When doing so, consider reducing
the number of jitter updates to one per period, in which case
the factor of 1.414 should be changed to 1.
Listing 13 is a Verilog-A model for a quadrature VCO that
exhibits accumulating jitter. It is an example of how to model
an oscillator with multiple outputs so that the jitter on the outputs is properly correlated.
D. Efficiency of the Models

Vv JU

Conceptually, a model that includes jitter should be just as


efficient as one that does not because jitter does not increase
the activity of the models, it only affects the timing of particular events. However, if jitter causes two events that would normally occur at the same time to be displaced so that they are
no longer coincident, then a circuit simulator will have to use
more time points to resolve the distinct events and so will run
more slowly. For this reason, it is desirable to combine jitter
sources to the degree possible.

mod 2rc
"Knit

75
Fig. 16.
jitter.

Ax, " l + t f A V c

The @ cross statement is used to determine the exact time


when the phase crosses the thresholds, indicating the beginning of a new interval. At this point, a new random trial S is
generated.

Listings 12 and 13, are constructed using three serial operations, as shown in Figure 16. First, the input signal is scaled to
compute the desired output frequency. Then, the frequency is
integrated to compute the output phase. Finally, the phase is
used to generate the desired output signal. The phase is computed with idtmod, a function that provides integration followed by a modulus operation. This serves to keep the phase
bounded, which prevents a loss of numerical precision that
would otherwise occur when the phase became large after a
long period of time. Output transitions are generated when the
phase passes -n/2 and n/2.

Vin

fc

Let A 7 . = ATAT. , then

V(out) <+ transition^ ? Vhl: Vlo, 0, tt);


end
endmodule

i +

@(timer(next)) begin
n = !n;
dT = jitter*$dist_normal(seed,0,1);
next = next + 0.5/freq + 0.707*dT;
end

0)

~ K\% + ti%) ~

Kx

Block diagram of VCO behavioral model that includes

The jitter is modeled as a random variation in the frequency


of the VCO. However, the jitter is specified as a variation in
the period, thus it is necessary to relate the variation in the
period to the variation in the frequency. Assume that without
jitter, the period is divided into K equal intervals of duration T
= T/K = l/Kf0. The frequency deviation will be updated
every interval and held constant during the intervals. With jitter, the duration of an interval is
%. = T + AT..
(77)

To make the HDL models even faster, rewrite them in either


Verilog-HDL or Verilog-AMS. Be sure to set the time resolution to be sufficiently small to prevent the discrete nature of
time in these simulators from adding an appreciable amount
of jitter.
1) Including Synchronous Jitter into OSC: One can combine
the output-referred noise of F D ^ and FD^ and the input-

63

Listing 12 VCO model that includes accumulating jitter.


include "discipline.h"
include "constants.h"
module vco (out, in);
input in; output out; electrical out, in;
parameter
parameter
parameter
parameter
parameter
parameter
parameter
parameter

real Vmin=0;
real Vmax=Vmin+1 from (Vmin:inf);
real Fmin=1 from (Orinf);
real Fmax=2*Fmin from (Fmin:inf);
real Vlo=-1, Vhi=1;
real tt=0.01/Fmax from (O:inf);
real jitter=O from [0:0.25/Fmax);// period jitter
real ttol=1u/Fmax from (0:1/Fmax);

real freq, phase, dT;


integer n, seed;
analog begin
(initlaLstep) seed = - 5 6 1 ;
//compute the freq from the input voltage
freq = (V(in) - Vmin)*(Fmax - Fmin) / (Vmax - Vmin)
+ Fmin;

Listing 13 Quadrature Differential VCO model that includes


accumulating jitter.
include "discipline.h"
Include "constants.h"
module quadVco (Plout.Nlout, PQout,NQout, Pin,Nin);
electrical Plout, Nlout, PQout, NQout, Pin, Nin;
output Plout, Nlout, PQout, NQout;
input Pin, Nin;
parameter
parameter
parameter
parameter
parameter
parameter
parameter
parameter

real freq, phase, dT;


integer i, q, seed;
analog begin
@(initial_step) seed = 133;
//compute the freq from the input voltage
freq = (V(Pin.Nin) - Vmin) * (Fmax - Fmin) / (Vmax - Vmin)
+ Fmin;

//bound the frequency (this is optional)


if (freq > Fmax) freq = Fmax;
if (freq < Fmin) freq = Fmin;

//bound the frequency (this is optional)


if (freq > Fmax) freq = Fmax;
if (freq < Fmin) freq = Fmin;

/ / add the phase noise


freq = f req/(1 + dT*freq);
//phase is the integral of the freq modulo 2K
phase = 2*^M_PI*idtmod(freq, 0.0,1.0, -0.5);

/ / add the phase noise


freq = freq/(1 + dT*freq);

/ / update jitter twice per period


// 1A14=sqrt(K), K=2 jitter updates/period
@(cross(phase + *M_PI/2, + 1 , ttol) or
cross(phase - 'M_PI/2, +1, ttol)) begin
d T = 1.414*jitter*$dist_jiormal(seed,0,1);
n = (phase >= - M_PI/2) && (phase < 'M_PI/2);
end

//phase is the integral of the freq modulo 2K


phase = 2* % MJ D l*idtmod(freq, 0.0,1.0, -0.5);
// update jitter where phase crosses n/2
//2=sqrt(K), K=4 jitter updates per period
@(cross(phase - 3**M_PI/4, +1, ttol) or
cross(phase - x M_PI/4, + 1 , ttol) or
cross(phase + 'lvLPI/4, + 1 , ttol) or
cross(phase + 3**M__PI/4, +1, ttol)) begin
dT = 2*jitter*$dist_normal(seed,0,1);
I = (phase >= -3*^M_PI/4) && (phase < %M_PI/4);
q = (phase >= - M_PI/4) && (phase < 3*%M_PI/4);
end

//generate the output


V(out) <+ transition^ ? Vhi: Vlo, 0, tt);
end
endmodule
referred noise of the PFD/CP with the output noise of OSC. A
modified fixed-frequency oscillator model that supports two
jitter parameters and the divide ratio M is given in Listing 14
(more on the effect of the divide ratio on jitter in the next section). The accJitter parameter is used to model the accumulating jitter of the reference oscillator, and the syncJitter
parameter is used to model the synchronous jitter of FD^,
FDN and PFD/CP. Synchronous jitter is modeled in the oscillator without using a nonzero delay in the transition function.
This is a more efficient approach because it avoids generating
two unnecessary events per period. To get full benefit from
this optimization, a modified PFD/CP given in Listing 15 is
used. This model runs more efficiently by removing support
for jitter and the td parameter.

real Vmin=0;
real Vmax=Vmin+1 from (Vmin:inf);
real Fmin=1 from (O:inf);
real Fmax=2*Fmin from (Fminrinf);
real Vlo=-1, Vhi=1;
real jitter=O from [0:0.25/Fmax);// period jitter
real ttol=1u/Fmax from (0:1/Fmax);
real tt=0.01/Fmax;

//generate the I and Q outputs


V(Plout) <+ transition(i ? Vhi: Vlo, 0, tt);
V(Nlout) <+ transition^ ? Vlo: Vhi, 0, tt);
V(PQout) <+ transition^ ? Vhi: Vlo, 0, tt);
V(NQout) <+ transition(q ? Vlo : Vhi, 0, tt);
end
endmodule
2) Merging the VCO and FDN: If the output of the VCO is
not used to drive circuitry external to the synthesizer, if the
divider exhibits simple synchronous jitter, and if the VCO
exhibits simple accumulating jitter, then it is possible to
include the frequency division aspect of the FD^ as part of the

64

Listing 14 Fixed-frequency oscillator with accumulating and


synchronous jitter.

Listing 15 PFD/CP without jitter.


include "discipline.h"

include "discipline.h"

module pfd_cp (out, ref, vco);

module osc (out);

input ref, vco; output out; electrical ref, vco, out;

output out; electrical out;


parameter real freq=1 from (0:inf);
parameter real ratio=1 from (0:inf);
parameter real Vlo=-1, Vhi=1;
parameter real tt=0.01*ratio/freq from (0:inf);
parameter real accJitter=O from [O:O.1/freq); //period jitter
parameter real syncJitter=O from [0:0.1 *ratlo/freq);
// edge-to-edge jitter

analog begin
@(initial_step) begin
accSeed = 286;
syncSeed = -459;
accSD = accJltter*sqrt(ratio/2);
syncSD = syncJitter;
next = 0.5/freq + $abstime;
end

l(out) <+ transition(lout * state, 0, tt);


end
endmodule

@(timer(next + dt)) begin


n = !n;
dT = accSD*$dist_normal(accSeed,0,1);
dt = syncSD*$dist_normal(syncSeed,0,1);
next = next + 0.5*ratio/freq + dT;
end

Thus, to merge the divider into the VCO, the VCO gain must
be reduced by a factor of N, the period jitter increased by a
factor of JN , and the divider model removed.
After simulation, it is necessary to refer the computed results,
which are from the output of the divider, to the output of
VCO, which is the true output of the PLL. The period jitter at
the output of the VCO, Jyco* c a n ^ e computed with (82).
To determine the effect of the divider on 5^(0)), square both
sides of (82) and apply (70)

V(out) <+ transition^ ? Vhi: Vlo, 0, tt);


end
endmodule
VCO by simply adjusting the VCO gain and jitter. If the
divide ratio of FDN is large, the simulation runs much faster
because the high VCO output frequency is never generated.
The Verilog-A model for the merged VCO and FDN is given
in Listing 16. It also includes code for generating a logfile
containing the length of each period. The logfile is used in
Section XIII when determining 5VCO the power spectral density of the phase of the VCO output.
Recall that the synchronous jitter of F D M and FD# has
already been included as part of OSC, so the divider model
incorporated into the VCO is noiseless and the jitter at the
output of the noiseless divider results only from the VCO jitter. Since the divider outputs one pulse for every N pulses at
its input, the variance in the output period is the sum of the
variance in N input periods. Thus, the period jitter at the output, /prj, is JN times larger than the period jitter at the input,
or

'FD

= JN+VCO-

integer state;
analog begin
@(cross(V(ref), dir, ttol)) begin
If (state > -1) state = state - 1;
end
@(cross(V(vco), dir, ttol)) begin
if (state < 1) state = state + 1;
end

integer n, accSeed, syncSeed;


real next, dT, dt, accSD, syncSD;

JVcO'

parameter real lout=100u;


parameter integer dir=1 from [-1:1] exclude 0;
//dir= 1 for positive edge trigger
// dir = -1 for negative edge trigger
parameter real tt=1n from (0:inf);
parameter real ttol=1p from (0:inf);

(82)

fl

FDrFD

aa
T
VCO7VCO

(83)

TvcO=TFD/N> and so

(84)

vco

From (72),

s^ VCO-r2

- ^FD f 2

(85)

Jvco
Finally,/vco = W/ FD , and so
^VCO2^

5prj>.

(86)

Once FDN is incorporated into the VCO, the VCO output signal is no longer observable, however the characteristics of the
VCO output are easily derived from (82) and (86), which are
summarized in Table II.
It is interesting to note that while the frequency at the output
of FDN is N times smaller than at the output of the VCO,
except for scaling in the amplitude, the spectrum of the noise
close to the fundamental is to a first degree unaffected by the
presence of FD#. In particular, the width of the noise spec-

65

T A B L E II: CHARACTERISTICS OF V C O OUTPUT RELATIVE TO THE

Listing 16 VCO with FD N .

OUTPUT OF FD/v ASSUMING THE V C O EXHIBITS SIMPLE


ACCUMULATING JITTER AND THE FDyy IS NOISE FREE.

Include "discipline.h"
module vco (out, in);

Frequency

Jitter

input in; output out; electrical out, in;


parameter
parameter
parameter
parameter
parameter
parameter
parameter
parameter

real
real
real
real
real
real
real
real

Vmin=0;
Vmax=Vmin+1 from (Vmin:inf);
Fmin=1 from (0:inf);
Fmax=2*Fmin from (Fminrinf);
ratio=1 from (0:inf);
Vlo=-1, Vhi=1;
tt=O.O1 *ratio/Fmax from (0:inf);
jitter=O from [0:0.25*ratio/Fmax);
//VCO period jitter
parameter real ttol=1u*ratio/Fmaxfrom (O:ratio/Fmax);
parameter real outStart=inf from (1/Fmin:inf);
real freq, phase, dT, delta, prev, Vout;
integer n, seed, fp;
analog begin
@ (initial_step) begin
seed = - 5 6 1 ;
delta = jitter * sqrt(2*ratio);
fp = $fopen("periods.m");
Vout = Vlo;
end

/vco

//apply the frequency divider, add the phase noise


freq = (freq / ratio)/(1 + dT * freq / ratio);
//phase is the integral of the freq modulo 1
phase = idtmod(freq, 0.0,1.0, -0.5);
/ / update jitter twice per period
@(cross(phase - 0.25, +1, ttol)) begin
dT = delta * $dist_normal(seed, 0,1);
Vout = Vhi;
end
@(cross(phase + 0.25, +1, ttol)) begin
dT = delta * $dist_normal(seed, 0,1);
Vout = Vlo;
if ($abstime >= outStart) $fstrobe( fp, "%0.10e",
$abstime - prev);
prev = $abstime;
end
V(out) <+ transition(Vout, 0, tt);
end
endmodule

trum is unaffected by FD#. This is extremely fortuitous,


because it means that the number of cycles we need to simulate is independent of the divide ratio N. Thus, large divide
ratios do not affect the total simulation time.

^/FD

j
V C 0

-'*>
"
^

54
=
^vco

N2S,

VFD

To understand why FD# does not affect the width of the noise
spectrum, recall that while we started with a jitter that varied
continuously with time, j(t) in (34), for either efficiency or
modeling reasons we eventually sampled it to end up with a
discrete-time version. The act of sampling the jitter causes the
spectrum of the jitter to be replicated at the multiples of the
sampling frequency, which adds aliasing. This aliasing is visible, but not obvious, at high frequencies in Figure 18. However, especially with accumulating jitter, the phase noise
amplitude at low frequencies is much larger than the aliased
noise, and so the close-in noise spectrum is largely unaffected
by the sampling. The effect of FD^ is to decimate the sampled
jitter by a factor of TV, which is equivalent to sampling the jitter signal, yCX at the original sample frequency divided by N.
Thus, the replication is at a lower frequency, the amplitude is
lower, and the aliasing is greater, but the spectrum is otherwise unaffected.

//compute the freq from the input voltage


freq = (V(in) - Vmin)*(Fmax - Fmin) / (Vmax - Vmin)
+ Fmin;
//bound the frequency (this is optional)
if (freq > Fmax) freq = Fmax;
if (freq < Fmin) freq = Fmin;

Phase Noise

XIII. SIMULATION AND ANALYSIS

The synthesizer is simulated using the netlist from Listing 18


and the Verilog-A descriptions in Listings 14-16, modifying
them as necessary to fit the actual circuit. The simulation
should cover an interval long enough to allow accurate Fourier analysis at the lowest frequency of interest {Fm^. With
deterministic signals, it is sufficient to simulate for K cycles
after the PLL settles if F m i n = \I(TK). However, for these signals, which are stochastic, it is best to simulate for \0K to
100AT cycles to allow for enough averaging to reduce the
uncertainty in the result.
One should not simply apply an FFT to the output signal of
the VCO/FDyy to determine (A/) for the PLL. The result
would be quite inaccurate because the FFT samples the waveform at evenly spaced points, and so misses the jitter of the
transitions. Instead, -(40 can be measured with Spectre's
Fourier Analyzer, which uses a unique algorithm that does
accurately resolve the jitter [11]. However, it is slow if many
frequencies are needed and so is not well suited to this application.
Unlike HAf), S^(Af) can be computed efficiently. The Verilog-A code for the VCO/FDN given in Listing 16 writes the
length of each period to an output file named periods.m. Writing the periods to the file begins after an initial delay, specified using outStart, to allow the PLL to reach steady state.
This file is then processed by Matlab from Math Works using
the script shown in Listing 17. This script computes S^(Af),

66

the power spectral density of <|), using Welch's method [28].


The frequency range is from/ out /2 to/out/nfft. The script cornListing 17 Matlab script used for computing S^Af). These results
must be further processed using Table II to map them to the output of
the VCO.
% Process period data to compute S^(Af)
echo off;
nfft=512;
% should be power of two
winLength=nfft;
overlap=nfft/2;
winNBW=1.5;
% Noise bandwidth given in bins
% Load the data from the file generated by the VCO
load periods.m;
% output estimates of period and jitter
T=mean(periods);
J=std(periods);
maxdT = max(abs(periods-T))/T;
fprintf(T = %.3gs, F = %.3gHz\n',T, 1/T);
fprintf('Jabs = %.3gs, Jrel = %.2g%%\n\ J, 100*J/T);
fprintf('max dT = %.2g%%\n\ 100*maxdT);
fprintf('periods = %d, nfft = %d\n\ length(periods), nfft);
% compute the cumulative phase of each transition
phases=2*pi*cumsum(periods)/T;
% compute power spectral density of phase
[Sphi,f]=psd(phases,nfftl1/T,winLength,overlap,'linear>);

XIV.

EXAMPLE

These ideas were applied to model and simulate a PLL acting


as a frequency synthesizer. A synthesizer was chosen with/ ref
= 25 MHz,/ 0 U t = 2 GHz, and a channel spacing of 200 kHz.
As such, M = 125 and N = 10,000.
The noise of OSC is -95 dBc/Hz at 100 kHz. Applying (74)
to compute a, where HAf) = 316 x 10"12, A/ = 100 kHz, and
fo = 25 MHz, gives a = 10"14. The period jitter J is then computed from (70), giving J = 20 ps.
The noise of VCO is -48 dBc/Hz at 100 kHz. Applying (74)
and (70) with (4/*) = 1.59 x 10"5, A/ = 100 kHz, and/ 0 =
2 GHz, gives a = 7.9 x 10~14 and an period jitter of J = 6.3 ps.
The period jitter of the PFD/CP and FDs was found to be
2 ns. The FDs were included into the oscillators, which suppresses the high frequency signals at the input and output of
the synthesizer. The netlist is shown in Listing 18. The results
(compensated for non-unity resolution bandwidth (-28 dB)
and for the suppression of the dividers (80 dB)) are shown in
Figures 17-20. The simulation took 7.5 minutes for 450k
time-points on a HP 9000/735. The use of a large number of
time points was motivated by the desire to reduce the level of
uncertainty in the results. The period jitter in the PLL was
found to be 9.8 ps at the output of the VCO.
Listing 18 Spectre netlist for PLL synthesizer.

% correct for scaling in PSD due to FFT and window


Sphi=winNBW*Sphi/nffi;

//PLL-based frequency synthesizer that models jitter


simulator lang=spectre

% plot the results (except at DC)


K = length(f);
semi!ogx(f(2:K),10*log10(Sphi(2:K)));
title('Power Spectral Density of VCO Phase');
xlabel('Frequency (Hz)');
ylabel('S phi (dB/Hz)');
rbw = winNBW/(T*nfft);
RBW=sprintf('Resolution Bandwidth = %.0f Hz (%.0f dB)\
rbw, 10*log10(rbw));
imtext(0.5,0.07, RBW);

ahdijnclude "osc.va"
//Listing 14
ahdLJnclude "pfd_cp.va" //Listing 15
ahdl_include "vco.va"
//Listing 16
Osc

freq=25MHz ratio=125\
accJitter=20ps syncJitter=2ns
PFD (err in fb) pfd__cp
lout=500ua
C1
(errc)
capacitor c=3.125nF
R
(c 0)
resistor
r=10k
C2
(c 0)
capacitor c=625pF
VCO (fb err)
vco
Fmin=1 GHz Fmax=3GHz \
Vmin=-4 Vmax=4 ratio=10000 \
jitter=6ps outStart=10ms

putes Sty(Af) with a resolution bandwidth of rbw.^ Normally,


S$(&f) is given with a unity resolution bandwidth. To compensate for a non-unity resolution bandwidth, broadband signals such as the noise should be divided by rbw. Signals with
bandwidth less than rbw, such as the spurs generated by leakage in the CP, should not be scaled. The script processes the
output of VCO/FD^. The results of the script must be further
processed using the equations in Table II to remove the effect
ofFDtf.

t The Hanning window used in the psd() function has a resolution


bandwidth of 1.5 bins [29]. Assuming broadband signals, Matlab
divides by 1.5 inside psd() to compensate. In order to resolve narrowband signals, the factor of 1.5 is removed by the script, and instead
included in the reported resolution bandwidth.

(in)

JitterSim

Osc&
+ 125

osc

tran

in

stop=60ms

PFD & CP
fb

err

VCO&
+ 10,000

r I

The low-pass filter LF blocks all high frequency signals from


reaching the VCO, so the noise of the phase lock loop at high
frequencies is the same as the noise generated by the openloop VCO alone. At low frequencies, the loop gain acts to stabilize the phase of the VCO, and the noise of the PLL is dom-

67

-10

VCO-OL

-20

-10

^-30

5 -40

I-

OL

S-50

^
t o- 3 0

*-

FD/CP,FD-OL

OSC-Ol>
-50

-80

300 Hz

-40

CL

-70

PLL-CL

1kHz 3 kHz

10 kHz 30 kHz

Fig. 17. Noise of the closed-loop PLL at the output of the VCO
when only the reference oscillator exhibits jitter (CL) versus the
noise of the reference oscillator mapped up to the VCO frequency
when operated open loop (OL).

1 kHz

3 kHz

10 kHz

30 kHz

100 kHz

Fig. 20. Closed-loop PLL noise performance compared to the openloop noise performance of the individual components that make up
the PLL. The achieved noise is slightly larger than what is expected
from the components due to peaking in the response of the PLL.

the loop. In this example, noise at the middle frequencies is


dominated by the synchronous jitter generated by the PFD/
CO and FDs. The measured results agree qualitatively with
the expected results. The predicted noise is higher than one
would expect solely from the open-loop behavior of each
block because of peaking in the response of the PLL from 5
kHz to 50 kHz. For this reason, PLLs used in synthesizers
where jitter is important are usually overdamped.

0
OL

-10

300 Hz

100 kHz

m-20

-30
XV.

CL
-40

300 Hz

1kHz 3 kHz

10 kHz 30 kHz

100 kHz

Fig. 18. Noise of the closed-loop PLL at the output of the VCO
when only the VCO exhibits jitter (CL) versus the noise of the VCO
when operated open loop (OL).

-25
-30
^.-35
N

5 -40
CO

OL

X
\

CL

3-45

-55

VV

A methodology for modeling and simulating the phase noise


and jitter performance of phase-locked loops was presented.
The simulation is done at the behavioral level, and so is efficient enough to be applied in a wide variety of applications.
The behavioral models are calibrated from circuit-level noise
simulations, and so the high-level simulations are accurate.
Behavioral models were presented in the Verilog-A language,
however these same ideas can be used to develop behavioral
models in purely event-driven languages such as VerilogHDL and Verilog-AMS. This methodology is flexible enough
to be used in a broad range of applications where phase noise
and jitter is important.
REFERENCES

[1] Ken Kundert. "Introduction to RF simulation and its application." Journal ofSolid-State Circuits, vol. 34, no. 9,
September 1999.

-60
1kHz

10 kHz

CONCLUSION

[2] Cadence Design Systems. "SpectreRF simulation option." www.cadence.com/datasheets/spectrerf.html.


[3] F. Gardner. Phaselock Techniques. John Wiley & Sons,
1979.

100kHz

Fig. 19. Noise of the closed-loop PLL at the output of the VCO
when only the PFD/CP, FDM, and FD^ exhibit jitter (CL) versus the
noise of these components mapped up to the VCOfrequencywhen
operated open loop (OL).

[4] D. Yee, C. Doan, D. Sobel, B. Limketkai, S. Alalusi, and


R. Brodersen. "A 2-GHz low-power single-chip CMOS
receiver for WCDMA applications." Proceedings of the
European Solid-State Circuits Conference, Sept. 2000.

inated by the phase noise of the OSC. There is some


contribution from the VCO, but it is diminished by the gain of

68

[5] A. Demir, E. Liu, A. Sangiovanni-Vincentelli, and I.


Vassiliou. "Behavioral simulation techniques for phase/
delay-locked systems." Proceedings of the IEEE Custom
Integrated Circuits Conference, pp. 453-456, May 1994.
[6] A. Demir, E. Liu, and A. Sangiovanni-Vincentelli.
"Time-domain non-Monte-Carlo noise simulation for
nonlinear dynamic circuits with arbitrary excitations."
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 15, no. 5, pp. 493-505,
May 1996.
[7] A. Demir, A. Sangiovanni-Vincentelli. "Simulation and
modeling of phase noise in open-loop oscillators." Proceedings of the IEEE Custom Integrated Circuits Conference, pp. 445-456, May 1996.
[8] A. Demir, A. Sangiovanni-Vincentelli. Analysis and
Simulation of Noise in Nonlinear Electronic Circuits and
Systems. Kluwer Academic Publishers, 1997.
[9] Ken Kundert. "Modeling and simulation of jitter in
phase-locked loops." In Analog Circuit Design: RF Analog-to-Digital Converters; Sensor and Actuator Interfaces; Low-Noise Oscillators, PLLs and Synthesizers,
Rudy J. van de Plassche, Johan H. Huijsing, Willy M.C.
Sansen, Kluwer Academic Publishers, November 1997.
[10] Ken Kundert. "Modeling and simulation of jitter in PLL
frequency synthesizers." Available from www.designers-guide.com.
[11] Kenneth S. Kundert. The Designer's Guide to SPICE and
Spectre. Kluwer Academic Publishers, 1995.
[12] Verilog-A Language Reference Manual: Analog Extensions to Verilog-HDL, version 1.0. Open Verilog International, 1996. Available from
www.eda.org/verilogams.
[13] Ulrich L. Rohde. Digital PLL Frequency Synthesizers.
Prentice-Hall, Inc., 1983.
[14] Paul R. Gray and Robert G. Meyer. Analysis and Design
of Analog Integrated Circuits. John Wiley & Sons, 1992.
[15] A. Demir, A. Mehrotra, and J. Roychowdhury. "Phase
noise in oscillators: a unifying theory and numerical
methods for characterization." IEEE Transactions on
Circuits and Systems I: Fundamental Theory and Applications, vol. 47, no. 5, May 2000, pp. 655 -674.
[16] F. Kaertner. "Determination of the correlation spectrum
of oscillators with low noise." IEEE Transactions on Microwave Theory and Techniques, vol. 37, no. 1, pp. 90101, Jan. 1989.

[17] F. X. Kaertner. "Analysis of white and/^01 noise in oscillators." International Journal of Circuit Theory and Applications, vol. 18, pp. 485-519, 1990.
[18] G. Vendelin, A. Pavio, U. Rohde. Microwave Circuit
Design. J. Wiley & Sons, 1990.
[19] W. Gardner. Introduction to Random Processes: With
Applications to Signals and Systems. McGraw-Hill,
1989.
[20] Joel Phillips and Ken Kundert. "Noise in mixers, oscillators, samplers, and logic: an introduction to cyclostationary noise." Proceedings of the IEEE Custom Integrated
Circuits Conference, CICC 2000. The paper and presentation are both available from
www.designersguide, com.
[21] T. A. D. Riley, M. A. Copeland, and T. A. Kwasniewski.
"Delta-sigma modulation in fractional-TV frequency synthesis." IEEE Journal of Solid-State Circuits, vol. 28 no.
5, May 1993, pp. 553 -559
[22] Frank Herzel and Behzad Razavi. "A study of oscillator
jitter due to supply and substrate noise." IEEE Transactions on Circuits and Systems - //; Analog and Digital
Signal Processing, vol. 46. no. 1, Jan. 1999, pp. 56-62.
[23] T. C. Weigandt, B. Kim, and P. R. Gray. "Jitter in ring
oscillators." 1994 IEEE International Symposium on
Circuits and Systems (ISCAS-94), vol. 4, 1994, pp. 2730.
[24] A. Papoulis. Probability, Random Variables, and Stochastic Processes. McGraw-Hill, 1991.
[25] J. J. Rael and A. A. Abidi. "Physical processes of phase
noise in differential LC oscillators." Proceedings of the
IEEE Custom Integrated Circuits Conference, CICC
2000.
[26] J. McNeill. "Jitter in Ring Oscillators." IEEE Journal of
Solid-State Circuits, vol. 32, no. 6, June 1997.
[27] H. Chang, E. Charbon, U. Choudhury, A. Demir, E. Felt,
E. Liu, E. Malavasi, A. Sangiovanni-Vincentelli, and I.
Vassiliou. A Top-Down Constraint-Driven Methodology
for Analog Integrated Circuits. Kluwer Academic Publishers, 1997.
[28] A. Oppenheim, R. Schafer. Digital Signal Processing.
Prentice-Hall, 1975.
[29] F. Harris. "On the use of windows for harmonic analysis
with the discrete Fourier transform." Proceedings of the
IEEE, vol. 66, no. 1, January 1978.

69

331

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 31, NO. 3, MARCH 1996

A Study of Phase Noise in CMOS Oscillators


Behzad Razavi, Member, IEEE

Abstract-This paper presents a study of phase noise in two


inductorless CMOS oscillators. First-order analysis of a linear
oscillatory system leads to a noise shaping function and a new
definition of Q. A linear model of CMOS ring oscillators is used
to calculate their phase noise, and three phase noise phenomena,
namely, additive noise, high-frequency multiplicative noise, and
low-frequency multiplicativenoise, are identified and formulated.
Based on the same concepts, a CMOS relaxation oscillator is also
analyzed. Issues and techniques related to simulation of noise in
the time domain are described,and two prototypesfabricated in a
0.5-pm CMOS technology are used to investigate the accuracy of
the theoretical predictions. Compared with the measured results,
the calculated phase noise values of a 2-GHz ring oscillator and
a 900-MHz relaxation oscillator at 5 MHz offset have an error
of approximately 4 dB.

models, the analytical approach can predict the phase noise


with approximately 4 to 6 dB of error.
The next section of this paper describes the effect of
phase noise in wireless communications. In Section 111, the
concept of Q is investigated and in Section IV it is generalized
through the analysis of a feedback oscillatory system. The
resulting equations are then used in Section V to formulate
the phase noise of ring oscillators with the aid of a linearized
model. In Section VI, nonlinear effects are considered and
three mechanisms of noise generation are described, and in
Section VII, a CMOS relaxation oscillator is analyzed. In
Section VIII, simulation issues and techniques are presented,
and in Section IX the experimental results measured on the
two prototypes are summarized.

I. INTRODUCTION

OLTAGE-CONTROLLED oscillators (VCOs) are an


integral part of phase-locked loops, clock recovery circuits, and frequency synthesizers. Random fluctuations in the
output frequency of VCOs, expressed in terms of jitter and
phase noise, have a direct impact on the timing accuracy
where phase alignment is required and on the signal-to-noise
ratio where frequency translation is performed. In particular,
RF oscillators employed in wireless tranceivers must meet
stringent phase noise requirements, typically mandating the use
of passive LC tanks with a high quality factor (Q). However,
the trend toward large-scale integration and low cost makes it
desirable to implement oscillators monolithically. The paucity
of literature on noise in such oscillators together with a lack of
experimental verification of underlying theories has motivated
this work.
This paper provides a study of phase noise in two inductorless CMOS VCOs. Following a first-order analysis of a
linear oscillatory system and introducing a new definition of
Q, we employ a linearized model of ring oscillators to obtain
an estimate of their noise behavior. We also describe the
limitations of the model, identify three mechanisms leading
to phase noise, and use the same concepts to analyze a CMOS
relaxation oscillator. In contrast to previous studies where
time-domain jitter has been investigated [l], [2], our analysis
is performed in the frequency domain to directly determine the
phase noise. Experimental results obtained from a 2-GHz ring
oscillator and a 900-MHz relaxation oscillator indicate that,
despite many simplifying approximations, lack of accurate
MOS models for RF operation, and the use of simple noise
Manuscript received October 30, 1995; revised December 17, 1995.
The author was with AT&T Bell Laboratories, Holmdel, NJ 07733 USA.
He is now with Hewlett-Packard Laboratories, Palo Alto, CA 94304 USA.
Publisher Item Identifier S 0018-9200(96)02456-0.

11. PHASE NOISEIN WIRELESS COMMUNICATIONS


Phase noise is usually characterized in the frequency domain. For an ideal oscillator operating at W O , the spectrum
assumes the shape of an impulse, whereas for an actual
oscillator, the spectrum exhibits skirts around the center
or carrier frequency (Fig. 1). To quantify phase noise, we
consider a unit bandwidth at an offset Aw with respect to W O ,
calculate the noise power in this bandwidth, and divide the
result by the carrier power.
To understand the importance of phase noise in wireless communications, consider a generic transceiver as
depicted in Fig. 2, where the receiver consists of a lownoise amplifier, a band-pass filter, and a downconversion
mixer, and the transmitter comprises an upconversion
mixer, a band-pass filter, and a power amplifier. The
local oscillator (LO) providing the carrier signal for both
mixers is embedded in a frequency synthesizer. If the
LO output contains phase noise, both the downconverted
and upconverted signals are corrupted. This is illustrated
in Fig. 3(a) and (b) for the receive and transmit paths,
respectively.
Referring to Fig. 3(a), we note that in the ideal case, the
signal band of interest is convolved with an impulse and thus
translated to a lower (and a higher) frequency with no change
in its shape. In reality, however, the wanted signal may be
accompanied by a large interferer in an adjacent channel, and
the local oscillator exhibits finite phase noise. When the two
signals are mixed with the LO output, the downconverted band
consists of two overlapping spectra, with the wanted signal
suffering from significant noise due to tail of the interferer.
This effect is called reciprocal mixing.
Shown in Fig. 3(b), the effect of phase noise on the transmit
path is slightly different. Suppose a noiseless receiver is to

0018-9200/96$05.00 0 1996 IEEE

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 31, NO. 3, MARCH 1996

332

Aiw

Fig. 1. Phase noise in an oscillator.

Low-Noise
Amplifier
c

Band-Pass
Filter

Frequency

Synthesizer

Amplifier

Band-Pass

Fig. 2. Generic wireless transceiver.

detect a weak signal at w2 while a powerful, nearby tranmitter


generates a signal at w1 with substantial phase noise. Then,
the wanted signal is corrupted by the phase noise tail of the
transmitter.
The important point here is that the difference between w1
and w2 can be as small as a few tens of kilohertz while each of
these frequencies is around 900 MHz or 1.9 GHz. Therefore,
the output spectrum of the LO must be extremely sharp. In
the North American Digital Cellular (NADC) IS54 system,
the phase noise power per unit bandwidth must be about 115
dB below the carrier power (i.e., - I15 dBc/Hz) at an offset
of 60 kHz.
Such stringent requirements can be met through the use of
LC oscillators. Fig. 4 shows an example where a transconductance amplifier (G,) with positive feedback establishes a
negative resistance to cancel the loss in the tank and a varactor
diode provides frequency tuning capability. This circuit has a
number of drawbacks for monolithic implementation. First,
both the control and the output signals are single-ended,

(b)
Fig 3. Effect of phase noise on (a) receive and (b) transmt paths.

making the circuit sensitive to supply and substrate noise.


Second, the required inductor (and varactor) Q is typically
greater than 20, prohibiting the use of low-Q integrated
inductors. Third, monolithic varactors also suffer from large
series resistance and hence a low Q. Fourth, since the LO
signal inevitably appears on bond wires connecting to (or
operating as) the inductor, there may be significant coupling
of this signal to the front end (LO leakage), an undesirable
effect especially in homodyne architectures [ 3 ] .
Ring oscillators, on the other hand, require no external
components and can be realized in fully differential form, but

333

RAZAVI: A STUDY OF PHASE NOISE IN CMOS OSCILLATORS

cc

llI

Freq.
Control

-L
Fig. 4.

(2)

LC oscillator,

= 2~

Energy
Energy Dissipated per Cycle

their phase noise tends to be high because they lack passive


resonant elements.

111. DEFINITIONS
OF Q

The quality factor, Q, is usually defined within the context


of second-order systems with (damped) oscillatory behavior.
Illustrated in Fig. 5 are three common definitions of Q. For an
RLC circuit, Q is defined as the ratio of the center frequency
and the two-sided -3-dB bandwidth. However, if the inductor
is removed, this definition cannot be applied. A more general
definition is: 27r times the ratio of the stored energy and the
dissipated energy per cycle, and can be measured by applying a
step input and observing the decay of oscillations at the output.
Again, if the circuit has no oscillatory behavior (e.g., contains
no inductors), it is difficult to define the energy dissipated
per cycle. In a third definition, an LC oscillator is considered
as a feedback system and the phase of the open-loop transfer
function is examined at resonance. For a simple LC circuit
such as that in Fig. 4, it can be easily shown that the Q of
the tank is equal to 0 . 5 ~ 0d@/dw, where W O is the resonance
frequency and d@/dw denotes the slope of the phase of the
transfer function with respect to frequency. Called the openloop & herein, this definition has an interesting interpretation
if we recall that for steady oscillations, the total phase shift
around the loop must be precisely 360. Now, suppose the
oscillation frequency slightly deviates from W O . Then, if the
phase slope is large, a significant change in the phase shift
arises, violating the condition of oscillation and forcing the
frequency to return to W O . In other words, the open-loop Q
is a measure of how much the closed-loop system opposes
variations in the frequency of oscillation. This concept proves
useful in our subsequent analyses.
While the third definition of Q seems particularlly wellsuited to oscillators, it does fail in certain cases. As an
example, consider the two-integrator oscillator of Fig. 6 , where
the open-loop transfer function is simply

-(?)
2

H(s)=

(1)

yielding CP = L H ( s = j w ) = 0, and Q = 0. Since this circuit


does indeed oscillate, this definition of Q is not useful here.

(3) Q=--00 dQ
2 do
Fig. 5. Common definitions of

&.

Fig. 6. Two-integrator oscillator.

Fig. 7. Linear oscillatory system.

IV. LINEAROSCILLATORY
SYSTEM
Oscillator circuits in general entail compressive nonlinearity, fundamentally because the oscillation amplitude is not
defined in a linear system. When a circuit begins to oscillate,
the amplitude continues to grow until it is limited by some
other mechanism. In typical configurations, the open-loop gain
of the circuit drops at sufficiently large signal swings, thereby
preventing further growth of the amplitude.
In this paper, we begin the analysis with a linear model. This
approach is justified as follows. Suppose an oscillator employs
strong automatic level control (ALC) such that its oscillation
amplitude remains small, making the linear approximation

LEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 31, NO. 3, MARCH 1996

334

Fig. 8. Noise shaping in oscillators.

valid. Since the ALC can be relatively slow, the circuit


parameters can be considered time-invariant for a large number
of cycles. Now, let us gradually weaken the effect of AJX
so that the oscillator experiences increasingly more selflimiting. Intuitively, we expect that the linear model yields
reasonable accuracy for soft amplitude limiting and becomes
gradually less accurate as the ALC is removed. Thus, the
choice of this model depends on the error that it entails
in predicting the response of the actual oscillator to various
sources of noise, an issue that can be checked by simulation
(Section VIII). While adequate for the cases considered here,
this approximation must be carefully examined for other types
of oscillators.
To analyze phase noise, we treat an oscillator as a feedback
system and consider each noise source as an input (Fig. 7).
The phase noise observed at the output is a function of: 1)
sources of noise in the circuit and 2) how much the feedback
system rejects (or amplifies) various noise components. The
system oscillates at w = W O if the transfer function

goes to infinity at this frequency, i.e., if H ( j w 0 ) = -1. For


frequencies close to the carrier, w = W O A w , the open-loop
transfer function can be approximated as

and the noise tranfer function is

Since H ( j w 0 ) = -1 and for most practical cases


<< 1, (4) reduces to

spectral density is shaped by

This is illustrated in Fig. 8. As we will see later, (6) assumes


a simple form for ring oscillators.
To gain more insight, let H ( j w ) = A ( w ) exp[j@(w)],and
hence
(7)

Since for w

M WO,

= 1, ( 6 ) can be written as

We define the open-loop Q as

Combining (8) and (9) yields

a familiar form previously derived for simple LC oscillators


[4]. It i s interesting to note that in an LC tank at resonance,
d A / & = 0 and (9) reduces to the third definition of Q given
in Section III. In the two-integrator oscillator, on the other
hand, d A / d w = 2/wo,d@./dw = 0 , and Q = 1. Thus, the
proposed definition of Q applies to most cases of interest.
To complete the discussion, we also consider the case
shown in Fig. 9, where H l ( j w ) H ~ ( ~=
wH
) ( j w ) . Therefore,
Y ( j w ) / X ( j w )is given by (5). For, Y ~ ( j w ) / X ( j w we
) , have

l a w dH/dwl

,[j(wo

+ Aw)] M

-1
.
aw-dH
dw

(5)

This equation indicates that a noise component at w = wo+Aw


is multiplied by - ( A w d H / d w ) - l when it appears at the
output of the oscillator. In other words, the noise power

giving the following noise shaping function:

335

RAZAVI: A STUDY OF PHASE NOISE IN CMOS OSCILLATORS

y,(im)

Y(p)
Fig. 9. Oscillatory system with nonunity-gain feedback.

Fig. 11. Linearized model of CMOS VCO.

open-loop transfer function is thus given by


-8

H(jw)=

(a)

vDD

(13)

(1 + j f i-5~)~

Freq. .........:.I ......................


i..;
,.
....................... I I
0...........i..........................
i ..........................
i

Therefore, JdA/dwl = 9/(4wO) and Id@/dwl = 3&/(4w0).


It follows from (6) or (10) that if a noise current Inl is
injected onto node 1 in the oscillator of Fig. 11, then its power
spectrum is shaped by

This equation is the key to predicting various phase noise


components in the ring oscillator.
AND MULTIPLICATIVE
NOISE
VI. ADDITIVE

Control

(b)
Fig. 10. CMOS VCO: (a) block diagram and (b) implementation of one
stage.

V. CMOS RING OSCILLATOR


Submicron CMOS technologies have demonstrated potential
for high-speed phase-locked systems [ 5 ] ,raising the possibility
of designing fully integrated RF CMOS frequency synthesizers. Fig. 10 shows a three-stage ring oscillator wherein both
the signal path and the control path are differential to achieve
high common-mode rejection.
To calculate the phase noise, we model the signal path in
the VCO with a linearized (single-ended) circuit (Fig. 11). As
mentioned in Section IV, the linear approximation allows a
first-order analysis of the topologies considered in this paper,
but its accuracy must be checked if other oscillators are of
interest. In Fig. 11, R and C represent the output resistance
and the load capacitance of each stage, respectively, ( R M
l / g m 3 = l / g m 4 ) , and G,R is the gain required for steady
oscillations. The noise of each differential pair and its load
devices are modeled as current sources Inl-In3, injected onto
nodes 1-3, respectively. Before calculating the noise transfer
function, we note that the circuit of Fig. 11 oscillates if, at
W O , each stage has unity voltage gain and 120 of phase shift.
Writing the open-loop transfer function and imposing these
two conditions, we have W O = & / ( R C ) and G,R = 2. The

Modeling the ring oscillator of Fig. 10 with the linearized


circuit of Fig. 11 entails a number of issues. First, while the
stages in Fig. 10 tum off for part of the period, the linearized
model exhibits no such behavior, presenting constant values
for the components in Fig. 11. Second, the model does not
predict mixing or modulation effects that result from nonlinearities. Third, the noise of the devices in the signal path has a
cyclostationary behavior, i.e., periodically varying statistics,
because the bias conditions are periodic functions of time. In
this section, we address these issues, first identifying three
types of noise: additive, high-frequency multiplicative, and
low-frequency multiplicative.
A, Additive Noise
Additive noise consists of components that are directly
added to the output as shown in Fig. 7 and formulated by
(6) and (14).
To calculate the additive phase noise in Fig. 10 with the aid
of (14), we note that for w M wo the voltage gain in each stage
is close to unity. (Simulations of the actual CMOS oscillator
indicate that for W O = 27r x 970 MHz and noise injected at
w - W O = 27r x 10 MHz onto one node, the components
observed at the three nodes differ in magnitude by less than
0.1 dB.) Therefore, the total output phase noise power density
due to In1-Jns is

where it is assumed I& = I:2 = I& = I:. For the differential


stage of Fig. 10, the thermal noise current per unit bandwidth

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL 31, NO 3, MARCH 1996

336

Fig. 12. High-frequency multiplicative noise.

is equal to

E = 8kT(gml + gm3)/3 M

8 k T I R . Thus,

In this derivation, the thermal drain noise current of MOS


devices is assumed equal to
= 41;T(2gm/3). For shortchannel devices, however. the noise may be higher [6]. Using
a charge-based model in our simulation tool, we estimate the
factor to be 0.873 rather than 2/3. In reality, hot-electron
effects further raise this value.
Additive phase noise is predicted by the linearized model
with high accuracy if the stages in the ring operate linearly for
most of the period. In a three-stage CMOS oscillator designed
for the RF range, the differential stages are in the linear region
for about 90% of the period. Therefore, the linearized model
emulates the CMOS oscillator with reasonable accuracy. However, as the number of stages increases or if each stage entails
more nonlinearity, the error in the linear approximation may
increase.
Since additive noise is shaped according to (16), its effect is
significant only for components close to the carrier frequency.

- Fig. 13. Frequency modulation due to tail current noise

oscilPatory system. Simulations indicate that for the oscillator topologies considered here, these two components have
approximately equal magnitudes. Thus, the nonlinearity folds
all the noise components below W O to the region above and
vice versa, effectively doubling the noise power predicted by
(6). Such components are significant if they are close to W O
and are herein called high-frequency multiplicative noise. This
phenomenon is illustrated in Fig. 12. (Note that a component
at 3w0 A w is also translated to W O A w , but its magnitude
is negligible.)
This effect can also be viewed as sampling of the noise
by the differential pairs, especially if each stage experiences hard switching. As each differential pair switches twice
in every period, a noise component at w, is translated to
2w0 f wn. Note that for highly nonlinear stages, the Taylor
expansion considered above may need to include higher order
terms.

B. High-Frequency Multiplicative Noise

The nonlinearity in the differential stages of Fig. 10, especially as they turn off, causes noise components to be
multiplied by the carrier (and by each other). If the input/output
characteristic of each stage is expressed as VoUt= all,$n
CQV,; Q~V,:, then for an input consisting of the carrier and a
noise component, e.g., K n ( t ) = A0 cos wot A, cosw,t, the
output exhibits the following important terms:

wn)t
~&~~c
( tl)i , ~
cos(wo
, ~:2w,)t
votl(t) fx Q24OAn COS(W0 f

C. Low-Frequency Multiplicative Noise

Since the frequency of oscillation in Fig. 10 is a function


of
the tail current in each differential pair, noise components
vout3(t)a 3 ~ ;cos(2w0
~ ,
- Wn)t.
in this current modulate the frequency, thereby contributing
Note that Voutl(t) appears in band if w, is small, i.e., if phase noise [classical frequency modulation (FM)]. Depicted
it is a low-frequency component, but in a fully differential in Fig. 13, this effect can be significant because, in CMOS
configuration, Voutl(t) = 0 because a2 = 0. Also, Vouta(t)oscillators, W O must be adjustable by more than &20% to
is negligible because A, << Ao, leaving Vout3(t)
as the only compensate for process variations, thus making the frequency
significant cross-product.
quite sensitive to noise in the tail current. This mechanism is
This simplified one-stage analysis predicts the frequency of illustrated in Fig. 14.
the components in response to injected noise, but not their
To quantify this phenomenon, we find the sensitivity or
magnitude. When noise is injected into the oscillator, the gain of the VCO, defined as HVCO= dwOut/dIss in
magnitude of the observed response at w, and 2w0 - w,
Fig. 13, and use a simple approximation. If the noise per unit
depends on the noise shaping properties of the feedback bandwidth in ISS is represented as a sinusoid with the same

RAZAVI: A STUDY OF PHASE NOISE IN CMOS OSCILLATORS

tit

e
.
.

00

..J
*-

00

M1

+ 1,s

then the output signal of the oscillator

(a)

R
4
L
k

+ ss
(b)

Fig. 15. Gain stage with (a) stationary and (b) cyclostationary noise.

wot + Kvco

For KvcoI,/w,

-4

*e*.

Fig. 14. Low-frequency multiplicative noise.

power: I, cos w,t,


can be written as

:qn
331

I , cos w,t

dt)

(17)

<< 1 radian (narrowband FM)

. [ C O S ( w o + b,)t

- COS(WO -

w,)t].

(19)

Thus, the ratio of each sideband amplitude to the carrier


amplitude is equal to ImKvc0/(2w,), i.e.,
IV,12(with respect to carrier) =

(20)

Since KVCOcan be easily evaluated in simulation or measurement, (20) is readily calculated.


It is seen that modulation of the carrier brings the low
frequency noise components of the tail current to the band
around WO.Thus, flicker noise in I, becomes particularly
important.
In the differential stage of Fig. 3(b), two sources of lowfrequency multiplicative noise can be identified: noise in Iss
and noise in Ms and Me. For comparable device size, these
two sources are of the same order and must be both taken
into account.
D. Cyclostationary Noise Sources

As mentioned previously, the devices in the signal path


exhibit cyclostationary noise behavior, requiring the use of periodically varying noise statistics in analysis and simulations.
To check the accuracy of the stationary noise approximation,
we perform a simple, first-order simulation on the two cases
depicted in Fig. 15. In Fig. 15(a), a sinusoidal current source
with an amplitude of 2 nA is connected between the drain and
source of M I to represent its noise with the assumption that
M I carries half of I S S .In Fig. 15(b), the current source is also
a sinusoid, but its amplitude is a function of the drain current
of M I . Since MOS thermal noise current (in the saturation
we use a nonlinear dependent
region) is proportional to 6,
source in SPICE [7] as In(t) = a q m s i n w , t , where
w, = 27r x 980 MHz. The factor Q is chosen such that
I,(t) = 2 nA x sinw,t when V,(t) = 1 x Iss/2 (balanced

Fig. 16. Addition of output voltages of N oscillators.

condition). Simulations indicate that the sideband magnitudes


in the two cases differ by less than 0.5 dB.
It is important to note that this result may not be accurate
for other types of oscillators.
E. Power-Noise Trade-off

As with other analog circuits, oscillators exhibit a tradeoff between power dissipation and noise. Intuitively, we note
that if the output voltages of N identical oscillators are added
in phase (Fig. 16), then the total carrier power is multiplied
by N2, whereas the noise power increases by N (assuming
noise sources of different oscillators are uncorrelated). Thus,
the phase noise (relative to the carrier) decreases by a factor
N at the cost of a proportional increase in power dissipation.
Using the equations developed above, we can also formulate
this trade-off. For example, from (16), since G,R M 2, we
have

To reduce the total noise power by N , G, must increase by the


same factor. For any active device, this can be accomplished
by increasing the width and the bias current by N . (To maintain
the same frequency of oscillation, the load resistor is reduced
by N . ) Therefore, for a constant supply voltage, the power
dissipation scales up by N .

IEEE JOURNAC OF SOLID-STATE CIRCUITS, VOL. 31, NO. 3, MARCH 1996

338

COMPARISON OF

TABLE I
THREE-STAGE
AND FOUR-STAGE
RINGOSCILLATORS
3-Stage VCO

&

&Stage VCO

Minimum Required DC Gain

Noise Shaping Function

Open-Loop Q

?! 4

(e 1.3)

Jz (M 1.4)

Ml M2

Total Additive Noise


Power Dissipation

1.8 mW

3.6 mW

Fig. 17. Substrate and supply noise in gain stage.

VII. CMOS RELAXATIONOSCILLATOR


In this section, we apply the analysis methodology described
thus far to a CMOS relaxation oscillator [Fig. 18(a)]. When
The choice of number of stages in a ring oscillator to
designed to operate at 900 MHz, this circuit hardly relaxes
minimize the phase noise has often been disputed. With
and the signals at the drain and source of MI and M2 are close
the above formulations, it is possible to compare rings with
to sinusoids. Thus, the linear model of Fig. 7 is a plausible
different number of stages (so long as the approximations
choice. To utilize our previous results, we assume the signals at
remain valid). For the cases of interest in RF applications,
the sources of M I and M2 are fully differentiall and redraw the
we consider three-stage and four-stage oscillators designed to
circuit as in Fig. 18(b), identifying it as a two-stage ring with
operate at the same frequency. Thus, the four-stage oscillator
capacitive degeneration (CA = 2C). The total capacitance
incorporates smaller impedance levels and dissipates more
seen at the drain of M I and MZ is modeled with C1 and C2,
power. Table 1 compares various aspects of the two circuits.
respectively. (This is also an approximation because the input
We make three important observations. 1) Simulations show
impedance of each stage is not purely capacitive.) It can be
that if the four-stage oscillator is to operate at the same speed
easily shown that the open-loop transfer function is
as the three-stage VCO, the value of R in the former must
be approximately 60% of that in the latter. 2) The Qs of
the two VCOs (10) are roughly equal. 3) The total additive
thermal noise of the two VCOs is about the same, because
where C1 = C2 = CD and gm denotes the transconductance of
the four-stage topology has more sources of noise, but with
each transistor. For the circuit to oscillate at W O , H ( j w 0 ) = 1,
lower magnitudes.
and each stage must have a phase shift of 180, with 90
From these rough calculations, we draw two conclusions.
contributed by each zero and the remaining 90 by the two
First, the phase noise depends on not only the Q, but the
poles at -gm/CA and -1/(RCD). It follows from the second
number and magnitude of sources of noise in the circuit.
condition that
Second, four-stage VCOs have no significant advantage
over three-stage VCOs, except for providing quadrature
outputs.
i.e., W O is the geometric mean of the poles at the drain and
source of each transistor. Combining this result with the first
condition, we obtain
G. Supply and Substrate Noise

F. Three-Stage Versus Four-Stage Oscillators

Even though the gain stage of Fig. 10 is designed as a differential circuit, it nonetheless suffers from some sensitivity to
supply and substrate noise (Fig. 17). Two phenomena account
for this. First, device mismatches degrade the symmetry of the
circuit. Second, the total capacitance at the common source of
the differential pair (i.e., the source junction capacitance of M I
and M2 and the capacitance associated with the tail current
source) converts the supply and substrate noise to current,
thereby modulating the delay of the gain stage. Simulations
indicate that even if the tail current source has a high dc output
impedance, a 1-mV,, supply noise component at 10 MHz
generates sidebands 60 dB below the canier at W O f (27r x 10
MHz).

gmR =

CA

CA - CD

After lengthy calculations, we have

and

This assumption is justified by decomposing C into two series capacitors,


each one of value 2C, and monitoring the midpoint voltage. The commonmode swing at this node is approximatley 18 dB below the differential swings
at the source of M I and M2.

339

RAZAVI: A STUDY OF PHASE NOISE IN CMOS OSCILLATORS

LT2vDD

Ml

-100

1.1

12

t .3

1A

1.5

1.8
GI42

Fig. 19. Simulated oscillator spectrum with injected white noise.

(C)

(d)

Fig. 18. (a) CMOS relaxation oscillator, (b) circuit of (a) redrawn, (c) noise
current of one transistor, and (d) tranfformed noise current.

For C, = O.~CA,Qreaches its maximum value-unity. In


other words, the maximum Q occurs if the (floating) timing
capacitor is equal to the load capacitance. The noise shaping
function is therefore equal to ( ~ ~ / A w ) ~ / 4 .
Since the drain-source noise current of M I and M2 appears
between two internal nodes of the circuit [Fig. 18(c)], the
transformation shown in Fig. 18(d) can be applied to allow
the use of our previous derivations. It can be shown that

and the total additive thermal noise observed at each drain is


10
3
This power must be doubled to account for high-frequency
multiplicative noise.
RESULTS
VIII. SIMULATION

A. Simulation Issues
The time-varying nature of oscillators prohibits the use
of the standard small-signal ac analysis available in SPICE
and other similar programs. Therefore, simulations must be
performed in the time domain. As a first attempt, one may
generate a pseudo-random noise with known distribution,
introduce it into the circuit as a SPICE piecewise linear
waveform, run a transient analysis for a relatively large number
of oscillation periods, write the output as a series of points
equally spaced in time, and compute the fast Fourier transform
(FFT)of the output. The result of one such attempt is shown in
Fig. 19. It is important to note that 1) many coherent sidebands

appear in the spectrum even though the injected noise is white,


and 2) the magnitude of the sidebands does not directly scale
with the magnitude of the injected noise!
To understand the cause of this behavior, consider a much
simpler case, illustrated in Fig. 20. In Fig. 20(a), a sinusoid at
1 GHz is applied across a 1-k0 resistor, and a long transient
simulation followed by interpolation and FFT is used to obtain
the depicted spectrum. (The finite width results from the finite
length of the data record and the arches are attributed to
windowing effects.) Now, as shown in Fig. 20(b), we add a
30-MHz squarewave with 2 ns transition time and proceed as
before. Note that the two circuits share only the ground node.
In this case, however, the spectrum of the 1-GHz sinusoid
exhibits coherent sidebands with 15 MHz spacing! Observed in
AT&T s internal simulator (ADVICE), HSPICE, and Cadence
SPICE, this effect is attributed to the additional points that
the program must calculate at each edge of the squarewave,
leading to errors in subsequent interpolation.
Fortunately, this phenomenon does not occur if only sinusoids are used in simulations.

B. Oscillator Simulations
In order to compute the response of oscillators to each
noise source, we approximate the noise per unit bandwidth
at frequency w, with an impulse (a sinusoid) of the same
power at that frequency. As shown in Fig. 21, the sinusoidal
noise is injected at various points in the circuit and the output
spectrum is observed. This approach is justified by the fact that
random Gaussian noise can be expressed as a Fourier series of
sinusoids with random phase [8], [SI.Since only one sinusoid
is injected in each simulation, the interaction among noise
components themselves is assumed negligible, a reasonable
approximation because if two noise components at, say, -60
dB are multiplied, the product is at -120 dB.
In the simulations, the oscillators were designed for a center
frequency of approximately 970 MHz. Each circuit and its
linearized models were simulated in the time domain in steps

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL 31, NO 3, MARCH 1996

340

Fig. 21. Simulated configuration.

The vertical axis represents 10 log qzs.Note that the observed


magnitude of the 980-MHz component differs by less than 0.2
dB in the two cases, indicating that the linearized model is
indeed an accurate representation. As explained in Section VIB, the 960-MHz component originates from third-order mixing
of the carrier and the 980-MHz component and essentially
doubles the phase noise.
In order to investigate the limitation of the linear model, the
oscillator was made progressively more nonlinear. Shown in
Fig. 23 is the output spectra of a four-stage CMOS oscillator,
revealing approximately 1 dB of error in the prediction by the
linear model. The error gradually increases with the number of
stages in the ring and reaches nearly 6 dB for an eight-stage
oscillator.
For bipolar ring oscillators (differential pairs with no emitter
followers), simulations reveal an error of approximately 2 dB
for three stages and 7 dB for four stages in the ring.

IX, EXPERIMENTAL
RESULTS
A. Measurements

9.5

10

10.5

11

x 100 MHZ

(b)

Fig. 20 Simple simulation revealing effect of pulse waveforms, (a) single sinusoidal source and (b) sinusoidal source along with a square wave
generator.

of 30 ps for 8 ps, and the output was processed in MATLAB


to obtain the spectrum. Since simulations of the linear model
yield identical results to the equations derived above, we will
not distinguish between the two hereafter.
Shown in Fig. 22 are the output spectra of the linear model
and actual circuit of a three-stage oscillator in 0.5-pm CMOS
technology with a Z-nA, 980-MHz sinusoidal current injected
into the signal path (the drain of one of the differential pairs).

Two different oscillator configurations have been fabricated


in a 0.5-pm CMOS technology to compare the predictions in
this paper with measured results. Note that there are three sets
of results: theoretical calculations based on linear models but
including multiplicative noise, simulated predictions based on
the actual CMOS oscillators, and measured values.
The first circuit is a 2.2-GHz three-stage ring oscillator.
Fig. 24 shows one stage of the circuit along with the measured
device parameters. The sensitivity of the output frequency to
the tail current of each stage is about 0.43 MHzIpA. The
measured spectrum is depicted in Fig. 25(a) and (b) with
two different horizontal scales. Due to lack of data on the
flicker noise of the process, we consider only thermal noise
at relatively large frequency offsets, namely, 1 MHz and 5
MHz.
It is important to note that low-frequency flicker noise
causes the center of the spectrum to fluctuate constantly. Thus,
as the resolution bandwidth (RBW) of the spectrum analyzer is
reduced [from 1 MHz in Fig. 25(a) to 100 kHz in Fig. 25(b)],
the carrier power is subject to more averaging and appears to
decrease. To maintain consistency with calculations, in which

RAZAVI: A STUDY OF PHASE NOISE IN CMOS OSCILLATORS

341

OL
...,."..........,
. . . . . . . . . . .l'1 ....... ....).
. . . . . . II . ......
!
or .......,.........
I
4
I

....... .....................

.. I . . . . .

..............................................

..: . . . ...... : . . . . . . . . . . .:. . . . . . . .:. . . . . . . . . ;

................................

a.............................
:..... .....................
a
.........................................................

4!

[_ :

:..............:..............

.... . . . . . . . .. ............. .. . . . . . . . .. ............. .. . . . .

-80 ......................................
-80
......................................

-6oc

-1
I

-100
-120
-140
-180
-1 80
9.4

92

..........................

j ....................

............ . . . . . . . . . . . . . . . .

9.8

II

-@+

9.8

10

10.2

10.4
x 100 MHz

I /I I

-100

-100
-120
-120
-140
-140
-160
-180

-180

-180

9.2

9.4

9.6

9.8

10

9.2

9.4

9.6

9.8

10

10

10.2
10.4
x 100 MHz

x 100 MHz

(b)

(b)
Fig. 22. Simulated output spectra of (a) linear model and (b) actual circuit
of a three-stage CMOS oscillator.

the phase noise is normalized to a constant carrier power,


this power (i.e., the output amplitude) is measured using an
oscilloscope.
The noise calculation proceeds as follows. First, find the
additive noise power in (16), and double the result to account
for third-order mixing (high-frequency multiplicative noise).
Next, calculate the low-frequency multiplicative noise from
(20) for one stage and multiply the result by three. We
assume (from simulations) that the internal differential voltage
and the drain noise
swing is equal to 1 V,, (0.353 V,,,)

current of MOSFET's is given by


= 4kT(0.863gm). For
Aw = 2n x 1 MHz, calculations yield

high-frequency multiplicative noise = - 100.1 dBc/Hz (29)


low-frequency multiplicative noise = - 106.3 dBc/Hz (30)
total normalized phase noise = -99.2 dBc/Hz. (31)

Fig. 23. Simulated output spectra of (a) linear model and (b) actual circuit
of a four-stage CMOS oscillator.

gm= 11214 U
Ms-Me: WILE 13.4u/0.5~
g m = 11630 U
M g : WIL = 1 3 . 4 ~ l 0 . 5 ~

I, 790 UA
g m = 11530 U

Fig. 24. Gain stage used in 2-GHz CMOS oscillator.

Simulations of the actual CMOS oscillator predict the total


noise to be -98.1 dBc/Hz. From Fig. 25(b), with the carrier
power of Fig. 25(a), the phase noise is approximately equal
to -94 dBc/Hz.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 31, NO. 3, MARCH 1996

342

Ml-M,:

W/L= 100u10.5~
g=
,, 1/84 Ili

R=275 R
C = 0.6 pF

Iss= 3 mA

Fig . 26. Relaxation oscillator parameters

(b)

F ig. 25. Measured output spectrum of ring oscillator (10 dB/div. vertical
SIsale). (a) 5 MHz/div. horizontal scale and 1 MHz resolution bandwidth, (b)
1 MHz horizontal scale and 100 lcHz resolution bandwidth.

Similarly, fcir A w = 27r x 5 MHz, calculations yield


high-frequency multiplicative noise = - 114.0 dBc/Hz (32)
low-frequency multiplicative noise = -120.2 dBc/Hz (33)
total normalized phase noise = - 113.1 dBc/Hz (34)
amd simulations predict - 112.4 dBc/Hz, while Fig. 25(a)
indicates a phase noise of - 109 dBc/Hz. Note that these values
correspond to a center frequency of 2.2 GHz and should be
lowered by approximately 8 dB for 900 MHz operation, as
shown in (9).
The second circuit is a 920-MHz relaxation oscillator,
depicted in Fig. 26. The measured spectra are shown in
Fig. 27. Since simulations indicate that the low-frequency
multiplicative noise is negligible in this implementation, we
consider only the thermal noise in the signal path. For A w =
27r x 1 MHz, calculations yield a relative phase noise of
-105 dBc/Hz, simulations predict -98 dB, and the spectrum
in Fig. 27 gives -102 dBc/Hz. For A w = 27r x 5 MHz,
the calculated and simulated results are -119 dBc/Hz and

Fig. 27. Measured output spectrum of relaxation oscillator (10 dB/div.


vertical scale). (a) 2 MJWdiv. horizontal scale and 100 kHz resolution
bandwidth and (b) 1 MHz horizontal scale and 10 1<Hzresolution bandwidth.

- 120 dBckIz, respectively, while the measured value is


dBc/Hz.

115

B. Discussion
Using the above measured data points and assuming a noise
shaping function as in (10) with a linear noise-power trade-off
(Fig. 16), we can make a number of observations.

343

RAZAVI: A STUDY OF PHASE NOISE IN CMOS OSCILLATORS

How much can the phase noise be lowered by scaling


device dimensions? If the gate oxide of MOSFETs is reduced indefinitely, their transconductance becomes relatively
independent of their dimensions, approaching roughly that
of bipolar transistors. Thus, in the gain stage of Fig. 24
the transconductance of A41 and A42 (for 1 ~ =
1 1 ~ 2=
395 pA) would go from (214 O)- to (66 s2)-. Scaling down
the load resistance proportionally and assuming a constant
oscillation frequency, we can therefore lower the phase noise
by 101og(214/66) M 5 dB. For the relaxation oscillator, on
the other hand, the improvement is about IO dB. These are,
of course, greatly simplified calculations, but they provide
an estimate of the maximum improvement expected from
technology scaling. In reality, short-channel effects, finite
thickness of the inversion layer, and velocity saturation further
limit the transconductance that can be achieved for a given
bias current.
It is also instructive to compare the measured phase noise
of the above ring oscillator with that of a 900-MHz three-stage
CMOS ring oscillator reported in [lo]. The latter employs
single-ended CMOS inverters with rail-to-rail swings in a 1.2pm technology and achieves a phase noise of -83 dBc/Hz at
100 kHz offset while dissipating 7.4 mW from a 5-V supply.
Assuming that
Relative Phase Noise K

(wg)zL
aw

Vswingz

__
1

IDD

(35)

where J&iIlg denotes the internal voltage swing and 100 is


the total supply current, we can utilize the measured phase
noise of one oscillator to roughly estimate that of the other.
With the parameters of the 2.2-GHz oscillator and accounting
for different voltage swings and supply currents, we obtain a
phase noise of approximately -93 dBc/Hz at 100 kHz offset
for the 900-MHz oscillator in [IO]. The 10 dB discrepancy is
attributed to the difference in the minimum channel length,
l/f noise at 100 kHz, and the fact that the two circuits
incorporate different gain stages.

ACKNOWLEDGMENT
The author wishes to thank V. Gopinathan for many illuminating discussions and T. Aytur for providing the
oscillator simulation and measurement results.

REFERENCES
[l] A. A. Abidi and R. G. Meyer, Noise in relaxation oscillators, IEEE
J. Solid-state Circuits, vol. SC-18, pp. 794-802, Dec. 1983.
[2] T. C. Weigandt, B. Kim, and P. R. Gray, Analysis of timing jitter in
cmos ring oscillators, in Proc. ISCAS, June 1994.
[3] A. A. Abidi, Direct conversion radio tranceivers for digital communications, in ZSSCC Dig. Tech. Papers, Feb. 1995, pp. 186187.
[4] D. B. Leeson, A simple model of feedback oscillator noise spectrum,
Proc. IEEE, pp. 329-330, Feb. 1966.
[5] B. Razavi, K. F. Lee, and R.-H. Yan, Design of high-speed low-power
frequency dividers and phase-locked loops in deep submicron CMOS,
IEEE J. Solid-state Circuits, vol. 30, pp. 101-109, Feb. 1995.
[6] Y. P. Tsividis, Operation and Modeling of the MOS Transistor. New
York McGraw-Hill, 1987.
[7] J. A. Connelly and P. Choi, Macromodeling with SPICE. Englewood
Cliffs, NJ: Prentice-Hall, 1992.
[8] S. 0. Rice, Mathematical analysis of random noise, Bell System Tech.
J., pp. 282-332, July 1944, and pp. 46156, Jan. 1945.
[9] P. Bolcato et al., A new and efficient transient noise analysis technique
for simulation of CCD image sensors or particle detectors, in Proc.
CICC, 1993, pp. 14.8.1-14.8.4.
[lo] T. Kwasniewski, et al., Inductorless oscillator design for personal
communications devices-A 1.2 pm CMOS process case study, in
Proc. CICC, May 1995, pp. 327-330.

Behzad Razavi (S87-M91) received the B.Sc.


degree in electrical engineering from Tehran (Sharif)
University of Technology, Tehran, Iran, in 1985, and
the M.Sc. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, in 1988
and 1991, respectively
From 1992 to 1996, he was a Member of Technical Staff at AT&T Bell Laboratories, Holmdel,
NJ, where his research involved integrated circuit
design for communication systems He is now with
Hewlett-Packard Laboratories, Palo Alto, CA. His
current interests include wireless transceivers,data conversion, clock recovery,
frequency synthesis, and low-voltage low-power circuits. He has been a Visiting Lecturer at Princeton University, Princeton, NJ, and Stanford University.
He is also a member of the Technical Program Committee of the International
Solid-state Circuits Conference. He has served as Guest Editor to the IEEE
OF SOLID-STATE CIRCUITS and International Journal of High Speed
JOURNAL
Electronics and is currently an Associate Editor of JSSC. He is the author of
the book Principles of Data Conversion System Design (IEEE Press, 1995),
and editor of Monolothic Phase-Locked Loops and Clock Recovery Circuits
(IEEE Press, l996).
Dr. Razavi received the Beatrice Winner Award for Editorial Excellence
at the 1994 ISSCC, the best paper award at the 1994 European Solid-State
Circuits Conference, and the best panel award at the 1995 ISSCC.

928

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 6, JUNE 1998

Correspondence
Corrections to A General Theory of
Phase Noise in Electrical Oscillators

Comments on A 64-Point Fourier Transform Chip for


Video Motion Compensation Using Phase Correlation1

Ali Hajimiri and Thomas H. Lee

Kevin J. McGee

The authors of the above paper1 have found an error in (19) on


p. 185. The factor of 8 in the denominator should be 4; therefore
(19) should read

Abstract The fast Fourier transform (FFT) processor of the above


paper,1 contains many interesting and novel features. However, bit reversed input/output FFT algorithms, matrix transposers, and bit reversers
have been noted in the literature. In addition, lower radix algorithms
can be modified to be made computationally equivalent to higher radix
algorithms. Many FFT ideas, including those of the above paper,1 can
also be applied to other important algorithms and architectures.

i2n 1 c2
n
1f
Lf1!g = 10 1 log 4q2 n=0
2 :
max 1!

I. INTRODUCTION
1

Noise power around the frequency n!0 + 1! causes two equal


sidebands at !0 6 1!: However, the noise power at n!0 0 1!
has a similar effect as mentioned in the paper. Therefore, twice the
power of noise at n!0 + 1! should be taken into account. This will
also change the 4 in the denominator of (21) to 2 to read
2
2
Lf1!g = 10 1 log 0q2rms 1 2in1 =1f
:
1!2
max

Similarly, (24) must change, and its correct form is

1!1=f = !1=f 1

c0
20rms

2
 !1=f 1 12 cc01 :

This will result in the factor of 1/2 becoming redundant in (29), i.e.,

!0
1
Lf1!g = 10 1 log VkT2 1 Rp 1 (C!
1 1!
0 )2
max

However, note that the discussion following (29) is still valid.


2
The factor c02 =20rms
should be changed to (c0 =20rms )2 in the
following instances:
1) p. 185, second column, last paragraph;
2) p. 190, second column, first paragraph;
3) p. 190, second column, second paragraph.
Nevertheless, the expression used to calculate the 0rms to predict
phase noise of ring oscillators is based on a simulation that takes
this effect into account automatically, and therefore the predictions
are still valid. The authors regret any confusion this error may have
caused.
Manuscript received February 27, 1998.
The authors are with the Center for Integrated Systems, Stanford University,
Stanford, CA 94305-4070 USA.
Publisher Item Identifier S 0018-9200(98)03730-5.
1 A. Hajimiri and T. H. Lee, IEEE J. Solid-State Circuits, vol. 33, pp.
179194, Feb. 1998.

In the above paper, the authors present a fast Fourier transform


(FFT) processor that contains many interesting and novel features.
The mathematics in the above paper,1 describe a matrix computation
where both time inputs and frequency outputs are in bit-reversed
order. Bit-reversed input/output FFT algorithms, while not widely
known, are not new, having been previously described in [3]. Fig.
1, for example, is a 16-point, radix-4, undecimated, bit reversed
input/output, constant output geometry graph based on [3].
The algorithm1 is also described as a decimation-in-time-andfrequency (DITF) type, but the architecture appears to be based
on decimation-in-time (DIT). In the above paper,1 Figs. 4 and 10
show a first calculation stage with unity twiddles before the butterfly
and a second and third calculation stage with prebutterfly twiddles.
Although the butterfly implementation of Fig. 51 may be unique,
the use of prebutterfly twiddles in all three stages, along with unity
twiddles in the first, would seem to indicate DIT. The architecture1
is also a pipeline and contains many elements common to this type
of processor, such as matrix transposers and bit reversers, as will be
described below.
II. MATRIX TRANSPOSERS

AND

BIT REVERSERS

Block serial/parallel or parallel/serial converters, sometimes called


matrix transpose or corner turn buffers, are used in many systems.
They perform a matrix transpose on data blocks by exchanging rows
and columns. Fig. 2 (from [7]) shows, from upper left to lower right,
the flow of data through a 4 2 4 shift-based transposer. The rotator
lines show where data will be routed on the next clock cycle and
the output is the transpose of the input. The switching action was
noted in [7] and [8] and rotator designs can be found in [4], [7],
and [8]. Although Fig. 6(b)1 is also an 8 2 8 transposer, it is being
used in a somewhat unusual way. By providing a complex (real and

Manuscript received January 31, 1997; revised March 5, 1998.


The author was with the Naval Undersea Warfare Center, Newport, RI
02841 USA. He is now at 33 Everett Street, Newport, RI 02840 USA.
Publisher Item Identifier S 0018-9200(98)03731-7.
1 C. C. W. Hui, T. J. Ding, J. V. McCanny, and R. F. Woods, IEEE J.
Solid-State Circuits, vol. 31, pp. 17511761, Nov. 1996.

00189200/98$10.00 1998 IEEE

790

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

Jitter and Phase Noise in Ring Oscillators


Ali Hajimiri, Sotirios Limotyrakis, and Thomas H. Lee, Member, IEEE

AbstractA companion analysis of clock jitter and phase noise


of single-ended and differential ring oscillators is presented. The
impulse sensitivity functions are used to derive expressions for the
jitter and phase noise of ring oscillators. The effect of the number
of stages, power dissipation, frequency of oscillation, and shortchannel effects on the jitter and phase noise of ring oscillators is
analyzed. Jitter and phase noise due to substrate and supply noise
is discussed, and the effect of symmetry on the upconversion of
1/f noise is demonstrated. Several new design insights are given
for low jitter/phase-noise design. Good agreement between theory
and measurements is observed.
Index TermsDesign methodology, jitter, noise measurement,
oscillator noise, oscillator stability, phase jitter, phase-locked
loops, phase noise, ring oscillators, voltage-controlled oscillators.

I. INTRODUCTION

UE to their integrated nature, ring oscillators have become an essential building block in many digital and
communication systems. They are used as voltage-controlled
oscillators (VCOs) in applications such as clock recovery
circuits for serial data communications [1][4], disk-drive read
channels [5], [6], on-chip clock distribution [7][10], and
integrated frequency synthesizers [10], [11]. Although they
have not found many applications in radio frequency (RF),
they can be used for some low-tier RF systems.
Recently, there has been some work on modeling jitter
and phase noise in ring oscillators. References [12] and [13]
develop models for the clock jitter based on time-domain
treatments for MOS and bipolar differential ring oscillators,
respectively. Reference [14] proposes a frequency-domain
approach to find the phase noise based on an linear timeinvariant model for differential ring oscillators with a small
number of stages.
In this paper, we develop a parallel treatment of frequencydomain phase noise [15] and time-domain clock jitter for ring
oscillators. We apply the phase-noise model presented in [16]
to obtain general expressions for jitter and phase noise of the
ring oscillators.
The next section briefly reviews the phase-noise model
presented in [16]. In Section III, we apply the model to timing
jitter and develop an expression for the timing jitter of oscillators, while Section IV provides the derivation of a closed-form
expression to calculate the rms value of the impulse sensitivity
function (ISF). Section V introduces expressions for jitter and
phase noise in single-ended and differential ring oscillators
Manuscript received April 8, 1998; revised November 2, 1998.
A. Hajimiri is with the California Institute of Technology, Pasadena, CA
91125 USA.
S. Limotyrakis and T. H. Lee are with the Center for Integrated Systems,
Stanford University, Stanford, CA 94305 USA.
Publisher Item Identifier S 0018-9200(99)04200-6.

in long- and short-channel regimes of operation. Section VI


describes the effect of substrate and supply noise as well as
the noise due to the tail-current source in differential structures. Section VII explains the design insights obtained from
this treatment for low jitter/phase-noise design. Section VIII
summarizes the measurement results.
II. PHASE NOISE
The output of a practical oscillator can be written as
(1)
is periodic in 2 and
and
where the function
model fluctuations in amplitude and phase due to internal
and external noise sources. The amplitude fluctuations are
significantly attenuated by the amplitude limiting mechanism,
which is present in any practical stable oscillator and is
particularly strong in ring oscillators. Therefore, we will
focus on phase variations, which are not quenched by such
a restoring mechanism.
As an example, consider the single-ended ring oscillator
with a single current source on one of the nodes shown in
Fig. 1. Suppose that the current source consists of an impulse
(in coulombs) occurring at time
of current with area
This will cause an instantaneous change in the voltage of that
node, given by
(2)
is the effective capacitance on that node at
where
the time of charge injection. This produces a shift in the
the change in the phase
is
transition time. For small
proportional to the injected charge
(3)
is the voltage swing across the capacitor and
The dimensionless function
is
the time-varying proportionality constant and is periodic in 2
It is large when a given perturbation causes a large phase shift
thus
and small where it has a small effect [16]. Since
represents the sensitivity of every point of the waveform to a
is called the impulse sensitivity function.
perturbation,
The time dependence of the ISF can be demonstrated by
considering two extreme cases. The first is when the impulse
is injected during a transition; this will result in a large phase
shift. As the other case, consider injecting an impulse while
the output is saturated to either the supply or the ground.
This impulse will have a minimal effect on the phase of the
oscillator, as shown in Fig. 2.

where

00189200/99$10.00 1999 IEEE

HAJIMIRI et al.: JITTER AND PHASE NOISE IN RING OSCILLATORS

791

Fig. 1. Five-stage inverter-chain ring oscillator.

Fig. 2. Effect of impulses injected during transition and peak.

Being interested in its phase


we can treat an oscillator
as a system that converts voltages and currents to phase. As
is evident from the discussion leading to (3), this system is
linear for small perturbations. It is also time variant, no matter
how small the perturbations are.
Unlike amplitude changes, phase shifts persist indefinitely,
since subsequent transitions are shifted by the same amount.
Thus, the phase impulse response of an oscillator is a timevarying step. Also note that as long as the introduced change
in the voltage due to the current impulse is small, the resultant
phase shift is linearly proportional to the injected charge, and
hence the transfer function from current to phase is linear.
The unit impulse response of the system is defined as the
amount of phase shift per unit current impulse [16]. Based

on the foregoing argument, we obtain the following timedependent impulse response:


(4)
is a unit step.
where
Knowing the response to an impulse, we can calculate
in response to any injected current using the superposition
integral

(5)

792

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

where
represents the noise current injected into the node
of interest. Note that the integration arises from the closedloop nature of the oscillator. The single-sideband phase-noise
spectrum due to a white-noise current source is given by [16]1
(6)
is the rms value of the ISF,
is the singlewhere
sideband power spectral density of the noise current source,
is the frequency offset from the carrier. In the case
and
of multiple noise sources injecting into the same node,
represents the total current noise due to all the sources and is
given by the sum of individual noise power spectral densities
[17]. If the noise sources on different nodes are uncorrelated,
the waveform (and hence the ISF) of all the nodes are the same
except for a phase shift, assuming identical stages. Therefore,
noise sources is
times
the total phase noise due to all
the value given by (6) (or 2 times for a differential ring
oscillator).
From (5), it follows that the upconversion of low-frequency
noise, such as 1 noise, is governed by the dc value of the
and 1
regions in
ISF. The corner frequency between 1
and is related to
the spectrum of the phase noise is called
through the following equation [16]:
the 1 noise corner

Fig. 3. Clock jitter increasing with time.

(7)
is the dc value of the ISF. Since the height of the
where
positive and negative lobes of the ISF is determined by the
slope of the rising and falling edges of the output waveform,
respectively, symmetry of the rising and falling edges can
and hence the upconversion of 1 noise.
reduce
III. JITTER
In an ideal oscillator, the spacing between transitions is
constant. In practice, however, the transition spacing will
be variable. This uncertainty is known as clock jitter and
(i.e., the time deincreases with measurement interval
lay between the reference and the observed transitions), as
illustrated in Fig. 3. This variability accumulation (i.e., jitter
accumulation) occurs because any uncertainty in an earlier
transition affects all the following transitions, and its effect
persists indefinitely. Therefore, the timing uncertainty when
seconds have elapsed is the sum of the uncertainties
associated with each transition.
The statistics of the timing jitter depend on the correlations
among the noise sources involved. The case of each transitions being affected by independent noise sources has been
considered in [12] and [13]. The jitter introduced by each stage
is assumed to be totally independent of the jitter introduced
by other stages, and therefore the total variance of the jitter is
given by the sum of the variances introduced at each stage. For
ring oscillators with identical stages, the variance will be given
where is the number of transitions during
and
by
1 A more accurate treatment [17] shows that the phase noise does not grow
without bound as fo approaches zero (it becomes flat for small values of
fo ): However, this makes no practical difference in this discussion.

Fig. 4. RMS jitter versus measurement time on a loglog plot.

is the variance of the uncertainty introduced by one stage


is proportional to
during one transition. Noting that
the standard deviation of the jitter after
seconds is [13]
(8)
where is a proportionality constant determined by circuit
parameters.
Another instructive special case that is not usually considered is when the noise sources are totally correlated with one
another. Substrate and supply noise are examples of such noise
sources. Low-frequency noise sources, such as 1 noise, can
also result in a correlation between induced jitter on transitions
over multiple cycles. In this case, the standard deviations rather
than the variances add. Therefore, the standard deviation of the
seconds is proportional to
jitter after
(9)
is another proportionality constant. Noise sources
where
such as thermal noise of devices are usually modeled as
uncorrelated, while substrate and supply-noise sources, as
well as low-frequency noise, are approximated as partially
or fully correlated sources. In practice, both correlated and
uncorrelated sources exist in a circuit, and hence a loglog
versus the measurement delay
plot of the timing jitter
for an open-loop oscillator will demonstrate regions with
slopes of 1/2 and 1, as shown in Fig. 4.

HAJIMIRI et al.: JITTER AND PHASE NOISE IN RING OSCILLATORS

793

Fig. 5. ISF for ring oscillators of the same frequency with different number of stages.

In most digital applications, it is desirable for


to
In practice, we wish
decrease at the same rate as the period
to keep constant the ratio of the timing jitter to the period.
Therefore, in many applications, phase jitter, defined as
(10)
is a more useful measure.
can be obtained using (5). As shown
An expression for
or
where is an
in Appendix A, for
integer, the phase jitter due to a single white noise source is
given by
(11)
Using (10) and (11), the proportionality constant
calculated to be

in (8) is

Fig. 6. Approximate waveform and ISF for ring oscillator.

(12)
IV. CALCULATION OF THE ISF

FOR

RING OSCILLATORS

To calculate phase noise and jitter using (6) and (12), one
needs to know the rms value of the ISF. Although one can
always find the ISF through simulation, we obtain a closedform approximate equation for the rms value of the ISF of ring
oscillators, which usually makes such simulations unnecessary.
It is instructive to look at the actual ISF of ring oscillators to
gain insight into what constitutes a good approximation. Fig. 5
shows the shape of the ISF for a group of single-ended CMOS
ring oscillators. The frequency of oscillation is kept constant
(through adjustment of channel length), while the number of
stages is varied from 3 to 15 (in odd numbers). To calculate the
ISF, a narrow current pulse is injected into one of the nodes
of the oscillator, and the resulting phase shift is measured a
few cycles later in simulation.
As can be seen, increasing the number of stages reduces the
peak value of the ISF. The reason is that the transitions of the
Since the
normalized waveform become faster for larger
sensitivity during the transition is inversely proportional to the
slope, the peak of the ISF drops. It should be noted that only
the peak of the ISF is inversely proportional to the slope, and

Fig. 7. Relationship between rise time and delay.

this relation should not be generalized to other points in time.


Also, the widths of the lobes of the ISF decrease as becomes
larger, since each transition occupies a smaller fraction of the
period. Based on these observations, we approximate the ISF
as triangular in shape and with symmetric rising and falling
edges, as shown in Fig. 6. The case of nonsymmetric rising
and falling edges is considered in Appendix B.
where
is the
The ISF has a maximum of 1
maximum slope of the normalized waveform in (1). Also, the
, and hence the
width of the triangles is approximately 2
slopes of the sides of the triangles are 1. Therefore, assuming
can be estimated as
equality of rise and fall times,

(13)

794

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

Fig. 8. RMS values of the ISFs for various single-ended ring oscillators versus number of stages.

On the other hand, stage delay is proportional to the rise time


(14)
is the normalized stage delay and is a proportionwhere
ality constant, which is typically close to one, as can be seen
in Fig. 7.
The period is 2 times longer than a single stage delay
(15)
Using (13) and (15), the following approximate expression for
is obtained:
(16)
dependence of
is independent of
Note that the 1
Fig. 8 illustrates
for the ISF shown in
the value of
Fig. 5 with plus signs on loglog axes. The solid line shows
which is obtained from (16) for
the line of
To verify the generality of (16), we maintain a
fixed channel length for all the devices in the inverters while
varying the number of stages to allow different frequencies of
is calculated, and is shown in Fig. 8
oscillation. Again,
with circles. We also repeat the first experiment with a different
supply voltage (3 V as opposed to 5 V), and the result is shown
are almost
with crosses. As can be seen, the values of
identical for these three cases.
is primarily a function
It should not be surprising that
because the effect of variations in other parameters,
of
and device noise, have already been decoupled
such as
, and thus the ISF is a unitless, frequency- and
from
amplitude-independent function.

Equation (16) is valid for differential ring oscillators as


well, since in its derivation no assumption specific to singleended oscillators was made. Fig. 9 shows the
for three
sets of differential ring oscillators, with a varying number of
stages (416). The data shown with plus signs correspond to
oscillators in which the total power dissipation and the drain
voltage swing are kept constant by scaling the tail-current
changes. Members of the
sources and load resistors as
second set of oscillators have a fixed total power dissipation
and fixed load resistors, which result in variable swings and
for whom data are shown with circles. The third case is
that of a fixed tail current for each stage and constant load
resistors, whose data are illustrated using crosses. Again, in
spite of the diverse variations of the frequency and other
dependency of
and its
circuit parameters, the 1
independence from other circuit parameters still holds. In the
which
case of a differential ring oscillator,
is the best fit approximation for
corresponds to
This is shown with the solid line in Fig. 9. A similar result
can be obtained for bipolar differential ring oscillators.
decreases as the number of stages increases,
Although
one should not prematurely conclude that the phase noise can
be reduced using a larger number of stages because the number
of noise sources, as well as their magnitudes, also increases for
a given total power dissipation and frequency of oscillation.
In the case of asymmetric rising and falling edges, both
and
will change. As shown in Appendix B, the 1
corner of the phase-noise spectrum is inversely proportional
corner can be
to the number of stages. Therefore, the 1
reduced either by making the transitions more symmetric in
terms of rise and fall times or by increasing the number of
stages. Although the former always helps, the latter has other
region, as will be
implications on the phase noise in the 1
shown in the following section.

HAJIMIRI et al.: JITTER AND PHASE NOISE IN RING OSCILLATORS

795

Fig. 9. RMS values of the ISFs for various differential ring oscillators versus number of stages.

where

V. EXPRESSIONS FOR JITTER AND


PHASE NOISE IN RING OSCILLATORS

(19)

In this section, we derive expressions for the phase noise


and jitter of different types of ring oscillators. Throughout
this section, we assume that the symmetry criteria required to
(and hence the upconversion of 1 noise) are
minimize
already met and that the jitter and phase noise of the oscillator
are dominated by white noise. For CMOS transistors, the drain
current noise spectral density is given by
(17)
is the zero-bias drain source conductance,
is
where
is the gate-oxide capacitance per unit area,
the mobility,
and
are the channel width and length of the device,
is the gate voltage overdrive. The
respectively, and
coefficient is 2/3 for long-channel devices in the saturation
region and typically two to three times greater for shortchannel devices [18]. Equation (17) is valid in both shortand long-channel regimes as long as an appropriate value for
is used.
A. Single-Ended CMOS Ring Oscillators
We start with a single-ended CMOS ring oscillator with
equal-length NMOS and PMOS transistors. Assuming that
the maximum total channel noise from NMOS
and PMOS devices, when both the input and output are at
is given by
(18)

and
(20)
and

is the gate overdrive in the middle of transition, i.e.,

During one period, each node is charged to


and
then discharged to zero. In an -stage single-ended ring
oscillator, the power dissipation associated with this process
However, during the transitions, some extra
is
current, known as crowbar current, is drawn from the supply,
which does not contribute to charging and discharging the
capacitors and goes directly from supply to ground through
both transistors. In a symmetric ring oscillator, these two
components are approximately equal, and their difference will
depend on the ratio of the rise time and stage delay. Therefore,
the total power dissipation is approximately given by
(21)
to make the waveforms symmetric
Assuming
to the first order, we have
(22)
is the delay of each stage and
and
are the
where
rise and fall time, respectively, associated with the maximum
slope during a transition.
Assuming that the thermal noise sources of the different
devices are uncorrelated, and assuming that the waveforms
(and hence the ISF) of all the nodes are the same except for a
noise sources is
phase shift, the total phase noise due to all

796

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

times the value given by (6). Taking only these inevitable


noise sources into account, (6), (16), (18), (21), and (22) result
in the following expressions for phase noise and jitter:

Using (28) and (29), we obtain the same expressions for


phase noise and jitter as given by (23) and (24), except for
a new

(23)

(30)

(24)
is the characteristic voltage of the device. For
where
long-channel mode of operation, it is defined as
Any extra disturbance, such as substrate and supply
noise, or noise contributed by extra circuitry or asymmetry in
the waveform will result in a larger number than (23) and (24).
Note that lowering threshold voltages reduces the phase noise,
in agreement with [12]. Therefore, the minimum achievable
phase noise and jitter for a single-ended CMOS ring oscillator,
assuming that all symmetry criteria are met, occurs for zero
threshold voltage

which results in a larger phase noise and jitter than the longAgain, note the absence
channel case by a factor of
of any dependency on the number of stages.
B. Differential CMOS Ring Oscillators
Now consider a differential MOS ring oscillator with resistive load. The total power dissipation is
(31)
is the number of stages,
is the tail bias current
where
is the supply voltage. The
of the differential pair, and
frequency of oscillation can be approximated by

(25)

(32)

(26)

Surprisingly, tail-current source noise in the vicinity of


does not affect the phase noise. Rather, its low-frequency noise
as well as its noise in the vicinity of even multiples of the
oscillation frequency affect the phase noise. Tail noise in the
vicinity of even harmonics can be significantly reduced by a
variety of means, such as with a series inductor or a parallel
capacitor. As before, the effect of low-frequency noise can
be minimized by exploiting symmetry. Therefore, only the
noise of the differential transistors and the load are taken into
account. The total current noise on each single-ended node is
given by

As can be seen, the minimum phase noise is inversely proportional to the power dissipation and grows quadratically with
the oscillation frequency. Further, note the lack of dependence
on the number of stages (for a given power dissipation
and oscillation frequency). Evidently, the increase in the
number of noise sources (and in the maximum power due
to the higher transition currents required to run at the same
as
frequency) essentially cancels the effect of decreasing
increases, leading to no net dependence of phase noise on
This somewhat surprising result may explain the confusion
that exists regarding the optimum , since there is not a strong
dependence on the number of stages for single-ended CMOS
ring oscillators. Note that (25) and (26) establish the lower
bound and therefore should not be used to calculate the phase
noise and jitter of an arbitrary oscillator, for which (6) and
(12) should be used, respectively.
We may carry out a similar calculation for the short-channel
case. For such devices, the drain current may be expressed as
(27)
is the critical electric field and is defined as the value
where
of electric field resulting in half the carrier velocity expected
from low field mobility. Combining (17) with (27), we obtain
the following expression for the drain current noise of a MOS
device in short channel:
(28)

(33)
is the load resistor,
for a
where
balanced stage in the long-channel limit and
in the short-channel regime. The phase noise and jitter due
to all 2 noise sources is 2 times the value given by (6)
and (12). Using (16), the expression for the phase noise of the
differential MOS ring oscillator is

(34)
and is given by
(35)

The frequency of oscillation can be approximated by

(29)

Equations (34) and (35) are valid in both long- and shortchannel regimes of operation with the right choice of
Note that, in contrast with the single-ended ring oscillator,
a differential oscillator does exhibit a phase noise and jitter
dependency on the number of stages, with the phase noise

HAJIMIRI et al.: JITTER AND PHASE NOISE IN RING OSCILLATORS

797

degrading as the number of stages increases for a given frequency and power dissipation. This result may be understood
as a consequence of the necessary reduction in the charge
swing that is required to accommodate a constant frequency
increases. At the
of oscillation at a fixed power level as
same time, increasing the number of stages at a fixed total
power dissipation demands a proportional reduction of tail,
current sources, which will reduce the swing, and hence
by a factor of 1
C. Bipolar Differential Ring Oscillator
A similar approach allows us to derive the corresponding
results for a bipolar differential ring oscillator. In this case,
the power dissipation is given by (31) and the oscillation
frequency by (32). The total noise current is given by the
sum of collector shot noise and load resistor noise

Fig. 10. Phasors for noise contributions from each source.

(36)
is the electron charge,
is the collector
where
Using these
current during the transition, and
relations, the phase noise and jitter of a bipolar ring oscillator
are again given by (34) and (35) with the appropriate choice
of
VI. OTHER NOISE SOURCES
Other noise sources, such as tail-current source noise in a
differential structure, or substrate and supply noise sources,
may play an important role in the jitter and phase noise of
ring oscillators. The low-frequency noise of the tail-current
source affects phase noise if the symmetry criteria mentioned
in Section II are not met by each half circuit. In such cases,
the ISF for the tail-current source has a large dc value, which
increases the upconversion of low-frequency noise to phase
noise. This upconversion is particularly prominent if the tail
device has a large 1 noise corner.
Substrate and supply noise are among other important
sources of noise. There are two major differences between
these noise sources and internal device noise. First, the power
spectral density of these sources is usually nonwhite and often
demonstrates strong peaks at various frequencies. Even more
important is that the substrate and supply noise on different
nodes of the ring oscillator have a very strong correlation.
This property changes the response of the oscillator to these
sources.
To understand the effect of this correlation, let us consider
the special case of having equal noise sources on all the nodes
of the oscillator. If all the inverters in the oscillator are the
same, the ISF for different nodes will only differ in phase by
as shown in Fig. 10. Therefore, the total
multiples of
phase due to all the sources is given by superposition of (5)

(37)

Fig. 11. Sideband power below carrier for equal sources on all five nodes
at nf0
f
:

+ m

Expanding the term in brackets in a Fourier series, we can


i.e.,
show that it is zero except at dc and multiples of
(38)
where is the th Fourier coefficient of the ISF. Equation (38)
means that for identical sources, only noise in the vicinity of
affects the phase.
integer multiples of
To verify this effect, sinusoidal currents with an amplitude
of 10 A were injected into all five nodes of the five-stage ring
oscillator of Fig. 1 at different offsets from integer multiples
of the frequency of oscillation, and the induced sidebands were
measured. The measured sideband power with respect to the
carrier is plotted in Fig. 11.
As can be seen in Fig. 11, only injection at low frequency
and in the vicinity of the fifth harmonic are integrated, and
show a 20 dB/dec slope. The effect of injection in the vicinity
is much
of harmonics that are not integer multiples of
smaller than at the integer ones. Ideally, there should be no
sideband induced by the injection in the vicinity of harmonics
that are not integer multiples of ; however, as can be seen
in Fig. 11, there is some sideband power due to the amplitude
response.
Low-frequency noise can also result in correlation between
uncertainties introduced during different cycles, as its value
does not change significantly over a small number of periods.

798

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

Therefore, the uncertainties add up in amplitude rather than


power, resulting in a region with a slope of one in the loglog
plot of jitter even in the absence of external noise sources
such as substrate and supply noise.

VII. DESIGN IMPLICATIONS


One can use (23) and (34) to compare the phase-noise
performance of single-ended and differential MOS ring osstages, the phase noise
cillators. As can be seen for
of the differential ring oscillator is approximately
times larger than the phase noise of a singleand
Since the minimum
ended oscillator of equal
for a regular ring oscillator is three, even a properly
designed differential CMOS ring oscillator underperforms its
single-ended counterpart, especially for a larger number of
stages. This difference is even more pronounced if proper
precautions to reduce the noise of the tail current are not
taken. However, the differential ring oscillator may still be
preferred in ICs because of the lower sensitivity to substrate
and supply noise, as well as lower noise injection into other
circuits on the same chip. The decision to use differential
versus single-ended ring oscillators should be based on both
of these considerations.
The common-mode sensitivity problem in a single-ended
ring oscillator can be mitigated to some extent by using two
identical ring oscillators laid out close to each other that
oscillate out of phase because of small coupling inverters
[19]. Single-ended configurations may be used in a less noisy
environment to achieve better phase-noise performance for a
given power dissipation.
As shown in Appendix B, asymmetry of the rising and
falling edges degrades phase noise and jitter by increasing
corner frequency. Thus, every effort should be taken
the 1
to make the rising and falling edges symmetric. By properly
adjusting the symmetry properties, one can suppress or even
eliminate low-frequency-noise upconversion [16]. As shown in
[16], differential symmetry is insufficient, and the symmetry of
each half circuit is important. One practical method to achieve
this symmetry is to use more linear loads, such as resistors or
linearized MOS devices. This method reduces the 1 noise
upconversion and substrate and supply coupling [20]. Another
revealing implication, shown in Appendix A, is the reduction
corner frequency as
increases. Hence for a
of the 1
process with large 1 noise, a larger number of stages may
be helpful.
One question that frequently arises in the design of ring
oscillators is the optimum number of stages for minimum jitter
and phase noise. As seen in (23), for single-ended oscillators,
region is not a strong
the phase noise and jitter in the 1
function of the number of stages for single-ended CMOS ring
oscillators. However, if the symmetry criteria are not well
noise, a larger
satisfied and/or the process has a large 1
will reduce the jitter. In general, the choice of the number
of stages must be made on the basis of several design criteria,
such as 1 noise effect, the desired maximum frequency of
oscillation, and the influence of external noise sources, such
as supply and substrate noise, that may not scale with

The jitter and phase noise behavior are different for differential ring oscillators. As (34) suggests, jitter and phase
noise increase with an increasing number of stages. Hence
if the 1 noise corner is not large, and/or proper symmetry
measures have been taken, the minimum number of stages
(three or four) should be used to give the best performance.
This recommendation holds even if the power dissipation is
not a primary issue. It is not fair to argue that burning more
power in a larger number of stages allows the achievement of
better phase noise, since dissipating the same total power in a
smaller number of stages results in better jitter/phase noise as
long as it is possible to maximize the total charge swing.
Another insight one can obtain from (34) and (35) is
that the jitter of a MOS differential ring oscillator at a
and
is smaller than that of a differential
given
bipolar ring oscillator, at least for todays range of circuit
and process parameters. As we go to shorter channel lengths,
the characteristic voltage for the MOS devices given by (30)
becomes smaller, and thus phase noise degrades with scaling.
Bipolar ring oscillators do not suffer from this problem.
LC oscillators generally have better phase noise and jitter
compared to ring oscillators for two reasons. First, a ring
oscillator stores a certain amount of energy in the capacitors
during every cycle and then dissipates all the stored energy
during the same cycle, while an LC resonator dissipates only
of the total energy stored during one cycle. Thus, for a
2
given power dissipation in steady state, a ring oscillator suffers
Second, in a ring
from a smaller maximum charge swing
oscillator, the device noise is maximum during the transitions,
which is the time where the sensitivity, and hence the ISF, is
the largest [16].

VIII. EXPERIMENTAL RESULTS


The phase-noise measurements in this section were performed using three different systems: an HP 8563E spectrum
analyzer with phase-noise measurement capability, an RDL
NTS-1000A phase-noise measurement system, and an HP
E5500 phase-noise measurement system. The jitter measurements were performed using a Tektronix CSA 803A communication signal analyzer.
Tables IIII summarize the phase-noise measurements. All
the reported phase-noise values are at a 1-MHz offset from
the carrier, chosen to achieve the largest dynamic range in
the measurement. Table I shows the measurement results for
three different inverter-chain ring oscillators. These oscillators
are made of the CMOS inverters shown in Fig. 12(a), with no
frequency tuning mechanism. The output is taken from one
node of the ring through a few stages of tapered inverters.
Oscillators number 1 and 2 are fabricated in a 2- m 5-V
CMOS process, and oscillator number 3 is fabricated in a 0.25m 2.5-V process. The second column shows the number of
ratios of the NMOS
stages in each of the oscillators. The
and PMOS devices, as well as the supply voltages, the total
measured supply currents, and the frequencies of oscillation
are shown next. The phase-noise prediction using (23) and
(6), together with the measured phase noise, are shown in the
last three columns.

HAJIMIRI et al.: JITTER AND PHASE NOISE IN RING OSCILLATORS

799

TABLE I
INVERTER-CHAIN RING OSCILLATORS

TABLE II
CURRENT-STARVED INVERTER-CHAIN RING OSCILLATORS

As an illustrative example, we will show the details of


phase-noise calculations for oscillator number 3. Using (16) to
the phase noise can be obtained from (6). We
calculate
calculate the noise power when the stage is halfway through
a transition. At this point, the drain current is simulated to be
of 4
10 V/m and a of 2.5 is used in
3.47 mA. An
A Hz
(28) to obtain a noise power of
The total capacitance on each node is
fF, and
fC. There is one such noise source on
hence
times the value
each node; therefore, the phase noise is
MHz
dBc/Hz.
given by (6), which results in
Table II summarizes the data obtained for current-starved
ring oscillators with the cell structure shown in Fig. 12(b),
all implemented in the same 0.25- m 2.5-V process. Ring
oscillators with a different number of stages were designed
with roughly constant oscillation frequency and total power
dissipation. Frequency adjustment is achieved by changing
the channel length, while total power dissipation control is
ratios of the
performed by changing device width. The
inverter and the tail NMOS and PMOS devices are shown
is kept at
while node
in Table II. The node
is at 0 V. The measured total current dissipation and
the frequency of oscillation can be found in columns 7 and
8. Phase-noise calculations based on (23) and (6) are in good

agreement with the measured results. The die photo of the chip
containing these oscillators is shown in Fig. 13. The slightly
superior phase noise of the three-stage ring oscillator (number
4) can be attributed to lower oscillation frequency and longer
channel length (and hence smaller ).
Table III summarizes the results obtained for differential
ring oscillators of various sizes and lengths with the inverter
topology shown in Fig. 12(c), covering a large span of frequencies up to 5.5 GHz. All these ring oscillators are implemented
in the same 0.25- m 2.5-V process, and all the oscillators,
except the one marked with N/A, have the tuning circuit
shown. The resistors are implemented using an unsilicided
polysilicon layer. The main reason to use poly resistors is to
reduce 1 noise upconversion by making the waveform on
each node closer to the step response of an RC network, which
is more symmetrical. The value of these load resistors and the
ratios of the differential pair are shown in Table III. A
fixed 2.5-V power supply is used, resulting in different total
power dissipations. As before, the measured phase noise is in
good agreement with the predicted phase noise using (34) and
(6). The die photo of oscillator number 26 can be found in
Fig. 14.
To illustrate further how one obtains the phase-noise predictions shown in Table III, we elaborate on the phase-noise

800

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

TABLE III
DIFFERENTIAL RING OSCILLATORS

(a)
Fig. 12.

(b)

(c)

Inverter stages for (a) inverter-chain ring oscillators, (b) current-starved inverter-chain ring oscillators, and (c) differential ring oscillators.

calculations for oscillator number 12. The noise current due


to one of differential pair NMOS devices is given by (28).
The total capacitance on each node in the balanced case is
fF, and the simulated voltage swing is 1.208 V;
fC. In the balanced case, this current
therefore,
mA, and therefore
is half of the tail current, i.e.,
the noise current of the NMOS device has a single-sideband
A Hz The thermal
spectral density of
noise due to the load resistor is
A Hz;
therefore, the total current noise density is given by
A Hz For a differential ring oscillator with
stages, there is one such noise source on each node;
therefore, the phase noise is 2 times the value given by
MHz
dBc/Hz The total
(6), which results in
mW, and
power dissipation is

Therefore, with an of 0.9, (34) predicts a phase noise of


MHz
dBc/Hz
Timing jitter for oscillator number 12 can be measured using
the setup shown in Fig. 15. The oscillator output is divided
into two equal-power outputs using a power splitter. The CSA
803A is not capable of showing the edge it uses to trigger,
as there is a 21-ns minimum delay between the triggering
transition and the first acquired sample. To be able to look
at the triggering edge and perhaps the edges before that, a
delay line of approximately 25 ns is inserted in the signal
path in front of the sampling head. This way, one may look
at the exact edge used to trigger the signal. If the sampling
head and the power splitter were noiseless, this edge would
show no jitter. However, the power splitter and the sampling
head introduce noise onto the signal, which cannot be easily

HAJIMIRI et al.: JITTER AND PHASE NOISE IN RING OSCILLATORS

Fig. 13.

Die photograph of the current-starved single-ended oscillators.

Fig. 14.

Die photograph of the 12-stage differential ring oscillator.

801

The effect of this excess jitter should be subtracted from the


jitter due to the DUT. Assuming no correlation between the
jitter of the DUT and the sampling head, the equivalent jitter
due to the DUT can be estimated by
(39)

Fig. 15.

Timing jitter measurement setup using CSA803A.

distinguished from the device under test (DUT)s jitter. This


extra jitter can be directly measured by looking at the jitter on
the triggering edge. This edge can be readily identified since
it has lower rms jitter than the transitions before and after it.

is the effective rms timing jitter,


is the
where
after the triggering edge,
measured rms jitter at a delay
is the jitter on the triggering edge.
and
Fig. 16 shows the rms jitter versus the measurement delay
for oscillator number 12 on a loglog plot. The best fit for the
Equations (12)
data shown in Fig. 16 is
and
and (35) result in
respectively. The region of the jitter plot with the slope of one
can be attributed to the 1 noise of the devices, as discussed
at the end of Section VI.
In a separate experiment, the phase noise of oscillator
and
number 7 is measured for different values of

802

Fig. 16.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

RMS jitter versus measurement interval for the four-stage, 2.8-GHz differential ring oscillator (oscillator number 12).

on the phase noise and jitter at a given total power dissipation


and frequency of oscillation was shown for single-ended and
differential ring oscillators using the general expression for
the rms value of the ISF. The upconversion of low-frequency
1 was analyzed showing the effect of waveform asymmetry
and the number of stages. New design insights arising from
this approach were introduced, and good agreement between
theory and measurements was obtained.
APPENDIX A
RELATIONSHIP BETWEEN JITTER AND PHASE NOISE
The phase jitter is
(40)
where
Fig. 17.

Phase noise versus symmetry voltage for oscillator number 7.

(41)
These bias voltages are chosen in such a way as
to keep a constant oscillation frequency while changing only
corner of the
the ratio of rise time to fall time. The 1
phase noise is measured for different ratios of the pullup and
pulldown currents while keeping the frequency constant. One
can observe a sharp reduction in the corner frequency at the
point of symmetry in Fig. 17.
IX. CONCLUSION
An analysis of the jitter and phase noise of single-ended
and differential ring oscillators was presented. The general
noise model, based on the ISF, was applied to the case of ring
oscillators, resulting in a closed-form expression for the phase
noise and jitter of ring oscillators [(6), (23), (34)]. The model
was used to perform a parallel analysis of jitter and phase
noise for ring oscillators. The effect of the number of stages

Therefore

(42)
For a white-noise current source, the autocorrelation func; therefore
tion is
(43)
which is
for

(44)

HAJIMIRI et al.: JITTER AND PHASE NOISE IN RING OSCILLATORS

803

Analog and digital designers prefer using phase noise and


timing jitter, respectively. The relationship between these two
parameters can be obtained by noting that timing jitter is the
standard deviation of the timing uncertainty

(45)
represents the expected value. Since the autocorwhere
is defined as
relation function of
(46)
the timing jitter in (45) can be written as

Fig. 18. Approximate waveform and the ISF for asymmetric rising and
falling edges.

APPENDIX B
NONSYMMETRIC RISING AND FALLING EDGES
We approximate the ISF in this Appendix by the function
depicted in Fig. 18. The rms value of the ISF is

(47)
The relation between the autocorrelation and the power spectrum is given by the Khinchin theorem [21], i.e.,
(48)

(52)

represents the power spectrum of


Therewhere
fore, (47) results in the following relationship between clock
jitter and phase noise:

and
are the maximum slope during the rising
where
and falling edge, respectively, and represents the asymmetry
of the waveform and is defined as

(49)

(53)

can be approximated
It may be useful to know that
for large offsets [22]. As can be seen from the
by
foregoing, the rms timing jitter has less information than the
phase-noise spectrum and can be calculated from phase noise
using (49). However, unless extra information about the shape
of the phase-noise spectrum is known, the inverse is not
possible in general.
In the special case where the phase noise is dominated
and
are given by (6) and (12).
by white noise,
Therefore, can be expressed in terms of phase noise in the
region as
1

noting that
(54)
Combining (52) and (54) results in the following:
(55)
i.e.,
which reduces to (16) in the special case of
symmetric rising and falling edges. The dc value of the ISF,
can be calculated from Fig. 18 in a similar manner and
is given by

(50)
is the phase noise measured in the 1
where
region at an offset frequency of
and
is the oscillation
frequency. Therefore, based on (8), the rms cycle-to-cycle jitter
will be given by
(51)
Note that for (50) and (51) to be valid, the phase noise at
should be in the 1
region.

(56)
Using (7), the 1

corner is given by
(57)

corner
As can be seen for a constant rise-to-fall ratio, the 1
decreases inversely with the number of stages; therefore, ring
oscillators with a smaller number of stages will have a larger
noise corner. As a special case, if the rise and fall time
1
are symmetric,
, and the 1
corner approaches zero.

804

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

ACKNOWLEDGMENT
The authors would like to thank M. A. Horowitz, G. Nasserbakht, A. Ong, C. K. Yang, B. A. Wooley, and M. Zargari for
helpful discussions and support. They would further like to
thank Texas Instruments, Inc., and Stanford Nano-Fabrication
facilities for fabrication of the oscillators.

[19] T. Kwasniewski, M. Abou-Seido, A. Bouchet, F. Gaussorgues, and J.


Zimmerman, Inductorless oscillator design for personal communications devicesA 1.2  CMOS process case study, in Proc. CICC,
May 1995, pp. 327330.
[20] J. G. Maneatis and M. A. Horowitz, Precise delay generation using coupled oscillators, IEEE J. Solid-State Circuits, vol. 28, pp. 12731282,
Dec. 1993.
[21] W. A. Gardner, Introduction to Random Processes. New York:
McGraw-Hill, 1990.
[22] W. F. Egan, Frequency Synthesis by Phase Lock. New York: Wiley,
1981.

REFERENCES
[1] L. DeVito, J. Newton, R. Croughwell, J. Bulzacchelli, and F. Benkley,
A 52 and 155 MHz clock-recovery PLL, in ISSCC Dig. Tech. Papers,
Feb. 1991, pp. 142143.
[2] A. W. Buchwald, K. W. Martin, A. K. Oki, and K. W. Kobayashi, A 6GHz integrated phase-locked loop using AlCaAs/Ga/As heterojunction
bipolar transistors, IEEE J. Solid-State Circuits, vol. 27, pp. 17521762,
Dec. 1992.
[3] B. Lai and R. C. Walker, A monolithic 622 Mb/s clock extraction data
retiming circuit, in ISSCC Dig. Tech. Papers, Feb. 1993, pp. 144144.
[4] R. Farjad-Rad, C. K. Yang, M. Horowitz, and T. H. Lee, A 0.4 mm
CMOS 10 Gb/s 4-PAM pre-emphasis serial link transmitter, in Symp.
VLSI Circuits Dig. Tech Papers, June 1998, pp. 198199.
[5] W. D. Llewellyn, M. M. H. Wong, G. W. Tietz, and P. A. Tucci, A
33 Mbi/s data synchronizing phase-locked loop circuit, in ISSCC Dig.
Tech. Papers, Feb. 1988, pp. 1213.
[6] M. Negahban, R. Behrasi, G. Tsang, H. Abouhossein, and G. Bouchaya,
A two-chip CMOS read channel for hard-disk drives, in ISSCC Dig.
Tech. Papers, Feb. 1993, pp. 216217.
[7] M. G. Johnson and E. L. Hudson, A variable delay line PLL for CPUcoprocessor synchronization, IEEE J. Solid-State Circuits, vol. 23, pp.
12181223, Oct. 1988.
[8] I. A. Young, J. K. Greason, and K. L. Wong, A PLL clock generator
with 5110 MHz of lock range for microprocessors, IEEE J. Solid-State
Circuits, vol. 27, pp. 15991607, Nov. 1992.
[9] J. Alvarez, H. Sanchez, G. Gerosa, and R. Countryman, A widebandwidth low-voltage PLL for PowerPCTM microprocessors, IEEE
J. Solid-State Circuits, vol. 30, pp. 383391, Apr. 1995.
[10] I. A. Young, J. K. Greason, J. E. Smith, and K. L. Wong, A PLL clock
generator with 5110 MHz lock range for microprocessors, in ISSCC
Dig. Tech. Papers, Feb. 1992, pp. 5051.
[11] M. Horowitz, A. Chen, J. Cobrunson, J. Gasbarro, T. Lee, W. Leung,
W. Richardson, T. Thrush, and Y. Fujii, PLL design for a 500 Mb/s
interface, in ISSCC Dig. Tech. Papers, Feb. 1993, pp. 160161.
[12] T. C. Weigandt, B. Kim, and P. R. Gray, Analysis of timing jitter in
CMOS ring oscillators, in Proc. ISCAS, June 1994.
[13] J. McNeill, Jitter in ring oscillators, IEEE J. Solid-State Circuits, vol.
32, pp. 870879, June 1997.
[14] B. Razavi, A study of phase noise in CMOS oscillators, IEEE J.
Solid-State Circuits, vol. 31, pp. 331343, Mar. 1996.
[15] A. Hajimiri, S. Limotyrakis, and T. H. Lee, Phase noise in multigigahertz CMOS ring oscillators, in Proc. Custom Integrated Circuits
Conf., May 1998, pp. 4952.
[16] A. Hajimiri and T. H. Lee, A general theory of phase noise in electrical
oscillators, IEEE J. Solid-State Circuits, vol. 33, pp. 179194, Feb.
1998.
, The Design of Low Noise Oscillators. Boston, MA: Kluwer
[17]
Academic, 1999.
[18] A. A. Abidi, High-frequency noise measurements of FETs with small
dimensions, IEEE Trans. Electron Devices, vol. ED-33, pp. 18011805,
Nov. 1986.

Ali Hajimiri received the B.S. degree in electronics


engineering from Sharif University of Technology,
Tehran, Iran, in 1994 and the M.S. and Ph.D.
degrees in electrical engineering from Stanford University, Stanford, CA, in 1996 and 1998, respectively.
He was a Design Engineer with Philips, where
he worked on a BiCMOS chipset for GSM cellular
units from 1993 to 1994. During the summer of
1995, he was with Sun Microsystems, where he
worked on the UltraSparc microprocessors cache
RAM design methodology. During the summer of 1997, he was with Lucent
Technologies (Bell Labs), where he investigated low-phase-noise integrated
oscillators. In 1998, he joined the Faculty of the California Institute of
Technology, Pasadena, as an Assistant Professor. His research interests are
high-speed and RF integrated circuits. He is coauthor of The Design of Low
Noise Oscillators (Boston, MA: Kluwer Academic, 1999).
Dr. Hajimiri was the Bronze Medal Winner of the 21st International
Physics Olympiad, Groningen, the Netherlands. He was a corecipient of the
International Solid-State Circuits Conference 1998 Jack Kilby Outstanding
Paper Award.

Sotirios Limotyrakis was born in Athens, Greece,


in 1971. He received the B.S. degree in electrical
engineering from the National Technical University
of Athens in 1995 and the M.S. degree in electrical engineering from Stanford University, Stanford,
CA, in 1997, where he currently is pursuing the
Ph.D. degree.
In the summer of 1993, he was with K.D.D.
Corp., Saitama R&D Labs, Japan, where he worked
on the design of communication protocols. During
the summers of 1996 and 1997, he was with the
Texas Instruments Inc. R&D Center, Dallas, TX, where he focused on
LNA, low-phase-noise oscillator design, and GSM mobile unit transmit path
architectures. His current research interests include the design of mixed-signal
circuits for high-speed data conversion and broad-band communications.
Mr. Limotyrakis received the W. Burgess Dempster Memorial Fellowship
from the School of Engineering, Stanford University, in 1995.

Thomas H. Lee (S87M87), for a photograph and biography, see p. 585 of


the May 1999 issue of this JOURNAL.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 3, MARCH 1998

483

On-Chip Measurement of the Jitter Transfer


Function of Charge-Pump Phase-Locked Loops
Benot R. Veillette, Student Member, IEEE, and Gordon W. Roberts, Member, IEEE

Abstract An all-digital technique for the measurement of


the jitter transfer function of charge-pump phase-locked loops
(PLLs) is introduced. Input jitter may be generated using one
of two methods. Both rely on deltasigma modulation to shape
the unavoidable quantization noise to high frequencies. This
noise is filtered by the low-pass characteristic of the device
and has little impact on the test results. For an inputoutput
response measurement, the output jitter is compared against
a threshold. As the stimulus generation and output analysis
circuits are digital, do not require calibration, and demand a
small area overhead, this jitter transfer function measurement
scheme may be placed on the die to adaptively tune a PLL after
fabrication. The technique can also implement built-in self-test
(BIST) for the characterization or manufacture test of PLLs.
The validity of the scheme was verified experimentally with offthe-shelf components.
Index TermsMixed analog-digital integrated circuits, phaselocked loops, self-testing, semiconductor device testing, sigma
delta modulation.

I. INTRODUCTION

HASE-LOCKED loops (PLLs) operating on digital signals are fundamental in microelectronic systems. Indeed,
they realize essential functions such as clock distributions and
clock recovery. They can thus be found in large digital circuits
such as microprocessors and on mixed-signal ICs for digital
communications. These devices process digital signals where
the phase information is contained in the transition times. They
usually make use of sequential phase detectors which require
charge-pumps and are thus called charge-pump PLLs [1].
The specifications for these PLLs are extremely aggressive,
especially when embedded in high-speed digital communication systems. Process variations can adversely affect circuit
performance and result in low yield. To increase the number
of good parts, some of the components can be trimmed. This
is, however, a very expensive process. Furthermore, aging or
different operating conditions can later affect circuits such that
they no longer meet specifications. The solution is to allow
the PLL to self-calibrate. With this property, the expensive
trimming stage can be avoided and changes in the operating
conditions can be tracked. However, the challenge now is to
integrate on-chip a characterization scheme. This involves two
functional blocks: a stimulus source and a response analyzer.

Manuscript received July 15, 1997; revised October 20, 1997. This work
was supported by NSERC and the Micronet, a Canadian federal network
of centers of excellence dealing with microelectronic devices, circuits, and
systems for ultra large scale integration.
The authors are with Microelectronics and Computer Systems Laboratory,
McGill University, Montreal, PQ H3A 2A7, Canada.
Publisher Item Identifier S 0018-9200(98)01021-X.

Fig. 1. Jitter transfer function test setup.

These circuits should not necessitate calibration themselves.


Furthermore, the silicon area overhead should be small.
Fig. 1 shows a typical test setup for the measurement
of the jitter transfer function in a laboratory [2]. A signal
source generates a high-frequency carrier which is phase
modulated with the source output of a spectrum analyzer
using an Armstrong phase modulator [3]. This jittery signal
is fed to the clock input of a data generator. The recovered
signal from the PLL is down-modulated and observed on the
spectrum analyzer. It can be seen that this procedure requires
precision analog signal sources and instruments. Looking
for shortcuts, engineers are tempted to break the task and
measure components independently [4] to infer the device
characteristics. However, the nature of the phase-locked loop
renders this solution unattractive as the tight feedback and
the sensitivity of some nodes to parasitics makes it difficult
to relate the values of components to the PLL behavior.
A significant level of accuracy may only be achieved by
measuring the system as a whole.
In this paper we propose to verify charge-pump PLLs
characteristics using mostly synchronous digital circuits. It
implies that the stimulus signals can only change at the clock
edges and that the output signals may only be sampled at
the same clock edges. This would seem like an unbearable
constraint as jitter, both created and measured, is quantized
to the test clock period. The results of any test would thus
be severely limited in precision. However, this hurdle can be
overcome using low-pass deltasigma (
) modulation [5].
This technique can encode high quality signals, in this case
a sinewave, on one or a few bits. The quantization noise
introduced in the operation is shaped to high frequencies.
Since PLLs are low-pass, high-frequency jitter is filtered

00189200/98$10.00 1998 IEEE

484

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 3, MARCH 1998

Fig. 2. Digital phase locked loop functional block diagram.

out. The output will thus exhibit a sufficiently high SNR to


be considered a pure sinewave. Therefore, the input highfrequency jitter noise does not affect test results. This filtering
principle was demonstrated for a voice codec, another low-pass
circuit [6].
Two methods will be presented to create jitter. The first
one is the digital modulation of the edges of a reference
signal using a higher frequency clock. The alternative is the
injection of a sinusoidal signal in the loop through a second
charge pump. The second method requires a lower clock
frequency but the first one is nonintrusive. Contrary to the
usual measurement procedure, a fixed output jitter is selected
and the amplitude of the input jitter at a given frequency is
varied. Ultimately, the test signal amplitude that results in the
selected output jitter is obtained and used to compute the jitter
transfer function. It is important to note that while both jitter
creation schemes are digital, the jitter loop injection method
does not require a test clock frequency larger than the PLL
operating frequency. Nevertheless, as will be shown, the test
accuracy can be increased with the digital phase modulation
and a higher clock frequency.
The outline of the paper is the following. First, an overview
of the PLL will be provided in Section II to establish the
notation. In Sections III and IV, the two methods for stimulating the device will be explained. Section V will study signal
generation using deltasigma modulation. The analysis of the
output signal as well as the test methodology will be the topics
of Section VI. Section VII will briefly look at the issue of
accuracy. Experimental setup and results will be discussed in
Sections VIII and IX, respectively. Overhead will be addressed
in Section X. Finally, conclusions will be drawn.
II. PHASE-LOCKED LOOP OVERVIEW
A block diagram of a simple charge pump PLL is shown
in Fig. 2. To the left, a sequential phase detector compares
the transitions of the reference input and voltage-controlled
oscillator (VCO) output signals. A two-output phase-frequency
detector (PFD) is illustrated, but other types of sequential
phase detectors may also be employed. The output of the PFD
can be any of three logic states, and thus a charge pump is
required for digital-to-analog conversion. The charge pump
can be of the current or voltage type. The jitter signal injection
technique of Section V relies on current charge pumps, but it
could also be adapted for a voltage charge pump as long as the
filter allows summation of voltages. Referring back to Fig. 2,
the low-pass filter removes short term variations and shapes
the PLL characteristics. The VCO in turn generates a square
wave whose frequency depends on the level of its analog input.
A counter may be inserted in the feedback loop to lock the
VCO clock to a lower frequency reference signal.

(a)

(b)
Fig. 3. Models of charge-pump phase-locked loop: (a) continuous time and
(b) discrete time.

The continuous-time linear model of this PLL in steady-state


operation is illustrated in Fig. 3(a). The variables
and
are the phase of the reference signal and the VCO output
signal, respectively. It should be understood that while these
variables represent jitter in a signal, the other variables in the
circuit stand for either voltage or current. The phase detector
converts phase to one of the two analog quantities while the
VCO performs the complementary operation. In Fig. 3(a), the
parameter
is the composite gain of the phase detector and
charge-pump circuits expressed in A/rad or V/rad depending
on the type of charge pump. The transfer function of the loop
filter is denoted
. The gain of the VCO is labeled
and is stated in rad/V. If a counter is present, then its effect
is lumped into this constant by dividing the VCO intrinsic
gain by the counter length ( in Fig. 2). With this operation,
a counter in the PLL becomes transparent for the proposed
measurement method.
The continuous-time model allows satisfactory predictions
of the PLL behavior. On the other hand, the phase of digital
signals is contained in the signal transitions and is thus
better represented as a discrete-time sequence. Therefore,
the analysis of PLL operating on digital signals should be
performed using difference equations or -transform tools.
Indeed, it has been shown that the discrete-time model is more
accurate, especially at high jitter frequency [7]. This model is
shown in Fig. 3(b). The closed-loop equation governing the
operation of the PLL is
(1)
is
In this equation, the discrete-time transfer function
the impulse invariant transform of the series combination of
the loop filter and VCO transfer functions (
). The
function
is labeled the jitter transfer function. Many
PLL specifications can be extracted from this transfer function
such as the jitter bandwidth and the jitter peaking. The latter
.
measure corresponds to the maximum value of
Process variations can significantly alter the charge pump
and VCO gains as well as the filter passive components values.
The method we propose can be used to automatically trim
components for compliance with a desired
. Alterna-

VEILLETTE AND ROBERTS: ON-CHIP MEASUREMENT OF THE JITTER TRANSFER FUNCTION

485

tively, it can be used to screen devices by comparing the


measured magnitude of
against a mask.
III. INPUT DIGITAL PHASE MODULATION
The first method that we will consider to generate the
test stimulus is the digital phase modulation of the PLL
input. Sinusoidal jitter of arbitrary frequency and amplitude
can be generated by modifying the edges of the reference
signal digitally. A test clock frequency which is a multiple of
the reference signal frequency is a prerequisite. This method
therefore may not be suitable for high-frequency PLLs. The
principle is to control the instantaneous phase of a 1-b digital
signal by delaying it by an amount set by a multibit digital
signal. Fig. 4 illustrates a circuit that can realize this operation.
The input signal to the delay cells string is a pulse whose
period is an integer multiple of the test clock period. The
digital signal generator output, denoted
, is updated at
every PLL cycle and delays the input signal by a variable
amount. Fig. 5 shows a typical waveform at the output of this
circuit for a test clock ( ) eight times the reference signal
frequency ( ). The dotted lines represent the test clock edges
with the thick ones used as zero jitter marks. The PLL input
signal jitter can thus be expressed as
rad

(2)

Obviously, the number of inputs to the multiplexer is limited


by the silicon area available but also by the ratio of the frequency of the clock operating the delay cells to the frequency
of the reference signal. The limited number of delay cells is
likely to make the jitter of the input signal
coarsely
quantized such that its SNR is unacceptably low. However,
PLLs are frequency selective with respect to jitter. Indeed,
the jitter transfer function is low-pass and the device filters
high-frequency jitter. When incorporating a
modulator
in the signal generator, quantization noise can be shaped to
high frequencies [5]. Therefore, the encoded signal from a
low-pass
modulator, such as
in Fig. 5, contains
a high-quality low-frequency sinewave and high-frequency
is
quantization noise as illustrated in Fig. 6(a). While
a multibit signal as it controls an
multiplexer,
modulation can ultimately encode signals on a single bit.
Because the PLL is low-pass and is designed to reject highfrequency components in the loop, the quantization noise will
be suppressed, leaving only the sinusoid in the output jitter as
shown in Fig. 6(b). Section V will examine how such a signal
can be generated.
IV. SIGNAL INJECTION

IN THE

LOOP

The second option for the generation of jitter is the injection


of a test signal at the input of the loop filter as shown in
Fig. 7. This signal source, represented here by the variable
, is injected through a second charge pump with gain
. It should be understood that this signal source is not a
jittery digital signal but an analog signal embedded in a 1-b
digital signal encoded using a
modulator. However, this
signal, when referred back to the input, is equivalent to input

Fig. 4. Digital phase modulation circuit.

Fig. 5. Typical digital phase modulated waveform.

(a)

(b)

Fig. 6. Phase spectrum: (a) input signal jitter and (b) output signal jitter.

Fig. 7. Discrete-time model of phase-locked loop modified for signal injection.

jitter. The PLL reference signal meanwhile is a square wave


and therefore the input jitter
will be zero. This setup
will be used to evaluate the characteristics of the PLL through
the measure of its
transfer function. Examining
the model of Fig. 7, it can be seen that this transfer function
will be equal to
(3)
It is thus equivalent within a multiplicative constant to the jitter
. For a spectral test such as the
transfer function
jitter transfer function, the test signal is a sinewave encoded
into a single bit. The quantization noise is concentrated at high
frequencies and is filtered out as explained in the previous
section. It is important to note that the clock period for signal
injection must be an integer multiple of the reference signal
period to prevent aliasing of the quantization noise back in
the PLL passband. This condition implies that the signal

486

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 3, MARCH 1998

(a)

Fig. 8. Injecting a signal into the PLL.

injection frequency cannot be higher than the reference signal


frequency. Converting the 1-b digital stream to an analog
signal and summing it with the output of the phase detector
is quite simple. A second current charge pump is placed
in parallel with the phase detector charge pump and both
outputs are connected together, forming a current summing
node as shown in Fig. 8. The accuracy of this analog-todigital conversion is a function of the matching of the two
current sources
and , typically ranging between 0.11%
in monolithic form. In this schematic, the impedance
implements the loop filter and
is the controlling voltage
of the VCO.

(b)
Fig. 9. Spectrum of test jitter signals: (a) signal band and (b) Nyquist
interval.

V. NOISE-SHAPED SINUSOID GENERATION


The basis of the jitter generation methods described in the
previous two sections is low-pass deltasigma modulation.
This technique allows the encoding of a bandlimited signal
represented by a large number of bits onto a very small number
of bits. An error is created by this quantization operation but it
is filtered such that it appears mostly at high frequencies. This
feat is realized by feeding back the quantization error to a filter
and summing it with the input before the next quantization.
Using deltasigma modulation, arbitrary signals, such as a
sinusoid, can be represented by a square wave (1-b signal)
with the difference being composed mostly of high-frequency
noise.
Fig. 9 shows the spectrum of a sinusoid for jitter generation
coded using different quantizers. Two curves show the power
density spectrum of multibit signals for the digital phase
modulation method. They were generated for different test
clock ratios (
) and thus different quantization steps ( ).
The last curve shows a 1-b signal for the signal injection
modulator was
method. The same second-order low-pass
used to produce each curve, only the quantizer was changed.
The input signal amplitude was kept the same in each case. As
explained previously, the signal is located at low frequencies
where the noise power is small. It can also be seen that the
quality of the signal improves with the number of quantization
steps (smaller step size ). This is evident by the drift in

the curve downward in Fig. 9 as decreases. Three possible


methods, illustrated in Fig. 10, may be employed on-chip to
generate any of the signals of Fig. 9. The selection of one of
these for implementation is based on the available resources
in the system, silicon area, PLL speed, and required accuracy.
Each method is described below.
A. Direct Frequency Synthesis
The most straightforward and versatile signal generation
scheme is a ROM-based digital frequency synthesizer [8]
followed by a low-pass
modulator. The output of the
modulator may be a 1-b signal or a multibit signal [9]
as required by the jitter creation technique used. However,
the direct frequency synthesis solution requires a large silicon
area and is thus usually not a good choice for built-in selftest (BIST). However, if a large block of RAM is present in
the system, then it could be enrolled to store data for signal
generation during the PLL test phase.
B. DeltaSigma Oscillator
An alternative is to use a circuit called a low-pass
oscillator [10]. The circuit, illustrated in Fig. 10(b), is a
digital resonator where the frequency setting multiplier has
been replaced by the combination of a
modulator and

VEILLETTE AND ROBERTS: ON-CHIP MEASUREMENT OF THE JITTER TRANSFER FUNCTION

(a)

taken. Some form of optimization is necessary to obtain good


results [12]. Also, a different data stream is required for each
frequency and amplitude desired. Nonetheless, considerable
speed can be achieved with this technique, and the overhead
is the smallest of the three methods.
VI. EVALUATION

(b)

(c)
Fig. 10. Generating deltasigma encoded sinusoids: (a) direct digital frequency synthesis, (b) deltasigma oscillator, and (c) fixed-length periodic
byte stream.

Fig. 11.

Comparing waveform against jitter threshold.

a multiplexer. To avoid large multiplexers, the output of the


oscillator is usually a single bit. Therefore, a second
modulator may be required for multibit signal generation.
However, as the structure is similar to the first one except for
the quantizer, hardware can be time-shared. Because a ROM
or a multiplier is not required, the silicon implementation of
oscillators is very efficient. However, the presence of
a nonlinear block in the feedback loop makes this device
difficult to predict using a linear model. Off-line simulations
are required to achieve maximum accuracy. Nevertheless,
oscillators can be implemented quickly and may possibly use
available on-chip computing resources such as a digital signal
processor (DSP).
C. Fixed-Length Bit Streams
The last method consists in generating a data stream from a
software sinewave generator and low-pass
modulator and
then selecting a subset of this stream. This subset is stored in
memory and the resulting data stream is then repeated [11]
as illustrated in Fig. 10(c). One can view this approach as
a special case of the ROM-based digital frequency synthesis
scheme presented above. This method is particularly useful for
single-bit signals as they can be represented using a very small
number of bits, on the order of a hundred. The downside is
that signal quality will vary widely with different subsets of
the same length of a given
modulator output if care is not

487

OF THE

OUTPUT JITTER

The exact gauging of the jitter response


is rather
difficult. However, a measure that can be made with good
accuracy is the point when
reaches a predetermined
value or threshold. The edges of the test clock, assumed to
be jitter free, will be used to establish this threshold. Fig. 11
illustrates how this can be accomplished. The dashed lines
represent the rising edge of the test clock with the bold ones
indicating the rising transition of the reference clock. The
output signal is sampled at positive test clock edges until two
adjacent samples yield a zero followed by a one, indicating
a rising edge of the signal. If this rising edge occurred in
intervals immediately before or after the reference test clock
edge, then jitter is below threshold; otherwise, an error is
generated. Over a time interval, the number of errors are
counted and a bit error rate (BER) measure can be obtained.
This averaging is done to prevent glitches and noise signals
from significantly affecting the final result. In Fig. 11, a ratio
of the test clock frequency over the reference signal frequency
of eight allows a minimum value for the threshold of
.
Larger values could also be used for the threshold by allowing
more test clock cycles in the valid interval.
The previous method requires a test clock frequency at least
three times the reference frequency. Indeed, three periods of
the test clock for a PLL period is the limiting case as two of
the clock periods must compose the valid interval, leaving one
to catch jitter above threshold. A different scheme, however,
can be used that compares jitter against a
rad threshold
using a 50% duty cycle test clock of the same frequency
as the reference clock. Fig. 12(a) illustrates how it can be
implemented with a few gates and registers. The threshold
is verified by sampling a data signal with both the reference
clock, used at the input of the PLL, and the recovered clock
at the VCO output. This data signal will toggle between one
and zero at the reference signal falling edges, resulting in a
frequency half that of the reference clock. When the output
jitter exceeds rad, sampling errors will occur with the VCO
output clock because it will sample a different data than the
reference clock. This is shown in Fig. 12(b) where an error
occurs when the jitter goes from 0.9 to 1.1 . This circuit
is somewhat similar to the circuit typically used for a jitter
tolerance test [2].
A single frequency point test is thus performed as follows:
an input jitter (
or
) is applied to the PLL and its
amplitude is modified until the maximum value resulting in
output jitter below threshold is found. The fastest procedure
for obtaining
or
is to use a binary search algorithm.
The initial amplitude ( ) is zero and the initial increment
.
( ) is 0.5. A test is performed with amplitude
If the VCO output jitter is lower than the threshold, then
the next amplitude
will be
. Otherwise, the

488

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 3, MARCH 1998

(a)

(b)
Fig. 12.

(a) Circuit to evaluate  rad jitter threshold. (b) Typical waveforms.

amplitude remains the same (


). The increment is
then divided by two and the procedure is repeated until the
desired accuracy is achieved. The uncertainty associated with
the signal amplitude following a step measure will be
which equals 2
. However, arbitrary accuracy cannot be
achieved as noise sources are always present.
VII. ACCURACY
The accuracy of the measured results will depend on many
factors. Foremost is the residual input signal jitter quantization
noise which passes through the PLL. The jitter creation clock
frequency along with the
modulator noise transfer function, the number of quantization bits, and the PLL bandwidth
will influence the effective SNR of the measured jitter at the
output. It is interesting to note that for each frequency point
on the jitter transfer function, the SNR of the measured output
jitter will remain constant. This is because, under a white noise
assumption for the quantizer error, the noise present in the PLL
input signal jitter does not vary for a selected
modulator
implementation. It is only a function of the quantizer step
and does not change with the signal amplitude and frequency.
Neglecting internal noise sources, the PLL output jitter noise
power density is a result of the input jitter noise power density
shaped by the jitter transfer function. It can thus be seen
that the PLL output jitter noise power is independent of the
parameters of the test sinusoid. Furthermore, as the threshold
is fixed, the PLL output sinewave jitter amplitude must also
be constant. In the measurement procedure, the input jitter
sinewave amplitude is varied to account for the response of
the PLL at a given frequency. Consequently, for a given PLL,
the SNR of the measured output jitter is fully determined from
the quantizer step size for the input signal jitter and the jitter
threshold at the output. Fig. 13 shows the theoretical SNR of
the output jitter with respect to the ratio of the cutoff frequency
of a second-order PLL over the PLL lock frequency, referred
to here as the PLL relative bandwidth. A curve is displayed
for each of the three input jitter quantization granularities.
A second-order
modulator is used to generate the input
signal, and the output jitter amplitude is held constant at . For

Fig. 13. Signal-to-noise ratio of the output jitter versus the PLL relative
bandwidth.

reference, digital data communication systems such as SONET


mandate a relative loop bandwidth of about 0.1%.
Also of concern is the amount of jitter present in the test
clock. However, the clock signal should be generated by a
tester and, provided a sound floorplan, this source of error
should be negligible. A more significant problem is the jitter,
both static and random, generated internally. It will add to the
jitter from the input signal and thus modify the effective output
jitter threshold. Its effect can only be reduced by using a jitter
threshold much larger than the jitter noise. Alternatively, it
could be accounted for and subtracted from the final results.
Finally, for the signal injection method, the matching of the
two charge pumps is obviously a cause of errors.
VIII. EXPERIMENTAL SETUP
A test setup, whose schematic is shown in Fig. 14, was
implemented on a breadboard using off-the-shelf components.
The device under test (DUT) is centered around the VCO from
a 74HC4046 monolithic phase-locked loop. However, because
this IC uses a voltage charge pump followed by a passive
filter and since its phase detector could not be separated from
this block, an XC4010 FPGA was programmed to implement
the classical phase-frequency detector [13]. The charge pump
is built out of discrete NPN and PNP transistors, a resistor,
and analog switches from a 74HC4066. A circuit is also
required to maintain the transistors inside their linear region
when the switches are open, but it is not shown here. The
PLL was operated at 100 kHz as parasitics of the board and
the time constant of some components do not allow for a
higher frequency. However, the measurement scheme should
be extendable to much higher frequency as the test circuits are
similar in nature to the PLL components.
Both digital signals for jitter creation (
and
) are
generated by a low-pass
oscillator programmed on the
same field-programmable gate array (FPGA). It uses 24-b
buses to achieve a tunability of 55 parts per million of
its programmable clock. A 3-b quantizer in addition to the
standard 1-b quantizer makes the signal generator capable of

VEILLETTE AND ROBERTS: ON-CHIP MEASUREMENT OF THE JITTER TRANSFER FUNCTION

Fig. 14.

489

Experimental setup.

multibit output [refer to Fig. 10(b)] for the purpose of digital


phase modulation. It should be noted that apart from the
modulator circuitry is not duplicated as it will
quantizer,
be operated in time-shared mode at double speed for multibit
operation.
The input to the PLL can be set to accommodate both jitter
generation methods. For the loop jitter injection scheme, a 100kHz square wave is presented to the input of the phase detector.
The input can also be the same signal phase modulated with the
help of an 800-kHz test clock. Eight jitter steps are therefore
possible, resulting in a
quantization.
Various jitter threshold circuits are also implemented on
the FPGA. The
jitter threshold circuit of Fig. 12(a) will
be employed in conjunction with the loop jitter injection. On
the other hand, thresholds of
and
are implemented
for the digital phase modulation method, making use of the
higher frequency test clock.
For each test, a warm-up stage of 214 data cycles is executed
to remove transients before a 216 data cycle test stage is
performed. The error threshold is set to 64, corresponding
to a BER of 10 3 . A control module built around a finite
state machine selects the amplitude of the input jitter for the
ensuing test according to the output of the jitter threshold
circuit, using the binary search algorithm. At each frequency
point, the amplitude is resolved to an accuracy of 15 b within
13 s. The entire digital circuitry for all the experiments requires
81% of the resources of an XC4010 FPGA. This experimental
setup is connected to a workstation through I/O modules to
allow a driving software to set the low-pass
oscillator
frequency as well as read the amplitude.
IX. EXPERIMENTAL RESULTS
The jitter transfer function measurement was carried out
for both the jitter injection and the digital phase modulation
techniques on two different PLL configurations with different
bandwidths and damping values. Table I summarizes the main
parameters of these experiments. The same current amplitude
was used for both charge pumps (
). The transfer
functions are presented in the continuous-time domain as this
is more typical of what can be found in industry.

TABLE I
EXPERIMENT PARAMETERS

The results of the experiments on the first configuration


are shown in Fig. 15. A measured jitter transfer function for
each jitter generation method is displayed. The dotted line
represents the theoretical jitter transfer function as predicted
from the direct measurement of the components. The phase
modulation scheme used a
threshold for this experiment.
The curve shows a 0.4-dB offset which can be attributed
mostly to the static jitter of the PLL. For the other jitter
creation scheme, the signal injection clock was chosen to be
50 kHz, that is half the PLL rate, in order to demonstrate
the flexibility in selecting this parameter. The offset is larger,
possibly because of mismatch between the two charge pumps
realized out of discrete transistors. The first two columns of
Table II summarize the features of the curves after removal
of the offsets. Both methods yield similar results for the PLL
bandwidth and jitter peaking. The theoretical predictions are
slightly off, most probably because of the parasitics of the
setup which were not accounted for in the calculations.
The jitter transfer functions measured in the second experiment are shown in Fig. 16. This PLL exhibits a larger
bandwidth and is more damped. It can be seen that the curve
obtained here with the jitter injection technique is of lesser
quality. This came about because the larger bandwidth yields
a lower output jitter SNR. From the graph of Fig. 13, it can
be seen that this SNR is barely over 20 dB. On the other
hand, the digital phase modulation still shows a smooth curve
because of the 3-b quantization which results in lower jitter

490

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 3, MARCH 1998

sought. However, in many applications, a smaller number of


signal frequencies and amplitudes are necessary and a fixedlength periodic bit stream could generate the signals for the
cost a few kilobits of RAM. For example, to verify that the
bandwidth of the PLL is smaller than some value, only one test
is required. Specific values for overhead or measurement time
depend heavily on the PLL application. Moreover, one should
not have a dogmatic stance about overhead as the addition of
the on-chip measurement circuits add value to a system. The
economic gains from a BIST may far outweigh the cost of
extra silicon.

XI. CONCLUSIONS

Fig. 15.

Jitter transfer function for experiment 1.


TABLE II
RESULTS SUMMARY

We have presented a PLL jitter transfer function measurement technique which is entirely digital except for the possible
addition of a charge pump. The technique is suitable for onchip measurement since it does not require trimming and the
silicon overhead is small. Two methods were introduced for
the creation of jitter, allowing tradeoffs between test clock
frequency on one side and loading, complexity, and accuracy
on the other side. Experimental results were presented which
suggest this scheme could be successfully implemented on
silicon.
ACKNOWLEDGMENT
The authors acknowledge the suggestions of B. Gerson from
PMC Sierra.
REFERENCES

Fig. 16.

Jitter transfer function for experiment 2.

noise levels at the output. Again, the meaningful parameters


are summarized in the two right-most columns of Table II.
X. IMPLEMENTATION
For any integrated measurement scheme, the area overhead
is obviously a major concern. While the digital portion of
the experimental setup uses a large portion of the FPGA, a
much more compact implementation is possible. Indeed, a
oscillator was selected as the signal generator because
of its versatility as a complete jitter transfer function was

[1] F. M. Gardner, Charge-pump phase-lock loops, IEEE Trans. Commun.,


vol. COMM-28, pp. 18491857, Nov. 1980.
[2] L. DeVito, A versatile clock recovery architecture and monolithic implementation, in Monolithic Phase-Locked Loops and Clock Recovery
Circuits: Theory and Design, B. Razavi, Ed. New York: IEEE Press,
1996.
[3] E. H. Armstrong, A method of reducing disturbances in radio signaling
by a system of frequency modulation, in Proc. IRE, May 1936, vol.
24, no. 5, pp. 689740.
[4] P. Goteti, G. Devarayanadurg, and M. Soma, DFT for embedded
charge-pump PLL systems incorporating IEEE 1149.1, in Proc. IEEE
1997 CICC, Santa Clara, CA, May 1997, pp. 210213.
[5] M. W. Hauser, Principles of oversampling A/D conversion, J. Audio
Eng. Soc., vol. 39, nos. 1/2, pp. 326, Jan./Feb. 1991.
[6] M. F. Toner and G. W. Roberts, A BIST scheme for an SNR, gain
tracking, and frequency response test of sigmadelta ADC, IEEE Trans.
Circuits Syst.II, vol. 41, pp. 115, Jan. 1995.
[7] J. P. Hein and J. W. Scott, z -domain model for discrete-time PLLs,
IEEE Trans. Circuits Syst., vol. 35, pp. 13931400, Nov. 1988.
[8] J. Tierney, C. M. Rader, and B. Gold, A digital frequency synthesizer,
IEEE Trans. Audio Electroacoustic, vol. 19, pp. 4857, 1971.
[9] J. G. Kenney and L. R. Carley, Design of multi-bit noise shaping data
converter, Analog Integrated Circuits and Signal Processing J., May
1993, vol. 3, no. 3, pp. 259272.
[10] A. K. Lu, G. W. Roberts, and D. A. Johns, A high-quality analog
oscillator using oversampling D/A conversion techniques, IEEE Trans.
Circuits Syst.II, vol. 41, pp. 437444, July 1994.
[11] E. M. Hawrysh and G. W. Roberts, An integration of memory-based
analog signal generation into current DFT architectures, in Proc. 1996
ITC, Washington, DC, Oct. 1996, pp. 528537.
[12] B. Dufort and G. W. Roberts, Signal generation using periodic single
and multi bit sigmadelta modulated streams, in Proc. IEEE 1997 ITC,
Washington, DC, Nov. 1997, pp. 396405.
[13] C. A. Sharpe, A 3-state phase detector can improve your next PLL
design, EDN Mag., pp. 5559, Sept. 1976.

VEILLETTE AND ROBERTS: ON-CHIP MEASUREMENT OF THE JITTER TRANSFER FUNCTION

Benot R. Veillette (S97) was born in TroisRivieres, Quebec, Canada, on January 1, 1971.
He received the B.Eng. (Honors) degree and the
M.Eng. degree from McGill University, Montreal,
PQ, Canada, in 1993 and 1995, respectively. He
is now completing the Ph.D. degree in electrical
engineering from the same institution.
His current research interests are in deltasigma
modulation, analog integrated circuits for communications, and mixed-signal testing.

491

Gordon W. Roberts (S85M85) was born in


Toronto, Canada, in 1959. He received the B.A.Sc.
degree in electrical engineering from the University
of Waterloo in 1983 and the M.Eng. and Ph.D.
degrees also in electrical engineering from the University of Toronto in 1986 and 1989.
In 1989 he joined the faculty of McGill University
where he is presently an Associate Professor. He
co-authored several text books and has contributed
seven chapters to various edited volumes related to
analog IC design and test.
He is presently an Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS
AND SYSTEMS and Editor of the IEEE Design and Test Magazine. He has
received numerous department and faculty awards for teaching, as well as
several IEEE awards for his work related to mixed-signal testing.

Physical Processes of Phase Noise

Differential LC Oscillators

Systems Laboratory
University o f California
Los Angeles,CA 90095-1594

Introduction
There is an unprecedented interest among circuit designers today
to obtain insight into mechanisms of phase noise in LC oscillators. For only with this insight is it possible to optimize oscillator
circuits using low-quality integrated resonators to comply with
the exacting phase noise specifications of modern wireless systems. Various numerical simulators are now available to assist the
circuit designer [ 11, [2], [3], in some cases accompanied by qualitative interpretations [4]. At present, therefore, the situation of the
oscillator designer is similar to the designer of amplifiers who is
equipped only with SPICE, but who lacks physical insight and
methods for simple yet accurate analysis with which to optimize a
circuit.
Over the years, various attempts at phase noise analysis have produced results that are variations on Leesons classic heuristic derivation without formal proof [SI, [6]. These analyses are based
on a linear model of an LC resonator in steady-state oscillation
through application of either feedback or negative conductance.
The results confirm Leeson by showing that phase noise is proportional to noise-to-carrier ratio and inversely to the square of resonator quality factor. However, without knowledge of the constant
of proportionality, which Leeson leaves as an unspecified noise
factor, the actual phase noise cannot be predicted.

The results are validated against SpectreRF simulations and measurements on two differential CMOS oscillators tuned by resonators with very different Qls.

Recognizing Phase Noise


For the purposes of analysis, a noise spectrum is considered as
consisting of uncorrelated sinewaves in a 1 Hz bandwidth at any
given frequency. Voltage or current noise produces amplitude and
phase fluctuations when superimposed on a periodic signal (from
now on, a large sinewave V0sin(2nf,t)). This is clearly seen [lo] by
isolating one sinewave vn in the noise spectrum, say at a frequency
offset +fmfrom the sinewave frequency f,. Figure 1 shows this as a
phasor vn rotating relative to the sinewave phasor V,, which is then
decomposed into two equal collinear phasors at +fm, and two antiphase conjugate phasors which are assigned a negative relative frequency -fm . Grouping the phasors pairwise as ?fm, it is seen that
one pair modulates the amplitude of the sinewave with time (AM),
while the other sweeps its phase (PM). Thus, half of any additive
noise on a sinewave produces phase noise, the other half amplitude
noise. When sin(w,t) is accompanied either by noise sinewave phasors +sin(wO+wm)t,+sin(wO-wm)t or by fcos(a,+am)t, +cos(o,a,)t, then phase noise alone is present.

Simple Model of the Differential Oscillator

It is now well understood that the large-signal periodic switching


of a self-limited oscillator [7] underpins this noise factor [SI. At
first sight, an accurate noise analysis of an oscillator subject to periodic bias currents appears intractable, however by using sensible
approximations Huang has solved this problem for a Colpitts oscillator [9] and obtained good agreement between analysis and measurements of thermally induced phase noise. The mechanisms of
flicker noise upconversion, which are important in CMOS oscillators, remain obscure.

This paper treats the well-known tail-current biased differential


L C oscillator (Figure 2). In steady state, the differential pair acts as
a negative conductance that switches the tail current I, into the LC
resonator. Owing to filtering in the L C circuit, the square wave of
current creates a sinusoidal voltage across the resonator of amplitude (4/z)I,R. This voltage drives the differential pair into switching, thus sustaining oscillation. In a CMOS oscillator the amplitude may build up to several volts, eventually limited by the supply
voltage.

In this paper we concentrate on an understanding of the popular


differential LC oscillator. We introduce simple models to capture
the nonlinear processes that convert voltage or current thermal
noise in resistors or transistors into phase noise in the oscillator.
The analysis does not require hypothetical elements, such as limiters or amplitude control loops, to fully explain phase noise. A
simple expression at the end accurately specifies thermally induced
phase noise, and lends substance to Leesons original hypothesis.
Next, the upconversion of flicker noise into phase noise is traced
to mechanisms first identified in the 1930s, but apparently since
forgotten. Unlike thermally induced phase noise, which appears as
phase modulation sidebands, flicker noise is shown to upconvert by
bias-dependent frequency modulation.

In previous work on noise in mixers [ll], we have shown how


a simple model of the switching differential pair is sufficient to
explain all frequency translations of noise. This model is used here.
Suppose that some noise (v) accompanies the resonator sinewave.
Assuming that a small fraction of the resonator voltage around the
zero crossing is enough to fully switch the differential pair, then
the noise simply advances or retards the instant of zero crossing
(Figure 3(a)). The randomly pulse-width modulated current at the
switch output may be decomposed into the original periodic square
wave in the absence of noise, superimposed with pulses of constant height but random width (Figure 3(b)). In turn, these pulses
may be approximated by a train of impulses at twice the oscillation
frequency multiplying the original noise waveform vn(t) (Figure
3(c)).

25-1-1
0-7803-5809-O/OO/$lO.OO
0 2000 IEEE IEEE 2000 CUSTOM INTEGRATED CIRCUITS CONFERENCE

569

Thermally Induced Phase Noise


Resonator Noise
Now consider a current source insin((w,+wm)t+@) representing
noise in the loss conductance of the resonator, where i:=4kT/R.
According to the model above, this modulates the zero crossing
instants of the differential pair, producing a current which, in addition to the usual square wave, also consists of current pulses sampling this noise at 20,. After sampling, frequency components
appear at O,?O,, 3w,+0,, ... However, usually the resonator will
filter the 3"' and higher harmonics, leaving o ~ ~ was, the only
important terms. These will induce a symmetric voltage response
in the resonator, and through feedback arrive at steady state. The
steady-state oscillation, in general, is of the form:
uOut= V,sinw,,t + Asin(w, - w,)t
Bcos(w,, - ~ , ~ ) t

+Csin(w,,
w,")t Dcos(w,,
w,,,)t
and here A=-C= i"x(L0,2/40,), while BzD-0. The relative signs
of A and C prove that the steady-state response to current noise in
the resonator's resistor is phase noise in the oscillator. The singlesideband phase noise density is found by the ratio of the sideband
power at a given frequency to the power in the fundamental oscillation frequency. Thus, the thermally induced phase noise density
due to resonator loss is:

where N,=2, the number of loss sources (in the left and right resonators) and N,=4 because uncorrelated quadrature noise originating at o,+o, contributes to SSB phase noise at offset w,.

Tail Current Noise


The switching action of the differential pair commutates noise in
the tail currents like a single-balanced mixer. The noise is translated up and down in frequency, and enters the resonator. The
resulting voltage drives the differential pair, the noise components
modulating the zero crossing instants. The resulting impulses of
current feed back into the resonator. The steady-state solution is
found by solving simultaneous equations of a form that anticipates
the end result, much like in any feedback circuit.
The single-balanced mixer shows the largest conversion gain
around the fundamental switching frequency, 1/3'd the current
conversion gain around the 3'" harmonic, and so on. Therefore,
only mixing by the fundamental at is important. Noise originating in the tail current at W, upconverts to w0+w,. Similarly, noise
at 20,,f0, downconverts to o , ~ o , .
Analysis shows that the upconversion produces coefficients A=C,
B=-D, both of which indicate AM only. It should be noted that
AM noise superimposed on the resonator fundamental frequency
does not modulate the zero crossings of the switching differential
pair, and therefore does not propagate in the feedback loop back
into the resonator. However, the downconversion results in phase
noise only, with A=-C, and B=D=O. The phase noise caused by
thermal noise originally at 20, is:

where y is the noise factor of a single FET, classically 2/3. It is


important to note that the AM noise resulting from upconversion,
if impiessed across a varactor at the resonator, will modulate the
varactor, thus the oscillation frequency by AM-to-FM conversion
[E!].
Although the process is different, the resulting sidebands
are indistinguishable from PM noise sidebands. Unlike the other
mechanisms of phase noise, this effect depends on the varactor
characteristics and VCO tuning range and it may be significant
only in certain situations.

DifferentialPair Noise
Noise originating in the differential pair is unlike the previous two
cases. There, only certain parts of the noise spectrum contributed
significantly to the total phase noise. White noise in the resonator is
filtered at harmonics of the resonant frequency. White noise in the
tail current only experiences a significant conversion gain around
the second harmonic of the oscillation frequency. However, the
simple model says that an impulse train samples white noise in the
differential pair, which if true, will cause it to accumulate without
bound at any specified offset frequency om.
In reality, any practical differential pair requires a non-zero input
voltage excursion to switch, and this is provided by the oscillation
waveform across the resonator. Therefore, noise in the differential
pair is actually not sampled by impulses, but by time windows of
finite width. The window height is proportional to transconductance, and width is set by tail current, and slope of the oscillation
waveform at zero crossing. The input-referred noise spectral density of the differential pair is inversely proportional to transconductance. Thus, the narrower the sampling window, that is, the
larger the sampling bandwidth, the lower the noise spectral density
[ 111. Analysis shows that the noise bandwidth product is constant,
and produces pure phase noise. After taking into account the accumulation of frequency translations throughout the sampling bandwidth, the following compact yet exact expression is reached:

We note that [8]has arrived at a similar analysis for the first two
sources of noise, but was unable to obtain a closed-form expression
for this last term.

Proving Leeson's Hypothesis


Leeson originally postulated that thermally induced phase noise in
any oscillator takes the form:

where F is an unspecified noise factor. By summing the expressions


obtained above for thermally induced phase noise arising from the
resonator, differential pair and tail bias current, respectively, for
the differential oscillator Leeson's noise factor is:

We emphasize that this simple expression captures all nonlinear


effects and frequency translations. At low bias currents while the

570

25-1-2

amplitude of oscillation is smaller than the power supply, the differential pair acts as a pure current switch driving the resonator
and V,=(4/x)RIT [13]. Then the second term comprising F simplifies to 2y. This means that as tail current increases and assuming
gmblrSR
is held constant, the noise factor remains constant and phase
noise improves as V i , that is, as I,. This has been observed by
others [131. However, beyond a critical tail current the amplitude
Vuis pegged constant, limited by supply voltage. Further increases
in I, will cause the differential pairs contribution to noise factor to
rise, degrading phase noise proportionally to I, (Figure 4). Therefore, for least phase noise the tail current should be just enough to
drive the amplitude to its maximum possible value.

However, this is not the only mechanism of indirect FM. At RF,


active device capacitance is also significant, and it no longer appears
as a pure negative resistance to the resonator. For example, the differential pair commutates current flowing in the capacitor C, at
the tail, which presents a negative capacitor (or, equivalently, an
inductor in a narrowband sense) at the differential output (Figure
6). This speeds up the oscillation frequency. Flicker noise in the
differential pair FETs modulates the duty cycle of commutation,
and therefore the effective negative capacitance. Here, too, Groszkowski gives a method of systematic analysis [16], which captures
the reactive components in the active devices by measuring the area
n enclosed by hysteresis in the dynamic negative resistance curve.

Flicker Noise Upconversion

Aw _
- --

Close-in to the oscillation frequency, the slope of the phase noise


spectrum in all CMOS VCOs turns from -20 to -30 dB/decade.
This is ascribed to the upconversion of flicker noise in FETs. To
understand this, let us first see if the analysis above explains this
upconversion.
Flicker noise in the tail current source at frequency W, indeed
upconverts to O&O, and enters the resonator, but as AM, not PM
noise. Therefore, in the absence of a high gain varactor to convert
AM to FM, flicker noise in the tail current will not appear as phase
noise. Next consider flicker noise in the differential pair. T h e preceding analysis says that this modulates zero crossings, and injects
a noise current into the resonator consisting of flicker noise sampled by an impulse train with frequency 20,. Thus noise originating at frequency O, produces currents at O, and at 20,f0, . Both
frequencies are strongly attenuated in the resonator, and neither
explains flicker-induced phase noise at w,+o,. One can only conclude that the mechanisms of flicker noise upconversion are quite
different than for thermally induced phase noise.

FundamentalSources of FM in Oscillators
In 1934, Groszkowski [151 while studying electronic oscillators
realized that the steady-state oscillation frequency seldom coincides with the natural frequency of the resonator which tunes
the oscillator. He found that the discrepancy arises because the
active device in the oscillator, such as the differential pair current
switch in the circuit considered here, drives the resonator with a
harmonic-rich waveform. T h e harmonics will flow into the lower
impedance capacitor (Figure 5) and upset the exact reactive power
balance between the Land the C required for steady state. Now the
frequency of oscillation must shift down until the reactive power in
the inductor increases to equal the reactive power in the capacitor
due to the fundamental and all harmonics. T h e shift, Am, is:

Aw _
--1
w,
c

2Q

n2(1-n2)
(1 - n2) n2/ Q

where mnis the normalized level of the nthharmonic. AO is the sum


of all negative terms, which means that oscillation frequency slows
down with more harmonic content. Now the harmonic content at
the output of a periodically switching differential pair is a function
of the tail current. In the autonomous oscillator, the drive to the
differential pair is also a function of tail current. The sensitivity
a ~ / a I ,is responsible for an indirect F M [7] due to flicker noise
in I,

w,

2Q2w,L

+q
-

n2(1- n)
.m:
2Q2 n=2 (1 n)
n2/Q

Thus the sensitivity of the reactance to bias current or offset voltage in the differential pair is estimated, which is another means
whereby flicker noise modulates the frequency of oscillation.

Validation of Analysis
T h e phase noise model was validated on two CMOS differential
L C oscillators. One oscillator uses a low Q, on-chip inductor, while
the other uses off-chip inductors with large Q Flicker noise is
modelled as a bias-independent, gate-referred voltage source [ 141.
T h e measured data and SpectreRF simulations are plotted with
predictions based on this paper. Excellent agreement (Figure 7) is
found across the entire spectrum, which encompasses thermally
induced phase noise and upconverted flicker noise.
K. S. Kundert, Introduction to RF simulation and its application, IEEE
Journulof SolidSture Circuitr, pp. 1298-319, 1999.

A. Demir, A. Mehrotra, and J. Roychowdhury, Phase noise in oscillators: a


unifying theory and numerical methods for characterisation, in Derign und
Automution Confirence, San Francisco, p p 26-3 1, 1998.
B. De Smedt and G. Gielen, Accurate simulation of phase noise in oscillators,
in Europun Solid-Sture Circuits Confirence, p p 208-1 1, 1997.
A. Hajimiri and T H. Lee, A general theory of phase noise in electrical oscillators, IEEEJournul of 3olid-Stute Circuirr, vol. 33, no. 2, p p 179-94, 1998.
D. B. Leeson, A Simple Model of Feedback Oscillator Noise Spectrum,
Proceedings of the IEEE, vol. 54, pp. 329-330, 1966.
J. Craninckx and M. Steyaert, Low-noise voltage-controlled oscillators using
enhanced LC-tanks, I E E E Trumucrionr on Circuirr und Syslems 11: Anulog und
DigiiulSignulProcesring, vol. 42, no. 12, p p 794-804, 1995.
K. K. Clarke and D. T Hess, Cummuniculion Circurrr: Anulysis undDerign.
Malabar, FL: Krieger, 1971.
C. Samori, A. L. Lacaita, E Villa, and E Zappa, Spectrum folding and phase
noise in LC tuned oscillators, IEEE Trunructionr on Circuits andSysremr 11:
Anulog und DigitulSignul Processing, vol. 45, no. 7, pp. 781-90, 1998.
Q Huang, On the exact design of R F oscillators, CICCProceedings, pp. 4 1 4 ,
1998.
W. P. Robins, Phure Noire in SignulSuurcer. London: Peter Peregrinus, 1982.
H. Darabi and A. Abidi, Noise in CMOS Mixers: A Simple Physical Model,
IEEEJournuluf S o l i d S r ~ r eCircuirr, vol. 35, no. 1, in press, 2000.
C. Samori, A. L. Lacaita, A. Zanchi, S. Levantino, and E Torrisi, Impact of
Indirect Stability on Phase Noise Performance of Fully-Integrated LC Tuned
VCOs, in Europeun Solid-Stute Circuits Confirence, Duisburg, Germany, p p
202-205, 1999.
A. Hajimiri and T H . Lee, Phase Noise in CMOS Differential L C oscillators,
in Symporium on VLSI Cirnrirr, Honolulu, HI, pp. 48-51, 1998.
J. Chang, A. A. Abidi, and C. R. Viswanathan, Flicker Noise in CMOS
Transistors from Subthreshold to Strong Inversion at Various Temperatures,
IEEE Trunructionr on Electron Dcuicer, vol. 41, pp. 1965-1971,1994.
J. Groszkowski, The Interdependence of Frequency Variation and Harmonic
Contcnt, and the problem of Constant-Frequency Oscillators, Proc. of the I R E ,
vol. 21, no. 7, pp 958-981, 1934.
J. Groszkowski, Frequency of Sel/-OrciNu~ions.Oxford: Pergamon Press, 1964.

25-1-3

571

Figure 5. Harmonics of oscillating current flow into capacitor, increasing its reactive energy.
Steady state frequency shifts
down until inductor energy balances.

Figure 1. Noise phasor added to a


sinewave decomposes into PM and
AM sidebands.

T
Figure 2. Differential LC oscillator
biased by tail current.

Nonlinear active
/Y~LV~/(~L)~

Figure 6. Capacitors associated


with active device appear as
reactances across the resonator,
shifting frequency.

-40

LO Voltage

-60

til

l%

% -80

.-

v)

z
al

2-100
a
c

-120
Approximate Model of Noise Pulses
J

Samdina
Impdsetain
Jime
N o i s h

Figure 3. (a) Noise at input of differential pair modulates instants


of zero crossing. (b) Output current consists of square wave,
plus random noise pulses. (c) Noise pulses modelled as a train of
impulses sampling noise waveform.

-140
1

10
100
Offset Frequency,kHz

1000

10
100
Offset Frequency,kHz

1000

-60

-70
N

5 -80
I%

cm -90

.-v)

:-100
0

Phase
Noise

Oscillation

'

Figure 4. Increasing tail current


first causes amplitude to rise,
until limited by supply. Phase
noise diminishes with rising
amplitude, then worsens due to
higher noise factor.

(ZI
v)

E -110

Bias Current, IT

-120
-130

Figure 7. Validation of the analysis presented in this paper. Measured phase noise is compared with predictions
from analysis, and with SpectreRF simulations. (a) 0.35-pm CMOS 1.1 GHz oscillator using resonator with
loaded Q o f 6. (b) 0.25-pm CMOS 830 MHz oscillator using discrete inductor with loaded of Q o f 25.

572

25-1-4

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

179

A General Theory of Phase Noise


in Electrical Oscillators
Ali Hajimiri, Student Member, IEEE, and Thomas H. Lee, Member, IEEE

Abstract A general model is introduced which is capable


of making accurate, quantitative predictions about the phase
noise of different types of electrical oscillators by acknowledging
the true periodically time-varying nature of all oscillators. This
new approach also elucidates several previously unknown design
criteria for reducing close-in phase noise by identifying the mechanisms by which intrinsic device noise and external noise sources
contribute to the total phase noise. In particular, it explains the
details of how 1=f noise in a device upconverts into close-in
phase noise and identifies methods to suppress this upconversion.
The theory also naturally accommodates cyclostationary noise
sources, leading to additional important design insights. The
model reduces to previously available phase noise models as
special cases. Excellent agreement among theory, simulations, and
measurements is observed.
Index TermsJitter, oscillator noise, oscillators, oscillator stability, phase jitter, phase locked loops, phase noise, voltage
controlled oscillators.

I. INTRODUCTION

HE recent exponential growth in wireless communication


has increased the demand for more available channels in
mobile communication applications. In turn, this demand has
imposed more stringent requirements on the phase noise of
local oscillators. Even in the digital world, phase noise in the
guise of jitter is important. Clock jitter directly affects timing
margins and hence limits system performance.
Phase and frequency fluctuations have therefore been the
subject of numerous studies [1][9]. Although many models
have been developed for different types of oscillators, each
of these models makes restrictive assumptions applicable only
to a limited class of oscillators. Most of these models are
based on a linear time invariant (LTI) system assumption
and suffer from not considering the complete mechanism by
which electrical noise sources, such as device noise, become
phase noise. In particular, they take an empirical approach in
describing the upconversion of low frequency noise sources,
such as
noise, into close-in phase noise. These models
are also reduced-order models and are therefore incapable of
making accurate predictions about phase noise in long ring
oscillators, or in oscillators that contain essential singularities,
such as delay elements.

Manuscript received December 17, 1996; revised July 9, 1997.


The authors are with the Center for Integrated Systems, Stanford University,
Stanford, CA 94305-4070 USA.
Publisher Item Identifier S 0018-9200(98)00716-1.

Since any oscillator is a periodically time-varying system,


its time-varying nature must be taken into account to permit
accurate modeling of phase noise. Unlike models that assume
linearity and time-invariance, the time-variant model presented
here is capable of proper assessment of the effects on phase
noise of both stationary and even of cyclostationary noise
sources.
Noise sources in the circuit can be divided into two groups,
namely, device noise and interference. Thermal, shot, and
flicker noise are examples of the former, while substrate and
supply noise are in the latter group. This model explains
the exact mechanism by which spurious sources, random
or deterministic, are converted into phase and amplitude
variations, and includes previous models as special limiting
cases.
This time-variant model makes explicit predictions of the
relationship between waveform shape and
noise upconversion. Contrary to widely held beliefs, it will be shown
that the
corner in the phase noise spectrum is smaller
than
noise corner of the oscillators components by a
factor determined by the symmetry properties of the waveform.
This result is particularly important in CMOS RF applications
because it shows that the effect of inferior
device noise
can be reduced by proper design.
Section II is a brief introduction to some of the existing
phase noise models. Section III introduces the time-variant
model through an impulse response approach for the excess
phase of an oscillator. It also shows the mechanism by which
noise at different frequencies can become phase noise and
expresses with a simple relation the sideband power due to
an arbitrary source (random or deterministic). It continues
with explaining how this approach naturally lends itself to the
analysis of cyclostationary noise sources. It also introduces
a general method to calculate the total phase noise of an
oscillator with multiple nodes and multiple noise sources, and
how this method can help designers to spot the dominant
source of phase noise degradation in the circuit. It concludes
with a demonstration of how the presented model reduces
to existing models as special cases. Section IV gives new
design implications arising from this theory in the form of
guidelines for low phase noise design. Section V concludes
with experimental results supporting the theory.
II. BRIEF REVIEW OF EXISTING MODELS AND DEFINITIONS
The output of an ideal sinusoidal oscillator may be expressed as
, where is the amplitude,

00189200/98$10.00 1998 IEEE

180

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

Fig. 2. A typical RLC oscillator.

Fig. 1. Typical plot of the phase noise of an oscillator versus offset from
carrier.

is the frequency, and is an arbitrary, fixed phase reference. Therefore, the spectrum of an ideal oscillator with no
random fluctuations is a pair of impulses at
. In a practical
oscillator, however, the output is more generally given by

The semi-empirical model proposed in [1][3], known also


as the LeesonCutler phase noise model, is based on an LTI
assumption for tuned tank oscillators. It predicts the following
:
behavior for

(1)

(3)

where
and
are now functions of time and is a
periodic function with period 2 . As a consequence of the
fluctuations represented by
and
, the spectrum of a
practical oscillator has sidebands close to the frequency of
oscillation,
.
There are many ways of quantifying these fluctuations (a
comprehensive review of different standards and measurement
methods is given in [4]). A signals short-term instabilities are
usually characterized in terms of the single sideband noise
spectral density. It has units of decibels below the carrier per
hertz (dBc/Hz) and is defined as

is an empirical parameter (often called the device


where
is the
excess noise number), is Boltzmanns constant,
is the average power dissipated in
absolute temperature,
is the oscillation frequency,
the resistive part of the tank,
is the effective quality factor of the tank with all the
is the offset
loadings in place (also known as loaded ),
is the frequency of the corner
from the carrier and
and
regions, as shown in the sideband
between the
region can be
spectrum of Fig. 1. The behavior in the
obtained by applying a transfer function approach as follows.
, is easily
The impedance of a parallel RLC, for
calculated to be

1 Hz

(2)
(4)

where
1 Hz represents the single sidefrom the carrier with a
band power at a frequency offset of
measurement bandwidth of 1 Hz. Note that the above definition
includes the effect of both amplitude and phase fluctuations,
and
.
The advantage of this parameter is its ease of measurement.
Its disadvantage is that it shows the sum of both amplitude and
phase variations; it does not show them separately. However, it
is important to know the amplitude and phase noise separately
because they behave differently in the circuit. For instance,
the effect of amplitude noise is reduced by amplitude limiting
mechanism and can be practically eliminated by the application of a limiter to the output signal, while the phase noise
cannot be reduced in the same manner. Therefore, in most
applications,
is dominated by its phase portion,
, known as phase noise, which we will simply
denote as
.

is the parallel parasitic conductance of the tank.


where
should
For steady-state oscillation, the equation
be satisfied. Therefore, for a parallel current source, the closedloop transfer function of the oscillator shown in Fig. 2 is given
by the imaginary part of the impedance
(5)
The total equivalent parallel resistance of the tank has an
equivalent mean square noise current density of
. In addition, active device noise usually contributes
a significant portion of the total noise in the oscillator. It is
traditional to combine all the noise sources into one effective
noise source, expressed in terms of the resistor noise with

HAJIMIRI AND LEE: GENERAL THEORY OF PHASE NOISE IN ELECTRICAL OSCILLATORS

181

Fig. 3. Phase and amplitude impulse response model.

a multiplicative factor, , known as the device excess noise


number. The equivalent mean square noise current density can
therefore be expressed as
. Unfortunately,
it is generally difficult to calculate a priori. One important
reason is that much of the noise in a practical oscillator
arises from periodically varying processes and is therefore
cyclostationary. Hence, as mentioned in [3],
and
are
usually used as a posteriori fitting parameters on measured
data.
Using the above effective noise current power, the phase
region of the spectrum can be calculated as
noise in the

(a)

(b)

(c)
Fig. 4. (a) Impulse injected at the peak, (b) impulse injected at the zero
crossing, and (c) effect of nonlinearity on amplitude and phase of the oscillator
in state-space.

(6)
III. MODELING
Note that the factor of 1/2 arises from neglecting the contribution of amplitude noise. Although the expression for the
noise in the
region is thus easily obtained, the expression
portion of the phase noise is completely empirical.
for the
As such, the common assumption that the
corner of the
phase noise is the same as the
corner of device flicker
noise has no theoretical basis.
The above approach may be extended by identifying the
individual noise sources in the tuned tank oscillator of Fig. 2
[8]. An LTI approach is used and there is an embedded
assumption of no amplitude limiting, contrary to most practical
cases. For the RLC circuit of Fig. 2, [8] predicts the following:
(7)
is yet another empirical fitting parameter, and
where
is the effective series resistance, given by

OF

PHASE NOISE

A. Impulse Response Model for Excess Phase


inputs
An oscillator can be modeled as a system with
(each associated with one noise source) and two outputs
that are the instantaneous amplitude and excess phase of the
oscillator,
and
, as defined by (1). Noise inputs to this
system are in the form of current sources injecting into circuit
nodes and voltage sources in series with circuit branches. For
each input source, both systems can be viewed as singleinput, single-output systems. The time and frequency-domain
fluctuations of
and
can be studied by characterizing
the behavior of two equivalent systems shown in Fig. 3.
Note that both systems shown in Fig. 3 are time variant.
Consider the specific example of an ideal parallel LC oscillator
shown in Fig. 4. If we inject a current impulse
as shown,
the amplitude and phase of the oscillator will have responses
similar to that shown in Fig. 4(a) and (b). The instantaneous
voltage change
is given by

(8)
(9)
,
,
, and
are shown in Fig. 2. Note that it
where
from circuit parameters.
is still not clear how to calculate
Hence, this approach represents no fundamental improvement
over the method outlined in [3].

is the total injected charge due to the current


where
impulse and
is the total capacitance at that node. Note
that the current impulse will change only the voltage across the

182

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

(a)

(b)

Fig. 5. (a) A typical Colpitts oscillator and (b) a five-stage minimum size
ring oscillator.

capacitor and will not affect the current through the inductor.
It can be seen from Fig. 4 that the resultant change in
and
is time dependent. In particular, if the impulse is applied
at the peak of the voltage across the capacitor, there will be no
phase shift and only an amplitude change will result, as shown
in Fig. 4(a). On the other hand, if this impulse is applied at the
zero crossing, it has the maximum effect on the excess phase
and the minimum effect on the amplitude, as depicted in
Fig. 4(b). This time dependence can also be observed in the
state-space trajectory shown in Fig. 4(c). Applying an impulse
at the peak is equivalent to a sudden jump in voltage at point
, which results in no phase change and changes only the
amplitude, while applying an impulse at point results only
in a phase change without affecting the amplitude. An impulse
applied sometime between these two extremes will result in
both amplitude and phase changes.
There is an important difference between the phase and
amplitude responses of any real oscillator, because some
form of amplitude limiting mechanism is essential for stable
oscillatory action. The effect of this limiting mechanism is
pictured as a closed trajectory in the state-space portrait of
the oscillator shown in Fig. 4(c). The system state will finally
approach this trajectory, called a limit cycle, irrespective of
its starting point [10][12]. Both an explicit automatic gain
control (AGC) and the intrinsic nonlinearity of the devices
act similarly to produce a stable limit cycle. However, any
fluctuation in the phase of the oscillation persists indefinitely,
with a current noise impulse resulting in a step change in
phase, as shown in Fig. 3. It is important to note that regardless
of how small the injected charge, the oscillator remains time
variant.
Having established the essential time-variant nature of the
systems of Fig. 3, we now show that they may be treated as
linear for all practical purposes, so that their impulse responses
and
will characterize them completely.
The linearity assumption can be verified by injecting impulses with different areas (charges) and measuring the resultant phase change. This is done in the SPICE simulations of
the 62-MHz Colpitts oscillator shown in Fig. 5(a) and the fivestage 1.01-GHz, 0.8- m CMOS inverter chain ring oscillator
shown in Fig. 5(b). The results are shown in Fig. 6(a) and (b),
respectively. The impulse is applied close to a zero crossing,

(a)

(b)

Fig. 6. Phase shift versus injected charge for oscillators of Fig. 5(a) and (b).

where it has the maximum effect on phase. As can be seen, the


current-phase relation is linear for values of charge up to 10%
of the total charge on the effective capacitance of the node
of interest. Also note that the effective injected charges due
to actual noise and interference sources in practical circuits
are several orders of magnitude smaller than the amounts of
charge injected in Fig. 6. Thus, the assumption of linearity is
well satisfied in all practical oscillators.
It is critical to note that the current-to-phase transfer function is practically linear even though the active elements may
have strongly nonlinear voltage-current behavior. However,
the nonlinearity of the circuit elements defines the shape of
the limit cycle and has an important influence on phase noise
that will be accounted for shortly.
We have thus far demonstrated linearity, with the amount
of excess phase proportional to the ratio of the injected charge
to the maximum charge swing across the capacitor on the
node, i.e.,
. Furthermore, as discussed earlier, the
impulse response for the first system of Fig. 3 is a step whose
amplitude depends periodically on the time when the impulse
is injected. Therefore, the unit impulse response for excess
phase can be expressed as
(10)
where
is the maximum charge displacement across the
capacitor on the node and
is the unit step. We call
the impulse sensitivity function (ISF). It is a dimensionless,
frequency- and amplitude-independent periodic function with
period 2 which describes how much phase shift results from
applying a unit impulse at time
. To illustrate its
significance, the ISFs together with the oscillation waveforms
for a typical LC and ring oscillator are shown in Fig. 7. As is
shown in the Appendix,
is a function of the waveform
or, equivalently, the shape of the limit cycle which, in turn, is
governed by the nonlinearity and the topology of the oscillator.
Given the ISF, the output excess phase
can be calculated using the superposition integral

(11)

HAJIMIRI AND LEE: GENERAL THEORY OF PHASE NOISE IN ELECTRICAL OSCILLATORS

(a)

183

(b)

Fig. 7. Waveforms and ISFs for (a) a typical LC oscillator and (b) a typical
ring oscillator.

where
represents the input noise current injected into the
node of interest. Since the ISF is periodic, it can be expanded
in a Fourier series
(12)
where the coefficients
are real-valued coefficients, and
is the phase of the th harmonic. As will be seen later,
is not important for random input noise and is thus
neglected here. Using the above expansion for
in the
superposition integral, and exchanging the order of summation
and integration, we obtain

Fig. 8. Conversion of the noise around integer multiples of the oscillation


frequency into phase noise.

consists of two impulses at


as shown in Fig. 8.
This time the only integral in (13) which will have a low
frequency argument is for
. Therefore
is given by
(16)
in
.
which again results in two equal sidebands at
More generally, (13) suggests that applying a current
close to any integer multiple of the
oscillation frequency will result in two equal sidebands at
in
. Hence, in the general case
is given by
(17)

(13)
B. Phase-to-Voltage Transformation
Equation (13) allows computation of
for an arbitrary input
current
injected into any circuit node, once the various
Fourier coefficients of the ISF have been found.
As an illustrative special case, suppose that we inject a low
frequency sinusoidal perturbation current
into the node of
interest at a frequency of
(14)
where
is the maximum amplitude of
. The arguments
of all the integrals in (13) are at frequencies higher than
and are significantly attenuated by the averaging nature of
the integration, except the term arising from the first integral,
which involves . Therefore, the only significant term in
will be
(15)
As a result, there will be two impulses at
in the power
spectral density of
, denoted as
.
As an important second special case, consider a current at a
frequency close to the carrier injected into the node of interest,
given by
. A process similar to that
of the previous case occurs except that the spectrum of

So far, we have presented a method for determining how


using (13).
much phase error results from a given current
Computing the power spectral density (PSD) of the oscillator
output voltage
requires knowledge of how the output
voltage relates to the excess phase variations. As shown in
Fig. 8, the conversion of device noise current to output voltage
may be treated as the result of a cascade of two processes.
The first corresponds to a linear time variant (LTV) currentto-phase converter discussed above, while the second is a
nonlinear system that represents a phase modulation (PM),
which transforms phase to voltage. To obtain the sideband
power around the fundamental frequency, the fundamental
harmonic of the oscillator output
can be used
as the transfer function for the second system in Fig. 8. Note
as the input.
this is a nonlinear transfer function with
Substituting
from (17) into (1) results in a single-tone
phase modulation for output voltage, with
given by (17).
Therefore, an injected current at
results in a pair
with a sideband power relative
of equal sidebands at
to the carrier given by
(18)

184

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

(a)

(b)

Fig. 9. Simulated power spectrum of the output with current injection at (a)
f
MHz and (b) f0 f
:
GHz.

m = 50

+ m = 1 06

This process is shown in Fig. 8. Appearance of the frequency


deviation
in the denominator of the (18) underscores that
the impulse response
is a step function and therefore
behaves as a time-varying integrator. We will frequently refer
to (18) in subsequent sections.
Applying this method of analysis to an arbitrary oscillator,
a sinusoidal current injected into one of the oscillator nodes
at a frequency
results in two equal sidebands at
, as observed in [9]. Note that it is necessary to use
an LTV because an LTI model cannot explain the presence of
a pair of equal sidebands close to the carrier arising from
sources at frequencies
, because an LTI system
cannot produce any frequencies except those of the input and
those associated with the systems poles. Furthermore, the
amplitude of the resulting sidebands, as well as their equality,
cannot be predicted by conventional intermodulation effects.
This failure is to be expected since the intermodulation terms
arise from nonlinearity in the voltage (or current) input/output
characteristic of active devices of the form
. This type of nonlinearity does not directly
appear in the phase transfer characteristic and shows itself only
indirectly in the ISF.
It is instructive to compare the predictions of (18) with
simulation results. A sinusoidal current of 10 A amplitude at
different frequencies was injected into node 1 of the 1.01-GHz
ring oscillator of Fig. 5(b). Fig. 9(a) shows the simulated
power spectrum of the signal on node 4 for a low frequency
MHz. This power spectrum is obtained using
input at
the fast Fourier transform (FFT) analysis in HSPICE 96.1. It
is noteworthy that in this version of HSPICE the simulation
artifacts observed in [9] have been properly eliminated by
calculation of the values used in the analysis at the exact
points of interest. Note that the injected noise is upconverted
and
, as predicted
into two equal sidebands at
by (18). Fig. 9(b) shows the effect of injection of a current at
GHz. Again, two equal sidebands are observed
and
, also as predicted by (18).
at
Simulated sideband power for the general case of current
can be compared to the predictions of
injection at

Fig. 10. Simulated and calculated sideband powers for the first ten coefficients.

(18). The ISF for this oscillator is obtained by the simulation


method of the Appendix. Here,
is equal to
,
where
is the average capacitance on each node of the
circuit and
is the maximum swing across it. For this
oscillator,
fF and
V, which results in
fC. For a sinusoidal injected current of amplitude
A, and an
of 50 MHz, Fig. 10 depicts the
simulated and predicted sideband powers. As can be seen
from the figure, these agree to within 1 dB for the higher
power sidebands. The discrepancy in the case of the low
power sidebands (
) arises from numerical noise in
the simulations, which represents a greater fractional error at
lower sideband power. Overall, there is satisfactory agreement
between simulation and the theory of conversion of noise from
various frequencies into phase fluctuations.
C. Prediction of Phase Noise Sideband Power
Now we consider the case of a random noise current
whose power spectral density has both a flat region and a
region, as shown in Fig. 11. As can be seen from (18) and the
foregoing discussion, noise components located near integer
multiples of the oscillation frequency are transformed to low
frequency noise sidebands for
, which in turn become
close-in phase noise in the spectrum of
, as illustrated in
Fig. 11. It can be seen that the total
is given by the sum
of phase noise contributions from device noise in the vicinity
of the integer multiples of
, weighted by the coefficients
. This is shown in Fig. 12(a) (logarithmic frequency scale).
The resulting single sideband spectral noise density
is
plotted on a logarithmic scale in Fig. 12(b). The sidebands in
the spectrum of
, in turn, result in phase noise sidebands
in the spectrum of
through the PM mechanism discuss
in the previous subsection. This process is shown in Figs. 11
and 12.
The theory predicts the existence of
,
, and flat
regions for the phase noise spectrum. The low-frequency noise
sources, such as flicker noise, are weighted by the coefficient
and show a
dependence on the offset frequency, while

HAJIMIRI AND LEE: GENERAL THEORY OF PHASE NOISE IN ELECTRICAL OSCILLATORS

185

(a)

(b)

Fig. 11.
bands.

Conversion of noise to phase fluctuations and phase-noise sideFig. 12. (a) PSD of
spectrum, Lf ! g.

the white noise terms are weighted by other


coefficients
and give rise to the
region of phase noise spectrum. It is
apparent that if the original noise current
contains
low frequency noise terms, such as popcorn noise, they can
appear in the phase noise spectrum as
regions. Finally,
the flat noise floor in Fig. 12(b) arises from the white noise
floor of the noise sources in the oscillator. The total sideband
noise power is the sum of these two as shown by the bold line
in the same figure.
To carry out a quantitative analysis of the phase noise
sideband power, now consider an input noise current with a
white power spectral density
. Note that
in (18)
for
represents the peak amplitude, hence,
Hz. Based on the foregoing development and (18),
the total single sideband phase noise spectral density in dB
below the carrier per unit bandwidth due to the source on one
node at an offset frequency of
is given by

(t)

and (b) single sideband phase noise power

obvious from the foregoing development that the


corner
of the phase noise and the
corner of the device noise
should be coincident, as is commonly assumed. In fact, from
Fig. 12, it should be apparent that the relationship between
these two frequencies depends on the specific values of the
various coefficients . The device noise in the flicker noise
dominated portion of the noise spectrum
can
be described by
(22)
is the corner frequency of device
noise.
where
Equation (22) together with (18) result in the following
expression for phase noise in the
portion of the phase
noise spectrum:
(23)

(19)

Now, according to Parsevals relation we have


(20)
where

is the rms value of

corner,
, is the frequency where
The phase noise
the sideband power due to the white noise given by (21) is
equal to the sideband power arising from the
noise given
by (23), as shown in Fig. 12. Solving for
results in the
following expression for the
corner in the phase noise
spectrum:

. As a result
(24)
(21)

This equation represents the phase noise spectrum of an


region of the phase noise spectrum.
arbitrary oscillator in
For a voltage noise source in series with an inductor,
should be replaced with
, where
represents the maximum magnetic flux swing in the inductor.
We may now investigate quantitatively the relationship
corner and the
corner of the
between the device
phase noise. It is important to note that it is by no means

This equation together with (21) describe the phase noise


spectrum and are the major results of this section. As can
be seen, the
phase noise corner due to internal noise
sources is not equal to the
device noise corner, but is
smaller by a factor equal to
. As will be discussed
later,
depends on the waveform and can be significantly
reduced if certain symmetry properties exist in the waveform
device noise need not imply
of the oscillation. Thus, poor
poor close-in phase noise performance.

186

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

Fig. 14.

0(x), 0e (x), and (x) for the Colpitts oscillator of Fig. 5(a).

Fig. 13. Collector voltage and collector current of the Colpitts oscillator of
Fig. 5(a).

D. Cyclostationary Noise Sources


In addition to the periodically time-varying nature of the
system itself, another complication is that the statistical properties of some of the random noise sources in the oscillator
may change with time in a periodic manner. These sources are
referred to as cyclostationary. For instance, the channel noise
of a MOS device in an oscillator is cyclostationary because the
noise power is modulated by the gate source overdrive which
varies with time periodically. There are other noise sources
in the circuit whose statistical properties do not depend on
time and the operation point of the circuit, and are therefore
called stationary. Thermal noise of a resistor is an example of
a stationary noise source.
A white cyclostationary noise current
can be decomposed as [13]:
(25)
is a white cyclostationary process,
is a
where
is a deterministic periodic
white stationary process and
function describing the noise amplitude modulation. We define
to be a normalized function with a maximum value of
is equal to the maximum mean square noise
1. This way,
power,
, which changes periodically with time. Applying
the above expression for
to (11),
is given by

used in all subsequent calculations, in particular, calculation


of the coefficients .
Note that there is a strong correlation between the cyclostationary noise source and the waveform of the oscillator. The
maximum of the noise power always appears at a certain point
of the oscillatory waveform, thus the average of the noise may
not be a good representation of the noise power.
Consider as one example the Colpitts oscillator of Fig. 5(a).
The collector voltage and the collector current of the transistor
are shown in Fig. 13. Note that the collector current consists
of a short period of large current followed by a quiet interval.
The surge of current occurs at the minimum of the voltage
across the tank where the ISF is small. Functions
,
,
and
for this oscillator are shown in Fig. 14. Note that,
in this case,
is quite different from
, and hence
the effect of cyclostationarity is very significant for the LC
oscillator and cannot be neglected.
The situation is different in the case of the ring oscillator
of Fig. 5(b), because the devices have maximum current
during the transition (when
is at a maximum, i.e., the
sensitivity is large) at the same time the noise power is large.
,
, and
for the ring oscillator of
Functions
Fig. 5(b) are shown in Fig. 15. Note that in the case of the
ring oscillator
and
are almost identical. This
indicates that the cyclostationary properties of the noise are
less important in the treatment of the phase noise of ring
oscillators. This unfortunate coincidence is one of the reasons
why ring oscillators in general have inferior phase noise
performance compared to a Colpitts LC oscillator. The other
important reason is that ring oscillators dissipate all the stored
energy during one cycle.

(26)
E. Predicting Output Phase Noise with Multiple Noise Sources
As can be seen, the cyclostationary noise can be treated as
a stationary noise applied to a system with an effective ISF
given by
(27)
can be derived easily from device noise characterwhere
istics and operating point. Hence, this effective ISF should be

The method of analysis outlined so far has been used to


predict how much phase noise is contributed by a single noise
source. However, this method may be extended to multiple
noise sources and multiple nodes, as individual contributions
by the various noise sources may be combined by exploiting
superposition. Superposition holds because the first system of
Fig. 8 is linear.

HAJIMIRI AND LEE: GENERAL THEORY OF PHASE NOISE IN ELECTRICAL OSCILLATORS

187

In particular, consider the model for LC oscillators in [3], as


well as the more comprehensive presentation of [8]. Those
models assume linear time-invariance, that all noise sources
are stationary, that only the noise in the vicinity of
is
important, and that the noise-free waveform is a perfect
sinusoid. These assumptions are equivalent to discarding all
but the
term in the ISF and setting
. As a specific
example, consider the oscillator of Fig. 2. The phase noise
due solely to the tank parallel resistor
can be found by
applying the following to (19):

Fig. 15.

(28)

0(x), 0e (x), and (x) for the ring oscillator of Fig. 5(b).

The actual method of combining the individual contributions


requires attention to any possible correlations that may exist
among the noise sources. The complete method for doing so
may be appreciated by noting that an oscillator has a current
noise source in parallel with each capacitor and a voltage noise
source in series with each inductor. The phase noise in the
output of such an oscillator is calculated using the following
method.
1) Find the equivalent current noise source in parallel with
each capacitor and an equivalent voltage source in series
with each inductor, keeping track of correlated and
noncorrelated portions of the noise sources for use in
later steps.
2) Find the transfer characteristic from each source to the
output excess phase. This can be done as follows.
a) Find the ISF for each source, using any of the
methods proposed in the Appendix, depending on
the required accuracy and simplicity.
b) Find
and
(rms and dc values) of the ISF.
3) Use
and coefficients and the power spectrum of
the input noise sources in (21) and (23) to find the phase
noise power resulting from each source.
4) Sum the individual output phase noise powers for uncorrelated sources and square the sum of phase noise rms
values for correlated sources to obtain the total noise
power below the carrier.
Note that the amount of phase noise contributed by each
noise source depends only on the value of the noise power
density
, the amount of charge swing across the effective capacitor it is injecting into
, and the steady-state
oscillation waveform across the noise source of interest. This
observation is important since it allows us to attribute a definite
contribution from every noise source to the overall phase noise.
Hence, our treatment is both an analysis and design tool,
enabling designers to identify the significant contributors to
phase noise.
F. Existing Models as Simplified Cases
As asserted earlier, the model proposed here reduces to
earlier models if the same simplifying assumptions are made.

is the parallel resistor, is the tank capacitor, and


where
is the maximum voltage swing across the tank. Equation
(19) reduces to
(29)
Since [8] assumes equal contributions from amplitude and
phase portions to
, the result obtained in [8] is
two times larger than the result of (29).
Assuming that the total noise contribution in a parallel tank
oscillator can be modeled using an excess noise factor
as
in [3], (29) together with (24) result in (6). Note that the
generalized approach presented here is capable of calculating
the fitting parameters used in (3), ( and
) in terms of
coefficients of ISF and device
noise corner,
.
IV. DESIGN IMPLICATIONS
Several design implications emerge from (18), (21), and (24)
that offer important insight for reduction of phase noise in the
oscillators. First, they show that increasing the signal charge
displacement
across the capacitor will reduce the phase
noise degradation by a given noise source, as has been noted
in previous works [5], [6].
In addition, the noise power around integer multiples of the
oscillation frequency has a more significant effect on the closein phase noise than at other frequencies, because these noise
components appear as phase noise sidebands in the vicinity
of the oscillation frequency, as described by (18). Since the
contributions of these noise components are scaled by the
Fourier series coefficients
of the ISF, the designer should
seek to minimize spurious interference in the vicinity of
for values of such that
is large.
Criteria for the reduction of phase noise in the
region
are suggested by (24), which shows that the
corner of
the phase noise is proportional to the square of the coefficient
. Recalling that
is twice the dc value of the (effective)
ISF function, namely
(30)
it is clear that it is desirable to minimize the dc value of
the ISF. As shown in the Appendix, the value of
is
closely related to certain symmetry properties of the oscillation

188

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

(a)

(b)

(a)

(b)

Fig. 17. Simulated power spectrum with current injection at f


for (a) asymmetrical node and (b) symmetrical node.

m = 50 MHz

(c)

(d)

Fig. 16. (a) Waveform and (b) ISF for the asymmetrical node. (c) Waveform
and (d) ISF for one of the symmetrical nodes.

waveform. One such property concerns the rise and fall


times; the ISF will have a large dc value if the rise and
fall times of the waveform are significantly different. A
limited case of this for odd-symmetric waveforms has been
observed [14]. Although odd-symmetric waveforms have small
coefficients, the class of waveforms with small
is not
limited to odd-symmetric waveforms.
To illustrate the effect of a rise and fall time asymmetry,
consider a purposeful imbalance of pull-up and pull-down
rates in one of the inverters in the ring oscillator of Fig. 5(b).
of the
This is obtained by halving the channel width
of the PMOS
NMOS device and doubling the width
device of one inverter in the ring. The output waveform
and corresponding ISF are shown in Fig. 16(a) and (b). As
can be seen, the ISF has a large dc value. For comparison, the waveform and ISF at the output of a symmetrical
inverter elsewhere in the ring are shown in Fig. 16(c) and
(d). From these results, it can be inferred that the close-in
phase noise due to low-frequency noise sources should be
smaller for the symmetrical output than for the asymmetrical
one. To investigate this assertion, the results of two SPICE
simulations are shown in Fig. 17. In the first simulation,
a sinusoidal current source of amplitude 10 A at
MHz is applied to one of the symmetric nodes of the

oscillator. In the second experiment, the same source is applied


to the asymmetric node. As can be seen from the power
spectra of the figure, noise injected into the asymmetric
node results in sidebands that are 12 dB larger than at the
symmetric node.
Note that (30) suggests that upconversion of low frequency
noise can be significantly reduced, perhaps even eliminated,
by minimizing , at least in principle. Since
depends
on the waveform, this observation implies that a proper
choice of waveform may yield significant improvements in
close-in phase noise. The following experiment explores this
concept by changing the ratio of
to
over some range,
while injecting 10 A of sinusoidal current at 100 MHz into
one node. The sideband power below carrier as a function
of the
to
ratio is shown in Fig. 18. The SPICEsimulated sideband power is shown with plus symbols and
the sideband power as predicted by (18) is shown by the
solid line. As can be seen, close-in phase noise due to
upconversion of low-frequency noise can be suppressed by
an arbitrary factor, at least in principle. It is important to note,
however, that the minimum does not necessarily correspond to
equal transconductance ratios, since other waveform properties
influence the value of . In fact, the optimum
to
ratio
in this particular example is seen to differ considerably from
that used in conventional ring oscillator designs.
The importance of symmetry might lead one to conclude
that differential signaling would minimize . Unfortunately,
while differential circuits are certainly symmetrical with respect to the desired signals, the differential symmetry disappears for the individual noise sources because they are
independent of each other. Hence, it is the symmetry of
each half-circuit that is important, as is demonstrated in the
differential ring oscillator of Fig. 19. A sinusoidal current of
100 A at 50 MHz injected at the drain node of one of
the buffer stages results in two equal sidebands, 46 dB
below carrier, in the power spectrum of the differential output.
Because of the voltage dependent conductance of the load
devices, the individual waveform on each output node is not
fully symmetrical and consequently, there will be a large

HAJIMIRI AND LEE: GENERAL THEORY OF PHASE NOISE IN ELECTRICAL OSCILLATORS

Fig. 18. Simulated and predicted sideband power for low frequency injection
versus PMOS to NMOS W=L ratio.

Fig. 19.

Four-stage differential ring oscillator.

upconversion of noise to close-in phase noise, even though


differential signaling is used.
Since the asymmetry is due to the voltage dependent conductance of the load, reduction of the upconversion might be
achieved through the use of a perfectly linear resistive load,
because the rising and falling behavior is governed by an
RC time constant and makes the individual waveforms more
symmetrical. It was first observed in the context of supply
noise rejection [15], [16] that using more linear loads can
reduce the effect of supply noise on timing jitter. Our treatment
shows that it also improves low-frequency noise upconversion
into phase noise.
Another symmetry-related property is duty cycle. Since the
ISF is waveform-dependent, the duty cycle of a waveform
is linked to the duty cycle of the ISF. Non-50% duty cycles
generally result in larger
for even . The high- tank of
an LC oscillator is helpful in this context, since a high will
produce a more symmetric waveform and hence reduce the
upconversion of low-frequency noise.
V. EXPERIMENTAL RESULTS
This section presents experimental verifications of the model
to supplement simulation results. The first experiment ex-

189

m = 100
3 + m = 16 3 MHz.

Fig. 20. Measured sideband power versus injected current at


kHz, f0 f
: MHz, f0
f
: MHz, f0
f

+ m=55

2 + m = 10 9

amines the linearity of current-to-phase conversion using a


five-stage, 5.4-MHz ring oscillator constructed with ordinary
CMOS inverters. A sinusoidal current is injected at frequencies
kHz,
MHz,
MHz, and
MHz, and the sideband powers
at
are measured as the magnitude of the injected
current is varied. At any amplitude of injected current, the
sidebands are equal in amplitude to within the accuracy of
the measurement setup (0.2 dB), in complete accordance with
the theory. These sideband powers are plotted versus the
input injected current in Fig. 20. As can be seen, the transfer
function for the input current power to the output sideband
power is linear as suggested by (18). The slope of the best
fit line is 19.8 dB/decade, which is very close to the predicted
slope of 20 dB/decade, since excess phase
is proportional
, and hence the sideband power is proportional to ,
to
leading to a 20-dB/decade slope. The behavior shown in
Fig. 20 verifies that the linearity of (18) holds for injected
input currents orders of magnitude larger than typical noise
currents.
The second experiment varies the frequency offset from
an integer multiple of the oscillation frequency. An input
,
sinusoidal current source of 20 A (rms) at
, and
is applied to one node and the output
is measured at another node. The sideband power is plotted
in Fig. 21. Note that the slope in all four cases is
versus
20 dB/decade, again in complete accordance with (18).
The third experiment aims at verifying the effect of the
on the sideband power. One of the predictions
coefficients
is responsible for the upconverof the theory is that
is
sion of low frequency noise. As mentioned before,
a strong function of waveform symmetry at the node into
which the current is injected. Noise injected into a node with
an asymmetric waveform (created by making one inverter
asymmetric in a ring oscillator) would result in a greater
increase in sideband power than injection into nodes with
more symmetric waveforms. Fig. 22 shows the results of an
experiment performed on a five-stage ring oscillator in which
one of the stages is modified to have an extra pulldown

190

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

Fig. 21. Measured sideband power versus f , for injections in vicinity of


multiples of f0 .

Fig. 22. Power of the sidebands caused by low frequency injection into
symmetric and asymmetric nodes of the ring oscillator.

NMOS device. A current of 20 A (rms) is injected into this


asymmetric node with and without the extra pulldown device.
For comparison, this experiment is repeated for a symmetric
node of the oscillator, before and after this modification. Note
that the sideband power is 7 dB larger when noise is injected
into the node with the asymmetrical waveform, while the
sidebands due to signal injection at the symmetric nodes are
essentially unchanged with the modification.
The fourth experiment compares the prediction and measurement of the phase noise for a five-stage single-ended ring
oscillator implemented in a 2- m, 5-V CMOS process running
at
MHz. This measurement was performed using a
delay-based measurement method and the result is shown in
Fig. 23. Distinct
and
regions are observed. We
first start with a calculation for the
region. For this
process we have a gate oxide thickness of
nm
and threshold voltages of
V and
V.
All five inverters are similar with
m
m
and
m
m, and a lateral diffusion of
m. Using the process and geometry information, the total
capacitance on each node, including parasitics, is calculated
to be
fF. Therefore,

Fig. 23. Phase noise measurements for a five-stage single-ended CMOS ring
oscillator. f0 = 232 MHz, 2-m process technology.

fC. As discussed in the previous section, noise current


injected during a transition has the largest effect. The current noise power at this point is the sum of the current
noise powers due to NMOS and PMOS devices. At this bias
point,
A2 /Hz and (
2
A /Hz. Using the methods outlined in the Appendix,
it may be shown that
for ring oscillators.
Equation (21) for
identical noise sources then predicts
. At an offset of
kHz,
this equation predicts
kHz
dBc/Hz, in good
agreement with a measurement of 114.5 dBc/Hz. To predict
the phase noise in the
region, it is enough to calculate
corner. Measurements on an isolated inverter on the
the
same die show a
noise corner frequency of 250 kHz,
when its input and output are shorted. The
ratio is
calculated to be 0.3, which predicts a
corner of 75 kHz,
compared to the measured corner of 80 kHz.
The fifth experiment measures the phase noise of an 11stage ring, running at
MHz implemented on the same
die as the previous experiment. The phase noise measurements
are shown in Fig. 24. For the inverters in this oscillator,
m
m and
m
m, which
results in a total capacitance of 43.5 fF and
fC.
The phase noise is calculated in exactly the same manner as
the previous experiment and is calculated to be
, or 122.1 dBc/Hz at a 500-kHz offset.
The measured phase noise is 122.5 dBc/Hz, again in good
agreement with predictions. The
ratio is calculated
corner of 43 kHz, while the
to be 0.17 which predicts a
measured corner is 45 kHz.
The sixth experiment investigates the effect of symmetry
on
region behavior. It involves a seven-stage currentstarved, single-ended ring oscillator in which each inverter
stage consists of an additional NMOS and PMOS device
in series. The gate drives of the added transistors allow
independent control of the rise and fall times. Fig. 25 shows
the phase noise when the control voltages are adjusted to
achieve symmetry versus when they are not. In both cases the
control voltages are adjusted to keep the oscillation frequency

HAJIMIRI AND LEE: GENERAL THEORY OF PHASE NOISE IN ELECTRICAL OSCILLATORS

Fig. 24. Phase noise measurements for an 11-stage single-ended CMOS ring
oscillator. f0 = 115 MHz, 2-m process technology.

191

Fig. 26. Sideband power versus the voltage controlling the symmetry of the
waveform. Seven-stage current-starved single-ended CMOS VCO. f0 = 50
MHz, 2-m process technology.

Fig. 27. Phase noise measurements for a four-stage differential CMOS ring
oscillator. f0 = 200MHz, 0.5-m process technology.
Fig. 25. Effect of symmetry in a seven-stage current-starved single-ended
CMOS VCO. f0 = 60 MHz, 2-m process technology.

A2 /Hz. Using these numbers


, the phase noise in the
region is predicted to be
, or 103.2 dBc/Hz at an offset
of 1 MHz, while the measurement in Fig. 27 shows a phase
noise of 103.9 dBc/Hz, again in agreement with prediction.
Also note that despite differential symmetry, there is a distinct
region in the phase noise spectrum, because each half
circuit is not symmetrical.
The eighth experiment investigates cyclostationary effects
in the bipolar Colpitts oscillator of Fig. 5(a), where the conduction angle is varied by changing the capacitive divider
ratio
while keeping the effective parallel
capacitance
constant to maintain
an
of 100 MHz. As can be seen in Fig. 28, increasing
decreases the conduction angle, and thereby reduces the
, leading to an initial decrease in phase noise.
effective
However, the oscillation amplitude is approximately given by
, and therefore decreases for large
values of . The phase noise ultimately increases for large as
a consequence. There is thus a definite value of (here, about
0.2) that minimizes the phase noise. This result provides a
theoretical basis for the common rule-of-thumb that one should

is
for
constant at 60 MHz. As can be seen, making the waveform
more symmetric has a large effect on the phase noise in the
region without significantly affecting the
region.
Another experiment on the same circuit is shown in Fig. 26,
which shows the phase noise power spectrum at a 10 kHz
offset versus the symmetry-controlling voltage. For all the
data points, the control voltages are adjusted to keep the
oscillation frequency at 50 MHz. As can be seen, the phase
noise reaches a minimum by adjusting the symmetry properties
of the waveform. This reduction is limited by the phase noise
in
region and the mismatch in transistors in different
stages, which are controlled by the same control voltages.
The seventh experiment is performed on a four-stage differential ring oscillator, with PMOS loads and NMOS differential
stages, implemented in a 0.5- m CMOS process. Each stage is
tapped with an equal-sized buffer. The tail current source has
a quiescent current of 108 A. The total capacitance on each
fF
of the differential nodes is calculated to be
and the voltage swing is
V, which results in
fF. The total channel noise current on each node

192

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

Fig. 28. Sideband power versus capacitive division ratio. Bipolar LC Colpitts
oscillator f0 = 100 MHz.

use
ratios of about four (corresponding to
Colpitts oscillators [17].

Fig. 29. State-space trajectory of an nth-order oscillator.

) in

VI. CONCLUSION
This paper has presented a model for phase noise which
explains quantitatively the mechanism by which noise sources
of all types convert to phase noise. The power of the model
derives from its explicit recognition of practical oscillators
as time-varying systems. Characterizing an oscillator with the
ISF allows a complete description of the noise sensitivity
of an oscillator and also allows a natural accommodation of
cyclostationary noise sources.
This approach shows that noise located near integer multiples of the oscillation frequency contributes to the total
phase noise. The model specifies the contribution of those
noise components in terms of waveform properties and circuit
parameters, and therefore provides important design insight by
identifying and quantifying the major sources of phase noise
degradation. In particular, it shows that symmetry properties
of the oscillator waveform have a significant effect on the
upconversion of low frequency noise and, hence, the
corner of the phase noise can be significantly lower than
device noise corner. This observation is particularly
the
important for MOS devices, whose inferior
noise has been
thought to preclude their use in high-performance oscillators.
APPENDIX
CALCULATION OF THE IMPULSE SENSITIVITY FUNCTION
In this Appendix we present three different methods to
calculate the ISF. The first method is based on direct measurement of the impulse response and calculating
from
it. The second method is based on an analytical state-space
approach to find the excess phase change caused by an impulse
of current from the oscillation waveforms. The third method
is an easy-to-use approximate method.

for a few cycles afterwards. By sweeping the impulse injection time across one cycle of the waveform and measuring
the resulting time shift
,
can calculated noting
that
, where
is the period of oscillation.
Fortunately, many implementations of SPICE have an internal
feature to perform the sweep automatically. Since for each
impulse one needs to simulate the oscillator for only a few
cycles, the simulation executes rapidly. Once
is
found, the ISF is calculated by multiplication with
. This
method is the most accurate of the three methods presented.
B. Closed-Form Formula for the ISF
An th-order system can be represented by its trajectory in
an -dimensional state-space. In the case of a stable oscillator,
the state of the system, represented by the state vector, ,
periodically traverses a closed trajectory, as shown in Fig. 29.
Note that the oscillator does not necessarily traverse the limit
cycle with a constant velocity.
In the most general case, the effect of a group of external
impulses can be viewed as a perturbation vector
which
. As
suddenly changes the state of the system to
discussed earlier, amplitude variations eventually die away,
but phase variations do not. Application of the perturbation
impulse causes a certain change in phase in either a negative
or positive direction, depending on the state-vector and the
direction of the perturbation. To calculate the equivalent time
shift, we first find the projection of the perturbation vector on
a unity vector in the direction of motion, i.e., the normalized
velocity vector

(31)

where is the equivalent displacement along the trajectory, and


A. Direct Measurement of Impulse Response
In this method, an impulse is injected at different relative
phases of the oscillation waveform and the oscillator simulated

is the first derivative of the state vector. Note the scalar


nature of , which arises from the projection operation. The
equivalent time shift is given by the displacement divided by

HAJIMIRI AND LEE: GENERAL THEORY OF PHASE NOISE IN ELECTRICAL OSCILLATORS

193

the speed
(32)
which results in the following equation for excess phase caused
by the perturbation:
(33)
In the specific case where the state variables are node
voltages, and an impulse is applied to the th node, there will
be a change in
given by (10). Equation (33) then reduces
to
(34)

Fig. 30. ISFs obtained from different methods.

is the norm of the first derivative of the waveform


where
vector and
is the derivative of the th node voltage. Equation (34), together with the normalized waveform function
defined in (1), result in the following:

identical stages. The denominator may then be approximated


by

(35)

Fig. 30 shows the results obtained from this method compared


with the more accurate results obtained from methods and
. Although this method is approximate, it is the easiest to
use and allows a designer to rapidly develop important insights
into the behavior of an oscillator.

where represents the derivative of the normalized waveform


on node , hence

(38)

(36)
ACKNOWLEDGMENT
It can be seen that this expression for the ISF is maximum
during transitions (i.e., when the derivative of the waveform
function is maximum), and this maximum value is inversely
proportional to the maximum derivative. Hence, waveforms
with larger slope show a smaller peak in the ISF function.
In the special case of a second-order system, one can use
the normalized waveform
and its derivative as the state
variables, resulting in the following expression for the ISF:
(37)
where
represents the second derivative of the function . In
the case of an ideal sinusoidal oscillator
, so that
, which is consistent with the argument
of Section III. This method has the attribute that it computes
the ISF from the waveform directly, so that simulation over
only one cycle of is required to obtain all of the necessary
information.
C. Calculation of ISF Based on the First Derivative
This method is actually a simplified version of the second
approach. In certain cases, the denominator of (36) shows little
variation, and can be approximated by a constant. In such a
case, the ISF is simply proportional to the derivative of the
waveform. A specific example is a ring oscillator with

The authors would like to thank T. Ahrens, R. Betancourt, R.


Farjad-Rad, M. Heshami, S. Mohan, H. Rategh, H. Samavati,
D. Shaeffer, A. Shahani, K. Yu, and M. Zargari of Stanford
University and Prof. B. Razavi of UCLA for helpful discussions. The authors would also like to thank M. Zargari, R.
Betancourt, B. Amruturand, J. Leung, J. Shott, and Stanford
Nanofabrication Facility for providing several test chips. They
are also grateful to Rockwell Semiconductor for providing
access to their phase noise measurement system.
REFERENCES
[1] E. J. Baghdady, R. N. Lincoln, and B. D. Nelin, Short-term frequency
stability: Characterization, theory, and measurement, Proc. IEEE, vol.
53, pp. 704722, July 1965.
[2] L. S. Cutler and C. L. Searle, Some aspects of the theory and
measurement of frequency fluctuations in frequency standards, Proc.
IEEE, vol. 54, pp. 136154, Feb. 1966.
[3] D. B. Leeson, A simple model of feedback oscillator noises spectrum,
Proc. IEEE, vol. 54, pp. 329330, Feb. 1966.
[4] J. Rutman, Characterization of phase and frequency instabilities in
precision frequency sources; Fifteen years of progress, Proc. IEEE,
vol. 66, pp. 10481174, Sept. 1978.
[5] A. A. Abidi and R. G. Meyer, Noise in relaxation oscillators, IEEE
J. Solid-State Circuits, vol. SC-18, pp. 794802, Dec. 1983.
[6] T. C. Weigandt, B. Kim, and P. R. Gray, Analysis of timing jitter in
CMOS ring oscillators, in Proc. ISCAS, June 1994, vol. 4, pp. 2730.
[7] J. McNeil, Jitter in ring oscillators, in Proc. ISCAS, June 1994, vol.
6, pp. 201204.
[8] J. Craninckx and M. Steyaert, Low-noise voltage controlled oscillators
using enhanced LC-tanks, IEEE Trans. Circuits Syst.II, vol. 42, pp.
794904, Dec. 1995.

194

[9] B. Razavi, A study of phase noise in CMOS oscillators, IEEE J.


Solid-State Circuits, vol. 31, pp. 331343, Mar. 1996.
[10] B. van der Pol, The nonlinear theory of electric oscillations, Proc.
IRE, vol. 22, pp. 10511086, Sept. 1934.
[11] N. Minorsky, Nonlinear Oscillations. Princeton, NJ: Van Nostrand,
1962.
[12] P. A. Cook, Nonlinear Dynamical Systems. New York: Prentice Hall,
1994.
[13] W. A. Gardner, Cyclostationarity in Communications and Signal Processing. New York: IEEE Press, 1993.
[14] H. B. Chen, A. van der Ziel, and K. Amberiadis, Oscillator with oddsymmetrical characteristics eliminates low-frequency noise sidebands,
IEEE Trans. Circuits Syst., vol. CAS-31, Sept. 1984.
[15] J. G. Maneatis, Precise delay generation using coupled oscillators,
IEEE J. Solid-State Circuits, vol. 28, pp. 12731282, Dec. 1993.
[16] C. K. Yang, R. Farjad-Rad, and M. Horowitz, A 0.6mm CMOS 4Gb/s
transceiver with data recovery using oversampling, in Symp. VLSI
Circuits, Dig. Tech. Papers, June 1997.
[17] D. DeMaw, Practical RF Design Manual. Englewood Cliffs, NJ:
Prentice-Hall, 1982, p. 46.

Ali Hajimiri (S95) was born in Mashad, Iran, in


1972. He received the B.S. degree in electronics
engineering from Sharif University of Technology in
1994 and the M.S. degree in electrical engineering
from Stanford University, Stanford, CA, in 1996,
where he is currently engaged in research toward
the Ph.D. degree in electrical engineering.
He worked as a Design Engineer for Philips on a
BiCMOS chipset for the GSM cellular units from
1993 to 1994. During the summer of 1995, he
worked for Sun Microsystems, Sunnyvale, CA, on
the UltraSparc microprocessors cache RAM design methodology. Over the
summer of 1997, he worked at Lucent Technologies (Bell-Labs), where he
investigated low phase noise integrated oscillators. He holds one European
and two U.S. patents.
Mr. Hajimiri is the Bronze medal winner of the 21st International Physics
Olympiad, Groningen, Netherlands.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

Thomas H. Lee (M83) received the S.B., S.M.,


Sc.D. degrees from the Massachusetts Institute of
Technology (MIT), Cambridge, in 1983, 1985, and
1990, respectively.
He worked for Analog Devices Semiconductor,
Wilmington, MA, until 1992, where he designed
high-speed clock-recovery PLLs that exhibit zero
jitter peaking. He then worked for Rambus Inc.,
Mountain View, CA, where he designed the phaseand delay-locked loops for 500 MB/s DRAMs. In
1994, he joined the faculty of Stanford University,
Stanford, CA, as an Assistant Professor, where he is primarily engaged in
research into microwave applications for silicon IC technology, with a focus
on CMOS ICs for wireless communications.
Dr. Lee was recently named a recipient of a Packard Foundation Fellowship
award and is the author of The Design of CMOS Radio-Frequence Integrated
Circuits (Cambridge University Press). He has twice received the Best Paper
award at ISSCC.

56

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999

Transactions Briefs
A Study of Oscillator Jitter Due
to Supply and Substrate Noise
Frank Herzel and Behzad Razavi

(a)

AbstractThis paper investigates the timing jitter of single-ended and


differential CMOS ring oscillators due to supply and substrate noise.
We calculate the jitter resulting from supply and substrate noise, show
that the concept of frequency modulation can be applied, and derive
relationships that express different types of jitter in terms of the sensitivity
of the oscillation frequency to the supply or substrate voltage. Using
examples based on measured results, we show that thermal jitter is
typically negligible compared to supply- and substrate-induced jitter in
high-speed digital systems. We also discuss the dependence of the jitter
of differential CMOS ring oscillators on transistor gate width, power
consumption, and the number of stages.
Index TermsJitter, oscillator, phase-locked loops, supply noise.

(b)
Fig. 1. Single-ended ring oscillator: (a) block diagram and (b) implementation of one stage.

I. INTRODUCTION
High-speed digital circuits such as microprocessors and memories
employ phase locking at the board-chip interface to suppress timing
skews between the on-chip clock and the system clock [1][3].
Fabricated on the same substrate as the rest of the circuit, the
phase-locked loop (PLL) must typically operate from the global
supply and ground busses, thus experiencing both substrate and
supply noise. The noise manifests itself as jitter at the output of the
PLL, primarily through various mechanisms in the voltage-controlled
oscillator (VCO). As exemplified by measured results reported in the
literature, we show that the contribution of device electronic noise
to jitter is typically much less than that due to supply and substrate
noise.
This paper describes the effect of supply and substrate noise on
the performance of single-ended and differential ring oscillators,
providing insights that prove useful in the design of other types of
oscillators as well. Section II summarizes the oscillators studied in
this work and Section III defines various types of jitter. Sections IV
and V quantify the jitter due to thermal noise in the oscillation
loop and frequency-modulating noise, respectively. Sections VI and
VII apply the developed results to the analysis of supply and
substrate noise, and Section VIII presents the dependence of jitter
upon parameters such as device size, the number of stages, and power
dissipation.
II. RING OSCILLATORS

UNDER INVESTIGATION

In this paper, we investigate both single-ended ring oscillators


(SEROs) and differential ring oscillators (DROs). The latter are
much more important in digital circuit applications, since DROs are
less affected by supply and substrate noise. The circuit topologies are
shown in Fig. 1 for the SERO and in Fig. 2 for the DRO.
Manuscript received October 1, 1997; revised August 2, 1998. This paper
was recommended by Associate Editor B. H. Leung.
F. Herzel was with the Electrical Engineering Department, University of
California at Los Angeles, Los Angeles, CA 90095, USA, on leave from the
Institute for Semiconductor Physics, Frankfurt, Oder, Germany.
B. Razavi is with the Electrical Engineering Department, University of
California, Los Angeles, CA 90095 USA.
Publisher Item Identifier S 1057-7130(99)01471-8.

(a)

(b)
Fig. 2. Differential ring oscillator: (a) block diagram and (b) implementation
of one stage.

The simulations were performed with the SPICE parameters of a


0.6-m CMOS technology. We employed the minimum gate length
throughout the paper. Furthermore, unless indicated otherwise, we
use the following parameters for the differential stage: W
m,
RL
k , ISS
mA, CL
; VDD
V. The rms value of
VDD was chosen to be 71 mV, corresponding to a peak amplitude
of 100 mV for a sinusoidal perturbation.

=1

=1

=0

III. DEFINITIONS

= 80

=3

OF

()

JITTER

We consider the output voltage Vout t of an oscillator in the steady


state. The time point of the nth minus-to-plus zero crossing of Vout t
is referred to as tn . The nth period is then defined as Tn tn+1 0 tn .
For an ideal oscillator, this time difference is independent of n, but in
reality it varies with n as a result of noise in the circuit. This results
in a deviation Tn Tn 0 T from the mean period T . The quantity
Tn is an indication of jitter.

1 =

10577130/99$10.00 1999 IEEE

()

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999

57

because the latter type hardly changes when the oscillator is placed
in the loop.
A more general quantification of the jitter is possible by means of
the steady-state autocorrelation function (ACF) defined as
N
C1T m
Tn+m Tn :
(4)
N !1 N
n=1
To obtain an intuitive understanding of this quantity, we insert (4)
with m
in (2), obtaining

( ) = lim 1

=0

(1

1 )

1Tc2 = C1T (0):

(a)

(5)

Equation (5) states that the ACF with zero argument is the squared
cycle jitter. For a nonzero argument, the ACF decreases with increasing m, finally approaching zero for m ! 1. This indicates that
the timing error Tn has a finite memory. In order to express the
cycle-to-cycle jitter by the ACF, we rewrite (3) as
N
Tcc2
Tn+1 0 Tn 2
N !1 N
n=1
C1T 0 C1T :
(6)

(b)

1 = lim 1 (1
1 )
= 2 (0) 2 (1)

Fig. 3. Illustration of (a) long-term jitter and (b) cycle-to-cycle jitter.

More specifically, absolute jitter or long-term jitter


N
Tabs N
Tn
n=1

( )=

(1)

is often used to quantify the jitter of phase-locked loops. Modeling


the total phase error with respect to an ideal oscillator [Fig. 3(a)],
absolute jitter is nonetheless illsuited to describing the performance
of oscillators because, as shown later, the variance of Tabs diverges
with time.
A better figure of merit for oscillators is cycle jitter, defined as the
rms value of the timing error Tn 1

1
1Tc = Nlim
!1
N

N
n=1

1Tn2 :

(2)

Cycle jitter describes the magnitude of the period fluctuations, but it


contains no information about the dynamics.
The third type of jitter considered here is cycle-to-cycle jitter
[Fig. 3(b)] given by

1
1Tcc = Nlim
!1
N

(Tn+1 0 Tn )2

n=1

(3)

representing the rms difference between two consecutive periods.


Note the difference between the cycle jitter and the cycle-to-cycle
jitter: the former compares the oscillation period with the mean period
and the latter compares the period with the preceding period. Hence,
in contrast to cycle jitter, cycle-to-cycle jitter describes the shortterm dynamics of the period. The long-term dynamics, on the other
hand, are not characterized by cycle-to-cycle jitter. For example, if
=f noise modulates the frequency slowly, Tcc does not reflect the
result accurately. With respect to the zero crossings, the cycle-to-cycle
jitter is a double-differential quantity in that three zero crossings of the
output voltage are related to each other. As discussed in Section V,
this results in a completely different dependence on the modulation
frequency than for the cycle jitter.
We should note that an oscillator embedded in a phase-locked loop
periodically receives correction pulses from the phase detector and
charge pump, and hence its long-term jitter strongly depends on the
PLL dynamics. Thus, for the analysis of a free-running oscillator,
cycle jitter and cycle-to-cycle jitter are more meaningful, particularly

1 In this paper, we use a time average definition of jitter which is equivalent


to the stochastic average if and only if the process T 2 is ergodic.

This expression will be used for an analytical calculation of the


cycle-to-cycle jitter in Section V.
IV. JITTER DUE TO DEVICE ELECTRONIC NOISE
The electronic noise of the devices in an oscillator loop leads to
phase noise and jitter [5], [7], [8]. Our objective is to express jitter
in terms of phase noise and vice versa. These relationships are useful
as they relate two measurable quantities.
In this paper, we neglect the effect of =f noise because it
introduces only slow phase variations in the oscillator. Such variations
are suppressed by the large loop bandwidth of PLLs used in todays
digital systems.
As derived in the Appendix, for white noise sources in the
oscillator, the single-sideband phase noise S (phase noise with
respect to the carrier) can be expressed in terms of the cycle-to-cycle
jitter according to

S (!) =

!03 =4 1Tcc2


(! 0 !0)2 + !03 =8 2 1Tcc4

3
2
 !(0!=40!10 )T2 cc (7)
and ! 0 !0 is the offset

where !0 is the oscillation frequency


frequency. The Appendix also shows that the cycle-to-cycle jitter
can be deduced from the phase noise according to

1Tcc2  4!3 S (!)(! 0 !0 )2 :


0

(8)

To obtain an estimate of the thermal jitter, we consider the differential


CMOS ring oscillator in [5]. For the 2.2-GHz oscillator with a phase
noise of 094 dBc/Hz at 1 MHz offset, we obtain from (8) a thermal
cycle-to-cycle jitter of 0.3 ps, i.e., less than 0.3 . Similar values are
obtained for the 900-MHz CMOS ring oscillators reported in [6].
In most timing applications, such small values are negligible with
respect to other sources of random jitter.
The thermal absolute jitter is proportional to the square root of the
measurement interval t. As derived in the Appendix, the absolute
jitter is given by

1Tabs = f20 1Tcc 1t:

(9)

In [9], the rms value of absolute jitter has been divided by the
square root of the measurement time to obtain a time-independent
figure of merit. This is not possible for supply and substrate noise,
since (7)(9) are derived for white noise in the feedback loop. Supply
and substrate noise, however, are generally not white.

58

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999

in the remainder. The deviation of the period from the mean is

1T (t) = f0 + 11 f0 (t) 0 f10

(12)

0
 0 VmfK
2 cos !m t:

(13)

1 (+ )

Multiplying this expression by T t  and averaging the result


with respect to t, we obtain the steady-state ACF
2

1T (t +  )1T (t) = Vm2fK4 0 cos !m :


0

Fig. 4. Illustration of frequency modulation through changes of drain junction capacitances.

(14)

1 ()

This quantity represents the ACF of the process


T t in a
continuous-time description. For the evaluation of the jitter according
and C
of the discrete-time
to (5) and (6), we need the values C
ACF. If the ACF does not change significantly during one oscillation
period, these values can be determined from the continuous-time
ACF at time points 
2 T . A numerical
2 T and 
verification of this approach is given below.
Inserting (14) with 
in (5), we find the cycle jitter

(1)

(0)

=0

=1 
=0 
=0
1Tc = Vpm2Kf 20 :

(15)

= T = 1=f0 , we obtain from (6) and (14) the

For 
and 
cycle-to-cycle jitter

1Tcc = VmfK2 0 1 0 cos(!m=f0 ):


0

Fig. 5. Illustration of the VCO model of an oscillator.

V. JITTER OF A FREQUENCY-MODULATED OSCILLATOR


The frequency of an oscillator generally depends on the supply
and substrate voltage. The variation of the oscillation frequency with
a voltage may be described by a sensitivity function, also called the
gain of the VCO and denoted by KVCO .
For example, as shown in Fig. 4, the drain junction capacitance of
M1 and M2 varies with VDD and VSub , thus modulating the frequency
of the ring oscillator. In some cases, KVCO itself may be a function
of the modulating frequency. In Fig. 4, for example, high-frequency
supply noise results in fast changes in VP and hence substantial
displacement current through CS (the capacitance contributed by M1 ,
M2 , and the current source). Note that, in general, the frequency
of an oscillator depends on various bias and supply voltages, as
conceptually illustrated in Fig. 5.
An oscillator subject to supply and substrate noise may be considered as a VCO with different control voltages each having a
different sensitivity. In this section, we attribute the cycle jitter and the
cycle-to-cycle jitter of a frequency-modulated oscillator to the static
sensitivity K0 . As shown in Sections VI and VII, these expressions
describe supply and substrate noise quite accurately.
Let the modulating control voltage be a small sinusoidal perturbation

1Vm (t) = Vm cos !m t:

Equations (15) and (16) express the jitter in terms of the lowfrequency sensitivity and the modulation frequency. They will be
verified numerically in Sections VI and VII. The main benefit of
these equations is that the calculation of the jitter is reduced to the
calculation or measurement of the oscillation frequency as a function
of the supply or substrate voltage.
From (15), we note that cycle jitter is independent of frequency
so long as the quasi-static approximation (11) holds. By contrast,
cycle-to-cycle jitter increases with frequency. For fm  f0 we find
from (16)

1Tcc  VmpK2f0 !3 m :
(17)
0
Note that the cycle-to-cycle jitter 1Tcc is approximately proportional
to the modulation frequency fm . This can be interpreted by noting
that 1Tcc is a double-differential quantity, as is evident from (5)

and (6).
Having reduced the jitter calculation to the static sensitivity K0 ,
we need to extract this quantity from simulations. For this purpose,
we apply a dc voltage perturbation to the supply. Fig. 6 shows the
oscillation frequency of the SERO and the DRO as a function of the
supply voltage. The frequency varies linearly with the supply voltage
over a relatively wide range of VDD . The slope of the curves in Fig. 6
represents the low-frequency sensitivity K0 , indicating that the SERO
is much more sensitive than the DRO. Using these values in (15) and
(16), we can predict the jitter from supply and substrate noise easily.

(10)
VI. JITTER DUE

We assume, as an approximation, that the frequency change follows


the control voltage according to

1f0 (t) = VmK0 cos !mt

(16)

(11)

where K0 is the static sensitivity, also called low-frequency VCO


gain. This approximation is referred to as quasi-static approximation

TO

SUPPLY NOISE

The supply and substrate noise created in a digital system is quite


complex. In addition to components at the clock frequency and
harmonics and subharmonics thereof, the noise spectrum generally
exhibits random signals resulting from the activities of each building
block as well. A rigorous treatment requires that the noise spectrum
be measured in a realistic environment and subsequently incorporated
in the analysis as explained in Section V.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999

59

(a)

(a)

(b)

(b)

Fig. 6. Oscillation frequency of (a) the single-ended ring oscillator and (b)
the differential ring oscillator as a function of static supply voltage.

Fig. 7. Cycle jitter and cycle-to-cycle jitter of (a) the SERO and (b) the DRO
as a function of supply voltage noise frequency. The solid lines represent the
quasi-static FM expressions.

In the following, we investigate the jitter due to sinusoidal supply


voltage perturbations. The calculation of the jitter consists of the
following steps: interpolation of the voltage waveform to find the
zero crossings; calculation of the periods Tn and subtraction of the
mean period T to obtain Tn ; and calculation of the cycle jitter (2)
and the cycle-to-cycle jitter (3) by performing time averaging.
We should also mention that simulations indicate that jitter has
a relatively linear dependence on the noise amplitude for supply
variations as large as a few hundred millivolts.
Fig. 7 plots the analytical and simulated cycle and cycle-to-cycle
jitter of single-ended and differential ring oscillators. As can be seen,
the analytical results of Section V predict the jitter with reasonable
accuracy.

Fig. 8. Substrate noise modeling.

VII. JITTER DUE

TO

SUBSTRATE NOISE

Substrate noise can be treated in the same fashion as supply noise.


For the numerical simulation of substrate noise, the bulk terminal of
the transistors is driven by a noise source (Fig. 8). Fig. 9 shows the
calculated jitter of the DRO as a function of the noise frequency.
Comparison with Fig. 7 indicates that for the DRO, a supply voltage
perturbation is almost equivalent to a substrate voltage perturbation of
opposite sign. To understand this, note from Fig. 10 that, with an ideal
tail current source, a change of V in VDD is equivalent to a change

of 0 V in Vsub . Simulations confirm that the static sensitivity K0


is indeed equal for supply and substrate noise, apart from the sign.
Fig. 9 also demonstrates that the quasi-static FM approach is suited
to describing the jitter introduced by substrate noise. Furthermore,
it suggests that a substantial fraction of the jitter results from the
voltage dependence of Cdb and Csb .
VIII. OSCILLATOR DESIGN

FOR

LOW JITTER

The simulation results presented thus far indicate the superior


performance of differential oscillators with respect to single-ended
topologies. Nonetheless, even differential configurations have a wide
design space; device size, voltage swings, power dissipation, and the
number of stages in a ring oscillator influence the overall sensitivity
to supply and substrate noise.

60

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999

Fig. 11. Jitter of the DRO versus gate width.


Fig. 9. Cycle jitter and cycle-to-cycle jitter of the DRO versus substrate
voltage noise frequency. Solid lines represent the quasi-static FM expressions.
The empty symbols show the jitter with the drain-bulk and source-bulk
capacitances set to zero.

(a)

Fig. 10.
DRO.

Illustration of the equivalence of supply and substrate noise for the

In this section, we study jitter as a function of three parameters:


transistor gate width, power dissipation, and the number of stages. To
make meaningful comparisons, the circuit is modified in each case
such that the frequency of oscillation remains constant. These parameters also affect the thermal jitter to some extent, but, considering the
vastly different designs reported in [5] and [6], we note that this type
of jitter still remains negligible.
A. Effect of Transistor Gate Width
The differential three-stage ring oscillator of Fig. 2 begins to
oscillate for W  30 m.
Fig. 11 shows the effect of the gate width on the jitter, where the
oscillation frequency is kept constant by adjusting CL in Fig. 2. The
jitter reaches a minimum for W  80 m. For large W , the value of
CL must be reduced so as to maintain the same oscillation frequency,
yielding a larger voltage-dependent fraction due to drain and source
junctions of each device and hence a higher sensitivity to noise.

(b)
Fig. 12. Illustration of the relationship between power consumption and
noise for (a) device electronic noise and (b) supply noise.

By contrast, the effect of supply and substrate noise on the jitter of a


given oscillator topology is relatively independent of the power drain.
This can be understood with the aid of the conceptual illustrations
in Fig. 12, where the output voltages of N identical oscillators are
added in phase. In Fig. 12(a), only the device electronic noise is
considered [5]. Since thepnoise in each oscillator is uncorrelated, the
output noise voltage is N times that of each oscillator, whereas
the output signal voltage is N 2Vj . In Fig. 12(b), on the other hand,
all oscillators are disturbed by the same noise source, thus exhibiting
completely correlated noise. That is, both the noise voltage and the
signal voltage are increased by a factor of N .
To confirm the above observation, the gate width and tail current
were decreased while the load resistance was increased proportionally. Table I shows that the jitter is quite constant.

B. Effect of Power Consumption

C. Effect of Number of Stages

The jitter resulting from device electronic noise generally exhibits


an inverse dependence upon the oscillator power dissipation [5], [10].

In applications where the required oscillation frequency is considerably lower than the maximum speed of the technology, a ring

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999

IMPACT

OF

61

TABLE I
POWER CONSUMPTION

TABLE II
THREE-STAGE VERSUS SIX-STAGE OSCILLATOR

Fig. 14. Grounded shield used under the capacitor to block substrate noise.

an n-well, grounded by a low-resistance n+ ring, is placed under the


capacitor so as to block the noise produced in the substrate.
IX. CONCLUSION
We have investigated the timing jitter in oscillators subject to
supply and substrate noise. For digital timing applications, the effect
of supply and substrate noise on the jitter is typically much more
pronounced than that of thermal noise. For supply and substrate noise,
we have derived analytical relationships between the cycle-to-cycle
jitter and the low-frequency sensitivity of the oscillation frequency
to supply or substrate noise. These relationships have been verified
by means of numerical calculations for single-ended and differential
CMOS ring oscillators. For differential ring oscillators, we have
investigated the dependence of the jitter on the transistor gate width,
power consumption, and the number of stages. As a special result,
we have found that in applications where the required oscillation
frequency is lower than the maximum speed of the technology, a
three-stage ring oscillator with additional load capacitances gives the
lowest jitter.
APPENDIX
JITTER AND PHASE NOISE DUE TO THERMAL AND SHOT NOISE
The output voltage of an oscillator can be written as

( ) = V0 cos[!0t + (t)]

V t

(18)

()

where V0 is the amplitude, !0 is the oscillation frequency, and  t


is the slowly varying excess phase. The excess frequency is

Fig. 13. Jitter of the three-stage and the six-stage DRO versus gate width
for an oscillation frequency of 500 MHz.

1!(t) = dtd (t)


and hence

( )=

 t

oscillator may incorporate more than three stages. Thus, the optimum
number of stages with respect to the jitter is of interest.
Shown in Table II and plotted in Fig. 13 is the jitter of three-stage
and six-stage oscillators designed for a frequency of 500 MHz with
constant tail current and voltage swings. We note that the minimum
values of cycle jitter and cycle-to-cycle jitter are smaller in a threestage topology. This is because for the three-stage oscillator, the
reduction of the oscillation frequency to the desired value is obtained
by means of the fixed capacitances CL rather than by the voltagedependent capacitances of the transistors. Hence, a smaller fraction
of the total load capacitance is subject to variations with supply and
substrate noise.
The addition of a fixed capacitor to each stage nonetheless entails
the issue of substrate noise coupling to the bottom plate of the
capacitor. In order to minimize this effect, a grounded shield must
isolate the capacitor from the substrate, as illustrated in Fig. 14. Here,

1!(u) du + (0):

(19)

(20)

Thermal and shot noise may be considered as white noise since


their cutoff frequencies are typically much higher than the oscillation
frequency. White noise in the feedback loop of the oscillator results
in phase diffusion, a phenomenon described by a Wiener process
[11]. Extensive investigations of phase noise indicate that white noise
sources in all types of oscillators give rise to a phase noise power
spectrum proportional to = ! 2 , where ! is the offset frequency
with respect to the carrier frequency [4], [5]. This trend is valid for
offset frequencies as high as several percent of the carrier frequency.
Thus, frequency noise, ! t , can be assumed white in such a band.
The autocorrelation of ! t is given by

1 (1 )

1 ()
1 ()
1!(t +  )1!(t) = 2D( )
(21)
where D is the diffusivity and  ( ) the Delta function. The
probability density of (t) represents a Gaussian distribution centered

62

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999

at 

(0) with the variance

2 = 2D t:

(22)

On the other hand, the phase noise can be expressed by the cycle-tocycle jitter by inserting (32) in (25), yielding

As evident from (22), the variance diverges with time. The autocorrelation of V t is known [12] and reads

()
2
hV (t +  )V (t)i = V20 exp(0Dj j)cos(!0  ):

(24)
(! 0 !0)2 + D2 :
This quantity is often normalized to V02 =2 and referred to as relative

phase noise with respect to the carrier [5] or as single-sideband phase


noise [8], given by

2D
(! 0 !0 )2 + D2 :

 D , we obtain from (25)


2D
S (!) 
(! 0 !0 )2 :

()

=0

The excess phase change during the nth cycle is referred to as


The nth oscillation period is defined by the relation

2f0Tn = 2 + 1n :

(27)

1n .
(28)


1Tn = 21fn0 = 1n 2T :
(29)
Hence, the cycle jitter 1Tc of the period during one cycle is related
to 1c according to

1Tc = 1c 2T :
(30)
For white noise sources, two successive periods are uncorrelated.
Since cycle-to-cycle jitter represents the difference between two
periods, the variance of cycle-to-cycle jitter is twice as large as the
variance of one period, yielding

(31)

Combining (27), (30), and (31), we obtain

1Tcc2 = 8!3 D
0

with

!0 =

2 :
T

(32)

(33)

The cycle-to-cycle jitter can now be expressed in terms of the singlesideband phase noise by inserting (32) in (26) to give

1Tcc2  4!3 S (!)(! 0 !0 )2 :


0

S =

f03 1Tc2
(f 0 f0 )2

(36)

1abs = 2D 1t =  1t:

=1

1Tcc = 21Tc :

(35)

(26)

For the deviation of the nth period Tn from the mean period
=f0 , we then find

T

(25)

=

1c = 2DT:

1Tcc4

where f0
!0 = . Equation (36) turns out to be a special case of
(35) for ! 0 !0  D .
The absolute jitter increases proportionally to the square root of the
measurement interval t as evident from (22). Hence, the absolute
phase jitter is

Next, we will relate the cycle-to-cycle jitter and the single-sideband


phase noise to each other. Note that the stationary Wiener process
has no memory and the increments in different time intervals are
statistically independent [11]. Therefore, the rms mean increment of
the excess phase  t within one cycle, i.e., the cycle jitter of the
phase, equals the increment of  t between t
and t T . Thus,
from (22), we obtain the phase cycle jitter as

()

A similar expression has been derived for ring oscillators in [10] and
reads in our notation

D

SV (!) = V02

For ! 0 !0

!03 =4 1Tcc2


(! 0 !0)2 + !03=8

(23)

Performing the Fourier transformation, we obtain the one-sided power


spectral density

S (!) =

S =

(34)

(37)

Using (32), the proportionality constant  can be related to the


cycle-to-cycle jitter according to

=

2D = 2f03=2 1Tcc :

(38)

REFERENCES
[1] I. A. Young, J. K. Greason, and K. L. Wong, A PLL clock generator
with 5 to 110 MHz of lock range for microprocessors, IEEE J. SolidState Circuits, vol. 27, pp. 15991607, Nov. 1992.
[2] J. Alvarez, H. Sanchez, G. Gerosa, and R. Countryman, A widebandwidth low-voltage PLL for powerPCTM microprocessors, IEEE
J. Solid-State Circuits, vol. 30, pp. 383391, Apr. 1995.
[3] R. Bhagwan and A. Rogers, A 1 GHz dual-loop microprocessor PLL
with instant frequency shifting, in IEEE Proc. ISSCC, San Francisco,
CA, Feb. 1997, pp. 336337.
[4] D. B. Leeson, A simple model of feedback oscillator noise spectrum,
in Proc. IEEE, pp. 329330, Feb. 1966.
[5] B. Razavi, A study of phase noise in CMOS oscillators, IEEE J.
Solid-State Circuits, vol. 31, pp. 331343, Mar. 1996.
[6] T. Kwasniewski et al., Inductorless oscillator design for personal
communications devicesA 1.2 m CMOS process case study, in
Proc. CICC, May 1995, pp. 327330.
[7] F. X. Kartner, Analysis of white and f 0 noise in oscillators, Int. J.
Circuits Theory, Appl., vol. 18, pp. 485519, Sept. 1990.
[8] W. Anzill and P. Russer, A general method to simulate noise in oscillators based on frequency domain techniques, IEEE Trans. Microwave
Theory Tech., vol. 41, pp. 22562263, Dec. 1993.
[9] J. A. McNeill, Jitter in ring oscillators, IEEE J. Solid-State Circuits,
vol. 32, pp. 870879, June 1997.
[10] T. C. Weigandt, B. Kim, and P. R. Gray, Analysis of timing jitter in
CMOS ring oscillators, in Proc. IEEE Int. Symp. Circuits and Systems
(ISCAS94), London, U.K., June 1994, vol. 4, pp. 2730.
[11] C. W. Gardiner, Handbook of Stochastic Methods. Berlin: SpringerVerlag, 1983.
[12] R. L. Stratonovich, Topics in the Theory of Random Noise. New York:
Gordon and Breach, 1967.

204

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 2, FEBRUARY 2001

A 2-V 900-MHz Monolithic CMOS Dual-Loop


Frequency Synthesizer for GSM Receivers
William S. T. Yan and Howard C. Luong, Member, IEEE

AbstractA 900-MHz monolithic CMOS dual-loop frequency


synthesizer suitable for GSM receivers is presented. Implemented
in a 0.5- m CMOS technology and at a 2-V supply voltage, the
dual-loop frequency synthesizer occupies a chip area of 2.64 mm2
and consumes a low power of 34 mW. The measured phase noise of
the synthesizer is 121.8 dBc/Hz at 600-kHz offset, and the measured spurious levels are 79.5 and 82.0 dBc at 1.6 and 11.3 MHz
offset, respectively.
Index TermsFrequency synthesis, frequency synthesizer,
phase-locked loop, radio frequency, voltage-controlled oscillator.
Fig. 1. Block diagram of the GSM-receiver front-end.

I. INTRODUCTION

ODERN transceivers for wireless communication consist of low-noise amplifiers, power amplifiers, mixers,
DSP chips, filters, and frequency synthesizers. These building
blocks have been realized using hybrid technologies and require
interfacing circuits, which increases the power consumption and
limits the maximum operating speed of the transceivers. For
this reason, it has become increasingly attractive to design and
monolithically integrate all these building blocks on a single
chip.
Designing fully integrated frequency synthesizers for this integration is always desirable but most challenging. The first
requirement is to achieve high-frequency operation with reasonable power consumption. However, the most critical challenges for the frequency synthesizer are the phase-noise and
spurious-level performance. Finally, small chip area is essential
to monolithic system integration.
In recent years, monolithic frequency synthesizers with
good phase-noise performance have been reported [1][3].
However, those designs operate at supply voltages of at least
2.7 V and power consumption of more than 50 mW. Moreover,
frequency synthesizers suffer from fractional
fractionalspurs which degrade their spurious-tone performance.
This paper presents a monolithic dual-loop frequency
synthesizer for GSM 900 system, which is implemented in a
0.5- m CMOS process, that achieves high operating frequency
(935.2959.8 MHz), low power consumption (34 mW), low
phase noise ( 121.8 dBc/Hz at 600kHz), low spurious level
( 82.0 dBc at 11.3MHz), and fast switching time (830 s).
Section II derives the design specification of the frequency
synthesizer for GSM 900. Section III describes the architecture for the proposed dual-loop design. In Section IV,
Manuscript received December 29, 1999; revised October 5, 2000.
The authors are with the Department of Electrical and Electronic Engineering,
Hong Kong University of Science and Technology, Clear Water Bay, Kowloon,
Hong Kong (e-mail: eetak@ee.ust.hk; eeluong@ee.ust.hk).
Publisher Item Identifier S 0018-9200(01)00927-1.

circuit implementation of critical building blocks is discussed.


Section V presents the measurement results of the synthesizer
including its phase noise, spurious level, and switching time
of the frequency synthesizer together with a comprehensive
performance evaluation.
II. DESIGN SPECIFICATION
The performance of frequency synthesizers is mainly specified by their output frequency, phase noise, spurious level, and
switching time. This section derives the specifications of a frequency synthesizer for GSM receivers.
A. Output Frequency
In GSM-900 systems, the receiver-channel frequencies are
expressed as follows:
MHz

(1)

is the channel number. To receive


where
signals in different channels, a GSM-receiver front-end, shown
in Fig. 1, is adopted. The receiver front-end consists of a lownoise amplifier (LNA) and an RF filter for filtering out-of-band
noise and blocking signals. The received signal is then mixed
down to an IF frequency ( ) of 70 MHz for base-band signal
processing. To extract information from the desired channel, the
) of the frequency
local oscillator (LO) output frequency (
synthesizer is changed accordingly, as follows:

MHz

(2)

which is the output-frequency range of the frequency synthesizer to be achieved.


B. Phase Noise
The blocking-signal specification for GSM 900 receivers is
shown in Fig. 2, where the desired signal power can be as low

00189200/01$10.00 2001 IEEE

YAN AND LUONG: MONOLITHIC CMOS DUAL-LOOP FREQUENCY SYNTHESIZER

205

Fig. 2. SNR degradation due to the phase noise and spurious level.

as 102 dBm. At 600-kHz offset frequency, the power of the


blocking signal can be as high as 43 dBm [4]. With a correct
LO frequency, the desired channel signal is downconverted to IF
frequency. However, blocking signals are also downconverted
with the LO signal and its phase noise. Since the power of the
blocking signal is much larger than that of the desired signal,
the phase-noise power falls into the IF frequency and degrades
the signal to noise ratio (SNR). The phase-noise specification
can be expressed as follows:
Fig. 3. GSM 900 receive and transmit time.

SNR
dBc/Hz at 600 kHz

(3)

of 9 dB is the SNR specification for the whole


where SNR
dBm and
dBm are
receiver, and
the power levels of the minimum desired signal and maximum
blocking signal, respectively.
C. Spurious Tones
Because of the feedthrough and modulation of the reference
away from the
signal, two spurious tones appears at the
desired output frequency, as shown in Fig. 2. The derivation of
the spurious-tone specification is similar to that of the phase
noise except that the channel bandwidth is not considered in this
can be expressed as
case. The spurious-tone specification
follows:
SNR
dBc at 1.6 MHz
dBc for offset

MHz.

(4)

respectively. For system monitoring purposes, a time slot between slot #6 and slot #7 is adopted. Therefore, the most critical
switching time is from the transmission period (slot #4) to the
system monitoring period (slot #6.5), which is equal to 865 s.
However, to take care of the settling time of the other components, the switching time of the frequency synthesizer is recommended to be kept within one time slot (577 s) [5].
III. DUAL-LOOP DESIGN
To reduce the switching time and the chip area of a synthesizer, a high loop bandwidth and a high reference frequency
are desired. Moreover, to suppress the phase-noise contribution of the reference signal and improve frequency-divider complexity, a lower frequency-division ratio is desirable. Therefore,
a dual-loop frequency synthesizer is proposed [6]. As shown in
Fig. 4, the dual-loop design consists of two reference signals and
two phase-locked loops (PLLs) in cascade configuration. In the
feedback path of the high-frequency loop, a mixer is adopted to
provide the frequency shift. The output frequency of the synthesizer is expressed as follows:

D. Switching Time
In GSM 900 systems, time-division multiple-access (TDMA)
is adopted within each frequency channel. As shown in Fig. 3,
each frequency channel is divided into eight time slots. Signals are received and transmitted in time slot #1 and slot #4,

(5)
and
are frequencies of the two reference
where
, and
are frequency division ratios.
signals, and

206

Fig. 4.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 2, FEBRUARY 2001

Proposed dual-loop frequency synthesizer.

Due to the dual-loop architecture, the comparison frequencies of the low-frequency and high-frequency loops are scaled
up from 200 kHz to 1.6 and 11.3 MHz, respectively. Therefore,
the loop bandwidths of both PLLs can be increased so that the
switching time and the chip area can be reduced. Compared
to single-loop integer- designs, the frequency-division ratio
is reduced from 42364449
of the programmable divider
to 226349. Such a reduction in the division ratio significantly
simplifies the frequency-divider design and reduces phase-noise
contribution of the input reference signal.
In the proposed dual-loop synthesizer, the divide-by-32 diand the high-frequency loop together greatly attenuvider
ates the phase noise and the spurious tones of the low-frequency
loop. As such, the low-frequency loop can be designed to have
a larger loop bandwidth and a loop filter as small as one-fifth
of the loop filter in the high-frequency loop. The low-frequency
loop requires additional components, including the phase-frequency detector (PFD1), the charge pump (CP1), and the fre, but they are all quite small and have very
quency divider
little impact on the chip area. In additional, VCO1 is implemented by a ring oscillator, which occupies a much smaller chip
area compared to VCO2. Altogether, the dual-loop design requires no more than 25% overhead in the chip area compared to
a fraction- design with the same loop bandwidth.
of the
Although the input-reference frequency
low-frequency loop is scaled up by 8 times; the required
frequency range of the oscillator VCO1 in the low-frequency
loop is also scaled up from 25 to 200 MHz. On the other
hand, the phase-noise of the ring oscillator is attenuated by the
and is then amplified by the high-frefrequency divider
quency loop; the total phase-noise attenuation from VCO1
output to the synthesizer output is 18 dB. Consequently, this
voltage-controlled oscillator (VCO) requires a high operating
frequency (600 MHz), a wide frequency range (200 MHz), and
a low phase noise ( 103 dBc/Hz at 600 kHz). A novel ring
VCO design that meets all of these tough specifications will be
presented in the next section.

IV. CIRCUIT IMPLEMENTATION


This section discusses the design consideration and circuit
implementation of the major building blocks that are unique and
critical to the proposed dual-loop synthesizer, namely the two
VCOs, the frequency dividers, the charge pump, and the loop
filters. Detailed analysis and design of other building blocks will
not be presented, either because they can be found somewhere
else or they are too obvious.
A. Ring Oscillator VCO1
The schematic of the proposed two-stage ring oscillator and
its delay cell to meet the required specification as described in
Section III are shown in Fig. 5(a) and (b), respectively. The delay
as input transconductors,
cell consists of nMOS transistors
for maintaining oscillacross-coupled pMOS transistors
, and a bias trantion, diode-connected pMOS transistors
for frequency tuning. The source nodes of transistors
sistor
are connected to supply to maximize its output amplitude
, which also helps suppress noise sources by turning them off
more often [7] and thus further enhances the phase-noise performance.
The half circuit of the delay cell is shown in Fig. 5(c). By
equating the delay-cell voltage gain to be unity, the oscillating
frequency of the ring oscillator can be expressed as follows:

(6)
is transconductance,
is channel conductance, and
where
is the total capacitance at output node. Oscillation starts
is large enough to overcome the output load
when the
.
V, transistors
are turned
When control voltage
, and the oscillator operates at maximum freon to cancel
, transistor
are
quency. When control voltage

YAN AND LUONG: MONOLITHIC CMOS DUAL-LOOP FREQUENCY SYNTHESIZER

207

Fig. 5. Circuit implementation of the ring oscillator VCO1. (a) Ring oscillator. (b) Delay cell. (c) Half circuit of delay cell.

turned off
, and the oscillator operates at minimum
, and frequency range
are
frequency. By (6),
expressed as follows:

(7)

is proportional to
, nMOS transistors
Since
are adopted as the input devices to minimize power consumption. From (7), 50% tuning range can be achieved when
.
Based on the approximate impulse-stimulus function (ISF)
and the analysis presented in [8], the phase noise of the oscillator is estimated to be approximately 107 dBc/Hz at 600-kHz
offset. On the other hand, using SpectreRF [9], the phase noise
is simulated to be 111.7 dBc/Hz at 600 kHz.
B. LC Oscillator VCO2
As the far-offset phase noise is dominated by the VCO2, an
LC oscillator is adopted to meet the stringent phase-noise specification. Fig. 6 shows the schematic of the LC oscillator. Crossare used to start and to maintain oscilcoupled transistors
lation with lower parasitics. PN-junction varactors implemented
by p diffusion on the n-well are used for frequency-tuning purpose. The common-mode output voltage is designed at 1.1 V
. To reduce
to enhance the driving of the frequency divider
phase-noise contribution due to flicker noise, pMOS transistors
are used as the current source.
To design an LC oscillator which satisfies the phase-noise requirement with minimum power consumption, inductors with
large inductance and small series resistance are desired. Therefore, two-layer inductors are adopted [10] for which the inductance and the quality factor can be scaled up by 4 and 2 times,
respectively. For the same reason, pn-junction varactors are interdigitized with p islands surrounded by n-well contacts to
enhance the quality factor. Finally, the transconductance
of transistor
is designed so that it does not overcompen-

Fig. 6.

Circuit implementation of the LC oscillator VCO2.

sate the LC tank too much (only twice) to reduce phase-noise


.
contribution by transistors
Based on the method described in [11], the phase noise of
the LC VCO is estimated to be 124.0 dBc/Hz at 600-kHz
frequency offset, which agrees well with the simulation using
SpectreRF.
C. Frequency Dividers
As the divide-by-4 frequency divider
needs to convert
sinusoidal signals from the VCO2 output into square-wave
signals, the first stage of the divider is implemented by
pseudo-nMOS logic while the second divide-by-2 divider is
implemented by the TSPC-logic divide-by-2 divider [12]. The
first divider is shown in Fig. 7 and consists of a pseudo-nMOS
amplifier and a divide-by-2 divider. Since the pseudo-nMOS
logic is a ratioed logic, the ratio between pMOS and nMOS
transistors is designed to be less than 1.6 to make sure output
logic 0 turns off the next stage.
D. Programmable-Frequency Divider
Fig. 8 shows the block diagram of the programmable-fre[13]. At reset state, the prescaler divides
quency divider

208

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 2, FEBRUARY 2001

TABLE I
SYSTEM DESIGN OF PROGRAMMABLE-FREQUENCY DIVIDER

Fig. 7. Circuit implementation of the pseudo-nMOS divide-by-2 frequency


divider.

control signal
1, the gated inverter is by-passed,
and the prescaler is a divide-by-12 divider. When control signal
0, the final state
010 will be de0 and the input signal is delayed by
tected, at which
one clock cycle. Thus the function of divide-by-13 is achieved.
The back-carrier-propagation approach allows low-frequency
signals (more significant bits) to switch to the final state much
earlier than high-frequency signals (less significant bits) and
thus reduces power consumption for a given speed.
E. Charge Pumps and Loop Filters

Fig. 8. Block diagram of the programmable-frequency divider

input signal by
, and its output is counted by both and
counters. After the counter has counted pulses, the
counter changes the state of the modulus control line
and
counter counts
the prescaler divides input by . Then the
the remaining cycles to reach overflow. As a whole, the
programmable divider generates one complete cycle for every
input cycles. The operation
repeats after the counter is reset.
As the frequency-division ratio (226 349) can be achieved
with different combinations of , , and , the most optimal
combination in terms of performance needs to be identified and
chosen as the design. As the counter must finish before the
counter resets it, the division ratio of the counter should
be larger than that of the counter. To optimize the power consumption, the operating frequencies and number of bits of both
the and counters should be minimized.
Table I shows the different combinations of , , and
which can implement the desired division ratio. Case 1 requires
the highest operating frequencies and number of bits for and
counters, so it is not adopted. Case 4 has the problem that
the value is larger than the value. It seems that Case 3 is
the best one, but Case 2 is chosen as the final design because it
is much easier to implement an asynchronous divide-by-12 frequency divider than a divide-by-14 divider.
The dual-modulus prescaler is implemented by the back-carrier-propagation approach as shown in Fig. 9 [14]. When

Fig. 10 shows the circuit implementation of the charge


pumps used in the two loops. Each charge pump consists of
two cascode-current sources for both the pull-up and pull-down
currents, four complementary switches, and a unity-gain
amplifier. By using high-swing cascode current sources, the
output impedance is increased for effective current injection.
Minimum-size complementary switches are adopted to minimize clock feedthrough and charge injection of the switches.
The unity-gain amplifier keeps the voltages of nodes VCO and
to be equal so that charge sharing between nodes VCO, ,
can be minimized [15].
and
The design of the loop filters in the two PLLs is a secondorder low-pass filter which is implemented using linear capacitors and silicide-blocked polysilicon resistors. The values of
capacitance, resistance, and charge-pump current are optimally
designed to satisfy simultaneously the phase-noise, spurioustone, and switching-time requirements with minimum chip area
[16]. The loop bandwidth of the low-frequency and high-frequency loops are 40 and 27 kHz, respectively.
F. Phase Noise of the Dual-Loop Frequency Synthesizer
Based on the linearized model shown in Fig. 11, the transfer
to output phase
of both
function from the input phase
the low-frequency and high-frequency loops can be expressed
as follows:

(8)
where
and ,

is the phase-detector gain,


is the VCO gain,
and are the total capacitance, zero time constant,

YAN AND LUONG: MONOLITHIC CMOS DUAL-LOOP FREQUENCY SYNTHESIZER

209

Fig. 9. Circuit implementation of the dual-modulus prescaler.

Fig. 10.

Circuit implementation of the charge pump and the loop filter.

and pole time constant of the corresponding loop filters, respecis the phase-detector gain,
is the VCO gain,
tively.
, and are the total capacitance, zero time constant,
and
and pole time constant of the corresponding loop filters, respectively. Since the transfer function is a low-pass function, the
reference phase noise is highly attenuated at high offset frequency. It also shows that the close-in phase noise of the reference signals is amplified by the frequency-division ratio. In
this work, the division ratio is reduced from 4449 to 349, and
the phase-noise contribution from the reference signals is supdB.
pressed by
Another important source of the close-in phase noise is
the charge-pump noise. The transfer functions between the

charge-pump noise current to the output phase noise can be


derived to be

(9)

210

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 2, FEBRUARY 2001

Fig. 11. Linearized model of the dual-loop frequency synthesizer.

It shows that small frequency-division ratio and large phasedetector gain or large charge-pump current are preferred for
phase-noise consideration. Another factor not included in (9)
which also affects the charge-pump noise is its turn-on time.
In this proposed synthesizer, the charge-pump turn-on time is
designed to be equal to 1/10 of the input period so that it is
long enough to eliminate the phase-frequency detector (PFD)
dead-zone problem, but at the same time is short enough to minimize the charge-pump phase-noise contribution.
The phase-noise contribution of the loop-filter resistors in
both PLLs can be estimated using their equivalent noise currents as follows:

which are high-pass functions. Therefore, the far-offset phase


noise of the synthesizer is dominated by the VCO phase noise.
Since the loop bandwidths of both PLLs are designed in a range
of tens of kilohertz to achieve spurious-level specification, the
far-offset phase noise of the PLLs only depends on the VCO
phase noise itself.
To evaluate the overall phase-noise performance, the relationship between the phase noise of the low-frequency loop and that
of the synthesizer output can be written as

(12)
(10)
which are bandpass functions with peaks appearing between the
zero and the pole of the loop filter. To suppress the phase-noise
peaking, large loop-filter capacitors are desired at the cost of
large chip area.
For the phase-noise contribution of the VCOs, the transfer
function between the VCO phase noise and output phase noise
can be found to be

dB close-in
which shows that there exists
phase-noise suppression for the low-frequency loop.
The estimated phase noise of the whole synthesizer is
81.4 dBc/Hz at 20.9 kHz and 123.8 dBc/Hz at 600 kHz.
The contribution of each component is shown in Fig. 12, which
shows that the close-in phase noise ( 100 kHz) is dominated
by the charge pump CP1 and loop filter LF1, while the far-offset
phase noise ( 100 kHz) is dominated by the LC oscillator.
V. EXPERIMENTAL RESULTS

(11)

The dual-loop frequency synthesizer is implemented in a


standard 0.5- m CMOS technology. Linear capacitors are put
under all the bias pins to serve as on-chip bypass capacitors.

YAN AND LUONG: MONOLITHIC CMOS DUAL-LOOP FREQUENCY SYNTHESIZER

211

B. Measurement of Varactors
The measurement results of the pn-junction varactor at
900 MHz are shown in Fig. 15. As the p diffusion of the
varactors used in the LC oscillator are connected to the output
of the LC oscillator core, they are biased at 1.16 V, which is
the dc bias of the oscillator core during the measurement. The
measured capacitance is close to the estimated results in the
is around 2
reverse-biased region. The series resistance
due to the minimum junction spacing and the nonminimum
junction width. The quality factor is around 30 in the operating
region of the oscillator.
C. Measurement of Ring Oscillator VCO1

Fig. 12. Estimated phase noise of the whole dual-loop frequency synthesizer
and contribution of each components at the synthesizer output.

The phase noise of the oscillators are measured by a directphase-noise measurement [17]. First, the carrier power is determined at large video (VBW) and resolution bandwidths (RBW).
Then, the resolution bandwidth is reduced until the noise edges
and not the envelope of the resolution filter are displayed. Finally, the phase noise is measured at the corresponding frequency offset from the carrier. To make sure that the measured
phase noise is valid, the displayed values must be at least 10 dB
above the intrinsic noise of the analyzer.
Fig. 16 shows the measurement results of the ring oscillator
VCO1. The operating frequency is measured to be between
324.0 and 642.2 MHz, over which the measured phase noise
is between 111 and 108 dBc/Hz at 600 kHz. The power
consumption is around 10 mW.
D. Measurement of LC Oscillator VCO2

Fig. 13.

Die photo of the dual-loop frequency synthesizer.

Fig. 13 shows the die photo of the dual-loop frequency synthesizer, and the active area of the synthesizer is 2.64 mm .
For characterization and measurement of passive devices,
testing structures for spiral inductors and varactors are included
on the same die with the synthesizer and are measured by a
network analyzer. To de-embed the probing-pad parasitics, an
open-pad structure is also measured.
A. Measurement of Inductors
, and
Fig. 14 shows the inductance , series resistance
of the on-chip spiral inductor. The measured
quality factor
inductance is close to simulation results and drops at frequencies close to the self-resonant frequency. However, the series
(30.2 ) is almost three times larger than the exresistance
pected value (11.6 ). The increase in series resistance is mainly
caused by eddy current induced within substrate and n-well fingers [16]. As series resistance increases significantly, the port-1
quality factor is limited to be 1.6 at 900 MHz.

Fig. 17 shows the measurement results of the LC oscillator


VCO2. Due to the quality-factor degradation of the spiral inductor, the bias current of the oscillator is increased by 15%
above its designed value to achieve the phase noise specification ( 121 dBc/Hz at 600 kHz). The measured operating frequency range is between 725.0 and 940.5 MHz. The oscillation
stops when the VCO control voltage is below 0.6 V because the
varactors become forward-biased. Over the desired frequency
range between 865.2 and 889.8 MHz, the achieved phase noise
is below 121 dBc/Hz at 600 kHz.
E. Measured Phase Noise of the Frequency Synthesizer
Fig. 18 shows the phase-noise measurement results of the
dual-loop frequency synthesizer at 889.8 MHz. The measured
phase noise is 121.8 dBc/Hz at 600 kHz which satisfies
the GSM requirement. At offset frequencies between 10 and
100 Hz, the phase noise is mainly contributed by the flicker
noise of the charge pump. However, the peak phase noise of
65.67 dBc/Hz at 15 kHz is measured, which is 15 dB higher
than the estimation presented in Fig. 12.
At offset frequencies above 100 kHz, where the phase
noise should be dominated by VCO2 and should go down
by 20 dB/dec, the measured phase noise goes down at a
rate of 40 dB/dec. It is believed that the increase in the
close-in phase noise is mainly due to the charge pump CP2
and the loop filter LF2, for the following reasons. First,
according to Fig. 12, only CP2 and LF2 have the phase-noise
slope of 40 dB/dec above 100-kHz frequency offset. Second,

212

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 2, FEBRUARY 2001

Fig. 14.

Measurement results and equivalent circuit model of the spiral inductors at 900 MHz.

Fig. 15.

Measurement results and bias condition of the pn-junction varactors at 900 MHz.

the measured peak-to-flat close-in phase noise in Fig. 18 is


around 15 dB, which is quite close to that of the estimated
value in Fig. 12. Lastly, it is observed experimentally that the
close-in phase noise is changed as the charge-pump current
of the high-frequency loop is adjusted. Unfortunately, the
phase-noise contribution of CP2 and LF2 cannot be measured
individually.

F. Measured Spurious Tones of the Frequency Synthesizer


Fig. 19 shows the measured spurious level of the dual-loop
frequency synthesizer at 865.2 MHz, which are 79.5 dBc at
1.6 MHz, 82.0 dBc at 11.3 MHz, and 82.83 dBc at 16 MHz.
At 11.3 MHz, the spurious level is only 6 dB above the requirement. However, the predicted spurious level at 1.6 MHz should
be below 90 dBc and the one at 16 MHz should not exist [16].

YAN AND LUONG: MONOLITHIC CMOS DUAL-LOOP FREQUENCY SYNTHESIZER

Fig. 16.

213

Measurement results of the ring oscillator VCO1.

Fig. 19.

Measured spurious level of the proposed synthesizer.

G. Switching Time of the Frequency Synthesizer

Fig. 17.

Measurement results of the LC oscillator VCO2.

To determine the worst-case switching time of the frequency


is switched from
synthesizer, the frequency division ratio
226 to 349, and the control voltages of both VCO1 and VCO2
are measured. As shown in Fig. 20, the measurable switching
time of the proposed synthesizer is 830 s for a frequency
error of approximately 10 kHz due to the limited resolution of
our oscilloscope. Since the VCO gain is 160 MHz/V, in order
to achieve a measurement accuracy of a 100-Hz frequency
error, the resolution of the oscilloscope would need to be better
than 100 nV, which unfortunately is not obtainable with our
equipment. On the other hand, the synthesizer suffers from a
slew-rate problem during the channel switching due to a small
charge-pump current (1.6 A) and a large loop-filter capacitor
(1.1 nF) in the high-frequency loop.
H. Performance Evaluation

Fig. 18.

Phase-noise measurement results of the proposed synthesizer.

During the measurement, the 1.6-MHz reference signal is


generated by a 16-MHz crystal oscillator and a decade counter.
Therefore, the 16-MHz spur is caused by the substrate coupling
between the crystal oscillator and the synthesizer. To verify the
reason of the increased spurious level at 1.6 MHz, the low-frequency loop is disabled, and the spurious level is still 75.1 dBc
at 1.6 MHz, which implies that the increase in spurious level at
1.6 MHz is mainly caused by the substrate coupling.

Table II summarizes the measured performance of the proposed frequency synthesizer, and Table III lists the performance
of other fully integrated synthesizers for comparison. The proposed synthesizer operates at a single 2-V supply while all other
designs require supply voltages of at least 2.7 V. Note that since
the designs [1] and [3] operate at higher frequencies, their power
consumption should be scaled down accordingly for a fair comparison. With this frequency normalization, the power consumption of the proposed dual-loop synthesizer is still comparable to
that of the other designs.
The synthesizer presented in [1] is a fractional- design
with a 26.6-MHz comparison frequency and a loop bandwidth
of 45 kHz. As comparison, the low-frequency loop of this
work has a comparison frequency of only 1.6 MHz but a loop
bandwidth of up to 40 kHz because of the relaxed requirement
of the spurious tones. On the other hand, the high-frequency
loop uses a 11.3-MHz comparison frequency but a loop
bandwidth limited to 27 kHz, which is the real limiting factor
of the switching-time performance. Therefore, it is believed
that a switching time close to that in [1] can be achieved by

214

Fig. 20.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 2, FEBRUARY 2001

Switching-time measurement results of the proposed synthesizer.


TABLE II
PERFORMANCE SUMMARY OF THE PROPOSED SYNTHESIZER

respectively. For the same reason, the total loop-filter capacitance can be smaller than 60 pF, which greatly reduces chip area.
However, the situation would be much different if channel programmability is included.
Although the proposed synthesizer consists of two loop filters, but the chip area is just a little bit larger than that of the
design in [3] due to the use of linear capacitors and silicideblocked resistors. Compared to the designs in [1] and [3], the
spurious levels are between 75 and 85 dBc, which indicates
that CMOS designs suffer the same problem from the substrate
coupling between the reference signal and VCO. However, for
the close-in phase noise, the proposed dual-loop synthesizer suffers from the 15-dB increase at the peak due to the charge pumps
and loop filters as discussed in Section V-E.
I. Generation of the Second Reference Sources

eliminating the slew-limiting problem in the high-frequency


loop.
The work described in [2] is an integer- bipolar junction
transistor (BJT) design with channel spacing of 600 kHz, and
its comparison frequency and loop bandwidth are limited to be
600 and 4 kHz, respectively. With such a low bandwidth, the
settling time is still less than 600 s. It implies that if a larger
charge-pump current or an active loop filter is adopted in the
high-frequency loop of the dual-loop design, slew limiting can
be suppressed and the switching time performance can be enhanced. Since the density of the BJT transistors is not as good
as the CMOS transistor, the chip area is relatively large even
though it does not include an on-chip loop filter.
The synthesizer in [3] is also an integer- design but without
channel programmability, and as such, its comparison frequency
and loop bandwidth can be as high as 61.5 MHz and 200 kHz,

The main drawback of our dual-loop synthesizer is that it


requires two reference sources. In reality, if a single reference
signal is preferred for the whole frequency synthesizer, the
second reference signal (204.8 MHz instead of 205 MHz) can
be generated from the 1.6-MHz reference signal by a third
PLL with frequency division ratio of 128. Since this third PLL
has a fixed division frequency, its frequency divider can be
implemented simply by cascading seven divide-by-2 dividers.
Without the divide-by-32 divider between the third PLL
output and the high-frequency loop, the close-in phase-noise
requirement of this PLL would be in fact 30 dB more stringent
( 92 dBc/Hz) than that of the low-frequency loop. However,
since the division ratio is only 128, it would offer a relaxation in
requirement. The remaining 21.3 dB phase-noise suppression
could be achieved by increasing the charge-pump current of
the third PLL.

YAN AND LUONG: MONOLITHIC CMOS DUAL-LOOP FREQUENCY SYNTHESIZER

215

TABLE III
PERFORMANCE COMPARISON OF RECENT WORK ON FULLY INTEGRATED FREQUENCY SYNTHESIZERS

In addition to the close-in phase noise, the far-offset phasenoise requirement would also be more stringent by the same
amount. Assuming that VCO1 is adopted in the third PLL, its
phase noise could be improved to be 116 dBc/Hz at 600 kHz
at a 204-MHz operation. With a 30.6-dB filtering effect of the
high-frequency loop, the phase-noise contribution by the VCO
of the third PLL would become 146.6 dBc/Hz at 600 kHz,
which would have negligible effect on the overall phase noise.
Basically, the implementation of the third PLL would be similar to that of the low-frequency loop, and its chip area would be
less than 10% of the total area of the dual-loop synthesizer. Since
the third VCO operates at half the frequency of VCO1, its power
consumption would be 25% of that of VCO1 ( 2.5 mW). Similarly, since the divide-by-128 divider also operates at half of the
frequency, it would consume only half the power as compared
( 0.3 mW). Although the charge-pump curto the divider
rent should be increased by 100 times to 640 A, the average
power current would be only 64 A since the turn-on time is
only around 1/10 of the input period. In conclusion, by introducing the third PLL to generate the second reference signal,
the additional power required would be less than 3 mW, and the
increase in the total chip area would still be less than 10%.
VI. CONCLUSION
A 900-MHz monolithic CMOS dual-loop frequency synthesizer with good phase-noise performance for GSM receivers is
presented. Compared to other fully integrated synthesizer designs, this proposed synthesizer operates at much lower supply
voltage and consumes approximately the same power with
frequency normalization. Implemented in a standard 0.5- m
CMOS technology and at 2-V supply voltage, the synthesizer
has a power consumption of 34 mW. At 900 MHz, the measured
phase noise is 121.8 dBc/Hz at 600-kHz frequency offset,

and the spurious level is 82 dBc at 11.3 MHz. Due to the


substrate coupling and testing setup, additional spurious levels
are measured to be 79.5 dBc at 1.6 MHz and 82.8 dBc
at 16 MHz. The chip area is less than 2.64 mm . Even if a
third PLL is implemented to generate the second reference
frequency, the increase in the total power consumption and the
total chip area would be negligibly small.
REFERENCES
[1] J. Craninckx and M. Steyaert, A fully integrated CMOS DCS-1800
frequency synthesizer, IEEE J. Solid-State Circuits, vol. 33, pp.
20542065, Dec. 1998.
[2] A. Ali and J. L. Tham, A 900-MHz frequency synthesizer with integrated LC voltage-controlled oscillator, Proc. IEEE Int. Solid-Stage
Circuits Conf., vol. 1, pp. 390391, 1996.
[3] J. F. Parker and D. Ray, A 1.6-GHz CMOS PLL with on-chip loop
filter, IEEE J. Solid-State Circuits, vol. 33, pp. 337343, Mar. 1998.
[4] Digital cellular telecommunications system (Phase 2 ); Radio transmission and reception (GSM 5.05), European Telecommunications
Standards Institute, 1996.
[5] D. Craninckx and D. Steyaert, Wireless CMOS Frequency Synthesizer
Design. Norwell, MA: Kluwer, 1998, pp. 201202.
[6] T. Aytur and J. Khoury, Advantages of dual-loop frequency synthesizers for GSM applications, in Proc. IEEE Int. Symp. Circuits and Systems, 1997.
[7] C. H. Park and B. Kim, A low-noise 900-MHz VCO in 0.6-m CMOS,
in Proc. Symp. VLSI Circuits, 1998.
[8] A. Hajimiri, S. Limotyrakis, and T. H. Lee, Phase noise in multi-gigahertz CMOS ring oscillators, in Proc. IEEE 1998 Custom Integrated
Circuit Conf., 1998, pp. 4952.
[9] Oscillator noise analysis in SpectreRF, application note to SpectreRF,
CADENCE, 1998.
[10] R. B. Merrill, T. W. Lee, H. You, R. Rasmussen, and L. A. Moberly, Optimization of high-Q integrated inductors for multilevel metal CMOS,
in Proc. Int. Electronic Device Meeting, 1995, pp. 983986.
[11] A. Hajimiri and T. H. Lee, A general theory of phase noise in electrical
oscillators, IEEE J. Solid-State Circuits, pp. 179194, Feb. 1998.
[12] J. Yuan and C. Svenson, High-speed CMOS circuit technique, IEEE
J. Solid-State Circuits, vol. 24, pp. 6270, Feb. 1989.
[13] B. Razavi, RF Microelectronics. Englewood Cliffs, NJ: Prentice Hall,
1997.

216

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 2, FEBRUARY 2001

[14] P. Larsson, High-speed architecture for a programmable frequency divider and a dual-modulus prescaler, IEEE J. Solid-State Circuits, vol.
31, pp. 744748, May 1996.
[15] I. A. Young, J. K. Greason, and K. L. Wong, PLL clock generator with
5 to 110 MHz of lock range for microprocessors, IEEE J. Solid-State
Circuits, vol. 27, pp. 15991607, Nov. 1992.
[16] W. S. T. Yan, A 2-V 900-MHz monolithic CMOS dual-loop
frequency synthesizer for GSM receivers, M.Phil. thesis, Hong
Kong University of Science and Technology. [Online.] Available:
http://www.ee.ust.hk/~eetak, 1999.
[17] T. Fredrich, Direct phase noise measurements using a modern spectrum
analyzer, Microwave J., vol. 35, pp. 94114, Aug. 1992.

William S. T. Yan received the Bachelor and


Masters degrees in electrical and electronics
engineering from the Hong Kong University of
Science and Technology, Hong Kong, in 1996 and
1999, respectively.
He is currently with Maxim Integrated Products,
Sunnyvale, CA. His current interests are in the areas
of high-frequency integrated-circuit design.

Howard C. Luong (M91) received the B.S. (high


honors), M.S., and Ph.D. degrees in electrical engineering and computer sciences from the University
of California, Berkeley, in 1988, 1990, and 1994,
respectively. For his Masters thesis, he worked on
MOS analog multipliers with scaling technologies.
For his Ph.D. dissertation, he designed and fabricated a superconductive flash-type analog-to-digital
converter that operated at multi-gigahertz clock and
input frequencies.
Since September 1994, he has been with the electrical and electronics engineering faculty at the Hong Kong University of Science and Technology, where he has been the Faculty-In-Charge of the Analog
Research Lab and the Associate Director of the EEE Undergraduate Program
Committee. His research interests are in high-performance analog and RF integrated circuits for wireless and portable communications.
Dr. Luong has served as an Associate Editor for IEEE TRANSACTIONS ON
CIRCUITS AND SYSTEMS II. He received the Faculty Teaching Excellence Appreciation Award from the Hong Kong University of Science and Technology
School of Engineering in 1995, 1996, and 2000.

2048

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997

A 27-mW CMOS FractionalSynthesizer Using Digital Compensation


for 2.5-Mb/s GFSK Modulation
Michael H. Perrott, Student Member, IEEE, Theodore L. Tewksbury III, Member, IEEE,
and Charles G. Sodini, Fellow, IEEE

Abstract A digital compensation method and key circuits


are presented that allow fractional-N synthesizers to be modulated at data rates greatly exceeding their bandwidth. Using this
technique, a 1.8-GHz transmitter capable of digital frequency
modulation at 2.5 Mb/s can be achieved with only two components: a frequency synthesizer and a digital transmit filter.
A prototype transmitter was constructed to provide proof
of concept of the method; its primary component is a custom
fractional-N synthesizer fabricated in a 0.6-m CMOS process
that consumes 27 mW. Key circuits on the custom IC are an onchip loop filter that requires no tuning or external components, a
digital MASH modulator that achieves low power operation
through pipelining, and an asynchronous, 64-modulus divider
(prescaler). Measurements from the prototype indicate that it
meets performance requirements of the digital enhanced cordless
telecommunications (DECT) standard.

(a)

61

(b)

Index Terms Compensation, continuous phase modulation,


digital radio, frequency modulation, frequency shift keying, frequency synthesizers, phase locked loops, sigmadelta modulation,
transmitters.
(c)

I. INTRODUCTION

HE use of wireless products has been rapidly increasing


the last few years, and there has been worldwide development of new systems to meet the needs of this growing market.
As a result, new radio architectures and circuit techniques are
being actively sought that achieve high levels of integration
and low power operation while still meeting the stringent
performance requirements of todays radio systems. Our focus
is on the transmitter portion of this effort, with the objective of
achieving over 1-Mb/s data rate using frequency modulation.
To achieve the goals of low power and high integration,
it seems appropriate to develop a transmitter architecture
that consists of the minimal topology that accomplishes the
required functionality. All digital, narrowband radio transmitters that are spectrally efficient require two operations to be
performed. The baseband modulation data must be filtered to
limit the extent of its spectrum, and the resulting signal must
Manuscript received July 7, 1997; revised August 4, 1997. This work was
supported by DARPA Contract DAAL-01-95-K-3526.
M. H. Perrott was with the Microsystems Technology Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139 USA. He is now
with Hewlett-Packard Laboratories, Palo Alto, CA 94304-1392 USA.
T. L. Tewksbury III was with Analog Devices, Wilmington, MA 01887
USA. He is now with IBM Microelectronics, Waltham, MA 02254 USA.
C. G. Sodini is with the Microsystems Technology Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139 USA.
Publisher Item Identifier S 0018-9200(97)08270-X.

Fig. 1. Methods of frequency modulation upconversion: (a) mixer based, (b)


direct modulation of VCO, and (c) indirect modulation of VCO.

be translated to a desired RF band. This paper will focus on


the issue of frequency translation, which can be accomplished
in at least three different ways for frequency modulation. As
illustrated in Fig. 1, the modulation signal can be (a) multiplied
by a local oscillator (LO) frequency using a mixer, (b) fed into
the input of a voltage controlled oscillator (VCO), or (c) fed
into the input of a frequency synthesizer.
Approach (a) can theoretically be accomplished with either
a heterodyne or homodyne approach. The heterodyne approach
offers excellent radio performance but carries a high cost
in implementation due to the current inability to integrate
the high- , low-noise, low-distortion bandpass filters required
at intermediate frequencies (IF) [1]. As a result, the direct
conversion approach has recently grown in popularity [2][4].
In this case, two mixers and baseband A/D converters are
required to form in-phase/quadrature (I/Q) channels and a
frequency synthesizer to obtain an accurate carrier frequency.
Approach (b) is referred to as direct modulation of a VCO
and has appeared in designs for the digital enhanced cordless
telecommunications (DECT) standard [5], [6]. A frequency
synthesizer is used to achieve an accurate frequency setting
and then disconnected so that modulation can be fed into the

00189200/97$10.00 1997 IEEE

PERROTT et al.: CMOS FRACTIONAL-

SYNTHESIZER USING DIGITAL COMPENSATION

VCO unperturbed by its dynamics. This technique allows a significant reduction in components; no mixers are required since
the VCO performs the frequency translation, and only one D/A
converter is required to produce the modulation signal. Power
savings are thus achieved, as demonstrated by the fact that
the design in [5] appears to consume nearly half the power
of the mixer based designs in [2][4]. Unfortunately, since
the synthesizer is inactive during modulation, the nominal
frequency setting of the VCO tends to drift as a result of
leakage currents. In addition, undesired perturbations, such
as the turn-on transient of the power amp, can dramatically
shift the output frequency. As stated in [6], the isolation
requirements for this method exclude the possibility of a onechip solution. Therefore, while the approach offers a significant
advantage in terms of power dissipation, the goal of high
integration is lost.
Finally, approach (c) can be viewed as indirect modulation
of the VCO through appropriate control of a frequency synthesizer that sets the VCO frequency and yields the simplest
transmitter solution of those presented. The synthesizer has a
digital input which allows elimination of the D/A converter
that is required when directly modulating the VCO. Since the
synthesizer controls the VCO during modulation, the problem
of frequency drift during modulation is eliminated. Also,
isolation requirements at the VCO input are greatly reduced at
frequencies within the PLL bandwidth. The primary obstacle
faced with this architecture is that a severe constraint is placed
on the maximum achievable data rate due to the reliance on
feedback dynamics to perform modulation.
This paper presents a compensation method and key circuits
that allow modulation of a frequency synthesizer at rates that
are over an order of magnitude faster than its bandwidth.
Application of the technique allows a high data rate ( 1 Mb/s)
transmitter with good spectral efficiency to be realized with
only two components: a frequency synthesizer and a digital
transmit filter. By avoiding additional components such as
mixers and D/A converters in the modulation path, a low
power transmitter architecture is achieved. Since off-chip filters are not required, high integration is accomplished as well.
The technique can be used in transmitter applications where
frequency modulation is desired, and a moderate tolerance is
allowed on the modulation index. (When using compensation,
the accuracy of the modulation index, which is defined as the
ratio of the peak-to-peak frequency deviation of the transmitter
output to its data rate, is limited by variations in the openloop gain of the PLL [7].) To provide proof of concept of the
technique, we present results from a 1.8-GHz prototype that
supports Gaussian frequency shift keying (GFSK) modulation,
the same modulation method used in DECT, at data rates in
excess of 2.5 Mb/s.
We begin by reviewing a fractional- synthesizer method
presented in [8][11] that provides a convenient structure with
which to apply the technique. It is shown that high data rates
and good noise performance are difficult to achieve with this
topology. A method is proposed to overcome these problems,
followed by discussion of issues that ensue from its use. A
description of key circuits in the prototype is then given,
which include an on-chip loop filter, a 64-modulus divider,

2049

N modulator.

Fig. 2. A spectrally efficient, fractional-

and a pipelined, digital modulator. Finally, experimental


results are presented and conclusions made.
II. BACKGROUND
The fractional- approach to frequency synthesis enables
fast dynamics to be achieved within the phase-locked loop
(PLL) by allowing a high reference frequency [8]; a very
useful benefit when attempting to modulate the synthesizer.
High resolution is achieved with this approach by allowing
noninteger divide values to be realized through dithering; it
has been shown that low spurious noise can be obtained by
using a high-order modulator to perform this operation
[8], [10], [11]. This approach leads to a simple synthesizer
structure that is primarily digital in nature, and is referred to
as a fractional- synthesizer with noise shaping.
Using this fractional- approach, it is straightforward to
realize a transmitter that performs phase/frequency modulation in a continuous manner by direct modulation of the
synthesizer. Fig. 2 illustrates a simple transmitter capable of
Gaussian minimum shift keying (GMSK) modulation [9]. The
binary data stream is first convolved with a digital finite
impulse response (FIR) filter that has a Gaussian shape.
(Physical implementation of this filter can be accomplished
with a ROM whose address lines are controlled by consecutive
samples of the data and time information generated by a
counter.) The digital output of this filter is then summed with
a nominal divide value and fed into the input of a digital
converter, the output of which controls the instantaneous
divide value of the PLL. The nominal divide value sets the
carrier frequency, and variation of the divide value causes
the output frequency to be modulated according to the input
data. Assuming that the PLL dynamics have sufficiently high
bandwidth, the characteristics of the modulation waveform
are determined primarily by the digital FIR filter and thus
accurately set.
Fig. 3 depicts a linearized model of the synthesizer dynamics in the frequency domain. The digital transmit filter confines
the modulation data to low frequencies, the modulator
adds quantization noise that is shaped to high frequencies, and
the PLL acts as a low-pass filter that passes the input but
attenuates the quantization noise. In the figure,
is

2050

Fig. 3. Linearized model of fractional-

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997

N modulator.

calculated as
(1)
and
are the loop filter transfer funcwhere
tion, the VCO gain (in Hz/V), and the nominal divide value,
respectively. (See [7] for modeling details.) An analogy between the fractional- modulator and a D/A converter
can be made by treating the output frequency of the PLL as
an analog voltage.
A key issue in the system is that the modulator adds
quantization noise at high frequency offsets from the carrier.
In general, the noise requirements for a transmitter are very
strict in this range to avoid interfering with users in adjacent
channels. In the case of the DECT standard, the phase noise
density can be no higher than 131 dBc/Hz at a 5-MHz
offset [6]. Noise at low frequency offsets is less critical for
a transmitter and need only be below the modulation signal
by enough margin to insure an adequate signal-to-noise ratio.
Sufficient reduction of the quantization noise can be
accomplished through proper choice of the sample rate,
which is assumed to be equal to the reference frequency, and
the PLL transfer function,
(Note that this problem is
analogous to that encountered in the design of D/A
converters, except that the noise spectral density at high frequencies, rather than the overall signal-to-noise ratio, is the key
parameter.) One way of achieving a low spectral density for
the noise is to use a high sample rate for the so that the
quantization noise is distributed over a wide frequency range
and its spectral density reduced. Alternatively, the attenuation
offered by
can be increased; this is accomplished by
decreasing its cutoff frequency, , or increasing its order,
Unfortunately, a low value of
carries a penalty of lowering the achievable data rate of the transmitter. This fact can be
observed from Fig. 3; the modulation data must pass through
the dynamics of the PLL so that its bandwidth is restricted by
that of
Given this constraint, the achievement of low
noise must be achieved through proper setting of the
sample rate and PLL order.
It is worthwhile to quantify required values of these parameters for a given data rate and noise specification. To do so,
we first choose
to be a Butterworth response of order

The above expression is chosen for the sake of simplicity in


calculations; other filter responses could certainly be implemented. The spectral density of the noise at the transmitter

Fig. 4. Achievable data rates versus PLL order and


noise from is 136 dBc/Hz at 5 MHz offset.

61 0

61 sample rate when

output due to quantization noise is expressed as


(2)
where is the sample period, and a multistage (MASH)
structure [12] of order is assumed for the modulator. By
choosing the order of the MASH to be the same as
the order of
, the rolloff of (2) and the VCO noise are
matched at high frequencies ( 20 dB/dec). Fig. 4 displays the
resulting parameters at different data rates; these values were
calculated by setting (2) to 136 dBc/Hz at 5 MHz and the
ratio
to 0.7, where
is the data rate. (The noise
specification was chosen to achieve less than 131 dBc/Hz at
5 MHz offset after adding in VCO phase noise.)
The figure reveals that the achievement of high data rates
and low noise must come at the cost of power dissipation
and complexity when attempting direct modulation of the
synthesizer. In particular, the power consumed by the digital
circuitry is increased at a high sample rate by virtue
of the increased clock rate of the modulator and the
digital FIR filter,
The power consumed by the
analog section is increased for high values of PLL order since
additional poles and zeros must be implemented. This issue is
aggravated by the need to set these additional time constants
with high accuracy in order to avoid stability problems in the
PLL. If tuning circuits are used to achieve such accuracy [13],
spurious noise problems can also be an issue.
III. PROPOSED METHOD
The obstacles of high data rate modulation discussed above
are greatly mitigated if the modulation bandwidth is allowed
to exceed that of the PLL. In this case, the bandwidth of
can be set sufficiently low that an excessively high PLL order
or reference frequency is not necessary to achieve the required
noise performance. Fig. 5 illustrates the proposed method that
achieves this goal. By cascading a compensation filter,
,
with the digital FIR filter, the transfer function seen by the
modulation data can be made flat by setting

PERROTT et al.: CMOS FRACTIONAL-

SYNTHESIZER USING DIGITAL COMPENSATION

2051

TABLE I
THEORETICALLY ACHIEVABLE DATA RATES
USING COMPENSATION FOR SECOND-ORDER PLL

Fig. 5. Proposed compensation method.

This new filter is simple to implement in a digital mannerby


combining it with the FIR filter, we need only alter the ROM
storage values. In fact, savings in area and power of the ROM
can be achieved over the uncompensated method since the
number of time samples that need to be stored are dramatically
reduced [7].
To illustrate the technique, we consider the case where
is chosen to implement GFSK modulation with
as used in the DECT standard, and
is second
order. Under these assumptions, the time domain version of
is described as samples of
(3)
where is the convolution operator, is the sample
period,
is the period of the data stream, and
equals
for
and zero elsewhere. Since
is the inverse of
, we write
(4)
In the time domain, the digital compensated FIR filter is
then calculated by taking samples of the expression

(5)
For
described in (3), these derivatives are well defined
and can be calculated analytically. A final form is derived by
substituting (3) into (5) to yield

Fig. 6. The effect of mismatch.

signal. If the order of


is increased to
the resulting
signal swings will be amplified according to
The achievable data rates using compensation are limited by
the ability of the PLL to accommodate this increased signal
swing. PLL components that are particularly affected are the
modulator, the divider, and the charge pump. Assuming
an appropriate multibit structure and multimodulus divider topology are used, the bottleneck in dynamic range will
be set by the limited duty cycle range of the charge pump.
Table I displays the achievable data rates at different
sample rates using compensation; the noise specification was
identical to that used to generate Fig. 4. In light of the signal
swing limitation and our goal of simplicity, we have restricted
our attention to second-order PLL dynamics. Calculations were
based on the assumption that the duty cycle of the charge pump
is limited only by its transient response, which was assumed
to be 5 ns. Comparison of this information with Fig. 4 reveals
that compensation allows high data rates to be achieved with
relatively low power and complexity. In the actual prototype,
data rates as high as 2.85 Mb/s are achieved with a secondorder PLL with
kHz, and a sample rate of
20 MHz.
B. Matching Issues

(6)
A. Achievable Data Rates
increases
Equation (6) reveals that the signal swing of
in proportion to
for large values of
Since
is the ratio of the modulation data rate to
the bandwidth of the PLL, we see that high data rates lead
to large signal swings of the modulation signal when using
compensation. Intuitively, this behavior makes sense since the
attenuation of
must be overcome by the compensated

In practice, mismatch will occur between the compensation


filter and PLL dynamics. While the compensation filter is
digital and therefore fixed, the PLL dynamics are analog in
nature and sensitive to process and temperature variations.
Fig. 6 illustrates that a parasitic pole/zero pair occurs when
the bandwidth of the PLL is too high; a similar situation
occurs when its bandwidth is too low. As will be seen in the
results sections, the parasitic pole/zero pair causes intersymbol
interference (ISI) and modulation deviation error. To mitigate
this problem in the prototype, an on-chip loop filter with
accurate time constants was implemented, and open-loop gain

2052

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997

Fig. 7. Prototype system.

control was used to accurately place the overall pole and zero
positions of the PLL transfer function.
An additional issue related to be mismatch arises from practical concerns in the PLL implementation. The achievement
of a large dynamic range in the charge pump is aided by
including an integrator in the loop filter (see Section IV-B2),
which yields an overall PLL transfer function as

(7)

, is now added that occurs


A parasitic pole/zero pair, and
well below
in frequency. Unfortunately, taking the inverse
of (7) leads to a compensation filter that is IIR in nature and
cannot be implemented with a ROM. To avoid such difficulties,
we can ignore the parasitic pole/zero pair and use
as
described in (4). The resulting ISI is negligible since
and
are close to each other and low in value. However, the
digital compensation filter must be modified to be samples of
to accommodate the increased gain of
at
frequencies greater than

Fig. 8. Die photo.


TABLE II
POWER DISSIPATION OF IC CIRCUITS

IV. IMPLEMENTATION
To show proof of concept of the proposed compensation
method, the system depicted in Fig. 7 was built using a
custom CMOS fractional- synthesizer that contains several
key circuits. Included are an on-chip, continuous-time filter
that requires no tuning or external components, a digital
MASH modulator with six output bits that achieves
low power operation through pipelining, and a 64-modulus
divider that supports any divide value between 32 and 63.5
in half cycle increments. An external divide-by-two prescaler
is used so that the CMOS divider input operates at half the
VCO frequency, which modifies the range of divide values to
include all integers between 64 and 127.

Fig. 8 displays a die photograph of the custom IC, which


was fabricated in a 0.6- m, double-poly, double-metal, CMOS
V and
process with threshold voltages of
V. The entire die is 3 mm by 3 mm, and its power
dissipation is 27 mW. Table II lists the power consumed by
individual circuits. The power supply values given in Table
II were chosen to be as low as possible to minimize power
dissipation; at the cost of higher power dissipation, all circuits
could be powered by a single 3.3-V supply.

PERROTT et al.: CMOS FRACTIONAL-

SYNTHESIZER USING DIGITAL COMPENSATION

2053

Fig. 10. An asynchronous, 64-modulus divider implementation.

Fig. 9. An asynchronous, eight-modulus divider topology.

The 64-modulus divider and six-output-bit modulator


provide a dynamic range for the compensated modulation
data that is wide enough to support data rates in excess of
2.5 Mb/s. The on-chip loop filter allows an accurate PLL
transfer function to be achieved by tuning just one PLL
parameterthe open-loop gain. A brief overview of each of
these components is now presented.
A. Divider
To achieve a low-power design, it is desirable to use an
asynchronous divider structure to minimize the amount of
circuitry operating at high frequencies. As such, a multimodulus divider structure was designed that consists of cascaded
divide-by-2/3 sections [14]; this architecture is an extension of
the common dual-modulus topology [15]. The eight-modulus
example in Fig. 9 shows the proposed structure which allows
a wide range of divide values to be achieved by allowing a
variable number of input cycles to be swallowed per output
cycle. Each divide-by-2/3 stage normally divides its input by
two in frequency, but will swallow an extra cycle per OUT
period when its control input,
, is set to one. As shown
for the case where all control bits are set to one, the number
of IN cycles swallowed per OUT period is binary weighted
according to the stage position. For instance, setting
causes one cycle of IN to be swallowed, while setting
causes four cycles of IN to be swallowed. Proper selection of
allows any integer divide value between 8 and 15
to be achieved.
The 64-modulus divider that was developed for the prototype system uses a similar principle to that discussed above,
but has a modified first stage to achieve high-speed operation.
Specifically, the implemented architecture consists of a highspeed divide-by-4/5/6/7 state machine followed by a cascaded
chain of divide-by-2/3 state machines as illustrated in Fig. 10.
The divide-by-4/5/6/7 stage accomplishes cycle swallowing
by shifting between four phases of a divide-by-two circuit.
Each of the four phases is staggered by one IN cycle, which
allows single cycle pulse swallowing resolution despite the fact
that two cascaded divide-by-two structures are used; details of
this approach are discussed at length in [7]. The important
point to make about the phase shifting approach, which is

Fig. 11. PFD, charge pump, and loop filter.

also advocated in [15], is that it allows a minimal number of


components to operate at high frequenciesthe first two stages
are simply divide-by-two circuits, not state machines. Also, the
fact that control signals are not fed into the first divide-by-two
circuit allows it to be placed off-chip in the prototype.
B. Analog Section
The achievement of accurate PLL dynamics is accomplished
in the prototype system with the variable gain loop filter
topology depicted in Fig. 11. The input to the filter is the
instantaneous phase error between the reference frequency and
divider output and is manifested as the deviation of the phase
frequency detector (PFD) output duty cycle from its nominal
value of 50%. As modulation data is applied, the duty cycle
is swept across a range of values; the shaded region in the
figure corresponds to the deviation that occurs when GFSK
modulation at 2.5 Mb/s is applied. A 50% nominal duty cycle
is desired to avoid the dead-zone of the PFD and thus reduce
distortion of the modulation signal. The prototype used a PFD
design from [16] to achieve this characteristic.
To produce a signal that is a filtered version of the phase
error, the output of the PFD is converted to complementary
current waveforms by a charge pump before being sent into
the inputs of an on-chip loop filter. The conversion to current
allows the filtering operation to be performed without resistors
and also provides a convenient means of performing gain
control of the resulting transfer function. An integrator is
included in the loop filter which forces the average current
from the charge pump to be zero and the nominal duty cycle
to be, ideally, 50% when the PLL is locked.
A PFD design with 50% nominal duty cycle is seldom used
in PLL circuits due to power consumption and spurious noise
issuesthe charge pump is always driving current into the
loop filter under such conditions, and spurs at multiples of
the reference frequency are produced due to the square wave
output of the PFD. Fortunately, these problems are greatly
mitigated in the prototype transmitter since the charge pump

2054

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997

(a)

(b)

Fig. 12.

(c)

Loop filter implementation.

Fig. 13. Effect of transient time and mismatch on duty cycle range.

output current is very small (at its largest setting, it toggles


between 3.5 and 3.5 A), and the loop filter bandwidth is
very low (84 kHz) in comparison to the reference frequency
(20 MHz). The resulting spur at the transmitter output is less
than 60 dBc at 20 MHz when measuring the transmitter in an
unmodulated state without an RF bandpass filter at its output.
When modulated, this spur is convolved with the modulation
signal and thus turned into phase noise [7]; it is reasonable to
assume that this noise is reduced to a negligible level when the
RF bandpass filter is included due to its high frequency offset.
1) Loop Filter: The on-chip loop filter uses an opamp to
integrate one of the currents and add it to a first-order filtered
version of the other current. This topology, shown in Fig. 12,
realizes the transfer function

kHz

kHz

(8)

The open loop gain,


, is adjusted by varying the charge
pump output current, The first-order pole is created using
a switched capacitor technique, which reduces its sensitivity
to thermal and process variations and removes any need
for tuning. Note that, although this time constant is formed
through a sampling operation, the output of the switched
capacitor filter is a continuous-time signal. Finally, the value of
the zero,
is determined primarily by the ratio of capacitors
and
under the assumption that the complementary
charge pump currents are matched.
A particular advantage of the filter topology is that the rate
of sampling
and
can be set high since it is independent
and
of the settling dynamics of the opamp. As such,
are set to the PFD output frequency, 20 MHz, to avoid
aliasing problems.
The opamp is realized with a single-ended, two-stage topology chosen for its simplicity and wide output swing. Its unity
gain frequency was designed to be 6 MHz; this value is
sufficiently higher than the bandwidth of the GFSK modulation signal at 2.5 Mb/s to avoid significantly affecting it.
It is recognized that the single-ended structure has higher
sensitivity to substrate noise than a differential counterpart.
However, little would be gained in this case by making it fully

differential since the output of the opamp is connected directly


to the varactor of an LC-based VCO, which is inherently single
ended. Fortunately, measured eye diagrams and spectral plots
presented at the end of this paper conform to calculations
that exclude substrate noise, thereby showing that it has
negligible impact on the modulation and noise performance
of the prototype system. However, as even higher levels of
integration are sought in future radio systems, the impact of
substrate noise will need to be carefully considered.
The limited dynamics of the opamp prevent it from following the fast transitions of its input current waveforms.
To prevent these waveforms from adversely affecting the
performance of the opamp, the voltage swing that appears at its
input terminals is reduced to a low amplitude (less than 40 mV
peak-to-peak) by capacitors
and
In the case of , this
capacitor also serves as part of the switched capacitor filter.
2) Charge Pump: Proper design of the charge pump is
critical for the achievement of high data rates since it forms the
bottleneck in dynamic range that is available to the modulation
signal. Fig. 13 illustrates the fundamental issues that need to be
considered in its design. To avoid distortion of the modulation
signal, the variation in duty cycle should be limited to a range
that allows the output of the charge pump to settle close to
its final value following all positive and negative transitions.
Fig. 13(a) shows the dynamic range available for a welldesigned charge pump; the nominal duty cycle is 50% and the
transition times are fast. Fig. 13(b) demonstrates the reduction
in dynamic range that occurs when the nominal duty cycle is
offset from 50%. This offset is caused by a mismatch between
positive and negative currents produced by the charge pump.
(The type II PLL dynamics force an average current of zero.)
Finally, Fig. 13(c) illustrates a case in which the charge pump
has slow transition times, the result again being a reduction
in dynamic range.
The charge pump topology was designed with the above issues in mind and is illustrated in Fig. 14. The core component
and
that is
of the architecture is a differential pair
and , and from
fed from the top by two current sources,
Ideally,
and
are equal
the bottom by a tail current,
to
where is adjusted by a 5-b D/A that
to and

PERROTT et al.: CMOS FRACTIONAL-

SYNTHESIZER USING DIGITAL COMPENSATION

Fig. 14.

Charge pump implementation.

Fig. 15.

A second-order, digital MASH structure.

controls the node


Transistors
and
are switched
which ideally causes
and
on and off according to
to switch between and
To achieve a close match between the positive and negative
currents of each charge pump output, the design strives to
set
and
In the first case,
and
are implemented as cascoded PMOS devices whose
layout is optimized to achieve high levels of device matching.
Unfortunately, device matching cannot be used to achieve a
close match between
and
since they are generated
by different types of devices. To circumvent this obstacle, a
feedback stage is used to adjust
by comparing currents
produced by a replica stage. This technique allows
to
be matched to
to the extent that the replica stage is
matched to the core circuit.
A low transient time in the charge pump response is
obtained by careful design of signal and device characteristics
at the source nodes of
and
First, the parasitic
capacitance at this node is minimized by using appropriate
layout techniques to reduce the source capacitance of
and
, the drain capacitance of
, and the interconnect
capacitance between each of the devices. Second, the voltage
deviation is minimized at this node that occurs when
switches. The level converter depicted in Fig. 14 accomplishes
this task by reducing the voltage variation at nodes
and
to less than 350 mV and setting an appropriate dc bias.

2055

Fig. 16. A pipelined adder topology.

C.

Modulator

Fig. 15 shows the second-order MASH topology used


in the prototype. This structure is well known [12] and has
properties that are well suited to our transmitter application.
The MASH topology is unconditionally stable over its entire
input range and is readily pipelined by using a technique
described in this section.
The spectral density at the output of a second-order MASH
modulator is described by the equation
(9)
In the presence of a sufficiently active input,
can
be considered a white noise source with spectral density

2056

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997

Fig. 17.

A pipelined, second-order, digital MASH structure.

Fig. 18.

Pipelined digital data path to divider input.

This assumption is reasonable while the


modulation signal is applied; we have found that setting the
least significant bit (LSB) of the modulator high also helps
to achieve this condition by forcing the internal states of the
MASH structure to constantly change.
A fact that does not appear to have been appreciated in
the literature is that the digital MASH structure is
highly amenable to pipelining. This is a useful technique when
seeking a low power implementation since it allows the supply
voltage to be reduced by virtue of the fact that the required
throughput can be achieved with lower circuit speed.
To pipeline the MASH structure, we apply a well-known
technique that has been used for adders and accumulators [17],
[18]. Fig. 16 illustrates a 3-b example. Since the critical path
in these structures is their carry chain, registers are inserted in
this path. To achieve time alignment between the input and the
delayed carry information, registers are also used to skew the
input bits. As indicated in the figure, we refer to this operation
as pipe shifting the input. The adder output is realigned
in time by performing an align shift of its bits as shown.
(Note that shading is applied to the adder block in Fig. 16 as a
reminder that its bits are skewed in time.) The same pipelining
approach can be applied to digital accumulators since there is
no feedback from higher to lower bits.
Since its basic building blocks are adders and accumulators,
a MASH modulator of any order can be pipelined using
this technique. Using the symbols introduced in the previous
two figures, Fig. 17 depicts a pipelined, second-order MASH
topology. Each first-order is realized as a pipelined
accumulator with feedback removed from the most significant
bits in its output. The output of the second stage is fed into the

filter
, which is implemented with two pipelined adders
and a delay element,
A delay is inserted between these
two adders in order to pipeline their sum path, which requires
a matching delay
in the path above for time alignment.
Also, a delay
must also be included in the output path of
the first stage to compensate for the time delay incurred
through the second stage. Since a signal once placed in the
pipe shifted domain can be sent through any number of
cascaded, pipelined adders and/or integrators, only one pipe
shift and align shift are needed in the entire structure.
Fig. 18 illustrates the implementation of the overall digital
path using pipelining. To save area, the circuits were pipelined
every two bits as opposed to one, and pipe shifting was not
applied to the carrier frequency signal since it is constant
during modulation. To achieve flexibility, the compensated
digital transmit filter was implemented in software, as opposed
to a ROM, and the resulting digital data stream fed into the
custom CMOS IC.
V. MEASURED PERFORMANCE
The primary performance criteria by which a transmitter is
judged are its accuracy in modulation and its noise performance. We now describe the characterization of the prototype
in relation to these issues.
Fig. 19 shows measured eye diagrams from the prototype
using an HP 89441A modulation analyzer. To illustrate the
impact of mismatch between the compensation filter and PLL
dynamics, measurements were taken under three different
values of open-loop gain. These results indicate that the
modulation performance of the transmitter is quite good even
when the open-loop gain is in error by 25%; the effects of

PERROTT et al.: CMOS FRACTIONAL-

SYNTHESIZER USING DIGITAL COMPENSATION

(a)
Fig. 19.

(b)

2057

(c)

Measured eye diagrams at 2.5 Mb/s for three different open-loop gain settings: (a)

025% gain error, (b) 0% gain error, and (c) 25% gain error.
VALUES

OF

TABLE III
NOISE SOURCES WITHIN PLL

where
is 30 MHz/V. The value of the kT/C noise current produced by the switched capacitor operation,
is
calculated as
(11)
Fig. 20.

Expanded view of PLL system.

this gain error are to produce a moderate amount of ISI and


an error in the modulation deviation.
An explanation of the observed ISI and deviation error is
given in [7]. In brief, the resulting mismatch creates a parasitic
pole/zero pair that occurs near the cutoff frequency of the PLL
(84 kHz in this case); the resulting transfer function seen by
the data can be viewed as the sum of a low-pass and an all-pass
filter. ISI is introduced as data excites the impulse response
of the low-pass filter, and modulation deviation error occurs
since the magnitude of the all-pass is changed according to
the amount of mismatch present.
Fig. 20 displays the dominant sources of noise in the
prototype; their values are displayed in Table III. Many of
these values were obtained through ac simulation of the
relevant circuits in HSPICE. Note that all noise sources other
than
are assumed to be white, so that the values of their
variance suffice for their description. This assumption is only
approximate for the VCO noise in the prototype, as will be
seen in the measured data.
Based on measurements, the input referred noise of the VCO
was calculated in the table from the expression
dBc/Hz at

MHz
(10)

where is Boltzmanns constant, and


is temperature in
degrees Kelvin.
Assuming that each of the noise sources in Fig. 20 are
independent of each other, we can express the overall phase
noise spectral density at the transmitter output
as
(12)
where
and
are the noise contributions
from the dominant voltage, current, and quantization noise
sources. Based on the values in Table III and the model in
Fig. 20, we obtain

(13)
where

In the case of
the division of
by two is an
approximation based on the fact that the dominant charge
pump noise source is switched in and out at each opamp input
with a nominal duty cycle of 50%. Note that
is given
by (2).
A plot of the spectra in (13) is shown in Fig. 21(a).
Computation of these spectra assumed the parameter values

2058

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997

(a)

(b)
Fig. 21. Noise spectra of synthesizer: (a) calculated: (1) charge pump induced, S8
(4) overall, S8 f and (b) measured synthesizer and open-loop VCO noise.

()

listed in Fig. 20 and Table III, and


kHz

described by (7) with


kHz

kHz
As seen in this diagram, the noise from the charge pump
dominates at low frequencies, and the influence of the
quantization noise dominates at high frequencies.
Fig. 21(b) shows measured noise results from the transmitter
prototype taken with an HP 3048A phase noise measurement

()

f ;

(2) VCO and opamp induced,

()

f ;

(3)

61 induced,

()

f ;

system. (The spurious content of the modulator was


reduced to negligible levels by feeding a binary data stream
into the LSB of the modulation path so that the internal states
of the were randomized; the binary data stream was
designed to have relatively flat spectral characteristics and
negligible levels of spurious energy at frequencies greater
than 10 kHz.) The resulting spectrum compares quite well
with the calculated curve in Fig. 21(a), especially at high
frequency offsets close to 5 MHz. At lower frequencies in the
range of 100 kHz, the measured noise is within about 3 dB

PERROTT et al.: CMOS FRACTIONAL-

SYNTHESIZER USING DIGITAL COMPENSATION

of the predicted value; the higher discrepancy in this region


was calculated without
might be attributed to the fact that
considering the offset or transient response of the charge pump
and/or the possible inaccuracy of the HSPICE device models
at low currents. Note that the spur at 20-MHz offset (the
reference frequency), which is due to the 50% nominal duty
cycle of the PFD, is less than 60 dBc.
Fig. 21(b) demonstrates that the unmodulated transmitter
has an output spectrum
of 132 dBc/Hz at 5-MHz
offset from the carrier. At this frequency offset, simulations
reveal that the output spectrum of the modulated transmitter
is equal to
when its data rate is close to the DECT rate
of 1 Mb/s [7]. This being the case, the transmitter satisfies the
DECT noise specification of 131 dBc/Hz at 5-MHz offset;
eye diagrams for data rates close to 1 Mb/s are found in [7].
VI. CONCLUSION
A digital compensation method and key circuits were presented that allow modulation of a frequency synthesizer at
rates over an order of magnitude faster than its bandwidth.
Using this technique, a transmitter prototype was built that
achieves 2.5-Mb/s data rate modulation using GFSK modulation at a carrier frequency of 1.8 GHz. Measured results
indicate that the architecture can achieve the modulation and
noise performance required by the DECT standard with a
structure that is highly integrated and has low power dissipation. In particular, the mostly digital design requires no
off-chip filters, no mixers, and no D/A converters in the
modulation path. Further, the structure contains only the core
components required of a narrowband, spectrally efficient
transmitter: a frequency synthesizer and a digital transmit filter.
ACKNOWLEDGMENT
The authors thank G. Dawe and J. Mourant for guidance
in RF issues, A. Chandrakasan for discussion on low power
methods, R. Weiner for bonding the die, B. Broughton for aid
in phase noise measurements, and M. Trott, P. Ferguson, P.
Katzin, Z. Zvonar, and D. Fague for advice.
REFERENCES
[1] P. Gray and R. Meyer, Future directions in silicon ICs for RF personal
communications, in IEEE Custom IC Conf., 1995, pp. 8390.
[2] T. Stetzler, I. Post, J. Havens, and M. Koyama, A 2.74.5 V single-chip
GSM transceiver RF integrated circuit, in Proc. IEEE Int. Solid-State
Circuits Conf., Feb. 1995, pp. 150151.
[3] J. Min, A. Rofougaran, H. Samueli, and A. A. Abidi, An all-CMOS architecture for a low-power frequency-hopped 900 MHz spread spectrum
transceiver, in IEEE Custom IC Conf., 1994, pp. 16.1/1-4.
[4] S. Sheng, L. Lynn, J. Peroulas, K. Stone, I. ODonnell, and R. Brodersen,
A low-power CMOS chipset for spread-spectrum communications, in
Proc. IEEE Int. Solid-State Circuits Conf., Feb. 1996, pp. 346347.
[5] S. Heinen, S. Beyer, and J. Fenk, A 3.0 V 2 GHz transmitter IC for
digital radio communication with integrated VCOs, in Proc. IEEE Int.
Solid-State Circuits Conf., Feb. 1995, pp. 150151.
[6] S. Heinen, K. Hadjizada, U. Matter, W. Geppert, V. Thomas, S. Weber,
S. Beyer, J. Fenk, and E. Matschke, A 2.7 V 2.5 GHz bipolar chipset for
digital wireless communication, in Proc. IEEE Int. Solid-State Circuits
Conf., Feb. 1997, pp. 306307.
[7] M. H. Perrott, Techniques for high data rate modulation and low power
operation of fractional-N frequency synthesizers, Ph.D. dissertation,
MIT, 1997.

2059

[8] T. A. Riley, M. A. Copeland, and T. A. Kwasniewski, Delta-sigma


modulation in fractional-N frequency synthesis, IEEE J. Solid-State
Circuits, vol. 28, pp. 553559, May 1995.
[9] T. A. Riley and M. A. Copeland, A simplified continuous phase modulator technique, IEEE Trans. Circuits Syst. II, vol. 41, pp. 321328,
May 1994.
[10] B. Miller and B. Conley, A multiple modulator fractional divider, in
Proc. 44th Annual Symp. on Frequency Control, May 1990, pp. 559567.
[11] B. Miller and B. Conley, A multiple modulator fractional divider,
IEEE Trans. Instrum. Meas., vol. 40, pp. 578583, June 1991.
[12] J. Candy and G. Temes, Oversampling Delta-Sigma Data Converters.
New York: IEEE Press, 1992.
[13] Y. Tsividis and J. Voorman, Integrated Continuous-Time Filters. New
York: IEEE Press, 1993.
[14] T. Kamoto, N. Adachi, and K. Yamashita, High-speed multi-modulus
prescaler IC, in 1995 Fourth IEEE Int. Conf. Universal Personal
Communications. Record. Gateway to the 21st Century, 1995, pp. 991,
325-8.
[15] J. Craninckx and M. S. Steyaert, A 1.75-GHz/3-V dual-modulus divideby-128/129 prescaler in 0.7-m CMOS, IEEE J. Solid-State Circuits,
vol. 31, pp. 890897, July 1996.
[16] M. Thamsirianunt and T. A. Kwasniewski, A 1.2 m CMOS implementation of a low-power 900-MHz mobile radio frequency synthesizer, in
IEEE Custom IC Conf., 1994, pp. 16.2/1-4.
[17] S.-J. Jou, C.-Y. Chen, E.-C. Yang, and C.-C. Su, A pipelined multiplieraccumulator using a high-speed, low-power static and dynamic full
adder design, IEEE J. Solid-State Circuits, vol. 32, pp. 114118, Jan.
1997.
[18] F. Lu and H. Samueli, A 200-MHz CMOS pipelined multiplieraccumulator using a quasidomino dynamic full-adder cell design, IEEE
J. Solid-State Circuits, vol. 28, pp. 123132, Feb. 1993.

Michael H. Perrott (S97) was born in Austin, TX,


in 1967. He received the B.S.E.E. degree from New
Mexico State University, Las Cruces, in 1988, and
the M.S. and Ph.D. degrees in electrical engineering
and computer science from Massachusetts Institute
of Technology, Cambridge, in 1992 and 1997, respectively.
He currently works at Hewlett-Packard Laboratories, Palo Alto, CA. His interests include signal
processing and circuit design applied to communication systems.

Theodore L. Tewksbury III (S86M87) received


the S.B. degree in architecture in 1983 and the
M.S. and Ph.D. degrees in electrical engineering and
computer science in 1987 and 1992, respectively,
all from the Massachusetts Institute of Technology,
Cambridge. His doctoral dissertation consisted of
an experimental and theoretical investigation of the
effects of oxide traps on the large-signal transient
performance of analog MOS circuits.
He joined Analog Devices, Inc., in 1987 as Design Engineer for the Converter Group, where he
worked on high-speed, high-resolution data acquisition circuits for video,
instrumentation, and medical applications. From 1992 to 1994, as Senior Characterization Engineer, he was involved in the development of high-accuracy
analog models for advanced bipolar, BiCMOS, and CMOS processes, with
emphasis on the statistical modeling of manufacturing variations. In December
1994, he joined the newly formed Communications Division at Analog
Devices as RF Design Engineer. He is presently involved in the design of
RF integrated circuits for wireless communications, including GSM, DECT,
and DBS. He is also actively involved in the development and modeling of
advanced semiconductor technologies for RF applications, including ADRF
(Analog Devices bipolar RF process) and silicon germanium.

2060

Charles G. Sodini (S80M82SM90F95) was


born in Pittsburgh, PA, in 1952. He received the
B.S.E.E. degree from Purdue University, Lafayette,
IN, in 1974 and the M.S.E.E. and the Ph.D. degrees
from the University of California, Berkeley, in 1981
and 1982, respectively.
He was a Member of the Technical Staff at
Hewlett-Packard Laboratories from 1974 to 1982,
where he worked on the design of MOS memory
and later, on the development of MOS devices with
very thin gate dielectrics. He joined the faculty of
the Massachusetts Institute of Technology, Cambridge, in 1983, where he
is currently a Professor in the Department of Electrical Engineering and
Computer Science. His research interests are focused on IC fabrication, device
modeling, and device level circuit design, with emphasis on analog and
memory circuits and systems.
Dr. Sodini held the Analog Devices Career Development Professorship of
Massachusetts Institute of Technologys Department of Electrical Engineering
and Computer Science and was awarded the IBM Faculty Development
Award from 19851987. He has served on a variety of IEEE Conference
Committees, including the International Electron Device Meeting where he
was the 1989 General Chairman. He was the Technical Program Co-Chairman
for the 1992 Symposium on VLSI Circuits and the 1993-1994 Co-Chairman of
the Symposium. He has served on the Electron Device Society Administrative
Committee from 198894 and is currently a member of the Solid-State Circuits
Council.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997

780

IEEE JOURNAL ON SOLID-STATE CIRCUITS, VOL. 35, NO. 5, MAY 2000

A CMOS Frequency Synthesizer with


an Injection-Locked Frequency Divider
for a 5-GHz Wireless LAN Receiver
Hamid R. Rategh, Student Member, IEEE, Hirad Samavati, Student Member, IEEE, and Thomas H. Lee, Member, IEEE

AbstractA fully integrated 5-GHz phase-locked loop (PLL)


based frequency synthesizer is designed in a 0 24 m CMOS
technology. The power consumption of the synthesizer is significantly reduced by using a tracking injection-locked frequency
divider (ILFD) as the first frequency divider in the PLL feedback
loop. On-chip spiral inductors with patterned ground shields are
also optimized to reduce the VCO and ILFD power consumption
and to maximize the locking range of the ILFD. The synthesizer
consumes 25 mW of power of which only 3.8 mW is consumed
by the VCO and the ILFD combined. The PLL has a bandwidth
of 280 kHz and a phase noise of 101 dBc/Hz at 1 MHz offset
frequency. The spurious sidebands at the center of adjacent
channels are less than 54 dBc.
Index TermsCMOS RF circuits, frequency synthesizers, injection-locked frequency dividers, wireless LAN.

Fig. 1. (a) U-NII and HIPERLAN frequency bands and (b) channel allocation
in our U-NII band WLAN system.

I. INTRODUCTION

we present our proposed architecture of the frequency synthesizer which takes advantage of an injection-locked frequency
divider (ILFD) to reduce the overall power consumption.
Section IV-A is dedicated to the design of the VCO and
demonstrates how on-chip spiral inductors can be optimized to
reduce the VCO power consumption and to improve the phase
noise performance at the same time. Section IV-B describes
the design issues of ILFDs as well as the optimization of
on-chip spiral inductors for wide-locking-range and low-power
ILFDs. The pulse swallow frequency divider, charge pump,
and loop filter are the subjects of Sections IV-C, IV-D, and
IV-E, respectively. The measurement results are presented in
Section V and conclusions are made in Section VI.

HE DEMAND for wireless local area network (WLAN)


systems which can support data rates in excess of 20 Mb/s
with very low cost and low power consumption is rapidly increasing. The newly released unlicensed national information
infrastructure (U-NII) frequency band in the United States is primarily intended for wideband WLAN and provides 300 MHz of
spectrum at 5 GHz [Fig. 1(a)]. The lower 200 MHz of this band
(5.155.35 GHz) overlaps the European high-performance radio
LAN (HIPERLAN) frequency band. The upper 100 MHz of the
spectrum which overlaps the industrial, scientific, and medical
(ISM) band is not used in our system. To stay compatible with
HIPERLAN the lower 200 MHz of the spectrum is divided into
eight channels which are 23.5 MHz wide [Fig. 1(b)]. The minimum signal level at the receiver is 70 dBm while the maximum strength of the received signal is 20 dBm. The large
dynamic range and wide channel bandwidths set very stringing
requirements for the synthesizer phase noise and spurious sideband levels.
In this paper we describe the design of a fully integrated
integer- frequency synthesizer as a local oscillator (LO) for
a U-NII band WLAN receiver. The front end of the receiver is
described in [9].
Section II describes some of the synthesizer design challenges and reviews previously existing solutions. In Section III
Manuscript received August 2, 1999; revised November 29, 1999. This work
was supported by the Stanford Graduate Fellowship program and IBM Corporation.
The authors are with the Center for Integrated Systems, Stanford University,
Stanford, CA 94305 USA (e-mail: hamid@smirc.stanford.edu).
Publisher Item Identifier S 0018-9200(00)02988-7.

II. FREQUENCY SYNTHESIZERS


Frequency synthesizers are an essential part of wireless receivers and often consume a large percentage (2030%) of the
total power (Table I). A typical PLL-based frequency synthesizer comprises both high and low frequency blocks. The high
frequency blocks, mainly the VCO and first stage of the frequency dividers, are the main power consuming blocks, especially in a CMOS implementation. Therefore, BiCMOS technology has often been chosen over CMOS, where the VCO and
the prescaler are designed with bipolar transistors and the low
frequency blocks are CMOS [1]. Off-chip VCOs and dividers
have also been used as an alternative [4]. However, because of
the increased cost neither of these two solutions is suitable for
many applications, and a fully integrated CMOS solution is favorable. A dividerless frequency synthesizer [11] which eliminates powerhungry frequency dividers is one solution for such
low-power and fully integrated systems. In this technique an

00189200/00$10.00 2000 IEEE

RATEGH et al.: CMOS FREQUENCY SYNTHESIZER

781

TABLE I
POWER CONSUMPTION OF FULLY INTEGRATED
WIRELESS RECEIVERS

aperture phase detector is used to compare the phase of the reference signal and the VCO output at every rising edge of the reference signal for only a time window which is a small fraction
of the reference period. Thus no frequency divider is required
in this PLL. The idea of a dividerless frequency synthesizer, although suitable for systems such as a GPS receiver where only
one LO signal is required, is not readily applied to wireless systems which require multiple LO frequencies with a small frequency separation.
Fig. 2. Frequency synthesizer block diagram.

III. PROPOSED SYNTHESIZER ARCHITECTURE


Our proposed architecture (Fig. 2) is an integer- frequency
synthesizer with an initial low power divide-by-two in the PLL
feedback loop. The prescaler follows the fixed frequency divider
and operates at half the output frequency, thus, its power consumption is reduced significantly. Furthermore, the first divider
is an injection-locked frequency divider [6], [7] which takes advantage of the narrowband nature of the system and trades off
bandwidth for power via the use of resonators. To further reduce
the power consumption, optimization techniques are used to design the on-chip spiral inductors of the VCO and ILFD.
Because of the fixed initial divide-by-two in the loop the reference frequency in our system is half of the LO spacing and is
11 MHz. Consequently, the loop bandwidth is reduced to maintain the loop stability. This bandwidth reduction helps to filter
harmonics of the reference signal, mainly the second harmonic,
which generate spurs in the middle of the adjacent channels.
The drawbacks of a reduced loop bandwidth are an increased
settling time and a higher in-band VCO phase noise. The higher
in-band VCO phase noise is not a limiting factor as the in-band
noise is dominated by the upconverted noise of the reference
signal. The slower settling time is only a problem in very fast
frequency-hopped systems.
The synthesized LO frequency in our system is 16/17 of the
received carrier frequency. This choice of LO frequency not
only eases the issue of image rejection in the receiver [9], but
also facilitates the generation of the second LO, which is 1/16
of the first LO, with the same synthesizer.
IV. SYNTHESIZER BUILDING BLOCKS
A. Voltage-Controlled Oscillator
Fig. 3 shows the schematic of the VCO. Two cross-coupled
transistors M1 and M2 generate the negative impedance
required to cancel the losses of the RLC tank. On-chip spiral
inductors with patterned ground shields [15] are used in this
design. The two main requirements for the VCO are low phase
noise and low power consumption. If the inductors were the
main source of noise, maximizing their quality factor would
improve the phase noise significantly. However, in multi-GHz
VCOs with short channel transistors, inductors are not the

main source of noise and a better design strategy is to maximize


the effective parallel impedance of the RLC tank at resonance.
This choice increases the oscillation amplitude for a given
power consumption and hence reduces the phase noise caused
by the noise injection from the active devices. Since inductors
product should
are the main source of loss in the tank, the
be maximized to maximize the effective parallel impedance of
the tank at resonance, where is the inductance and is the
quality factor of the spiral inductors. It is important to realize
alone does not necessarily maximize the
that maximizing
product, and it is the latter that matters here.
To design the spiral inductors, we use the same inductor
model reported in [14]. The inductance is first approximated
with a monomial expression as in [3]. Optimization is used
product. The
next to find the inductor with the maximum
inductors in this design are 2.26 nH each with an estimated
quality factor of 5.8 at 5 GHz. It is worth mentioning that at
5 GHz, the magnetic loss in the highly doped substrate of the
epi process reduces the inductor quality factor significantly.
Approximate calculations show that substrate inductive loss
is proportional to the cube of the inductors outer diameter.
Therefore, a multilayer stacked inductor which has a smaller
area compared to a single-layer inductor with the same inductance may achieve a larger quality factor. We should mention
that in our design, inductors are laid out using only the top-most
metal layer.
The varactors in Fig. 3 are accumulation-mode MOS capacitors [5], [12]. The quality factor of these varactors can be substantially degraded by gate resistance if they are not laid out
properly. In our design each varactor is laid out with 14 fingers
which are 3 m wide and 0.5 m long. The quality factor of this
varactor at 5 GHz is estimated to exceed 60. The losses of the
RLC tank are thus dominated by the inductors, as expected.
B. Injection-Locked Frequency Divider
Fig. 4 shows the schematic of the voltagecontrolled ILFD
used in the frequency synthesizer. The incident signal (the VCO
output) is injected into the gate of M3 and is delivered with
a subunity voltage gain to Vx, the common source connection
of M1 and M2. Transistor M4 is used to provide a symmetric

782

IEEE JOURNAL ON SOLID-STATE CIRCUITS, VOL. 35, NO. 5, MAY 2000

Fig. 3. Schematic of the VCO.


Fig. 4. Schematic of the differential ILFD.

load for the VCO. The output signal is fed back to the gates of
M1 and M2 and is summed with the incident signal across the
gates and sources of M1 and M2. The nonlinearity of M1 and
M2 generates intermodulation products which allow sustained
oscillation at a fraction of the input frequency [6]. As shown
in [6] in the special case of a divide-by-two and a third-order
ILFD
nonlinearity, the phase-limited locking range of an
can be expressed as

(1)

where
freerunning oscillation frequency;
frequency offset from ;
incident amplitude;
impedance of the RLC tank at resonance;
quality factor of the RLC tank;
second-order coefficient of the nonlinearity.
As (1) suggests, a larger incident amplitude as well as a larger
result in a larger achievable
which we refer to as the
oscillator
, so the largest
locking range. In an
practical inductance should be used to maximize the locking
range.
A larger quadratic nonlinearity ( ) also increases the locking
range. So a circuit architecture with a large second-order nonlinearity is favorable for a divide-by-two ILFD and in fact the
circuit in Fig. 4 has such a characteristic. The common source
connection node of the differential pair moves at twice the frequency of the output signal even in the absence of the incident
signal. So this circuit has a natural tendency for divide-by-two
operation when the incident signal is effectively injected into
node Vx.
To further extend the locking range, the ILFD is designed
such that the resonant frequency of its output tank tracks the
input frequency. Accumulation mode MOS varactors are used to
tune the ILFD and its control voltage is tied to the VCO control
voltage (Fig. 2). The locking range of the ILFD therefore does
not limit the tuning range of the PLL beyond what is determined
by the VCO.

As in the VCO design, on-chip spiral inductors with patterned


ground shields are used in the ILFD, but with a different optimization objective. As mentioned earlier the largest practical
inductance maximizes the locking range. However, reduction of power consumption demands maximization of the
product. The inductor has its largest value when the total capacitance that resonates with it is minimized. To reduce its parasitic bottom plate capacitance the inductor should be laid out
with narrow topmost metal lines. However, the large series resistance of narrow metal strips degrades the inductor quality factor
product significantly. Therefore, both and
and reduces the
product may not be maximized simultaneously for an
the
on-chip spiral inductor resonating with a fixed capacitance. Optimization is thus used to design for the maximum inductance
product is large enough to satisfy the specisuch that the
fied power budget. The inductors resulting from this trade-off
are 9.5 nH each with an estimated quality factor of 4.2 at the
divider output frequency (2.5 GHz).
C. Pulse Swallow Frequency Divider
) consists of a
The pulse swallow frequency divider (
prescaler followed by a program and pulse swallow
counter. Only one CMOS logic ripple counter is used for both
program and pulse swallow counters. The program counter
generates one output pulse for every ten input pulses. The
output of the pulse swallow counter is controlled by three
channel select bits. The overall division ratio is 220227. At
the beginning of the cycle the prescaler divides by 23. As soon
as the first three bits of the ripple counter match the channel
select bits, the prescaler begins to divide by 22. The next cycle
starts after the ripple counter counts to ten.
The prescaler consists of three dual-modulus divide-by-2/3
and one divide-by-2 frequency divider made of source-coupled
logic (SCL) flip-flops and gates (Fig. 5). The modulus control
(MC) input selects between divide-by-22 and divide-by-23. Except for the second dual modulus all other dividers including
the CMOS counters are triggered by the falling edges of their
input clocks, allowing a delay of as much as half the period of
the input of each divider. With this arrangement we guarantee
, and
(Fig. 5) and prevent a race
overlap between
condition.

RATEGH et al.: CMOS FREQUENCY SYNTHESIZER

783

Fig. 6. Simplified schematic of the charge pump and loop filter.


Fig. 5. Block diagram of the prescaler.

D. Charge Pump
Fig. 6 shows the circuit diagram of the charge pump and loop
filter. The charge pump has a differential architecture. However,
, drives the loop filter. To preonly a single output node,
from drifting to the rails when neither of the
vent the node
up and down signals (U and D) is active, the unity gain buffer
shown in Fig. 6 is placed between the two output nodes. This
buffer keeps the two output nodes at the same potential and
thus reduces the charge pump offset. The power of the spurious
sidebands in the synthesized output signal is thereby reduced.
In this charge pump the current sources are always on and the
PMOS and NMOS switches are used to steer the current from
one branch of the charge pump to the other.

Linearized PLL model.

where is the crossover frequency. By differentiating (4) with


it can be shown that the maximum phase margin
respect to
is achieved at
(5)
and the maximum phase margin is

E. Loop Filter
and capacitor
in the loop filter (Fig. 6) genResistor
. Capacitor
erate a pole at the origin and a zero at
and the combination of
and
are used to add extra poles at
frequencies higher than the PLL bandwidth to reduce reference
feedthrough and decrease the spurious sidebands at harmonics
and
,
of the reference frequency. The thermal noise of
although filtered by the loop, directly modulates the VCO control voltage and can cause substantial phase noise in the VCO
if the resistors are not sized properly. The capacitors and resistors of the loop filter should be properly chosen to perform the
required filtering function and maintain the stability of the loop
without introducing too much noise. Fig. 7 shows a linearized
phased-locked loop model. In a third-order loop, the loop filter
, and
and its impedance can be written
contains only
as

Notice that the maximum phase margin is only a function of


(ratio of
and ) and for less than 1 the phase margin is
less than 20 which makes the loop practically unstable.
to
To complete our loop analysis we force
be the crossover frequency of the loop and get

(7)

is the charge pump

(4)

(8)

. The open loop transfer

(3)
is the VCO gain constant and
where
current. The phase margin of the loop is

(6)

Now we can define a loop filter design recipe as follows.


from the VCO simulation.
1) Find
2) Choose a desired phase margin and find from (6).
3) Choose the loop bandwidth and find from (5).
and such that they satisfy (7).
4) Select
. If the calculated
5) Calculate the noise contribution of
noise is negligible the design is complete, otherwise go
back to step four and increase .
The same loop analysis can be repeated for a fourth-order
loop. In this case the phase margin is

(2)
and
where
function of the third-order PLL is

Fig. 7.

784

IEEE JOURNAL ON SOLID-STATE CIRCUITS, VOL. 35, NO. 5, MAY 2000

Fig. 8. Die micrograph.

where

and
. The crossover frequency for the maximum
phase margin is shown in (9), at the bottom of the page.
to be the crossover frequency it should satisfy
Finally for
(10), shown at the bottom of the page.
As in the third-order loop the maximum phase margin is not a
function of the absolute values of the s and s and is only a
, and
). The loop
function of their ratios (
filter design recipe for the fourth-order loop is modified as follows.
from the VCO simulation.
1) Find
,
2) Choose a desired phase margin and find
from (8) and (9).
and

3) Choose the loop bandwidth and find from (9).


and
such that they satisfy (10).
4) Select
and
.If their
5) Calculate the noise contribution of
noise contribution is negligible the design is complete,
otherwise go back to step four and increase .
Notice that in a fourth-order loop there are two degrees of
and
to achieve a defreedom in choosing
sired phase margin. Therefore, the suppression of the spurious
sidebands can be improved without reducing the phase margin
or the loop bandwidth.
In our system the maximum VCO gain constant is
500 MHz/V. With this VCO gain, and loop filter values
k
pF
pF
of
k
pF, and
A, the crossover frequency
is about 280 kHz with a 46 phase margin. The calculated
contribution to VCO phase noise at 10 MHz offset frequency
is 137 dBc/Hz, which is negligible compared to the intrinsic
noise of the VCO.
V. MEASUREMENT RESULTS
The frequency synthesizer is designed in a 0.24- m CMOS
technology. Fig. 8 shows the die micrograph of the synthesizer
with an area of 1 mm 1.6 mm, including pads.

(9)

(10)

RATEGH et al.: CMOS FREQUENCY SYNTHESIZER

Fig. 9.

785

Fig. 11.

ILFD phase noise measurements.

Fig. 12.

Phase noise of the synthesizer output signal.

VCO tuning range.

Fig. 10. ILFD locking range and power consumption as a function of incident
amplitude.

TABLE II
ILFD PERFORMANCE SUMMARY

The analog blocks (VCO, ILFD, and prescaler) are supplied


by 1.5 V while the digital portions of the synthesizer are supplied by 2 V. The reason for this choice of supplies is to achieve a
larger tuning range for the VCO. The accumulation mode MOS
)
capacitors in this technology have a flatband voltage (
around zero volts. Thus to get the full range of capacitor variation the control voltage should exceed the VCO supply to produce a net negative voltage across the varactors in Fig. 3. To
eliminate a need for multiple supplies the VCO can be biased
with a PMOS current source, and by connecting the sources of
M1 and M2 to ground. More than 500 MHz (10% of the center
frequency) of VCO tuning range is achieved for a 1.5-V control
voltage variation (Fig. 9).
The free-running oscillation frequency of the ILFD changes
of the center frequency) for a 1.5-V
more than 110 MHz (
control voltage variation.
Fig. 10 shows the locking range of the ILFD as a function
of the incident amplitude for two different control voltages. As
expected, changing the control voltage only changes the operation frequencies and not the locking range. The ILFDs average
power consumption is also shown on the same figure. Increasing
the incident amplitude increases the locking range and the average power consumption. The average power at 1-V incident

amplitude is less than 0.8 mW while the locking range exceeds


of the center frequency).
1000 MHz (
The ILFD phase noise measurement results are shown in
Fig. 11. The solid line shows the phase noise of the HP83732B
signal generator used as the incident signal. The dashed line is
the phase noise of the freerunning ILFD. The two other curves
are the phase noise of the ILFD when locked to two different
incident frequencies. The curve marked as middle frequency
is measured when the incident frequency is in the middle of
the locking range and the edge frequency curve is measured at
the lower edge of the locking range. At low offset frequencies
the output of the frequency divider follows the phase noise of
the incident signal and is 6 dB lower due to the divide-by-two
operation. However, at larger offset frequencies the added noise
from the divider itself, the external amplifier, and measurement
tools reduces the 6 dB difference between the incident and

786

IEEE JOURNAL ON SOLID-STATE CIRCUITS, VOL. 35, NO. 5, MAY 2000

TABLE III
MEASURED SYNTHESIZER PERFORMANCE

output phase noise. The ILFD phase noise measurements for


offset frequencies higher than 200 kHz are not accurate due to
the dominance of noise from the external amplifier.
The spurious tones at 11-MHz offset frequency from the
center frequency are more than 45 dB below the carrier. The
spurs at the 22-MHz offset frequency are at 54 dBc. Since
the LO spacing is twice the reference frequency, the spurs at
11-MHz offset frequency fall at the edge of each channel and
are less critical than the 22-MHz spurs which are located at
the center of adjacent channels. With the 54 dBc spurs at
22 MHz offset frequency, an undesired adjacent channel may
be 44 dB stronger than the desired channel for a minimum 10
dB signal-to-interference ratio.
Phase noise measurements of the complete synthesizer output
signal are shown in Fig. 12. The phase noise at small offset frequencies is mainly determined by the phase noise of the reference signal. The phase noise measured at offset frequencies
beyond the PLL bandwidth is the inherent VCO phase noise.
The phase noise at 1-MHz offset frequency is measured to be
101 dBc/Hz. The phase noise at 22 MHz offset frequency is
extrapolated to be 127.5 dBc/Hz. Therefore the signal in the
adjacent channel can be 43 dB stronger than that of the desired
channel for a 10 dB signaltointerference ratio.
VI. CONCLUSION
In this work we demonstrate the design of a fully integrated,
5-GHz CMOS frequency synthesizer designed for a U-NII band
WLAN system. The tracking injection-locked frequency divider
used as the first divider in the PLL feedback loop reduces the
power consumption considerably without limiting the performance of the PLL. Table II summarizes the performance of the
ILFD. The power consumption of two flip-flop based frequency
dividers at 5 GHz are also listed for comparison purposes. In
a 0.24- m CMOS technology a simulated SCL flip-flop based

frequency divider loaded with the same capacitance as in the


ILFD consumes almost an order of magnitude more power than
the ILFD with a 600-MHz locking range. The measurement results of a fast flip-flop based divider in an advanced 0.1- m
CMOS technology show a power consumption of 2.6 mW at
5 GHz [8] which is more than four times the power of the ILFD
with a 600 MHz locking range.
Table III summarizes the performance of the synthesizer. The
spurious sidebands at offset frequencies of twice the reference
signal are more than 54 dB below the carrier. The spurs are
and
signals to
mainly due to charge injection from the
the loop, and can be reduced significantly by using a cascode
structure for transistors M1M4 (Fig. 6). Better matching between the up and down current sources also improves the sideband spurs. Of the 25-mW total power consumption, less than
3.8 mW is consumed by the VCO and ILFD combined. This low
power consumption is achieved by the optimized design of the
spiral inductors in the VCO and ILFD. The prescaler operates
at 2.5 GHz and consumes 19 mW, of which about 40% is consumed in the first 2/3 dual modulus divider. Therefore the ILFD,
which takes advantage of narrowband resonators, consumes an
order of magnitude less power than the first 2/3 dual modulus
divider, while operating at twice the frequency.
ACKNOWLEDGMENT
The authors would like to thank Dr. M. Hershenson, Dr. S.
Mohan, and T. Soorapanth for their valuable technical discussions and help. They also thank National Semiconductor for fabricating the chip.
REFERENCES
[1] T. S. Aytur and B. Razavi, A 2-GHz, 6 mW BiCMOS frequency synthesizer, IEEE J. Solid-State Circuits, vol. 30, pp. 14571462, Dec. 1995.
[2] J. Craninckx and M. Steyaert, A fully integrated CMOS DCS-1800 frequency synthesizer, in ISSCC Dig., 1998, pp. 372373.
[3] M. Hershenson, S. S. Mohan, S. P. Boyd, and T. H. Lee, Optimization
of inductor circuits via geometric programming, in Design Automation
Conf. Dig., June 1999, pp. 994998.
[4] C. G. S. M. H. Perrott and T. L. Tewksbury, A 27-mW CMOS fractional-N synthesizer using digital compensation for 2.5-Mb/s GFSK
modulation, IEEE J. Solid-State Circuits, vol. 32, pp. 20482059, Dec.
1997.
[5] A. S. Porret, T. Melly, and C. C. Enz, Design of high-Q varactors for
low-power wireless applications using a standard CMOS process, in
Custom Integrated Circuits Conf. Dig., May 1999, pp. 641644.
[6] H. R. Rategh and T. H. Lee, Superharmonic injection-locked frequency
dividers, IEEE J. Solid-State Circuits, vol. 34, pp. 813821, June 1999.
[7] H. R. Rategh, H. Samavati, and T. H. Lee, A 5GHz, 1mW CMOS
voltage controlled differential injection-locked frequency divider, in
Custom Integrated Circuits Conf. Dig., May 1999, pp. 517520.
[8] B. Razavi, K. F. Lee, and R. H. Yan, Design of high-speed, low-power
frequency dividers and phase-locked loops in deep submicron CMOS,
IEEE J. Solid-State Circuits, vol. 30, pp. 101109, Feb. 1995.
[9] H. Samavati, H. R. Rategh, and T. H. Lee, A 5GHz CMOS wireless-LAN receiver front-end, IEEE J. Solid-State Circuits, vol. 35, pp.
xxxxxx, May 2000.
[10] D. Shaeffer, A. Shahani, S. Mohan, H. Samavati, H. Rategh, M. Hershenson, M. Xu, C. Yue, D. Eddleman, and T. Lee, A 115-mW, 0.5-m
CMOS GPS receiver with wide dynamic-range active filters, IEEE J.
Solid-State Circuits, vol. 33, pp. 22192231, Dec. 1998.
[11] A. Shahani, D. Shaeffer, S. Mohan, H. Samavati, H. Rategh, M. Hershenson, M. Xu, C. Yue, D. Eddleman, and T. Lee, Low-power dividerless frequency synthesis using aperture phase detector, IEEE J.
Solid-State Circuits, vol. 33, pp. 22322239, Dec. 1998.

RATEGH et al.: CMOS FREQUENCY SYNTHESIZER

[12] T. Soorapanth, C. P. Yue, D. K. Shaeffer, T. H. Lee, and S. S. Wong,


Analysis and optimization of accumulation-mode varactor for RF ICs,
in Symp. VLSI Circuits Dig., 1998, pp. 3233.
[13] M. Steyaert, M. Borremans, J. Janssens, B. D. Muer, N. Itoh, J.
Craninckx, J. Crols, E. Morifuji, H. S. Momose, and W. Sansen, A
single-chip CMOS transceiver for DCS-1800 wireless communications, in ISSCC Dig., 1998, pp. 4849.
[14] C. P. Yue, C. Ryu, J. Lau, T. H. Lee, and S. S. Wong, A physical model
for planar spiral inductors on silicon, in IEDM Tech. Dig., 1996, pp.
6.5.16.5.4.
[15] C. P. Yue and S. S. Wong, On-chip spiral inductors with patterned
ground shields for Si-Based RF ICs, in Symp. VLSI Circuits Dig., 1997,
pp. 8586.

Hamid R. Rategh (S99) was born in Shiraz, Iran


in 1972. He received the B.S. degree in electrical
engineering from Sharif University of Technology,
Tehran, Iran, in 1994 and the M.S. degree in
biomedical engineering from Case Western Reserve
University, Cleveland, OH, in 1996. He is currently
pursuing the Ph.D. degree in the Department
of Electrical Engineering, Stanford University,
Stanford, CA.
During the summer of 1997, he was with Rockwell
Semiconductor System, Newport Beach, CA, where
he was involved in the design of a CMOS dual-band, GSM/DCS1800, direct
conversion receiver. His current research interests are in low-power radio frequency (RF) integrated circuits design for high-data-rate wireless local area network systems.
Mr. Rategh received the Stanford Graduate Fellowship in 1997. He was
a member of the Iranian team in the 21st International Physics Olympiad,
Groningen, the Netherlands.

787

Hirad Samavati (S99) received the B.S. degree


in electrical engineering from Sharif University of
Technology, Tehran, Iran, in 1994, and the M.S.
degree in electrical engineering from Stanford
University, Stanford, CA, in 1996. He currently is
pursuing the Ph.D. degree at Stanford University.
During the summer of 1996, he was with Maxim
Integrated Products, where he designed building
blocks for a low-power infrared transceiver IC. His
research interests include RF circuits and analog
and mixed-signal VLSI, particularly integrated
transceivers for wireless communications.
Mr. Samavati received a departmental fellowship from Stanford University in
1995 and a fellowship from the IBM Corporation in 1998. He is the winner of
the ISSCC Jack Kilby outstanding student paper award for the paper Fractal
Capacitors in 1998.

Thomas H. Lee (M96) received the S.B., S.M. and


Sc.D. degrees in electrical engineering, all from the
Massachusetts Institute of Technology (MIT), Cambridge, in 1983, 1985, and 1990, respectively.
He joined Analog Devices in 1990, where he
was primarily engaged in the design of high-speed
clock recovery devices. In 1992, he joined Rambus,
Inc., Mountain View, CA, where he developed
high-speed analog circuitry for 500 megabyte/s
CMOS DRAMs. He has also contributed to the
development of PLLs in the StrongARM, Alpha,
and K6/K7 microprocessors. Since 1994, he has been an Assistant Professor
of electrical engineering at Stanford University, where his research focus
has been on gigahertz-speed wireline and wireless integrated circuits built
in conventional silicon technologies, particularly CMOS. He holds 12 U.S.
patents and is the author of a textbook, The Design of CMOS Radio-Frequency
Integrated Circuits (Cambridge, MA: Cambridge Press, 1998), and is a
coauthor of two additional books on RF circuit design. He is also a cofounder
of Matrix Semiconductor.
Dr. Lee has twice received the Best Paper award at the International SolidState Circuits Conference, was coauthor of a Best Student Paper at ISSCC,
and recently won a Packard Foundation Fellowship. He is a Distinguished Lecturer of the IEEE Solid-State Circuits Society, and was recently named a Distinguished Microwave Lecturer.

536

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 5, MAY 2002

A Fully Integrated CMOS Frequency Synthesizer


With Charge-Averaging Charge Pump and Dual-Path
Loop Filter for PCS- and Cellular-CDMA
Wireless Systems
Yido Koo, Hyungki Huh, Yongsik Cho, Jeongwoo Lee, Joonbae Park, Kyeongho Lee, Deog-Kyoon Jeong, and
Wonchan Kim

AbstractA fully integrated CMOS frequency synthesizer


for PCS- and cellular-CDMA systems is integrated in a 0.35- m
CMOS technology. The proposed charge-averaging charge pump
scheme suppresses fractional spurs to the level of noise, and
the improved architecture of the dual-path loop filter makes
it possible to implement a large time constant on a chip. With
current-feedback bias and coarse tuning, a voltage-controlled
oscillator (VCO) enables constant power and low gain of the VCO.
Power dissipation is 60 mW with a 3.0-V supply. The proposed
frequency synthesizer provides 10-kHz channel spacing with
phase noise of 121 dBc/Hz in the PCS band and 127 dBc/Hz
in the cellular band, both at 1-MHz offset frequency.
Index TermsBonding-wire inductor, CMOS RF, coarse tuning,
dual-path loop filter, fractional- -type prescaler, frequency synthesizer, phase noise, phase-locked loop.
Fig. 1.

Dual-band CDMA RF transceiver.

I. INTRODUCTION

IRELESS systems, such as PCS-CDMA, cellular


CDMA, and JSTD-018PCS, require the frequency
synthesizer to have precise channel spacing and low phase
noise to meet the overall noise specification and to prevent unwanted signal mixing of the interferer. Most existing frequency
synthesizers are implemented in silicon germanium (SiGe) or
bipolar technologies, and use several external devices such as
temperature-compensated crystal oscillator (TCXO) and loop
filter. Because of cost and power consumption requirements,
fully integrated CMOS RF building blocks are crucial and have
been widely explored [1], [2].
Fig. 1 shows an example of a dual-band RF transceiver architecture for PCS- and cellular CDMA. A local oscillator (LO)
signal from a dual-band frequency synthesizer is fed to the first
mixer of the receiver for downconversion and it is also used in
the transmitter for upconversion. The noise requirement of the
frequency synthesizer is determined by the blocking profile of
the system, which is calculated from the power of signal and interferer, minimum signal-to-noise ratio (SNR), and bandwidth

Manuscript received July 28, 2001; revised November 9, 2001.


Y. Koo, H. Huh, Y. Cho, D.-K. Jeong, and W. Kim are with the School of
Electrical Engineering and Computer Science, Seoul National University, Seoul
151-742, Korea (e-mail: ydkoo@iclab.snu.ac.kr).
J. Lee, J. Park, and K. Lee are with GCT Semiconductor, Inc., San Jose, CA
95131 USA.
Publisher Item Identifier S 0018-9200(02)03676-4.

specification [3]. The lower the phase noise of the LO signal is,
the less unwanted signal around the carrier is modulated within
the in-band channel.
Table I shows worldwide mobile frequency standards and RF
phase-locked loop (PLL) requirements. CDMA systems require
fast switching time with precise accuracy of the channel frequency. The channel raster is 30 kHz in cellular and 50 kHz
in PCS systems and, to support the dual-band solution of the
CDMA system, the frequency resolution of the synthesizer must
be 10 kHz. This is a major limiting factor in the reduction of
the locking time and root mean square (rms) phase error. It also
makes it difficult to achieve single-chip integration due to the
loop filter that has a large time constant.
In Section II, the special features of the proposed frequency
synthesizer are discussed. Section III describes several building
blocks of the synthesizer. The measurement results are given in
Section IV, and conclusions are presented in Section V.
II. SYSTEM ARCHITECTURE
The proposed PLL is a monolithic integrated circuit that performs dual-band RF synthesis for CDMA wireless communication applications without any external device. Fig. 2 shows the
block diagram of the fractional- -type frequency synthesizer
is mainly
architecture. The external reference frequency
19.68 MHz, and 19.8 and 19.2 MHz are also supported. The
voltage-controlled oscillator (VCO) oscillates at 1.7 GHz for

0018-9200/02$17.00 2002 IEEE

KOO et al.: FULLY INTEGRATED CMOS FREQUENCY SYNTHESIZER

537

TABLE I
WORLDWIDE MOBILE FREQUENCY STANDARDS AND RF-PLL REQUIREMENTS

Fig. 3. Timing diagram of reference and VCO inputs of PFD in locked state
when the fraction is 1/3.

Fig. 2.

N frequency synthesizer architecture.

Fractional-

the PCS band, and at 900 MHz for the cellular band, and employs bonding wire as the inductor of the inductancecapacitance (LC) tank.
is 10 kHz
To meet the frequency resolution of 10 kHz,
and loop bandwidth is only 1 kHz with an integer- -type
prescaler. To reduce rms phase error, it is very important for the
PLL to have a wide bandwidth. The fractional- architecture,
compared with its integer- counterpart, has a wider loop
bandwidth with the same frequency resolution. However, it
suffers from a major drawback called fractional spur.
The fractional- structure is employed for the prescaler in
the proposed architecture. Channel selection and other control
signals are provided through a serial interface. To suppress the
fractional spur, a special type of charge pump is designed. The
new scheme of the dual-path loop filter enables flexible filter
design for on-chip integration. The VCO combined with current-feedback bias and coarse tuning enables constant power
and low gain of the VCO.
III. SYNTHESIZER BUILDING BLOCKS
A. Charge Pump With Charge-Averaging Scheme
While the reference spur occurs in integer- synthesis due
to charge-pump mismatch, fractional- synthesis suffers from
the fractional spur caused by the phase difference in the locked
state as well as charge-pump mismatch. Fig. 3 shows the timing
diagram of the reference and the VCO inputs of the phase/frequency detector (PFD) in the locked state.
The fractional spur mainly stems from the variation of the
prescaler division factor, in other words, the phase difference

between reference and VCO inputs at every cycle of operation


of the PFD. For example, to achieve a locking at 1/3 fractional
frequency, as shown in Fig. 3, the prescaler division factor is
in one cycle and
in the other two cycles in every successive three cycles. It produces voltage ripples in the control
signal of the VCO and therefore a fractional spur occurs. However, the average phase error during one circulation is zero in
the locked state. As seen in Fig. 3, the sum of successive phase
errors during three clock cycles is zero. This is the motivation
of the proposed scheme.
Fig. 4 shows the proposed charge-averaging scheme and its
operation. The charge pump is composed of four current sources
and four sampling capacitors. The same up/dn signals are fed to
each current source, which has 1/3 of the total current. Since
the fractional coefficient of fractional- is three in this design,
we use four pairs of switches and four capacitors in the charge

and capacitor stores


pump. Each pair of switches
and then
the charge that is injected from the pump during
dumps it in the next period in turn. In the locked state, the sum
of the phase errors in the three cycles is zero; therefore, the
after charge summing during
is the same
voltage of
. This results in no voltage ripple in the dump mode. The
as
switching noise due to charge sharing and clock feedthrough
may affect the amount of charge in the capacitors. These could
be static errors. However, they influence each capacitor equally.
This type of static dc error does not cause voltage ripples, i.e., a
fractional spur.
Fig. 5 shows the behavioral simulation results of the conventional fractional- architecture and the proposed charge-averaging scheme. In simulation, the VCO gain is set as perfectly
linear and neither current mismatch nor clock feedthrough in
the switch is assumed. Other characteristics are the same in both
cases, except for the charge pump. In the conventional scheme

538

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 5, MAY 2002

Fig. 6.

Fig. 4.

Mode change of charge-pump operation.

Charge-averaging scheme. (a) Block diagram. (b) Its operation.

Fig. 5. Behavioral simulation of spur in fractional- architecture. (a) With


conventional scheme. (b) With proposed charge-averaging scheme.

shown in Fig. 5(a), approximately 50 dBc of fractional spur is


found, but in the proposed scheme shown in Fig. 5(b), the fractional spur is suppressed to the noise level.
With respect to the locking time, the charge-averaging
scheme can exhibit an undesirable effect. If the size of the
sampling capacitor is very small compared to that of the loop
filter capacitor, the locking time is increased. To solve this
problem, in the acquisition mode, the charge pump operates in
the same way as the normal charge pump, which means that
the charge pump is directly connected to the loop filter. After
locking to the desired frequency, the charge-averaging mode is
employed (Fig. 6). Therefore, there is no additional burden in
locking time.
In the ac analysis of the loop characteristic, time delay degrades the phase margin. As the time delay goes larger, the
system becomes more unstable. The averaging method in this
scheme results in an added time delay in the loop characteristic.

Fig. 7. Loop filter characteristics. (a) Conventional second-order loop filter.


(b) Dual-path loop filter.

Therefore, there exists a tradeoff between loop bandwidth and


loop stability.
B. Dual-Path Loop Filter
Most PLLs use a second-order loop filter to suppress the
control voltage ripple and to guarantee an appropriate phase
margin. Fig. 7(a) shows a conventional second-order loop filter
and the expression for control voltage in ac analysis. As described above, the frequency resolution is 10 kHz and the bandwidth of the PLL is a few kilohertz. This means that more than
1 nF of capacitance should be integrated on a chip, which is

KOO et al.: FULLY INTEGRATED CMOS FREQUENCY SYNTHESIZER

Fig. 8.

539

Proposed architecture of dual-path loop filter.

the major limiting factor for on-chip integration. In addition,


the thermal noise generated in a large resistor is modulated to
phase noise via the control signal. The dual-path loop filter in
Fig. 7(b) can be a solution for this problem. It separates the
and , so it is possible to design the loop filter
loop with
more freely in conjunction with the pump current while keeping
the loop transfer function nearly the same as that of a normal
second-order loop filter.
An example of the dual-path loop filter was proposed by
Craninckx and Steyaert [4]. In spite of many advantages inherent in this architecture, it has two active devices, an amplifier
and a current adder. These inject additional noise into the control voltage, and after modulation in the VCO, the phase noise
may increase. In addition, the floating capacitor across the amplifier is implemented with the metal-to-metal capacitor, so it
requires a large area.
Fig. 8 shows the new dual-path loop filter implementation that
is proposed in this paper. A unity-gain buffer is inserted between
and to separate and . If continuously follows
,
the operation is the same as that of the normal second-order loop
filter. It is less noisy because there is only one active device and
requires a smaller area since there is no floating capacitor.
and
are implemented using two pMOS transistors, whose
source, drain, and bulk are tied to a separate, quiet supply.

Fig. 9. Bonding wire for inductor of VCO. (a) Pad and lead frame. (b)
Modeling.

C. Voltage-Controlled Oscillator
Two major issues in the design of the VCO are low phase
noise to meet the overall noise figure criteria and high gain
linearity for robust stability. Phase noise is mainly dependent
of the LC tank [5]. Although an
on the quality factor
on-chip spiral inductor has recently been widely explored [6], a
bonding-wire inductor is superior to a spiral inductor in terms
of resistance, i.e., quality factor. In addition, the bonding-wire
inductor has constant inductance over a wide frequency range.
Fig. 9 shows bonding-wire inductor modeling. Two pads are
connected to the differential output of the VCO and the ends
of two lead frames are connected as a short or by external
inductance, according to the operating band, PCS or cellular.
factor of the inductance is expressed as
The
with parasitic components ignored. If the parameters of the
nH,
,
pH,
bonding wire are
m , which are typical values in the QFN20 package,
the factor of the inductor at 1.7 GHz is 43.
Fig. 10 shows the circuit diagram of the VCO. The oscillation frequency is controlled by the combination of fixed and

Fig. 10. Circuit diagram of (a) VCO and (b) bias circuit.

variable capacitor. There are two methods that have been previously reported for implementation of the fixed capacitor. One
is a metal-to-metal capacitor [1] and the other is a MOS transistor [7]. The former is used since it is superior in terms of VCO
pushing characteristics. Metal-to-metal capacitors and switches
are used for coarse tuning, which are controlled by the coarse
should be
tuning controller. The size of the switch
sufficiently large in order to avoid degradation of the factor
of the LC tank. As a variable capacitor, an accumulation-mode
MOS transistor is used for fine tuning. The MOS capacitor has
an inherent nonlinear capacitance. But, with the coarse tuning
scheme, the control voltage moves within 0.2 V around half
of the supply voltage, thereby obtaining almost linear gain.

540

Fig. 11.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 5, MAY 2002

Output swing of VCO versus variations of inductance.

Fig. 13. Coarse tuning controller. (a) Block diagram. (b) Timing diagram of
operation.

Fig. 14.

Die microphotograph.

of 40 MHz/V, and 30 MHz/V is typical. The total range is divided into 64 levels by coarse tuning, and the frequency spacing
of two adjacent curves is approximately 7 MHz in the PCS band.
D. Coarse Tuning Architecture
Fig. 12.

VCO tuning range of (a) cellular band, and (b) PCS band.

The output swing level of the VCO is another key issue,


since the receiver and transmitter chips expect an LO signal of
constant power. Generally bonding-wire inductance varies by
10 and to compensate, the total capacitance varies accordingly. This produces the variation of the VCO swing level. The
bias circuit in Fig. 10(b) is designed to have a constant output
swing regardless of capacitance. It monitors the operating status
of the fixed and variable capacitors and provides current in the
direction of compensating for the output swing. Fig. 11 shows
the VCO swing variation for the conventional and proposed
scheme when the inductance varies.
Fig. 12 shows the measured frequency tuning range in the
cellular and PCS band. More than 25% of the tuning range is
obtained in each case. It is sufficient to compensate for the variation of bonding-wire inductance. The VCO gain is a maximum

Small VCO gain is important for reduction of both spur and


phase noise. The frequency spur is directly proportional to the
VCO gain. Also, as the VCO gain is reduced, the fluctuation
of control and supply voltages are less modulated to the phase
noise of the VCO. To meet the requirement of the wide frequency range and the small VCO gain, the coarse tuning controller, shown in Fig. 13, is designed to control the fixed capacitor in the VCO.
The coarse tuning controller, shown in Fig. 13, is composed
of an edge detector, counter/comparator, lock filter, and shift
are
register. During coarse tuning, the rising edges of
and the result is compared
counted in one period of
to a predetermined value of desired frequency. Starting from
the center frequency, a proper level is found by a successive
approach. The lock-detection filter determines when to start the
fine tuning process by monitoring the up/dn signal. The total
elapsed time in coarse tuning is less than 100 s.

KOO et al.: FULLY INTEGRATED CMOS FREQUENCY SYNTHESIZER

541

TABLE II
SUMMARY OF SYNTHESIZER PERFORMANCE

Fig. 15.

Measured carrier spectrum.

at one output of the VCO is 1.05 nH. All control signals are
fed through a serial interface. The VCO at the bottom and the
loop filter on the left have a common analog supply and ground,
and the others are all connected to the digital supply and ground.
The power consumption of the total chip is 60 mW and the VCO
alone dissipates 12.3 mW.
Fig. 15 shows the measured carrier spectrum with the center
frequency of 980 MHz. The output power is 1.2 dBm with an
inductive load, which is sufficient for the output power requirements. Fig. 16 shows the measured phase noise in the cellular
and PCS band. Phase noise is 106 dBc/Hz at 100-kHz offset
and 127 dBc/Hz at 1-MHz offset in the cellular band, and
104 dBc/Hz at 100-kHz offset and 121 dBc/Hz at 1-MHz
offset in the PCS band. Fractional spurs are suppressed to the
phase noise level. Table II shows the performance summary.
V. CONCLUSION
In this paper, we demonstrate a fully integrated CMOS
frequency synthesizer designed for PCS- and cellular-CDMA
wireless systems. A charge-averaging scheme for reducing
fractional spurs and a dual-path loop filter architecture are
proposed. The new bias circuit of the VCO compensates for the
variation of output swing of the VCO caused by the variation
of bonding-wire inductance, and the proposed coarse tuning
technique achieves a small VCO gain and a wide operating
frequency range of the VCO simultaneously. The frequency
synthesizer fabricated in a 0.35- m CMOS technology offers
127-dBc/Hz and 121-dBc/Hz phase noise at 1-MHz offset
with 980 MHz and 1.76 GHz of carrier frequency, respectively.
Fig. 16.

Measured PLL output phase noise. (a) Cellular band. (b) PCS band.

REFERENCES
IV. EXPERIMENTAL RESULTS AND SUMMARY
The proposed frequency synthesizer has been fabricated in
a 0.35- m CMOS technology. Fig. 14 shows the die photograph of the synthesizer with an area of 2.5 mm 2.0 mm including pads. The circuit has been measured with a nominal
3.0-V supply and a 2.7-V worst case. The bonding wire of the
QFN20 package used in the VCO has 1.36 nH of self-inductance and 0.31 nH of mutual inductance, so the total inductance

[1] A. Kral, F. Behbahani, and A. A. Abidi, RF-CMOS oscillators with


switched tuning, in Proc. IEEE Custom Integrated Circuits Conf., May
1998, pp. 555558.
[2] C. Lam and B. Razavi, A 2.6-GHz/5.2-GHz frequency synthesizer in
0.4-m CMOS technology, in Symp. VLSI Circuits Dig. Tech. Papers,
June 1999, pp. 117120.
[3] B. Razavi, RF Microelectronics. Upper Saddle River, NJ: Prentice
Hall, 1998.
[4] J. Craninckx and M. Steyaert, A fully integrated CMOS DCS-1800
frequency synthesizer, IEEE J. Solid-State Circuits, vol. 33, pp.
20542065, Dec. 1998.

542

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 5, MAY 2002

[5] D. B. Leeson, A simple model of feedback oscillator noise spectrum,


Proc. IEEE, vol. 54, pp. 329330, Feb. 1966.
[6] S. Mohan, M. Hershenson, S. Boyd, and T. H. Lee, Simple accurate
expressions for planar spiral inductances, IEEE J. Solid-State Circuits,
vol. 34, pp. 14191424, Oct. 1999.
[7] J.-M. Mourant, J. Imbornone, and T. Tewksbury, A low phase noise
monolithic VCO in SiGe BiCMOS, in IEEE Radio Frequency Integrated Circuits (RFIC) Symp. Dig., June 2000, pp. 6568.

Yido Koo was born in Seoul, Korea, in 1973. He received the B.S. and M.S. degrees from the School
of Electrical Engineering, Seoul National University,
Seoul, Korea, in 1996 and 1998, respectively, where
he is currently working toward the Ph.D. degree.
His research interests include RF building blocks
and systems for wireless communication and highspeed interface for data communications. Currently,
he is developing a low-noise frequency synthesizer
for CDMA and GSM applications.

Hyungki Huh was born in Seoul, Korea. He


received the B.S. and M.S. degree in electrical
engineering from Seoul National University, Seoul,
Korea, in 1998 and 2001, respectively, where he
is currently working toward the Ph.D. degree in
electrical engineering.
His research interests are in the area of RF circuits and systems with emphasis on the fractional frequency synthesizer.

Yongsik Cho was born in Daegu, Korea. He received


the B.S. degree in electrical engineering from Seoul
National University, Seoul, Korea, in 2000, where he
is currently working toward the M.S. degree in electrical engineering.
His research interests are in the area of RF circuits
and systems.

Jeongwoo Lee received the B.S. and M.S. degrees


in electronics engineering and the Ph.D. degree in
electrical engineering from Seoul National University, Seoul, Korea, in 1994, 1996, and 2000, respectively.
He is currently a Manager with the W-CDMA team
of GCT Semiconductor Inc., San Jose, CA. His current research interests include CMOS transceiver circuitry for highly integrated radio applications.

Joonbae Park received the B.S. and M.S. degrees


in electronics engineering and the Ph.D. degree in
electrical engineering from Seoul National University, Seoul, Korea, in 1993, 1995, and 2000, respectively.
In 1998, he joined GCT Semiconductor Inc., San
Jose, CA, as Director of the Analog Division. He is
currently involved in the development of CMOS RF
chip sets for WLL, W-CDMA, and wireless LAN.
His other research interests include data converters
and high-speed communication interfaces.
Dr. Park received the Best Paper Award of VLSI Design99, Goa, India.

Kyeongho Lee was born in Seoul, Korea, in 1969.


He received the B.S. and M.S. degrees in electronics
engineering and the Ph.D. degree in electrical
engineering from Seoul National University, Seoul,
Korea, in 1993, 1995, and 2000, respectively.
He was with Silicon Image, Inc., Sunnyvale, CA,
as a Member of Technical Staff, where he worked on
CMOS high-bandwidth low-EMI transceivers. He is
currently with GCT Semiconductor Inc., San Jose,
CA, as a Co-Chief Executive Officer. His research interests include various CMOS high-speed circuits for
wire/wireless communication systems and integrated CMOS RF systems.

Deog-Kyoon Jeong received the B.S. and M.S. degrees in electronics engineering from Seoul National
University, Seoul, Korea, in 1981 and 1984, respectively, and the Ph.D. degree in electrical engineering
and computer sciences from the University of California, Berkeley, in 1989.
From 1989 to 1991, he was with Texas Instruments
Inc., Dallas, TX, where he was a Member of Technical Staff and worked on the modeling and design of
BiCMOS gates and the single-chip implementation
of the SPARC architecture. He joined the faculty of
the Department of Electronics Engineering and Inter-University Semiconductor
Research Center, Seoul National University, as an Assistant Professor in 1991.
He is currently an Associate Professor of the School of Electrical Engineering,
Seoul National University. His main research interests include high-speed I/O
circuits, VLSI systems design, microprocessor architectures, and memory systems.

Wonchan Kim was born in Seoul, Korea, on


December 11, 1945. He received the B.S. degree in
electronics engineering from Seoul National University, Korea, in 1972. He received the Dip.-Ing. and
Dr.-Ing. degrees in electrical engineering from the
Technische Hochschule Aachen, Aachen, Germany,
in 1976 and 1981, respectively.
In 1972, he was with Fairchild Semiconductor
Korea as a Process Engineer. From 1976 to 1982, he
was with the Institut fr Theoretische Electrotecnik
RWTH Aachen. Since 1982, he has been with the
School of Electrical Engineering, Seoul National University, where he is currently a Professor. His research interests include development of semiconductor
devices and design of analog/digital circuits.

1028

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

61

A Modeling Approach for Fractional-N


Frequency Synthesizers Allowing
Straightforward Noise Analysis
Michael H. Perrott, Mitchell D. Trott, Member, IEEE, and Charles G. Sodini, Fellow, IEEE

AbstractA general model of phase-locked loops (PLLs) is derived which incorporates the influence of divide value variations.
The proposed model allows straightforward noise and dynamic
analyses of fractional- frequency synthesizers and other
PLL applications in which the divide value is varied in time. Based
on the derived model, a general parameterization is presented that
further simplifies noise calculations. The framework is used to analyze the noise performance of a custom synthesizer implemented in a 0.6- m CMOS process, and accurately predicts the
measured phase noise to within 3 dB over the entire frequency
offset range spanning 25 kHz to 10 MHz.

61

61

Index TermsDelta, dithering, divider, fractional- , frequency, modeling, noise, phase-locked loop, PLL, quantization
noise, sigma, synthesizer.
Fig. 1. Block diagram of a

I. INTRODUCTION

HE USE OF wireless products has been rapidly increasing


in the last decade, and there has been worldwide development of new systems to meet the needs of this growing market.
As a result, new radio architectures and circuit techniques are
being actively sought that achieve high levels of integration and
low-power operation while still meeting the stringent performance requirements of todays radio systems. One such technique is the use of modulation to achieve high-resolution
frequency synthesizers that have relatively fast settling times, as
described by Riley et al. in [1], Copeland in [2], and Miller and
Conley in [3], [4]. This method has now been used in a variety
of applications ranging from accurate frequency generation [1],
[5][7] to direct frequency modulation for transmitter applications [8][12].
However, despite its increasing use, a general model of
fractional- synthesizers to encompass dynamic and noise performance has not previously been presented. The primary obstacle to deriving such a model is that, in contrast to classical
phase-locked loop (PLL) systems, a synthesizer dynamically varies the divide value in the PLL according to the output
of a modulator. Traditional methods of PLL analysis assume a static divide value, and the step toward allowing for dynamic variations is not straightforward. As a result, the impact
Manuscript received November 14, 2000; revised March 14, 2002. This work
was supported in part by the Defense Advanced Research Projects Agency under
Contract DAAL-01-95-K-3526.
M. H. Perrott and C. G. Sodini are with the Microsystems Technology Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
(e-mail: perrott@mit.edu).
M. D. Trott is with Hewlett-Packard Laboratories, Palo Alto, CA 94304 USA.
Publisher Item Identifier 10.1109/JSSC.2002.800925.

61 frequency synthesizer.

of the divide value variations is often treated in isolation of other


influences on the PLL [1], such as noise in the phase detector
and voltage-controlled oscillator (VCO), and overall analysis of
the synthesizer becomes cumbersome.
In this paper, we develop a simple model for the synthesizer that allows straightforward analysis of its dynamic and
noise performance. The predictions of the model compare extremely well to simulated and experimental results of implemented synthesizers [9], [10], [13]. In addition, we present
a PLL parameterization that simplifies calculation of the PLL
dynamics and assessment of the synthesizer noise performance.
To develop the synthesizer model, we first derive a general model of the PLL that incorporates the influence of divide
value variations. The derivation is done in the time domain and
then converted to a frequency-domain block diagram. We parameterize the resulting PLL model in terms of a single funcand illustrate its usefulness in determining the noise
tion
performance of the PLL. The modulator is then included
in the generalized PLL model and its impact on the PLL is analyzed. Finally, the modeling approach is used to calculate the
noise performance of a custom synthesizer integrated in a
0.6- m CMOS process and then compared to measured results.
II. BACKGROUND
Fig. 1 displays a block diagram of a frequency synthesizer, along with a snapshot of the signals associated with
various nodes in this system. A PLL in essence, the synthesizer achieves accurate setting of its output frequency by locking
to a reference frequency. This locking action is accomplished
through feedback by dividing down the VCO output frequency
and comparing its phase to the phase of the reference source

0018-9200/02$17.00 2002 IEEE

PERROTT et al.: MODELING APPROACH FOR

FRACTIONAL-

FREQUENCY SYNTHESIZERS

1029

to produce an error signal. The phase comparison operation is


done through the use of a phase/frequency detector (PFD) which
also acts as a frequency discriminator when the PLL is out of
lock. The loop filter attenuates high-frequency components in
the PFD output so that a smoothed error signal is sent to the
VCO input. It consists of an active or passive network, and is
typically fed by a charge pump which converts the error signal
to a current waveform. The charge pump is not necessary, but
provides a convenient means of setting the gain of the loop filter
and simplifies implementation of an integrator when required.
As illustrated in the figure, a key characteristic of synthesizers is that the divide value is dynamically changed in time
according to the output of a modulator. By doing so, much
higher frequency resolution can be achieved for a given PLL
bandwidth setting than possible with classical integer- frequency synthesizers [1].
III. TIME-DOMAIN PLL MODEL
We now derive time-domain models for each individual PLL
block shown in Fig. 1. The primary focus of our effort is on
obtaining a divider model incorporating dynamic changes to its
value. However, the derivation of this model requires careful attention to the way we model the PFD. In particular, we will parameterize signals associated with a tristate PFD with sequences
that can be directly related to the divider operation. This approach is extended to an XOR-based PFD by relating its output
to that of a tristate PFD. Following a brief derivation of the VCO
model, we then obtain the divider model by relating its operation to the VCO model and the PFD sequences discussed above.
Finally, the charge pump and loop filter models are described,
and the overall PLL model constructed.
A. Tristate PFD
The tristate PFD and its associated signals are shown in Fig. 2.
, is characterized as a series of
The output of the detector,
pulses whose widths are a function of the relative phase differand
. We paramence between rising edges of
and
with
eterize the phase difference between
and
, respectively.
the discrete-time sequences
is nominally zero, and
is defined in (1). The seare parameterized by the following
ries of pulses that form
discrete-time sequences.
: time instants at which the rising edges of the reference
clock occur.
: time instants at which the rising edges of the

divider output occur.


: time difference between rising edges of
and

.
Assuming a constant reference frequency, consecutive values
for are related for all as

where is the reference period. We will make use of the


parameterization in deriving the PFD model; the other sequences
will be used when deriving the divider model.
Since phase detection is a memoryless operation, its influence
on the PLL dynamics is sufficiently modeled by its gain. How-

Fig. 2. Tristate phase-frequency detector and associated signals.

ever, the pulsed behavior of the PFD output adds some complexity in deriving the value of that gain, so our derivation will
consist of two steps. The first step relates the input phase difsequence. The second step relates the
ference to the
sequence to an impulse approximation of the
waveform.
to the phase difference,
The relationship of
, is defined as
(1)
To verify the above definition, one observes from Fig. 2 that a
to be
.
phase error of causes
sequence on the PLL dynamics is cumThe impact of the
bersome to model analytically since the pulse-width modulated
PFD output has a nonlinear influence on the PLL dynamics.
However, a simple approximation greatly eases our effortswe
simply represent the PFD output as an impulse sequence rather
than a modulated pulse sequence. Fig. 3 illustrates this approxare represented as impulses with area
imation; pulses in
equal to their corresponding pulse, as described by
(2)
We discuss the significance of the above expression when we
derive the frequency-domain model of the PLL in Section IV.
Our justification for the impulse approximation is
heuristiceach PFD output pulse has much smaller width
than the loop filter impulse response, and therefore acts like an
impulse when the two are convolved together. Obviously, the
accuracy of this approximation depends on how much smaller
the PFD output pulse widths are compared to the dominant
time constant of the loop filter. Since the PFD pulses must
be smaller than a reference period, high accuracy is achieved

1030

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

Fig. 3. Impulse sequence approximation of PFD output.

when the reference frequency is much higher than the loop


filter (PLL) bandwidth. Fortunately, this condition is satisfied
when dealing with synthesizers since a high reference
frequency to PLL bandwidth ratio is required to adequately
suppress the quantization noise. For additional discussion
on this issue, see [13].
B.

XOR-Based

PFD

An XOR-based PFD is shown in Fig. 4 [13][15], along with


associated signals that will be discussed later. Assuming the
is
PFD is not performing frequency acquisition, the signal
, so that the detector operates
simply passed to the output,
as an XOR phase detector. As such, the detector outputs an
and
are in quadrature,
average error of zero when
is nominally a two-level square wave rather than the
and
trilevel short-pulse waveform obtained with the tristate design.
The combination of having wide pulses and only two output
levels allows the XOR-based PFD to achieve high linearity,
which is desirable for synthesizer applications to avoid
folding down quantization noise [13].
To model the XOR-based PFD, we simply relate its associated signals to the tristate detector so that the previous results
can be readily applied. Fig. 4 displays the signals associated
can be decomwith this PFD, and reveals that the output
, and a trilevel
posed into the sum of a square wave,
. The first component is independent of
pulse waveform,
the input phase difference to the detector and presents a spurious
noise signal to the PLL; its influence can be made negligible
, captures the
with proper design. The second component,
, on the
impact of the input phase difference,
PFD output, and can be parameterized according to the width of
its pulses, where

As with the tristate detector, the impulse approximation can be


applied to obtain

which, if we ignore
, is the tristate expression multiplied by a factor of 2. Thus, if we ignore the phase offset of
and the square wave
, the XOR-based PFD has an iden-

Fig. 4.

XOR-based

PFD, associated signals, and E (t) decomposition.

tical model to that of the tristate topology except that its gain is
increased by a factor of 2.
C. Voltage-Controlled Oscillator
For our purposes, only two equations are needed to model
the VCO. The first relates deviations in the VCO phase, defined
, to changes in the VCO input voltage,
. Since
as
VCO phase is the integral of VCO frequency, and deviations in
, where
is in units
VCO frequency are calculated as
of hertz per volt, we have
(3)
The second equation relates the absolute VCO phase, defined as
, to deviations in the VCO phase and the nominal VCO
:
frequency
(4)
Our modeling efforts will be primarily focused on deviations
in the VCO phase, so that (3) is of the most interest. However,
(4) is required in the divider derivation that follows.
D. Divider
Modeling of the divider will be accomplished by first re, to the VCO phase deviations,
lating the PFD pulse widths,
, and the divide value sequence,
. Given this relationship, the divider model is backed out using the PFD gain
expression in (1).
We begin by noting that the divider output edges occur
, completes
whenever the absolute VCO phase,
radian increments of phase. As stated in (4),
is
, and phase variations,
composed of a ramp in time,
. These statements are collectively illustrated in Fig. 5.
occur at the rising edges of the
Note that changes in
divider.

PERROTT et al.: MODELING APPROACH FOR

FRACTIONAL-

FREQUENCY SYNTHESIZERS

1031

Fig. 6. Time-domain model of PLL.

Fig. 5.

Relationship of divider edges to instantaneous VCO phase, 8

(t).

Carrying out the summation operation, we obtain

Now, we can relate


to the VCO phase signal and divider
sequence using (4) and Fig. 5. The first of two key equations is
derived from Fig. 5 as
(5)

Assuming initial conditions are zero, this last expression becomes

The second key equation is obtained by evaluating (4) at time


and
and subtracting the resulting
instants
expressions:

(8)

which, since
written as

and

, is equivalently

(6)
We combine the two key equations into one formulation by substitution of (6) into (5):

The final form of the desired equation is obtained by modifying (8) according to the following statements:
,
,
Define
.
.
Approximate
As such, we obtain

(9)
with the
We obtain the desired divider model by replacing
is zero.
PFD gain expression in (1) and assuming
(10)

Rearrangement of this last expression then produces

(7)
Equation (7) is a difference equation relating all variables of
interest; to remove the differences we sum the formulation over
all positive time samples up to sample :

It is important to note that the only approximation made in de. Essentially, we


riving (10) is that
are ignoring the nonuniform time sampling of the VCO phase
deviations. As discussed in [13] and verified by actual implementations [9], [10], this approximation is quite accurate in
practice even when the PLL is modulated.
E. Charge Pump and Loop Filter
The charge pump and loop filter relate the PFD output
to the VCO input
. We model the charge pump as a simple
of value . The time domain model
scaling operation on
.
of the loop filter is characterized by its impulse response,
F. Overall Model
We now combine the results of Section III-AE to obtain
the overall time-domain PLL model shown in Fig. 6. The PFD
model is obtained from (1) and (2), the divider model from
(10), and the VCO model from (3). As discussed earlier, the

1032

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

XOR-based PFD has a factor of two larger gain than the tristate design, which is captured by the factor in the PFD model.
For convenience in analysis to follow, we also define an abstract
, as the output of the divider accumulation action.
signal,
Some observations are in order. First, the divider effectively
samples the continuous-time output phase deviation of the
, and then divides its value by
. The output
VCO,
, is influenced by the integration
phase of the divider,
. The integration of
of deviations in the divider value,
is a consequence of the fact that the divider output is a
causes an incremental change in
phase signal, whereas
the frequency of the divider output. Second, the PFD, charge
pump, and loop filter translate the discrete-time error signal
and
to the continuous-time input of
formed by
. These elements, along with the divider, also
the VCO,
to
.
act as a D/A converter for mapping changes in

IV. FREQUENCY-DOMAIN PLL MODEL


Derivation of a frequency-domain model of the PLL is
complicated by the sampling operation and impulse train
modulator shown in Fig. 6. We discuss a simple approximation
for the sampling operation and impulse train modulator that
results in a linear time-invariant PLL model. This method,
known as pseudocontinuous analysis [16], takes advantage of
the fact that the impulsive output of the PFD is low-pass filtered
in continuous time by the loop filter.

Fig. 7. Pseudocontinuous method of modeling a sampling operation in the


frequency domain.

A. Pseudocontinuous Approximation
that is sampled with period and then
Consider a signal
, as described by
converted to an impulse sequence

Fig. 8.

where
. The frequency-domain relationship
and
is found by taking the Fourier transform
between
of the above expression, which leads to

This expression reveals that the Fourier transform of


,
, is composed of multiple copies of the Fourier transform
,
, that are scaled in magnitude by
and shifted
of
. We assume
in frequency from one another with spacing
is confined to frequencies
that the frequency content of
and
, so that negligible aliasing
between
within
.
occurs between the copies of
to
Developing a frequency-domain model relating
is complicated by the many copies of
in
that occur due to the sampling operation. However, if we
is fed into a continuous-time low-pass filter
assume that
with sufficiently low bandwidth, we can obtain a simple
and
.
approximation of the relationship between
Fig. 7 graphically illustrates a frequency-domain view of the
sampling operation and the impact of following it with a
.
continuous-time low-pass filter of bandwidth less than
The low-pass filter significantly attenuates all of the replicated

Frequency-domain model of PLL.

copies of
within
except for the baseband copy,
which allows us to approximate the relationship between
and
in the frequency domain as a simple scaling operation
. In so doing, we ignore aliasing effects that will occur
of
at frequencies beyond
if there is frequency content in
to
. However, our analysis will
the range of
be reasonably accurate when performing closed-loop analysis
for most frequencies of interest in our application. The double
outline of the box in the figure is meant to serve as a reminder
that a sampling operation is taking place.
B. Resulting Model
The time-domain block diagram in Fig. 6 is now readily
converted to the frequency domain by taking the -transform
of the discrete-time blocks, the Fourier transform of the
continuous-time blocks, and by applying the approximation
of the sampling operation discussed above. Fig. 8 displays
the resulting model. Note that all blocks are parameterized
by the common variable , which denotes frequency in hertz,
under the assumption that all discrete-time sequences interact
with the continuous-time blocks as modulated impulse trains
of period . Also note that all the signals in the PLL are
still denoted in the time domain even though they interact

PERROTT et al.: MODELING APPROACH FOR

FRACTIONAL-

FREQUENCY SYNTHESIZERS

1033

Fig. 9. Detailed view of PLL noise sources and examples of their respective
spectral densities.

through frequency-domain blocks. The reason for this notation


convention is that, in practice, these signals are stochastic and
do not have defined Fourier transforms, but rather are described
by their power spectral densities.
Fig. 10. Parameterized model of PLL for dynamic response and noise
calculations.

V. PARAMETERIZATION OF PLL
We now parameterize the PLL dynamics depicted in Fig. 8 in
. Using this
terms of a single function which we will call
parameterization, we then develop a general noise model for frequency synthesizers in which all the relevant transfer functions
.
are described in terms of
A. Derivation
To parameterize the PLL dynamics, it is convenient to define
a base function that provides a simple description of all the PLL
transfer functions of interest. It turns out that the following definition works well for this purpose.
(11)
where

is the open-loop transfer function of the PLL:


(12)

is low pass in nature with infinite gain at dc,


Since
has the following properties:
as
as

Divider/reference jitter,
, corresponds to noise-induced
variations in the transition times of the Reference or Divider
is caused
output waveforms. A periodic reference spur
by use of the XOR-based PFD, or by the tristate PFD when its
output duty cycle is nonzero. Charge-pump noise is caused by
noise produced in the transistors that compose the charge-pump
circuit. Finally, VCO noise includes the intrinsic noise of the
VCO and voltage noise at the output of the loop filter. For convenience in later discussion, we have lumped these noise sources
into two categories, VCO noise and detector noise, as shown in
Fig. 9.
Fig. 10 displays the transfer function relationships from each
of the above noise sources to the synthesizer output. The derivation of these transfer functions is straightforward based on Fig. 9
parameterization derived earlier. Note that two
and the
different parameterizations are shown to describe the impact of
divide value variations on the PLL output phase. The alternate
, more directly to
model relates changes in the divide value,
the PLL output frequency. Its derivation follows by noting that
the order of linear time-invariant blocks can be switched, and
that

(13)

is a low-pass filter with a low frequency gain


implying that
of one.
in terms of
One may try to tie an intrinsic meaning to
PLL behavior. However, it is meant only as a convenient vehicle
for compactly describing the PLL transfer functions of interest,
as will be shown later in this section.
B. Application to Noise Analysis
The derived parameterization allows straightforward calculation of the noise performance of a synthesizer as a function of
various noise sources in the PLL, which are shown in Fig. 9.

for
Note that the validity of the dynamic model, and its alternate,
presented in Fig. 10, has been verified in previous work discussed in [9], [13]. The validity of the noise model will be verified in Section VII.
Calculation of spectral noise densities using Fig. 10 is
complicated by the fact that both discrete-time (DT) and

1034

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

continuous-time (CT) signals are present. Three cases are of


significance, and their respective spectral noise calculations are
as follows [17]:
fed into CT filter
to produce a
Case 1) CT input
:
CT output
(14)
fed into DT filter
Case 2) DT input
:
duce a DT output

to pro(15)

fed into CT filter


:

Case 3) DT input
CT output

to produce a

(16)
In Case (3), we assume that the DT input interacts with the CT
filter as a modulated impulse train of period .
The above spectral density calculations and Fig. 10 allow us
to accurately calculate the influence of the various noise sources
on the PLL output. A few qualitative observations are also in
order. Detector noise is low-pass filtered by the PLL dynamics,
while VCO noise is high-pass filtered by the PLL dynamics. The
overall noise power in the PLL output, whose integral over frequency corresponds to the time-domain jitter of the PLL output,
is a function of the PLL bandwidth. If the PLL bandwidth is very
low, VCO noise will dominate over a wide frequency range due
to the abundant suppression of detector noise. Likewise, a high
PLL bandwidth will suppress VCO noise over a wide frequency
range at the expense of allowing more detector noise through.
VI.

SYNTHESIZER MODEL

We are now ready to incorporate the modulator into the


general PLL model. We do so by first providing a brief description of modulator fundamentals, and then provide intuition to the means by which they increase the frequency resolution of a synthesizer compared to a classical implementation in
which the divider value is held constant. Finally, we present a
frequency-domain model of the synthesizer and use it to
calculate the impact of the quantization noise on the PLL
output phase.
A.

Modulator

A modulator achieves a high-resolution signal using


only a few output levels. To do this, the modulator dithers
its output at a high rate such that the average value of the
dithered sequence corresponds to a high-resolution input signal
whose energy is confined to low frequencies. Appropriate
filtering of the output sequence removes quantization noise
produced by the dithering, which yields a high-resolution
signal closely matching that of the input.
In synthesizer applications, it is important to note that
the modulator is purely digital in its implementation. Thus,
structures that are difficult to implement in the analog
world due to high matching requirements, such as the MASH
(or cascaded) architecture [18], [19], are trivial to implement in

Fig. 11.

Illustration of dithering action of

61 modulator.

this application due to the precise matching offered by digital


circuits.
In general, modeling of a modulator is accomplished by
assuming its quantization noise is independent of its input [19].
This leads to a linear time-invariant model that is parameterized
by transfer functions from the input and quantization noise to the
output. For instance, a MASH modulator structure [19] of
, and output
is described by
order , input
(17)
Thus, the modulator passes its input to the output along with
, that is shaped by the filter
.
quantization noise,
is white and uniformly distributed between 0 and 1
Ideally,
[20], [21].
so that its spectrum is flat and of magnitude
It is convenient to parameterize the modulator in terms
of two transfer functions. The signal transfer function (STF) of
to output
,
the modulator is defined from the input
while the noise transfer function (NTF) is defined from the base
to the output. Inspection of (17) reveals
quantization noise
that a MASH structure of order is parameterized as
STF:
NTF:
B. Application to PLL
To understand the impact of using a modulator to control the divide value in a frequency synthesizer, Fig. 11 contrasts
the way the divide value is varied in classical versus fractional- frequency synthesizers based on the alternate model
in Fig. 10. Note that the divide value variations are cast as continuous-time signals to get the proper scale factor such that a
unit change in divide value yields an output frequency change
Hz. In the classical case, the divide value is static except
of
when the output frequency is changed, and the PLL output frequency responds to the change according to the low-pass nature
. In contrast, a fractionalof the PLL dynamics

PERROTT et al.: MODELING APPROACH FOR

Fig. 12.

Parameterized model of a

FRACTIONAL-

FREQUENCY SYNTHESIZERS

1035

61 synthesizer.

synthesizer constantly dithers the divide value at a high rate


such that
extracts
compared to the bandwidth of
out its low-frequency content. The low frequency content of the
output is, in turn, set by the input
, which can
have arbitrarily high resolution. Thus, the modulator allows the PLL output frequency to be controlled to a very high
resolution independent of the reference frequencya high reference frequency can be used while simultaneously achieving
high-frequency resolution.
C. Frequency-Domain Model
To obtain the frequency-domain model of a synthesizer,
we simply extend the PLL model in Fig. 10 to include the
modulator, as shown in Fig. 12. This figure depicts a general
model of a modulator which is characterized by its STF
is assumed ideal (i.e.,
and NTF. The base quantization noise
white) in the illustration.
Fig. 12 offers several insights to the fundamentals of
frequency synthesis. First, we see that the shaped quantization noise passes through a digital accumulator and then the
, before impacting the output phase of the
PLL dynamics,
PLL. The digital accumulator, a consequence of the integrating
nature of the divider, effectively reduces the noise-shaping order
, act to remove the
of the by one. The PLL dynamics,
high-frequency quantization noise produced by the modulator. The quantization noise adds an additional noise
source to those already present in the PLL, but the relationship
from each noise source to the output phase remains purely a
and the nominal divide value.
function of
D. Quantization Noise Impact on PLL
As Fig. 12 reveals, a synthesizers noise performance
is impacted by the quantization noise in addition to the
intrinsic detector and VCO noise sources found in the classical
PLL. Calculation of this impact is straightforward using the presented modeling approach. For example, given the NTF of an
th order MASH structure is
, we calculate the impact of its quantization noise on the PLL output using Fig. 12
and (16) as

Fig. 13.

Block diagram of prototype system.

which is also expressed as

(18)
If the quantization noise spectra of

is white, then

as previously discussed. In many cases,


is not white and
must be computed numerically by simulating the modu.
lator at a given value of
Equation (18) shows that the quantization noise is
reduced in order by one due to the integrating action of the
is white, the shaped noise rises at
divider. Assuming
dB/decade for frequencies
. Therefore, if
is chosen to be the same as the order of the
the order of
, the quantization noise seen at the PLL output will roll
off at 20 dB/decade outside the PLL bandwidth. This rolloff
characteristic matches that of the VCO noise.
VII. RESULTS
The above methodology is now used to analyze the noise
performance of a prototype system described in [9], [13].
Fig. 13 displays a block diagram of the prototype, which
consists of a custom CMOS fractional- synthesizer IC that
includes an XOR-based PFD, an on-chip loop filter that uses
switched capacitors to set its time constant, a second-order digital MASH modulator, and an asynchronous 64-modulus
divider that supports any divide value between 32 and 63.5
in half-cycle increments. An external divide-by-2 prescaler is
used so that the CMOS divider input operates at half the VCO
frequency, which modifies the range of divide values to include
all integers between 64 and 127. A computer interface is used
to set the digital frequency value that is fed into the input of the
modulator.
A. Modeling
A linearized frequency-domain model of the prototype
system is shown in Fig. 14. The open-loop transfer function of
and a zero
the system consists of two integrators, a pole at
at . Additional poles and zeros occur in the system due to
the effects of finite opamp bandwidth and other nonidealities,

1036

Fig. 14.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

Linearized frequency-domain model of prototype system.

but are not significant for the analysis to follow. The


parameterization is calculated from Fig. 14 and (11) as

(19)
Fig. 15.

The parameters of the system were set such that the PLL had a
bandwidth of 84 kHz:

Expanded view of PLL System.


TABLE I
VALUES OF NOISE SOURCES WITHIN PLL

kHz
kHz
kHz
(20)
Fig. 15 expands the block diagram of the prototype to indicate the circuits of relevance and their respective noise contributions. A few comments are in order. First, a reference frequency
of 20 MHz was chosen to achieve an acceptably low impact
of quantization noise while still allowing low-power implementation of the digital logic. This choice of reference freto achieve an output
quency, in turn, required that
was set to
carrier frequency of 1.84 GHz. The value of
was chosen
30 MHz/V by the external VCO. The value of
as large as practical in order to obtain good noise performance;
it was constrained to 30 pF due to area constraints on the die of
the custom IC.
B. Noise Analysis
Table I displays the value of each noise source shown
in Fig. 15. Many of these values were obtained through ac
simulation of the relevant circuits in HSPICE. Note that all
are assumed to be white, so that
noise sources other than
the values of their variance suffice for their description. This
,
assumption holds for the input-referred VCO noise,
provided that the output phase noise of the VCO rolls off at
20 dB/dec [22], [23]; the 20 dB/dec rolloff is achieved in
, which has a flat spectral density,
the model since
passes through the integrating action of the VCO. The actual
VCO deviates from the 20 dB/dec rolloff at low frequencies
noise, and at high frequencies due to a finite noise
due to
floor. However, the assumption of 20 dB/dec rolloff suffices
for the frequency offsets of interest.

The input-referred noise of the VCO was calculated from an


open-loop VCO phase noise measurement (shown in Fig. 17) at
5-MHz frequency offset as
dBc/Hz
at

MHz

is 30 MHz/V. The value of the


where
produced by the switched-capacitor operation
lated as

(21)
noise current
was calcu-

(22)
is Boltzmanns constant, and
is temperature
where
in degrees Kelvin. Finally, the spectral density of the
quantization noise was calculated as
(23)
is the order of the modulator.
where
The noise sources in Table I can be classified as either
charge-pump noise, VCO noise, or quantization noise,
,
, and
, respectively. For
which we denote as
is referred to the
convenience, we will assume that
input of the VCO, so that it passes through the transfer function
before influencing the VCO output phase. Given the

PERROTT et al.: MODELING APPROACH FOR

FRACTIONAL-

FREQUENCY SYNTHESIZERS

1037

values of these sources, the overall noise spectral density at the


is described as
synthesizer output
(24)
,
, and
are the contributions
where
,
, and
, respectively.
is given by
from
.
and
are calculated from
(18) with
Fig. 10 and (14) as

(25)
and
are white, and
Note that we have assumed that
since an XOR-based PFD is used.
that
and
The task that remains is to determine the values of
. Examination of Fig. 15 reveals that charge-pump noise
is a function of the following noise sources:

Fig. 16.

Calculated noise spectra of synthesizer compared to measured results.

(26)
while VCO noise is a function of the noise sources
(27)
and
We will quickly infer the value of the functions
in this paper; the reader is referred to [13] for more detail.
. Examination of Table I reveals
Let us first determine
is an order of magnitude larger than
,
, and
that
. Since the
noise source is switched alternately between
the positive and negative terminals of OP1, its contribution to
will be pulsed in nature. At a nominal duty cycle of 50%,
to be split equally between
we would expect the energy of
is then
the positive and negative terminals of OP1. As such,
. This intuitive argument was verified using a detailed
C simulation of the PLL [24]. Note that a more accurate estiwill take into account any offset in the nominal duty
mate of
cycle of the phase detector output, and the transient response of
the charge pump.
. Since Table I reveals that
Now let us determine
is of the same order of
, we simply add these components
. This expression is accurate at
to obtain
frequencies less than the unity gain bandwidth of OP1; the
noise source is passed to its output with a gain of approximately
one in this region. At frequencies beyond OP1s bandwidth, the
is attenuated in this
expression is conservatively high since
frequency range.
Based on the above information, plots of the spectra in (24)
are shown in Fig. 16. For convenience, we have also overlapped
measured results from Fig. 17 for easy comparison, which will
be discussed shortly. As shown in Fig. 16, the influence of detector noise dominates at low frequencies, and the influence of
VCO and quantization noise dominate at high frequencies.
described by (19) with the
Note that the calculations use
parameter values specified in (20).

Fig. 17. Measured closed-loop synthesizer noise and open-loop VCO noise.

Fig. 17 shows measured plots of


and the open-loop
phase noise of the VCO from the synthesizer prototype; the plots
were obtained from an HP 3048A phase-noise measurement
system. It should be noted that the LSB of the modulator
was dithered to reduce spurious content, which was necessary
due to the low order of the modulator. The resulting spectra
compare quite well with the calculated curve in Fig. 16 over the
frequency offset range of 25 kHz to 10 MHz. Above 10 MHz,
the phase-noise measurement was limited by the sensitivity of
the measurement equipment. Note that the 60 dBc spur at
20-MHz offset is due to the 50% nominal duty cycle of the PFD;
no effort was made to reduce it below this level during the design
process since it was acceptable for the intended application of
the prototype.
VIII. CONCLUSION
In this paper, we developed a general model of a PLL that
incorporates the influence of divide value variations. A model
for fractional- synthesizers was obtained by simply
incorporating a modulator model into this framework.
The PLL model was parameterized by a single transfer func, which further simplifies noise calculations. The
tion
framework was used to calculate the noise performance of a
custom synthesizer, and was shown to accurately predict

1038

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

measured results within 3 dB over a frequency offset range


from 25 kHz to 10 MHz.

[24] M. H. Perrott, Fast and accurate behavioral simulation of fractional-N


frequency synthesizers and other PLL/DLL circuits, in Proc. Design
Automation Conf. (DAC), June 2002, pp. 498503.

ACKNOWLEDGMENT
The authors would like to thank the Hong Kong University of
Science and Technology, and in particular, J. Lau, P. Chan, and
P. Ko, for their support in the writing of this paper.
REFERENCES
[1] T. A. Riley, M. A. Copeland, and T. A. Kwasniewski, Deltasigma
modulation in fractional- frequency synthesis, IEEE J. Solid State
Circuits, vol. 28, pp. 553559, May 1993.
[2] M. A. Copeland, VLSI for analog/digital communications, IEEE
Commun. Mag., vol. 29, pp. 2530, May 1991.
[3] B. Miller and B. Conley, A multiple modulator fractional divider, in
Proc. 44th Annu. Symp. Frequency Control, May 1990, pp. 559567.
, A multiple modulator fractional divider, IEEE Trans. Instrum.
[4]
Meas., vol. 40, pp. 578583, June 1991.
[5] W. Rhee, B.-S. Song, and A. Ali, A 1.1-GHz CMOS fractional- frequency synthesizer with 3-b third-order sigmadelta modulator, IEEE
J. Solid-State Circuits, vol. 35, pp. 14531460, Oct. 2000.
[6] B. Miller, Technique enhances the performance of PLL synthesizers,
Microw. RF, pp. 5965, Jan. 1993.
[7] T. Kenny, T. Riley, N. Filiol, and M. Copeland, Design and realization of a digital deltasigma modulator for fractional- frequency synthesis, IEEE Trans. Veh. Technol., vol. 48, pp. 510521, Mar. 1999.
[8] T. A. Riley and M. A. Copeland, A simplified continuous phase modulator technique, IEEE Trans. Circuits Syst. II, vol. 41, pp. 321328,
May 1994.
[9] M. Perrott, T. Tewksbury, and C. Sodini, A 27-mW CMOS fractional- synthesizer using digital compensation for 2.5-Mb/s GFSM
modulation, IEEE J. Solid-State Circuits, vol. 32, pp. 20482060,
Dec. 1997.
[10] S. Willingham, M. Perrott, B. Setterberg, A. Grzegorek, and W. McFarland, An integrated 2.5-GHz sigmadelta frequency synthesizer with
5 microseconds settling and 2-Mb/s closed-loop modulation, in Proc.
IEEE Int. Solid-State Circuits Conf. (ISSCC), Feb. 2000, pp. 200201.
[11] N. Filiol, T. Riley, C. Plett, and M. Copeland, An agile ISM band
frequency synthesizer with built-in GMSK data modulation, IEEE J.
Solid-State Circuits, vol. 33, pp. 9981008, July 1998.
[12] N. Filiol, C. Plett, T. Riley, and M. Copeland, An interpolated
frequency-hopping spread-spectrum transceiver, IEEE Trans. Circuits
Syst. II, vol. 45, pp. 312, Jan. 1998.
[13] M. H. Perrott, Techniques for high data rate modulation and low power
operation of fractional- frequency synthesizers with noise shaping,
Ph.D. dissertation, Massachusetts Inst. Technol., Cambridge, MA, 1997.
[14] A. Hill and A. Surber, The PLL dead zone and how to avoid it, RF
Design, pp. 131134, Mar. 1992.
[15] M. Thamsirianunt and T. A. Kwasniewski, A 1.2-m CMOS implementation of a low-power 900-MHz mobile radio frequency synthesizer, in Proc. IEEE Custom Integrated Circuits Conf. (CICC), 1994,
p. 16.2.
[16] J. A. Crawford, Frequency Synthesizer Handbook. Norwood, MA:
Artech, 1994.
[17] E. A. Lee and D. G. Messerschmitt, Digital Communication, 2nd
ed. Norwell, MA: Kluwer, 1994.
[18] J. Candy and G. Temes, Oversampling DeltaSigma Data Converters. New York: IEEE Press, 1992.
[19] S. Norsworthy, R. Schreier, and G. Temes, DeltaSigma Data Converters: Theory, Design, and Simulation. New York: IEEE Press,
1997.
[20] A. Sripad and D. Snyder, A necessary and sufficient condition for quantization errors to be uniform and white, IEEE Trans. Acoust. Speech
Signal Proc., vol. ASSP-25, pp. 442448, Oct. 1977.
[21] W. Bennett, Spectra of quantized signals, Bell Syst. Tech. J., vol. 27,
pp. 446472, July 1948.
[22] D. Leeson, A simple model of feedback oscillator noise spectrum,
Proc. IEEE, vol. 54, pp. 329330, Feb. 1966.
[23] A. Hajimiri and T. Lee, A general theory of phase noise in electrical oscillators, IEEE J. Solid-State Circuits, vol. 33, pp. 179194, Feb. 1998.

Michael H. Perrott received the B.S. degree in electrical engineering from New Mexico State University,
Las Cruces, in 1988, and the M.S. and Ph.D. degrees
in electrical engineering and computer science from
the Massachusetts Institute of Technology (M.I.T.),
Cambridge, in 1992 and 1997, respectively.
From 1997 to 1998, he was with Hewlett-Packard
Laboratories, Palo Alto, CA, working on high-speed
circuit techniques for synthesizers. In 1999,
he was a visiting Assistant Professor at the Hong
Kong University of Science and Technology, where
he taught a course on the theory and implementation of frequency synthesizers.
From 1999 to 2001, he was with Silicon Laboratories, Austin, TX, where he developed circuit and signal-processing techniques to achieve high-performance
clock and data recovery circuits. He is currently an Assistant Professor in the
Department of Electrical Engineering and Computer Science at M.I.T., where
his research focuses on high-speed circuit and signal processing techniques for
data links and wireless applications.

61

Mitchell D. Trott (S90M92) received the B.S.


and M.S. degrees in systems engineering from Case
Western Reserve University, Cleveland, OH, in
1987 and 1988, respectively, and the Ph.D. degree
in electrical engineering from Stanford University,
Stanford, CA, in 1992.
He was an Assistant and Associate Professor in the
Department of Electrical Engineering and Computer
Science at the Massachusetts Institute of Technology,
Cambridge, from 1992 until 1998. He was Director
of Research with ArrayComm, Inc., San Jose, CA,
from 1998 to 2002. He is currently with Hewlett-Packard Laboratories, Palo
Alto, CA. His research interests include multiuser communication, information
theory, and coding theory.

Charles G. Sodini (S80M82SM90F94) was


born in Pittsburgh, PA, in 1952. He received the
B.S.E.E. degree from Purdue University, Lafayette,
IN, in 1974, and the M.S.E.E. and Ph.D. degrees
from the University of California, Berkeley, in 1981
and 1982, respectively.
He was a Member of the Technical Staff with
Hewlett-Packard Laboratories from 1974 to 1982,
where he worked on the design of MOS memory
and, later, on the development of MOS devices with
very thin gate dielectrics. He joined the faculty of
the Massachusetts Institute of Technology (M.I.T.), Cambridge, MA, in 1983,
where he is currently a Professor in the Department of Electrical Engineering
and Computer Science. His research interests are focused on integrated circuit
and system design with emphasis on analog, RF, and memory circuits and
systems. Along with Prof. R. T. Howe, he is a coauthor of an undergraduate
text on integrated circuits and devices entitled Microelectronics: An Integrated
Approach (Englewood Cliffs, NJ: Prentice-Hall, 1996).
Dr. Sodini held the Analog Devices Career Development Professorship at
M.I.T.s Department of Electrical Engineering and Computer Science and was
awarded the IBM Faculty Development Award from 1985 to 1987. He has served
on a variety of IEEE Conference Committees, including the International Electron Device Meeting, of which he was the 1989 General Chairman. He was the
Technical Program Co-Chairman in 1992 and the Co-Chairman for 19931994
of the Symposium on VLSI Circuits. He served on the Electron Device Society
Administrative Committee from 1988 to 1994. He has been a member of the
Solid-State Circuits Society (SSCS) Administrative Committee since 1993 and
is currently President of the SSCS.

888

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 6, JUNE 2003

A Stabilization Technique for Phase-Locked


Frequency Synthesizers
Tai-Cheng Lee and Behzad Razavi, Fellow, IEEE

AbstractA stabilization technique is presented that relaxes the


tradeoff between the settling speed and the magnitude of output
sidebands in phase-locked frequency synthesizers. The method introduces a zero in the open-loop transfer function through the use
of a discrete-time delay cell, obviating the need for resistors in
the loop filter. A 2.4-GHz CMOS frequency synthesizer employing
the technique settles in approximately 60 s with 1-MHz channel
spacing while exhibiting a sideband magnitude of 58.7 dBc. Designed for Bluetooth applications and fabricated in a 0.25- m digital CMOS technology, the synthesizer achieves a phase noise of
112 dBc/Hz at 1-MHz offset and consumes 20 mW from a 2.5-V
supply.
Index TermsCharge pumps, feedforward, loop stability, oscillators, phase-locked loops (PLLs), prescalers, synthesizers.

I. INTRODUCTION

HE design of phase-locked loops (PLLs) must generally


deal with a tight tradeoff between the settling time and the
amplitude of the ripple on the oscillator control line. For phaselocked RF synthesizers, this tradeoff limits the performance in
terms of the channel switching speed and the magnitude of the
reference sidebands that appear at the output.
This paper describes a loop stabilization technique that yields
a small ripple while achieving fast settling [1]. Using a discrete-time delay cell, the PLL architecture creates a zero in the
open-loop transfer function. Another important advantage of the
technique is that it uses no resistors in the loop filter, lending itself to digital CMOS technologies. Also, it amplifies the value
of the loop filter capacitor, thus saving a great deal of silicon
area. Realized in a 2.4-GHz CMOS synthesizer, the proposed
method provides a settling time of approximately 60 reference
cycles with an output sideband level of 59 dBc.
Section II of the paper develops the foundation for the proposed technique. Section III describes the 2.4-GHz synthesizer
architecture and the design of its building blocks and Section IV
proposes fast simulation techniques for RF synthesizers. Section V summarizes the experimental results.
II. STABILIZATION TECHNIQUE
Consider the PLL shown in Fig. 1(a), where a voltage-controlled oscillator (VCO) is driven by a charge pump (CP) and a
Manuscript received April 23, 2002; revised February 18, 2003.
T.-C. Lee was with the Department of Electrical Engineering, University
of California, Los Angeles, CA 90095 USA. He is now with the Department
of Electrical Engineering, National Taiwan University, Taipei 10617, Taiwan,
R.O.C.
B. Razavi is with the Department of Electrical Engineering, University of
California, Los Angeles, CA 90095 USA (e-mail: razavi@ee.ucla.edu)
Digital Object Identifier 10.1109/JSSC.2003.811879

Fig. 1. (a) Conventional PLL architecture. (b) Proposed PLL architecture with
delayed charge pump circuit.

phase/frequency detector (PFD). Resistor


provides the stabisuppresses the glitch generated by
lizing zero and capacitor
the charge pump at every phase comparison instant. The glitch
arises from: 1) the mismatch between the arrival times of the
Up and Down pulses; 2) the mismatch between the widths of
the Up and Down pulses; 3) the mismatch between the charge
pump current sources (both random and due to channel-length
modulation); and 4) the mismatch between the charge injection
and clock feedthrough of the pMOS and nMOS switches in the
charge pumps. Charge sharing also exacerbates the ripple [2].
deterThe principal limitation of this architecture is that
lowers the ripple on the control
mines the settling whereas
must remain below
by roughly a factor of
voltage. Since
10 so as to avoid underdamped settling, the loop must inevitably
if
is to sufficiently suppress
be slowed down by a large
the ripple. It is, therefore, desirable to seek methods of creating
the stabilizing zero without the resistor so that the capacitor that
defines the switching speed also directly suppresses the ripple.
A number of approaches to realizing a zero in PLLs have been
reported [3][5]. The circuits in [3] and [4] require a transconductance amplifier, whose design for large output swings (necessary for maximizing the tuning range of LC VCOs) and low
flicker noise becomes difficult. The synthesizer in [5] employs
a voltage-controlled delay line but it mandates a large delay and
a nearly rail-to-rail control voltage.

0018-9200/03$17.00 2003 IEEE

LEE AND RAZAVI: STABILIZATION TECHNIQUE FOR PHASE-LOCKED FREQUENCY SYNTHESIZERS

It is important to note that the problem of ripple becomes increasingly more serious as the supply voltage is scaled down
and/or the operating frequency goes up. The relative magnitude
of the primary sidebands at the output of the VCO is given by
where
is the peak amplitude of the
is the gain of the VCO, and
first harmonic of the ripple,
is the synthesizer reference frequency. For a given relative
tuning range (e.g., 10 ), the gain of LC VCOs must increase
MHz/V and
if the supply voltage goes down. If
MHz, then the fundamental ripple amplitude must be
less than 63 V to guarantee sidebands 60 dB below the carrier.
In order to arrive at the stabilization technique, consider the
PLL architecture shown in Fig. 1(b). Here, the primary charge
, drives a single capacitor
while a secondary
pump,
, injects charge after some delay
. The
charge pump,
is thus equal to
total current flowing through
(1)
(2)
is assumed to be much smaller than the loop
where
time constant. Consequently, the transfer function of the
PFD/CP/LPF combination can be expressed as
(3)
Assuming

, we have

889

each stage is wide enough to support such pulses, then a very


large number of stages is required to obtain the necessary
,
demanding a high power dissipation.
with process
The second issue relates to the variation of
, such
and temperature. Since is directly proportionally to
variations can greatly affect the loop stability.
To resolve the above difficulties, the architecture is modified
as shown in Fig. 2(a), where a discrete-time analog delay line
and
. The delay network is realized as
is placed after
depicted in Fig. 2(b), consisting of two interleaved master-slave
sample-and-hold branches operating at half of the reference freas follows. When
is
quency. The circuit emulates
shares a charge packet corresponding to the previous
high,
while
samples a level proporphase comparison with
and
tional to the present phase difference. In the next period,
exchange roles. The interleaved sampling network, there.
fore, provides a delay equal to the reference period
The discrete-time delay technique of Fig. 2 allows a precise
definition of the zero frequency without the use of resistors.
To quantify the behavior of a PLL incorporating this method,
we assume the loop settling time is much greater than
so that the delay network can be represented by the continuous-time model shown in Fig. 2(b). Here,
approximates the interleaved branches. Equation (4) can then be
rewritten as
(8)

(4)
obtaining a zero at
(5)
can, therefore, stabilize the loop.
Proper choice of
The damping factor and the settling time of the loop can be
written, respectively, as
(6)
(7)
In order to achieve a sufficiently low zero frequency,
must be large or close to unity. Since the accuracy in the definition of is limited by mismatches between the two charge
must still be a large value. For example, if
pumps,
MHz,
pF,
A,
MHz/V,
, and
, then a
of approximately 500 ns
is required to ensure a well-behaved loop response.
The architecture of Fig. 1(b) suffers from two critical
drawbacks. First, it requires that the delay stage provide a very
and accommodate a wide range of Up and Down
large
pulsewidths. Specifically, when the loop is locked, the Up and
Down pulses are less than 500 ps wide. The tradeoff between
delay and bandwidth, therefore, makes the design of the delay
line difficult. As depicted in Fig. 1(b), if the bandwidth of each
stage in the delay line is reduced so as to yield a large delay,
then the narrow Up and Down pulses are heavily attenuated,
giving rise to a dead zone. Conversely, if the bandwidth of

and the current through


is newhere it is assumed
glected. This equation exhibits two interesting properties. First,
, then
and the value
if
is amplified by
. For example, if
,
of
is multiplied by a factor of 10, saving substantial area.
then
Second, the zero frequency is equal to
(9)
a value independent of process and temperature. Assuming
, we obtain the damping factor and the
settling time constant of the loop as
(10)
(11)
Note that the damping factor exhibits much less process and
temperature dependence than in the conventional loop of
, the proposed circuit resemFig. 1(a). Interestingly, for
bles the topology of Fig. 1(a) but with the resistor replaced by
a switched-capacitor network.
While providing insight and serving as design guidelines, the
above results are obtained by a continuous-time approximation
of the loop and their validity must be verified. Simulations using
A,
MHz/V,
pF,
the values
yield
and
s. Equations
and
(10) and (11) predict these parameters to be 0.31 and 13 s,
respectively. Thus, the continuous-time approximation provides
a reasonably accurate estimate of the loop behavior even for

890

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 6, JUNE 2003

Fig. 2. Actual implementation of PLL with delay sampling circuit and continuous-time approximation of delay network.

underdamped settling (where the loop time constant is relatively


short).
For RF synthesis, the delay network of Fig. 2(b) must be designed carefully so as to minimize ripple on the control voltage.
Since in the locked condition, the voltages at nodes and are
or
and
creates
nearly equal, charge sharing between
only a small ripple. Furthermore, the switches in the delay stage
are realized as small, complementary devices to introduce negligible charge injection and clock feedthrough.
Comparison With Conventional Architecture In order to
quantify the advantage of the proposed architecture over the
in
conventional PLL topology, we note that capacitor
or
. Since the sampling
Fig. 2 appears in parallel with
capacitors are typically two to three times larger than , they
suppress the charge pump nonidealities by about 9 to 12 dB.
The behavioral model shown in Fig. 3(a) is simulated in
MATLAB for the two cases. As explained in Section IV, the
reference frequency and the divide ratio are scaled by a factor
of 100 to speed up the simulation. The nonideality of the charge
that is
pump is modeled by a constant current mismatch
injected into the loop filter at each phase comparison instant.
Fig. 3(b) depicts the settling behavior and the output spectrum
for the two cases. (The plots are deliberately offset for clarity).
For approximately equal settling times, the proposed topology
(Type A) achieves 10 dB lower sidebands than the conventional
loop does.

III. SYNTHESIZER DESIGN


A 2.4-GHz CMOS synthesizer targeting Bluetooth applications has been designed using the stabilization technique
described above. This section presents the architecture and
building blocks of the synthesizer.
Shown in Fig. 4, the synthesizer uses an integer- architecture with a feedback divider whose modulus is given by
, where
,
, and

.
MHz, the output frequency covers the 2.4-GHz
With
ISM band. The output of the swallow counter is pipelined by the
to allow a relaxed design for the level converter
flip-flop
and the swallow counter. The buffer following the VCO suppresses the kickback noise of the prescaler when the modulus
changes. It also avoids limiting the tuning range of the VCO by
the input capacitance of the prescaler.
A. VCO Design
The VCO topology is shown in Fig. 5(a). To provide both
negative and positive voltages across the MOS varactors, the
and
are grounded and the circuit is biased on
sources of
. The inductors are realized as shown in Fig. 5(b),
top by
with the bottom spiral moved down to metal 2 so as to reduce
the parasitic capacitance [7]. Each inductor is about 14 nH, occupies an area of 180 m 180 m, and exhibits a of 4 and
a parasitic capacitance of 100 fF.

LEE AND RAZAVI: STABILIZATION TECHNIQUE FOR PHASE-LOCKED FREQUENCY SYNTHESIZERS

891

Fig. 3. (a) MATLAB behavioral simulations for the ripples on the control lines. (b) Time-domain settling and VCO output spectrum during lock for Type A
(delay-sampling loop filter) and Type B (conventional loop filter).

The prescaler must divide the 2.4-GHz signal while consuming a small power dissipation. Depicted in Fig. 6(b),
the circuit employs three current-steering flip-flops with
diode-connected loads. The use of NOR gates obviates the need
for power- and headroom-hungry level shift circuits (or large
input swings) required in NAND gates. The program counter
and the swallow counter incorporate static flip-flops to ensure
reliable operation at low frequencies.
IV. SIMULATION TECHNIQUES

Fig. 4. Synthesizer architecture.

The varactors are implemented as accumulation-mode nMOS


devices (placed inside n-well). In this design, a 160-fF varactor
is employed to allow a tuning range of about 12%. The measured
phase noise of the VCO is 120 dBc/Hz at 1-MHz offset.

A 2.4-GHz synthesizer with a reference frequency of 1 MHz


requires a transient simulation step of approximately 20 ps for
a total settling time on the order of 100 s, i.e., five million
points. The simulation, therefore, requires an extremely long
time (about 5 days on an Ultra 10 Sun Workstation) owing to
both the vastly different time scales and the large number of
devices (especially in the divider).
This section describes a number of techniques that reduce the
simulation time by several orders of magnitude while revealing
the loop dynamics with reasonable accuracy.
A. Linear Discrete-Time Model

B. Pulse-Swallow Counter
Shown in Fig. 6(a), the pulse-swallow counter consists of
a prescaler, a program counter and a swallow counter. The
pipelining in the swallow counter allows the use of a small
divider ratio in the prescaler. Simulations suggest that a 4/5
topology minimizes the overall divider power dissipation.

The voltages at nodes and in Fig. 2(b) can be, respectively, expressed by the following discrete-time equations

(12)

892

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 6, JUNE 2003

Fig. 5.

(a) LC oscillator. (b) Two-layer stacked inductor.

Fig. 6.

(a) Divider. (b) Prescaler.

(13)
denotes the input phase error. Even though the
where
transform of
and
can be derived, the high order of the
resulting polynomials makes it difficult to derive the close-loop
response in analytical form. Thus, the two equations, along with
the rest of the PLL, are realized in MATLAB. The VCO is modeled as a phase accumulator, with each new value of phase obtained as the previous phase plus the product of the time interval and the new frequency. The phase detector is simply a
for the above equations. The linear
subtractor, generating
discrete-time model facilitates the choice of the charge pump
for fast settling and
current, the value of , and the value of
minimum ripple on the oscillator control line.
Fig. 7 shows the settling behavior of the synthesizer as predicted by MATLAB and transistor-level implementation. The
simple discrete-time model yields a moderate accuracy while
requiring orders of magnitude less simulation time.
B. Transistor-Level Model
The impact of various PFD, CP, and VCO nonidealities upon
the loop dynamics must ultimately be studied in a realistic transistor-level implementation. We present two techniques that reduce the simulation time from days to minutes.
The first method is based on time contraction, whereby the
reference frequency is scaled up by a factor of 100 and the main
loop filter capacitor ( in Fig. 2) and the divide ratio are scaled

Fig. 7.

Settling behavior of MATLAB and transistor-level simulations.

down by the same factor. All other loop parameters remain unand
by
changed. From (10) and (11), we note that scaling
100 maintains a constant damping factor while scaling the settling time by 100. Since the PFD operates reliably at 100 MHz
with no dead zone, this method directly reduces the simulation
time by a factor of 100.
Fig. 8 depicts an example of time contraction by a factor of
10. Note that the time axis has a logarithmic scale. It can be
observed that the loop settling behavior scales accurately by the
same factor.
In the second method, the divider is realized as a simple behavioral model in HSPICE that uses a handful of ideal devices
and its complexity is independent of the divide ratio. Illustrated
in Fig. 9, the principle of the behavioral divider is to pump a

LEE AND RAZAVI: STABILIZATION TECHNIQUE FOR PHASE-LOCKED FREQUENCY SYNTHESIZERS

Fig. 8.

893

Time contraction.
Fig. 10. Die photo.

Fig. 9.

Divider behavioral model.


TABLE I
FAST SIMULATION SUMMARY

well-defined charge packet into an integrator in every period


and reset the integrator when its output exceeds a certain level
. Using an ideal op amp, comparator, and switches with
, the circuit can achieve arbitrarily
proper choice of and
large divide ratios. (The duty cycle of output can be controlled
.) This technique yields another factor of 20 reduction in
by
the simulation speed, allowing the synthesizer to be simulated
in less than 3 min on an Ultra 10 Sun Workstation. Table I summarizes the results of the two simulation techniques.
V. EXPERIMENTAL RESULTS
The frequency synthesizer has been fabricated in a digital
0.25- m CMOS technology. Shown in Fig. 10 is a photograph

Fig. 11.

Measured output spectrum of the synthesizer.

of the die, whose active area measures 0.65 mm 0.45 mm.


The circuit has been tested in a chip-on-board assembly while
running from a 2.5-V power supply. The power dissipation is
20 mW.
Fig. 11 shows the output spectrum in the locked condition.
The phase noise is equal to 112 dBc/Hz at 1 MHz offset,
well exceeding the Bluetooth requirement. The primary reference sidebands are at approximately 58.7 dBc. This level is
lower than that achieved in [8] with differential VCO control
and an 86.4-MHz reference frequency. Similarly, the designs in
[9] and [10] exhibit an inferior tradeoff between the settling time
and the sideband magnitudes.
Fig. 12 plots the measured settling behavior of the synthesizer
when its channel number is switched by 64. Here, the channel
select input is periodically switched between the two end channels and the oscillator control voltage is monitored. The settling
time is about 60 s, i.e., 60 input cycles. Table II summarizes
the measured performance of the synthesizer.
VI. CONCLUSION
A PLL stabilization technique is introduced that relaxes the
tradeoff between the settling time and the ripple on the control voltage, while obviating the need for resistors in the loop
filter. The proposed approach creates a zero in the open loop

894

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 6, JUNE 2003

[5] A. Zolfaghari, A. Chan, and B. Razavi, A 2.4-GHz 34-mW CMOS


transceiver for frequency-hopping and direct-sequence applications, in
IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 2001, pp.
12591265.
[6] C. Lam and B. Razavi, A 2.6-GHz/5.2-GHz frequency synthesizer in
0.4-m CMOS technology, IEEE J. Solid-State Circuits, vol. 35, pp.
788794, May 2000.
[7] A. Zolfaghari, A. Chan, and B. Razavi, Stacked inductors and
transformers in CMOS technology, IEEE J. Solid-State Circuits, pp.
620628, Apr. 2001.
[8] L. Lin, L. Tee, and P. R. Gray, A 1.4-GHz differential low-noise CMOS
frequency synthesizer using a wideband PLL architecture, in IEEE Int.
Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 2000, pp. 204205.
[9] T. K. K. Kan, G. C. T. Leung, and H. C. Luong, A 2-V 1.8-GHz fully
integrated CMOS dual-loop frequency synthesizer, IEEE J. Solid-State
Circuits, vol. 37, pp. 10121020, Aug. 2002.
[10] C.-W. Lo and H. C. Luong, A 1.5-V 900-MHz monolithic CMOS
fast-switching frequency synthesizer for wireless applications, IEEE
J. Solid-State Circuits, vol. 37, pp. 459470, Apr. 2002.
Fig. 12.

Control voltage during loop settling.


TABLE II
SYNTHESIZER PERFORMANCE SUMMARY

transfer function by adding two consecutive phase comparison


results and can be extended to more consecutive samples as in
a transversal filter. The method also amplifies the loop filter
capacitor by a large number (e.g., 10), saving substantial chip
area. The proposed concepts are demonstrated in a 2.4-GHz RF
CMOS synthesizer.
The stabilization technique finds applications in other phaselocked systems as well. Examples include clock generators and
clock and data recovery circuits.
REFERENCES
[1] T. C. Lee and B. Razavi, A stabilization technique for phase-locked
frequency synthesizers, in VLSI Symp. Dig. Tech. Papers, June 2001,
pp. 3942.
[2] M. G. Johnson and E. L. Hudson, A variable delay line PLL for CPUcoprocessor synchronization, IEEE J. Solid-State Circuits, vol. 23, pp.
12181223, Oct. 1988.
[3] I. I. Novof, J. Austin, R. Kelkar, D. Strayer, and S. Wyatt, Fully integrated CMOS phase-locked loop with 15 to 240 MHz locking range
and 50 ps jitter, IEEE J. Solid-State Circuits, vol. 11, pp. 12591265,
Nov. 1995.
[4] S. Sidiropoulos, D. Liu, J. Kim, G. Wei, and M. Horowitz, Adaptive
bandwidth DLLs and PLLs using regulated supply CMOS buffers, in
VLSI Symp. Dig. Tech. Papers, June 2000, pp. 124127.

Tai-Cheng Lee was born in Taiwan, R.O.C., in


1970. He received the B.S. degree from National
Taiwan University, Taipei, Taiwan, R.O.C., in
1992, the M.S. degree from Stanford University,
Stanford, CA, in 1994, and the Ph.D. degree from
the University of California, Los Angeles, in 2001,
all in electrical engineering.
He was with LSI Logic from 1994 to 1997 as a Circuit Design Engineer. He served as an Adjunct Assistant Professor with the Graduate Institute of Electronics Engineering (GIEE), National Taiwan University, from 2001 to 2002. Since 2002, he has been with the Department of
Electrical Engineering and GIEE, National Taiwan University, where he is an
Assistant Professor. His main research interests are in high-speed mixed-signal
and analog circuit design, data converters, PLL systems, and RF circuits.

Behzad Razavi (S87M90SM00F03) received


the B.Sc. degree in electrical engineering from Sharif
University of Technology, Tehran, Iran, in 1985 and
the M.Sc. and Ph.D. degrees in electrical engineering
from Stanford University, Stanford, CA, in 1988 and
1992, respectively.
He was an Adjunct Professor at Princeton
University, Princeton, NJ, from 1992 to 1994, and
at Stanford University in 1995. He was with AT&T
Bell Laboratories and Hewlett-Packard Laboratories
until 1996. Since September 1996, he has been an
Associate Professor and subsequently Professor of electrical engineering at
the University of California, Los Angeles. He is the author of Principles of
Data Conversion System Design (New York: IEEE Press, 1995), RF Microelectronics (Englewood Cliffs, NJ: Prentice-Hall, 1998), Design of Analog
Integrated Circuits (New York: McGraw-Hill, 2001), Design of Integrated
Circuits for Optical Communications (New York: McGraw-Hill, 2002), and the
editor of Monolithic Phase-Locked Loops and Clock Recovery Circuits (New
York: IEEE Press, 1996). His current research includes wireless transceivers,
frequency synthesizers, phase-locking and clock recovery for high-speed data
communications, and data converters.
Dr. Razavi received the Beatrice Winner Award for Editorial Excellence at the
1994 ISSCC, the Best Paper Award at the 1994 European Solid-State Circuits
Conference, the Best Panel Award at the 1995 and 1997 ISSCC, the TRW Innovative Teaching Award in 1997, and the Best Paper Award at the IEEE Custom
Integrated Circuits Conference in 1998. He was the corecipient of the Jack Kilby
Outstanding Student Paper Award at the 2002 ISSCC. He served on the Technical Program Committee of the International Solid-State Circuits Conference
(ISSCC) from 1993 to 2002. He has also served as Guest Editor and Associate
Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS, IEEE TRANSACTIONS
ON CIRCUITS AND SYSTEMS, and the International Journal of High Speed Electronics. He is recognized as one of the top ten authors in the 50-year history of
ISSCC. He is also an IEEE Distinguished Lecturer.

490

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 4, APRIL 2000

An Adaptive PLL Tuning System Architecture


Combining High Spectral Purity and
Fast Settling Time
Cicero S. Vaucher, Member, IEEE

AbstractAn adaptive phase-locked loop (PLL) architecture


for high-performance tuning systems is described. The architecture combines contradictory requirements posed by different performance aspects. Adaptation of loop parameters occurs continuously, without switching of loop filter components, and without
interaction from outside of the tuning system. The relationship of
performance aspects (settling time, phase noise, and spurious signals) to design variables (loop bandwidth, phase margin, and loop
filter attenuation at the reference frequency) are presented, and
the basic tradeoffs of the new concept are discussed. A circuit implementation of the adaptive PLL, optimized for use in a multiband (global) car-radio tuner IC, is described in detail. The realized tuning system achieved state-of-the-art settling time and spectral purity performance in its class (integer- PLLs): a signal-tonoise ratio of 65 dB, a 100-kHz spurious reference breakthrough
signal under 81 dBc, and a residual settling error of 3 kHz after
1 ms, for a 20-MHz frequency step. It simultaneously fulfills the
speed requirements for inaudible frequency hopping and the heavy
signal-to-noise ratio specification of 64 dB.
Index TermsAdaptive systems, FM noise, frequency synthesizers, phase-locked loops.

I. INTRODUCTION

AST settling timefrequency synthesizers are essential


building blocks of modern communication systems.
Typical examples are digital cellular mobile systems, which
employ a combination of time-division duplex (TDD) and
frequency-division duplex (FDD) techniques. In these systems,
the downlink frequencies (base station to handsets) are placed
in different bands with respect to uplink frequencies. In order
to save cost and decrease the size of the handset, it is desirable
to use the same frequency synthesizer to generate uplink and
downlink frequencies. Requirements are that the synthesizer
has to switch between bands and settle to another frequency
within a predetermined time ( 1.7 ms for GSM and DCS-1800
systems [1]).
Car-radio receivers with optimal radio data system (RDS)
performance ask for fast-settling-time tuning systems as well
[2]. The RDS network transmits a list of (nationwide) alternative frequencies carrying the same program. The tuner performs
a background scanning of these frequencies, so that optimum

Manuscript received July 23, 1999; revised November 29, 1999.


The author is with Philips Research Laboratories, Eindhoven 5656 AA The
Netherlands (e-mail: Cicero.Vaucher@philips.com).
Publisher Item Identifier S 0018-9200(00)02861-4.

reception condition is provided when the receiver is displaced


within different coverage regions. For the system to be effective, the background scanning has to be performed in a transparent (inaudible) way to the listener. A possible but expensive
way to do that is to use two tuners in the receiver, with one of
them being used for checking on alternative frequencies only.
Single-tuner solutionswhich have a much better price/performance ratiorequire a tuning system architecture able to do
frequency hopping in an inaudible way [2]. In other words, a
fast-settling-time architecture is required for these applications.
Communication systems often pose severe requirements on
the spectral purity of the tuning system local oscillator (LO)
signal. There are two main reasons for this. First, to avoid problems with reciprocal mixing of adjacent channels. Reciprocal
mixing decreases the receiver's selectivity and disturbs the reception of weak signals. Second, because the mixing process,
which is used for down-conversion of the radio-frequency (RF)
signals, superposes the phase noise of the LO on the modulation of the RF signal. Hence, the signal-to-noise ratio (SNR) at
the output of the demodulator is a function of LO's phase noise
level [3].
This paper describes an adaptive tuning system architecture
that combines fast settling time with excellent spectral purity
performance. The architecture was optimized to be used in
a global car-radio tuner IC with inaudible RDS background
scanning. The integer- frequency synthesizer has an SNR of
65 dB and a 100-kHz spurious reference breakthrough under
81 dBc at the voltage-controlled oscillator (VCO) ( 87 dBc
at the mixer). Residual settling error for a 20-MHz frequency
step is 3 kHz after 1 ms. These results are similar to those of a
fractional- implementation [4]. The complexity of our tuning
system, however, is much smaller. The adaptive phase-locked
loop (PLL) was integrated in a 5-GHz, 2- m bipolar technology. The tuning system works with 8.5-V supply voltage for
the charge pumps and with 5 V for the logic functions. Total
current consumption is 21 mA from the 5-V supply and 12 mA
from the 8.5-V supply.
The architecture of the multiband tuner IC is described in
Section II. Section III presents relationships of settling time,
phase noise, and spurious signals to the design variables, namely
loop bandwidth, phase margin, and loop filter attenuation at the
reference frequencies. Section IV introduces the adaptive PLL
architecture and discusses the advantages and tradeoffs of the
concept. Section V describes the circuit implementation, and
Section VI presents a summary of measured results.

00189200/00$10.00 2000 IEEE

VAUCHER: ADAPTIVE PLL TUNING SYSTEM ARCHITECTURE

491

Fig. 1. Simplified block diagram of the global car-radio tuner IC.

TABLE I
RECEPTION BANDS WITH CORRESPONDING TUNING SYSTEM PARAMETERS

II. MULTIBAND TUNER ARCHITECTURE


The block diagram of the global tuner IC with inaudible background scanning is shown in Fig. 1. The receiver and tuning
system architectures have been defined such that all reception
bands can be accessed with a single VCO and a single loop filter,
without changes to the application. Mapping the frequency of
the VCO to the different input bands is achieved by dividing its
output frequency by different ratios, depending on the band to
be received. The division is accomplished in the FM DIV and

AM DIV dividers, which are set in between the VCO output


and the RF mixers. Table I presents the VCO frequency and
tuning system parameter settings for various reception bands,
including the American Weather Band. By dividing the VCO
output, the tuning resolution is 1 kHz in AM mode and 50 kHz in
FM mode, despite the fact that reference frequencies are 20 kHz
and 100 kHz, respectively.
Combining the different reception bands in one single applicationthe same VCO and same loop filtercomplicates the
design of the tuning system. A reception band with worst case

492

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 4, APRIL 2000

Fig. 2. Open-loop frequency response (Bode plot) of a type-2, third-order


charge-pump PLL for different values of phase margin  .

(a)

spectral purity requirements determines the loop filter design.


Nonetheless, robustness for variations in tuning system parameters, for all reception bands, has to be insured. The relationships
between different performance aspects on system level are discussed in the following section.
III. SETTLING TIME AND SPECTRAL PURITY PERFORMANCE
The properties of a PLL are strongly related to its phase detector implementation [5]. Present-day PLL frequency synthesizers usually employ the tristate, sequential phase frequency
detector (PFD), combined with a charge pump (CP) [6]. The
analysis of the PLL properties presented in this paper assumes
the use of a PFD/CP in the loop.
A. Settling Time, Loop Bandwidth, and Loop Phase Margin
Bode diagrams are a powerful tool for designing PLL tuning
systems [7], [8] because they enable direct assessment of the
and open-loop bandwidth (0-dB freloop's phase margin
are obquency ). Accurate and reliable results for and
tained with ease to implement behavioral models [9] and with
fast ac simulation runs. In spite of the advantages of the ac
method, design equations relating the settling performance of a
type-2, third-order charge-pump PLL1 [6] to its open-loop bandwidth and phase margin have, to the best of our knowledge, not
yet been published in the open literature.
Fig. 2 presents Bode plots of a type-2, third-order loop for
. Fig. 3(a) displays the trandifferent values of phase margin
sient response of such a loop for three different values of phase
, normalized
margin. The responses are plotted as
.
is the remaining frequency error with respect to
for
is the amplitude of the frequency jump.
the final value and
, so that
Fig. 3(b) presents the responses as
on the long-term transient response is easily
the impact of
observed.
The influence of the phase margin on the settling time, obtained with transient simulations similar to those of Fig. 3, is
presented in Fig. 4. The figure shows the time necessary for
to reach a numerical value of
the value of
10. The settling time decreases with increasing phase margin,
1The

most widely used configuration in synthesizer applications.

(b)
Fig. 3. Setting transient for different values of  , normalized for f t. (a)
Setting error (represented as f f t =f
) versus f t. (b) Setting error
(represented as
j
f f t j=f
) versus f t.

ln( 1 ( )

1( )
)

reaching a minimum for


values of around 50 . Increasing
the phase margin further leads to a sharp increase in the settling
time.
The relationship of settling time and phase margin, displayed
in Fig. 4, can be understood with the help of Fig. 5. It presents
the pole and zero locations of the closed-loop transfer function
of a third-order loop with different values of phase margin (Bode
plots presented in Fig. 2). The real part of the dominant (comfor values of
of about 50 . When
plex) poles approach
equals 53 , all three poles lie at
. That is the location
with the fastest damping of the transient error. The fastest response, however, is obtained with 51 . The complex parts of the
poles speed up the settling transient a bit further (25%). For
higher values of phase margin, the dominant real pole moves to
the right on the real axis. This pole is responsible for the slowing
53 . Fig. 5
down of the PLL response for values of
shows that the dominant pole, for 60 phase margin, lies at about
0.4 . Hence, it may be concluded that the usual practice of
designing critically damped loopswhich have a phase margin
of about 70 [5]is not appropriate for fast-settling-time applications.
Let us consider Fig. 3(b) again. One sees that the (envelope
of the) curves can be approximated by straight lines. The ap-

VAUCHER: ADAPTIVE PLL TUNING SYSTEM ARCHITECTURE

Fig. 4. Setting time as function of the phase margin for f

493

=f

=e

( ) for a 1(ln( )) of ten.

Fig. 6. Average values of  

In (3):

Fig. 5. Position of the closed-loop poles and zeros of a third-order PLL


corresponding to different values of  , as displayed in Fig. 2.

proach proposed here takes


into account with the help of an
. By so doing, we arrive at
effective damping coefficient
the following approximation for the envelope of the curves of
Fig. 3(b):
(1)
can be obtained from tranNumerical estimations for
sient simulations with the help of the following expression:
(2)
The settling time results presented in Fig. 4 leads to the nudisplayed in Fig. 6. These values repmerical values for
, as they are obtained from a
resent an average value for
of ten.
Manipulation of (1) results in an equation describing the minimum loop bandwidth required to achieve given settling speciand
fications
(3)

locking time(s);
amplitude of the frequency jump (Hz);
maximum frequency error (Hz) at
;
can be read from Fig. 6.
Two points about the present treatment of the transient response need further explanation. First, the presented results are
based on a linear continuous-time model for the discrete-time
charge-pump PLL. It is known in the literature [6] that the
continuous-time approach is a good approximation for the
discrete-time PLL if the reference (sampling) frequency
of the loop is at least a factor of ten higher than its open-loop
bandwidth . Therefore, the value of , calculated with (3),
. If
has to be checked against the loop's reference frequency
is smaller than ten, then actual settling
the target ratio
behavior will deviate from the calculations.
The second point is that usual implementations of the phase
frequency detector have a limited linear phase error detection
range, namely, from 2 to 2 [9]. When the instantaneous
becomes larger than 2 , the PFD interprets
phase error
2 . This effect leads to a
the error information as
longer settling time than predicted with (3). The maximum value
, denoted
, was found to obey the following relaof
, where is the main ditionship:
is a fitting factor for the influence of the
vider ratio and
. Numerical values for
, obtained
phase margin on
from transient simulations, lie in the range [0.7,0.8]. Hence, the
maximum phase error is contained in the interval 2 , when
2
. If this condition is satisfied, then the
(discrete-time) transient response is accurately predicted by the
continuous-time linear model.
Inaudible RDS background scanning requires settling times
of 1 ms, defined as a residual settling error of 6 kHz for a
20-MHz frequency jump. The nominal loop phase margin is set
of five. On the other
to 50 , which corresponds to a
in the
hand, it is appropriate to use a lower value for
calculations (e.g., 2.5), to provide enough margin for variations
in the nominal values of loop bandwidth and phase margin.
Solving (3) for these settling specifications leads to a nominal
value of 3.2 kHz for the loop bandwidth .

494

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 4, APRIL 2000

Fig. 7. FM noise density and residual FM for loop bandwidths of 800 Hz and 3 kHz.

The loop bandwidth that satisfies different settling requirements can be calculated with the help of (3). Settling specifications, however, often require loop bandwidths that are not optimal with respect to spectral purity performance, as will become clear in the next subsection.
B. Phase Noise Performance and Loop Bandwidth
The dependency of the total phase noise of a PLL tuning
system on the phase noise of the loop components is well known
in the literature [3], [5], [10]. The phase noise of the VCO is suppressed inside the loop bandwidth, whereas the (phase) noise
from the other building blocks is transferred to the VCO output,
multiplied by the closed-loop transfer function of the PLL: a
low-pass function that suppresses their noise contribution outside the loop bandwidth. There is a crossover point for the
loop bandwidth, where the noise contribution from the dividers
and charge pump becomes dominant with respect to the noise
from the VCO.
For terrestrial FM reception, the LO signal residual frequency
noise (residual FM) determines the ultimate receiver's SNR performance. The SNR specification for the application is 64 dB,
defined for a reference level of 22.5-kHz peak deviation with
50- s deemphasis. Complying to the specification requires the
residual FM in the LO signal to be less than 10 Hz rms.
The frequency (FM) noise density of the LO signal
is linked to its phase noise power density
by
[5].
equals
, the single-sideband noise-to-carrier ratio, so that
2
. Finally, the residual FM can be calculated
(4)
in (4) depend on the signal
The integration limits and
bandwidth of the application [3]. For terrestrial FM reception,
the lower limit is 20 Hz and the higher is 20 kHz. Fig. 7 presents

the simulated frequency noise (FM noise) power density and the
residual FM, which is plotted as function of , with fixed at
20 Hz. The FM noise density and the residual FM are plotted for
values of loop bandwidth of 800 Hz and of 3 kHz. For 3 kHz,
the residual FM amounts to 40 Hz rms, which is 12 dB higher
than the specification. A loop bandwidth of 800 Hz, on the other
hand, leads to a residual FM of 8 Hz rms, which satisfies the
SNR requirement.
The contributions of different noise sources to the total frequency noise density, in the case of an 800-Hz loop bandwidth,
are displayed in Fig. 8. The contribution of the VCO to the
residual FM equals that of the other synthesizer building blocks.
This is a good compromise, and 800 Hz was chosen as the nominal loop bandwidth for in-lock situations.
The settling specification requires a bandwidth of 3.2 kHz.
The SNR constraint, on the other hand, asks for 800 Hz. These
conflicting requirements can be combined when the loop bandwidth is made adaptive as a function of the operating mode: frequency jump or in-lock.
Adapting the value of the loop bandwidth during frequency
jumps is easily accomplished by switching the nominal value of
the charge-pump current [6], [13]. This method, however, often
causes disturbances in the VCO tuning voltagethe so-called
secondary glitch-effectat the moment the current is switched
from high to low values. These disturbances are highly undesirable, as they have to be corrected by the loop in small bandwidth mode. What is more, the secondary glitches may cause
audible disturbances in analog systems and increase the bit error
rate in digital systems.
To provide stability for a small bandwidth loop requires a
transfer function zero located at low frequencies (large time constant). A low-frequency zero, however, is undesirable for operation in high bandwidth mode. It causes the phase margin to be
too high, which increases the settling time. Note that the efdecreases for high values of
fective damping coefficient
phase margin (see Fig. 6).

VAUCHER: ADAPTIVE PLL TUNING SYSTEM ARCHITECTURE

495

Fig. 8. Contributions from different noise sources to the total FM noise density and residual FM (20 Hz20 kHz) with 800-Hz loop bandwidth.

Therefore, for optimal settling time and phase noise, one has
not only to switch the value of the loop bandwidth but also to
change the location of the zero in the transfer function.
C. Reference Spurious Signals and Loop Filter Attenuation
The use of phase frequency detectors yields the minimum
levels of spurious breakthrough at the reference frequency [11].
The spurious signals are due to compensation of leakage currents or to imperfections in the charge pumps implementation.
Standard FM modulation theory and the small angle approximation lead to the following equation for the amplitude of the
from
spurious signal (in dBc), which is at an offset frequency
the carrier:

Fig. 9.

Adaptive PLL tuning system architecture.

spurious
(5)
where
offset frequency from the carrier (Hz);
amplitude of ac current component with frequency
(A);
impedance of the loop filter at
(V/A);
VCO gain (Hz/V).
is twice the value of the loop-filter
The value of
dc leakage current [12] in loops operating with well-designed
charge pumps. In cases where the charge pump has chargesharing problems and/or charge injection into the loop filter,
may become dominated by these second-order effects.
The imperfections can lead to spurious components with (much)
higher amplitudes than would be expected based on the leakage
current alone.
Rearranging the above equation leads to a formula that relates
to the specified maximum
the required filter attenuation at
, to the dc leakage
level of spurious signals
, and to the VCO gain
current
(6)

Fig. 10. Loop-filter configuration, charge-pump currents, and component


values used in the global car-radio tuner IC.

The relevant values of


equal
and its harmonics in a
Hz.
standard PLL operating with a reference frequency of
Therefore, the required loop-filter (trans)impedance for these
frequencies can be readily calculated. The VCO gain, the spurious specification, and the expected (maximum) leakage current are known.

496

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 4, APRIL 2000

Fig. 11.

Bode plots of the adaptive loop during frequency jumps and in-lock.

Fig. 12.

Implementation of the DZ building block.

An important conclusion to be taken from the above equations


is that the amplitude of the spurious signals is not dependent on
the absolute value of loop bandwidth. Instead, it is determined
by the (trans)impedance of the loop filter. This means that, at
least in principle, any spurious specification can be achieved
simply by decreasing the impedance level of the loop filter. In
practice, this is not a viable option because the PLL loop bandwidth is proportional to the value of the loop-filter resistor and
to the charge-pump current [6].
For a constant value of the loop bandwidth, a decrease of
the loop-filter impedance level requires a proportional increase
of the nominal charge-pump current. This leads to difficulties
in the charge-pump design and to higher power dissipation. To
avoid these difficulties, more RC sections are added to the basic
loop-filter configuration, so that the filter attenuation at higher
frequencies is increased. Additional RC sections, however, inevitably cause phase lag at lower frequencies. The phase lag de-

creases the loop phase margin and increases the settling time in
high-bandwidth mode.
Therefore, to provide optimal settling, low-power dissipation,
and good spurious performance, one has not only to switch the
value of the loop bandwidth but also to bypass (some) RC sections of the loop filter. The PLL architecture presented here
complies with these requirements.

IV. ADAPTIVE PLL ARCHITECTURE


A. Basic Architecture
The basic idea is to have two loops working in parallel, as
depicted in Fig. 9. Loop 1, built around PFD1 and CP1, is dimensioned for in-lock operation. Loop 2, built around PFD2,
DZ, and CP2, is dimensioned for fast settling time. Loop 1 operates all the time, whereas Loop 2 is only active during tuning

VAUCHER: ADAPTIVE PLL TUNING SYSTEM ARCHITECTURE

actions. Loop 1 and Loop 2 share the crystal oscillator, the reference divider, and the main divider.
A smooth takeover from Loop 1, after a frequency jump,
avoids secondary glitch effects. The high-current charge
pump CP2 is only active during tuning. CP2 is controlled by the
dead-zone (DZ) block. DZ generates a smooth transition into a
well-defined dead zone for CP2 when lock is achieved, so that
sudden disturbances of the VCO tuning voltage are avoided.
Additional freedom for optimization of the loop parameters
is obtained by using two separate charge-pump outputs and by
applying the charge-pump currents to different nodes of the loop
filter. In this way, the location of the zeros for frequency jumps
and in-lock can be set in a continuous way, without switching
of loop componentswhich is a source of secondary glitch
problems. Furthermore, the path from Icpl to Vtune may contain additional filtering sections for, e.g., attenuation of spurious
signals and/or fractional- quantization noise [14]. These filter
sections may be bypassed by Icph to increase the phase margin
in high-bandwidth mode.
B. Loop-Filter Implementation
The ideas described above are demonstrated with the help
of Figs. 10 and 11. Fig. 10 presents the loop-filter configuration and component values used in the global tuner IC (Fig. 1).
Fig. 11 shows the optimized Bode diagrams of the adaptive PLL
(in FM mode) with the loop filter of Fig. 10.
During frequency jumps both CP1 and CP2 are active; the
loop filter zero frequency is 1/2 RbCa and lies at a high frequency, matching the 0-dB open-loop frequency. It enables stability and fast tuning to be achieved. The nominal loop bandwidth in this mode is 3.2 kHz, and the phase margin is 50 . After
the frequency jump only CP1 is active. The zero of the loop filter
moves to a lower frequency (1/2 Ra Rb Ca), without the
switching of loop-filter components. The low-frequency zero increases the phase margin in-lock.
When the loop is in-lock, an extra pole is introduced
(1/2 RcCc), which increases the 100-kHz reference suppression by about 20 dB. During frequency jumps, these
elements are bypassed by CP2, increasing the phase margin in
high-bandwidth mode. If the loop bandwidth were increased
by simply switching the amplitude of CP1, one would end up
with an unstable loop, because of a phase margin of less than
10 in high-bandwidth mode.
C. Dead-Zone Implementation
The new element in the adaptive PLL architecture is the combination of the DZ block with the high-current charge pump
CP2. The function of DZ is to provide CP2 with a well-des. The dead zone is centered symmetfined dead zone of
rically around the locking position of charge pump CP1 [see
Fig. 13(a)].
The logic diagram of the DZ/CP2 combination is depicted in
Fig. 12. The figure shows how the different logic functions influence the duty cycle of the up and dn signals from the phase
frequency detector (PFD2). At the input of DZ, the up and dn
signals have a finite duty cycle, even for an in-lock situation
. The finite duty cycle eliminates dead-zone problems
in CP1. The XOR and AND gates are used to cancel the finite

497

(a)

(b)

(c)
Fig. 13.

Shift in locking position as function of VCO tuning voltage.

in-lock duty cycle. The processed up and dn signals are then applied to low-pass filters and slicers, whose function is to prevent
pulses that have too small a duty cycle from reaching CP2. The
cutoff frequency of the low-pass filters, the discrimination level
of the slicers, and the turn-on time of CP2 determine the size of
s.
the dead zone around the lock position
A tradeoff among settling performance, circuit implementation, and robustness arises, when the magnitude of the dead zone
has to be determined. Let us start discussing circuit aspects.
The dead zone of charge pump CP2 should be centered
around the locking position of the loop for optimum settling and
spectral purity performance. The locking position, however, is
a function of the output voltage of charge pump CP1. The effect
is depicted in Fig. 13. One sees that, as the tuning voltage Vtune
increases, there is a shift of the locking position to positive
. The reason lies in the finite output resistance
values of
of the active element used in CP1. Different current gains in
CP1's UP and DOWN branches need to be compensated by up
and dn signals with different duty cycles at the locking point.
Different duty cycles are accomplished by a shift in the loop's
locking position.
Fig. 13 shows situations where the gain in the UP branch of
the pump decreases as Vtune increases. The ideal operating situation is depicted in Fig. 13(a). Situation (b) is still allowed from
the point of view of spectral purity but has asymmetrical settling
performance. Finally, (c) depicts a situation that should never
happen: the locking position shifts so much that the high-current charge pump CP2 becomes active and degrades the in-lock
spectral purity. Therefore, increasing the size of CP2's dead zone
s) eases the design of charge pump CP1 and increases the
(
robustness of the system.
On the other hand, the size of CP2's dead zone influences the
settling performance of the adaptive loop. The influence of
on the transient response was simulated with behavioral models.
The results are displayed in Fig. 14, together with the settling
requirements that ensure inaudible background scanning functionality. Table II presents the settling time for different settling
. A dead-zone value of
accuracies and different values of
infinity corresponds to the situation where only CP1 is active

498

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 4, APRIL 2000

Fig. 14.

Detail of settling transient for different values of  .

TABLE II
SIMULATED IN-LOCK SNR AND SETTLING TIME (ms) FOR A 20-MHz
FREQUENCY JUMP FOR DIFFERENT VALUES OF THE DEAD ZONE AND
DIFFERENT SETTLING ACCURACIES

(nonadaptive loop). Table II shows that by using the adaptive


loop architecture, it is possible to combine fast settling time with
leaves more residual phase
good SNR in-lock. Increasing
(and frequency) error to be corrected by the small bandwidth
loop. The closer one comes to the locking point in high bandwidth mode, the shorter the total settling transient will be. A
dead-zone value of 15 ns is a good compromise for the intended application.

Fig. 15.

Micrograph of the tuner IC.

V. CIRCUIT IMPLEMENTATION
A die micrograph of the total tuner IC is displayed in Fig. 15.
The adaptive PLL has been integrated with the other functional
blocks of Fig. 1 in a 5-GHz, 2- m bipolar technology [15].
Fig. 16. Architecture of the main programmable divider.

A. Programmable Dividers
The architecture of the main divider is depicted in Fig. 16.
The high-frequency part of the programmable divider is based
on the programmable prescaler concept described in [12] and
consists of a chain of 2/3 divider cells. The modular architecture
enables easy optimization of power dissipation and robustness
for process variations. The division range of the basic prescaler
configuration is extended by the low-frequency programmable
counter. The logic functions of the PLL were implemented with

current routing logic techniques (CRL) [12], [16]. The low-frequency part of the main and reference dividers operate with low
current levels to limit total power dissipation. To decrease the
phase noise of the reference signal going to the phase detectors,
this signal is reclocked in a high-current D-flip-flop (D-FF). The
clean crystal signal is used to clock the D-FF. The total main divider current consumption is 5 mA. The first 2/3 cell consumes
2.1 mA.

VAUCHER: ADAPTIVE PLL TUNING SYSTEM ARCHITECTURE

Fig. 17.

Simplified circuit diagram of charge pump CP1.

Fig. 18.

CP1 and CP2 charge-pump currents as a function of

499

1t .

B. Oscillators
The LC VCO uses an external tank circuit. It can be tuned
from 150 to 250 MHz, with a voltage tuning range from 0.5
to 8 V. The VCO phase noise is 100 dBc/Hz at 10 kHz, for
a carrier frequency of 237 MHz. The VCO core consumes
1.5 mA. The 20.5-MHz reference crystal oscillator operates
in linear mode, to avoid harmonics interfering in the FM
reception bands. Quadrature generation for the image rejection
FM mixers (see Fig. 1) is accomplished in a divider-by-two
(FM DIV), with the exception of reception in the American
Weather Band (WX). In that case, I/Q signals are generated
with a RC-CR network directly from the VCO. This avoids the
need to have the VCO operating at 346 MHz, and a change in
the LC VCO tuned circuit during WX reception.
C. Charge Pumps
Fig. 17 shows the simplified circuit diagram of the low-current charge pump CP1. The up and dn signals from the phase

Fig. 19.

Settling transient for a 20-MHz tuning step.

500

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 4, APRIL 2000

(a)

(b)
Fig. 20.

Spectral purity measurements in FM mode: (a) reference spurious breakthrough and (b) close to the carrier.

detector drive the input differential pairs, which set the currents
in the PNP current switches Q1 and Q2 on and off. The collector
outputs of Q1 and Q2 are kept at equal dc levels by the dc feed-

back arrangement provided by Q3 and Q4. This prevents asymmetry in the source and sink currents, ensuring good centring of
the charge-pump characteristics for all tuning voltages. Q5 and

VAUCHER: ADAPTIVE PLL TUNING SYSTEM ARCHITECTURE

501

Fig. 21. Evaluation of the FM channelVCO purity determines SNR for V


kHz; 26 dB = 2:0 V. THD meas.: FMdev = 75 kHz.

> 300 V. Fin = 97:1 MHz, AF freq = 1 kHz. SNR meas.: FMdev = 22:5

Q6 provide means for stabilization of currents and for speeding


up the switching of Q1 and Q2. The reset circuits monitor the
currents in Q1 and Q2 and generate the reset signals RST Up
and RST Dn. These signals are fed back to reset the phase detectors. The high-current charge pump CP2 is a scaled-up version of the CP1 circuit, without the reset circuits.

VII. CONCLUSION

VI. MEASUREMENTS
The measured charge-pump currents as a function of the
time difference between the phase detector inputs are shown
in Fig. 18. Good centering of the two charge-pump outputs
is observed, and there is enough margin for variations in
the in-lock position of CP1. The measured settling transient
response is displayed in Fig. 19. The settling performance
complies to the settling requirements and enables inaudible
background scanning in single-tuner RDS applications.
The frequency spectrum of the VCO in FM mode is presented
in Fig. 20(a) and (b). Fig. 20(a) shows the spurious reference
breakthrough at 100 kHz to be under 81 dBc. There is yet a
6-dB improvement in noise and spurious breakthrough before
the VCO signal reaches the FM mixers, due to the division by
two in the FM DIV divider (see Fig. 1). Fig. 20(b) displays the
phase noise spectrum close to the carrier. Spectrum measurements done in AM mode showed a reference spurious breakthrough of 57 dBc, at an offset of 20 kHz from the carrier. For
AM, the improvement in phase noise and spurious performance
amounts to 26 dB, due to the division by 20 in between the VCO
and the AM mixers.
Finally, the SNR and THD of the total FM receiver chain are
displayed in Fig. 21 as a function of the antenna input signal
. For low values of
, the noise is dominated by RF
level
input noise and by the quality of the building blocks in the signal
processing chain: low-noise amplifier, mixers, and demodulator.
( 300 V), the dominant noise source
For high values of
becomes the LO signal. The excellent measured FM sensitivity,
2.0 V for 26-dB SNR, and the ultimate SNR of 65 dB verify
the spectrum purity of the tuning system and of the RF channel.

This paper described an adaptive PLL architecture for


high-performance tuning systems. The relationships of performance aspects to design variables were presented. It is
demonstrated that design for spectral purity performance
often leads to suboptimal settling performance, because of
different requirements on the loop bandwidth and on the
location of the zeros and poles of the closed-loop transfer
function. The adaptive architecture described here resolves
these contradictory requirements, without the necessity of
switching circuit elements in the loop filter. The adaptation of
loop bandwidth occurs continuously, as a function of the phase
error in the loop, and without interaction from outside of the
tuning system. During frequency jumps, high bandwidth and
high phase margin are obtained by bypassing filter sections.
When the loop is locked, the architecture allows heavy filtering
of spurious signals. The implementation of the dead-zone
block was presented, and the basic tradeoffs of the concept
were discussed. The adaptive PLL was optimized for use in a
multiband (global) car-radio tuner IC, which features inaudible
background scanning. Design and architecture of the PLL
building blocks were discussed, and measurement results were
presented. The integrated adaptive PLL tuning system achieved
state-of-the-art settling and spectral purity performance in its
class (integer- PLLs). It fulfills simultaneously the speed
requirements for inaudible frequency hopping and the heavy
SNR specification of 64 dB.
ACKNOWLEDGMENT
The author wishes to thank D. Kasperkovitz for technical support during the project, K. Kianush for his tireless disposition
in bringing the car-radio project to a successful end, H. Vereijken for the optimization and layout of the synthesizer building
blocks, B. Egelmeers for the implementation and evaluation
of the concept in a bread-board functional model, and G. van
Werven for the measurements.

502

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 4, APRIL 2000

REFERENCES
[1] B. Razavi, A 900 MHz/1.8 GHz CMOS transmitter for dual-band applications, IEEE J. Solid-State Circuits, vol. 34, pp. 573579, May 1999.
[2] K. Kianush and C. S. Vaucher, A global car radio IC with inaudible
signal quality checks, in IEEE Int. Solid-State Circuits Conf. Dig. Tech.
Papers, 1998, pp. 130131.
[3] W. P. Robins, Phase Noise in Signal Sources, 2nd ed, ser. 9. London,
U.K.: Inst. Elect. Eng., 1996.
[4] H. Adachi, H. Kosugi, T. Awano, and K. Nakabe, High-speed frephase-locked loop,
quency-switching synthesizer using fractional
IEICE Trans. Electron., pt. 2, vol. 77, no. 4, pp. 2028, 1994.
[5] U. L. Rohde, RF and Microwave Digital Frequency Synthesizers. New
York: Wiley, 1997.
[6] F. M. Gardner, Charge-pump phase-lock loops, IEEE Trans.
Commun., vol. 28, no. 11, pp. 18491858, Nov. 1980.
[7] H. Meyr and G. Ascheid, Synchronization in Digital Communications. New York: Wiley, 1990.
[8] F. M. Gardner, Phase-Lock Techniques. New York: Wiley, 1979.
[9] B. Razavi, Ed., Monolithic Phase-Locked Loops and Clock Recovery
Circuits. New York: IEEE Press, 1996.
[10] V. F. Kroupa, Noise properties of PLL systems, IEEE Trans.
Commun., vol. C-30, pp. 22442552, Oct. 1982.
[11] C. S. Vaucher, Synthesizer architectures, in Analog Circuit Design, R.
J. van de Plassche, Ed. Norwell, MA: Kluwer, 1997.

[12] C. Vaucher and D. Kasperkovitz, A wide-band tuning system for fully


integrated satellite receivers, IEEE J. Solid-State Circuits, vol. 33, no.
7, pp. 987998, July 1998.
[13] K. Nagaraj, Adaptive charge pump for phase-locked loops, U.S. Patent
5 208 546, 1993.
[14] B. Miller and B. Conley, A multi-modulator fractional divider, in Proc.
IEEE 44th Annu. Symp. Frequency Control, 1990, pp. 559567.
[15] Philips Semiconductors, TEA6840H global car-radio tuner datasheet,
1999.
[16] W. G. Kasperkovitz, Digital shift register, U.S. Patent 5 113 419, 1992.

Cicero S. Vaucher (M98) was born in So Francisco de Assis, Brazil, in 1968. He graduated in electrical engineering from the Universidade Federal do
Rio Grande do Sul, Porto Alegre, Brazil, in 1989.
He joined the Integrated Transceivers group
of Philips Research Laboratories, Eindhoven,
The Netherlands, in 1990, where he works on
implementations of low-power building blocks for
frequency synthesizers, on synthesizer architectures
for low-noise/high-tuning-speed applications, and
on CAD modeling of PLL synthesizers.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 10, OCTOBER 2000

1445

Fast-Switching Frequency Synthesizer with a


Discriminator-Aided Phase Detector
Ching-Yuan Yang, Student Member, IEEE, and Shen-Iuan Liu, Member, IEEE

AbstractA phase-locked loop (PLL) with a fast-locked


discriminator-aided phase detector (DAPD) is presented. Compared with the conventional phase detector (PD), the proposed
fast-locked PD reduces the PLL pull-in time and enhances the
switching speed, while maintaining better noise bandwidth. The
synthesizer has been implemented in a 0.35- m CMOS process,
and the output phase noise is 99 dBc/Hz at 100-kHz offset.
Under the supply voltage of 3.3 V, its power consumption is 120
mW.

II. BASIC IDEA AND MODEL


A simple charge-pump PLL consists of four major blocks: the
phase detector (PD), the charge-pump circuit, the loop filter, and
the voltage-controlled oscillator (VCO) [3][6]. Fig. 2 shows
the linear model of a charge-pump PLL-based frequency synthesizer. The closed-loop transfer function can be represented
as

Index TermsBandwidth adjusting, fast acquisition, fast


locking, frequency synthesizers, phase detectors, phase-locked
loops.

I. INTRODUCTION

HASE-LOCKED loop (PLL) circuits have been found to


be useful wherever there is a need to synchronize a local oscillator with an independent incoming signal, such as serial data
links and RF wireless communications. In order to optimize the
loop performance, some features should be taken care of [1], [2].
First, to minimize output phase jitter due to external noise, the
loop bandwidth should be made as narrow as possible. Second,
to minimize output jitter due to internal oscillator noise, or to obtain best tracking and acquisition properties, the loop bandwidth
should be made as wide as possible. These principles obviously
oppose each other; and therefore some compromises between
these two principles are always inevitable. The block diagram
of a PLL with a discriminator-aided phase detector (DAPD) is
shown in Fig. 1. One could leave the discriminator connected
permanently and/or merely weight the relative contributions of
the system so as to obtain the desired damping. The discriminator-aided path adds to lock the PLL quickly. Once the PLL
is in lock, a better bandwidth can be maintained while the discriminator is disconnected.
In this paper, a novel DAPD is presented to reduce pull-in
and to enhance the switching speed of the PLL, while
time
maintaining the same noise bandwidth and avoiding modulation damping. Section II describes the basic concept of the proposed structure. Sections III and IV present the realization and
the measurement of the system, respectively, and Section V concludes the paper.
Manuscript received November 30, 1999; revised April 28, 2000. This
work was sponsored by the National Science Council under Contract
88-2219-E-002-024.
C.-Y. Yang was with the Department of Electrical Engineering, National
Taiwan University, Taipei, Taiwan 10617, R. O. C. He is now with the
Department of Electronic Engineering, HuaFan University, Taipei, Taiwan
223, R.O.C.
S.-I. Liu is with the Department of Electrical Engineering, National Taiwan
University, Taipei, Taiwan 10617, R. O. C.
Publisher Item Identifier S 0018-9200(00)08697-2.

(1)
The conventional PD is implemented in conjunction with a
charge-pump loop filter in the PLL, as illustrated in Fig. 3.
To determine the transfer function of the PD, assume there is
and
in the
a time interval between two input signals
PD, the output current of the charge-pump circuit is a pulse of
duration , and the amplitude of the charge-pump current is
. In the continuous-time approximation, the average value
per input signal period can be given as
(2)
The transfer function curve of a linear PD is shown in Fig. 4(a),
where the vertical axis represents the charge injected into the
loop filter during one period of the input signal. The characteristic of a nonlinear PD, as shown in Fig. 4(b), can be divided
into two regions [7]. It has the same characteristic within the
locked-in region as that of the linear PD, but the acquisition
time will be reduced with the steeper characteristic outside the
lock-in region. When designing a PLL with the nonlinear PD,
first the central slope is determined to fulfill the requirement of
noise and modulation for the PLL with a standard PD. Then,
is gradually increased to improve acthe slope near
quisition speed. The proposed nonlinear PD can be built with
delay cells and standard PD circuits, as shown in Fig. 4(c). The
standard PD is a digital circuit, triggered by the positive edge
of the input reference signal and the output feedback signal
. Considering the delay cells with
, the PDs decide the
among these regions. Acposition of the phase difference
cording to the value of , the charge pump will output the corresponding current controlled by the up signals or the down
signals . The behavior model of the nonlinear PD can be explained by the waveforms of Fig. 4(d). According to the time
difference between both input signals and , the up signals
are used to increase and the down signals are used to decrease the frequency of signal . The nonlinear PD always generates the right signal to equalize the frequency of both input
signals as the conventional PD. The time interval is positive

00189200/00$10.00 2000 IEEE

1446

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 10, OCTOBER 2000

Fig. 1.

Block diagram of PLL with discriminator-aided phase acquisition.

Fig. 2.

Linear model of PLL frequency synthesizer.

and a resistor
with a capacitor
added in parallel as shown
in Fig. 2. The impedance of this filter can be
(4)

and
with
open-loop gain of the PLL equals

. The

(5)
Fig. 3. Phase detector with charge-pump filter.

which has a crossover frequency of

(negative) where leads (lags) . When is larger than


,
may appear high level; when is smaller
may appear high level. As the nonlinear PD
than
is applied, two cases can occur during different time interval :
, the injected charge
.
Case 1:
, the injected charge
Case 2:
, which can be approxias
, is very small.
mated
Generally, the total transfer function of the PD with the current-pump circuit and the loop filter can be expressed as [8]

(6)
The open-loop gain of this third-order PLL can be calculated in
terms of the frequency , as follows:

(7)
and its phase margin can be determined in terms of

(3)
(8)
is the pump current of the charge-pump circuit. The
where
is the series connection of a capacitor
impedance

In order to maintain the same loop gain and phase margin for

YANG AND LIU: FAST-SWITCHING FREQUENCY SYNTHESIZER

1447

Fig. 4. (a) Characteristic of the conventional linear phase detector. (b) Characteristic of the nonlinear phase detector. (c) Block diagram of the nonlinear phase
detector. (d) Operation of the nonlinear phase detector.

Fig. 5.

Block diagram of the frequency synthesizer.

the sake of stability, the charge-pump current becomes


instead of
outside the locked-in region, and the loop-filter
instead of
while increases
resistor would become
times, i.e., the loop bandwidth increases times. It may speed
up the switching capability of the PLL. Once it is locked on
the correct frequency, the PLL will then return to the low-noise
operation.

III. CIRCUIT REALIZATION


A. Architecture
The designed frequency synthesizer integrates the proposed
DAPD, the charge-pump circuit, a prescaler, and a VCO in a
single CMOS chip. It is similar to the structure of a conventional
integer-N frequency synthesizer, as shown in Fig. 5. By adding

1448

Fig. 6.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 10, OCTOBER 2000

Schematic of phase detector with DAPD and charge-pump filter.

the frequency-doubling block, the output frequency can be up


to 900 MHz from a 450-MHz VCO.
B. Phase Detector with DAPD and Charge-Pump Filter
A schematic diagram of the DAPD is shown in Fig. 6. The phase
frequency detectors are used to compare the phase difference of
) of the DAPD deboth input signals. The output signal (
pends on the phase difference of both input signals whether it is
or not. Considering the delay cells with delay
larger than
, which is very small but never negligible, the DAPD decides
the operating bandwidth of the loop filter. When leads , the
, and
is lowand
is
time difference islarger than
is smaller than
, and
high. Otherwise, when lags
is high and
is low. In a word, if the absolute value
,
of the time difference between input signals is larger than
may appear high level. Also, the charge-pump current beandtheresistoroftheloopfilterbecomes
,i.e.,
comes
. Until the absolute value of is within
and
are both high, thus
is brought to low level, then
and
, rethe charge-pump current and the resistor return to
spectively, with a narrower bandwidth for better noise rejection.
However, the delay cell is adopted according to the VCOs noise.
,
Assuming that the phase characteristic of the signal is
should be larger than
to make the DAPD work.
In our design, the loop bandwidth of the PLL equals about
krad/s, and the loop gain zero
and the loop pole

are placed on a factor of four below and above ,


of 560 A is applied
respectively. In addition, a pump current
is chosen. The values of the resistors
and the parameter
and
and the capacitors
and
are 470 , 235 , 33 nF,
and 2.2 nF, respectively. The open-loop gain response is depicted
in Fig. 7. Curve (a) is the characteristic of the PLL with the DAPD
while the bandwidth is 120 kHz. However, the PLL will return
curve (b) with the bandwidth of 40 kHz when it is near in lock.
These curves give the same phase margin of approximately 60 .
Thus the PLL would be usually stable.
Currently, most frequency synthesizers use phase-frequency
detectors (PFDs) as their PDs. A PFD is a sequential circuit which
can not only detect the phase error but also provides a frequencysensitive signal to aid acquisition when the loop is out of lock.
The drawback of some conventional PFDs is a dead zone in the
phase characteristic, which generates the phase error in the output
signals. To solve this problem, a dynamic CMOS PFD is adopted
as shown in Fig. 8(b), which is similar to the one proposed in
[10]. The PFD consists of two half-transparent registers, shown
in Fig. 8(a), [9] and a NAND gate. It is triggered by the negative
edge of input signals. The timing diagram of the PFD is shown in
Fig. 8(c). Even though the input signals are in-phase, the glitches
caused by the reset path always exist. So, extra filters are added
in the DAPD to remove the effect of the glitches.
So far, the positive gain of the VCO is applied from the above
discussion. However, since the gain of the VCO is negative as

YANG AND LIU: FAST-SWITCHING FREQUENCY SYNTHESIZER

1449

Fig. 7. Simulated open-loop gain Bode plot.

(a)

(b)

(c)
Fig. 8. Implementation of phase-frequency detector. (a) Half-transparent cell. (b) Phase-frequency detector (PFD). (c) Timing diagram of PFD.

described later,
and
of the PD connected to the charge
pump should be interchanged. The charge pump, which is based
on one described in [11], is adopted. It suppresses the charge
sharing from the parasitic capacitance by a pair of switchedcurrent sources.
C. Dual-Modulus Prescaler
The dual-modulus prescaler is the high-frequency building
block in the frequency synthesizer. This circuit shown in Fig. 9
divides the frequency of the VCO output signal by a factor of
32 or 33 depending on the logic value of the controlled signal

mode [12][15]. It consists of a synchronous divide-by-4/5


counter as the first stage and an asynchronous divide-by-8
counter as the second stage. The circuits in the first stage
are fully differential, while the single-ended logic circuits
are used in the second stage. To reduce the supply noise, an
emitter-coupled logic (ECL)-like differential logic is used in
the high-speed stage [16]. In the divide-by-4/5 circuit, the
DFF is a differential flip-flop. Fig. 10 shows the schematic
diagram of a NAND-gate logic flip-flop. Merging the logic gates
to a flip-flop saves power and increases the operating speed.
The toggle flip-flops are made by true single-phase clocking
(TSPC) DFFs of [12] behind a differential-to-single buffer.

1450

Fig. 9.

Fig. 10.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 10, OCTOBER 2000

Fig. 11.

VCO schematic.

Fig. 12.

IC microphotograph.

Fig. 13.

Experimentally measured VCO transfer curve.

Functional block diagram of the dual-modulus prescaler.

Schematic of the differential NAND-gate flip-flop.

This buffer is used to achieve the rail-to-rail output signal in


the low-speed stage.
D. VCO
The VCO is another high-frequency building block in a frequency synthesizer. Still, an ECL-like current-mode differential
pair, as shown in Fig. 11, is used as a delay cell [17], [18] to
achieve high common-mode rejection in a four-stage ring oscillator. The coarse tuning of the ring-oscillators center frequency
is achieved by the bias Vbpo1 (or through the use of a digital-to-analog converter), and a fine tuning technique is needed
for the PLL voltage-control path. The gain required for the oscillator is easily determined by the ratio of M1 and M2 as the
current gain. The proposed delay cell has the better noise performance because the operation of the circuit is carried out by
the differential signal immune to the power-supply-injected and
substrate-injected noise sources. The replica bias circuit adjusts
the load over a wide range in response to a swept supply current. It insures the output swing of delay cells maintain fixed
and takes a changeable bias current to cover a suitable range of
different output frequencies. Bypass capacitors are also an important consideration for the replica bias and voltage reference
circuits. On-chip bypass capacitors can be used to help reduce
their noise contribution to the ring-oscillator delay cells.

IV. MEASUREMENT RESULTS


The synthesizer is implemented in a 0.35- m CMOS process.
The microphotograph of the fabricated frequency synthesizer
is shown in Fig. 12. The loop filter is off-chip, and the output
signal of the VCO is connected to a source follower. The frequency synthesizer is measured at a supply voltage of 3.3 V.
The frequency of the reference signal is 14 MHz. Fig. 13 shows
the measured VCO transfer function by varying the controlled
voltage. The measured VCO has a monotonic frequency range
of 435485 MHz. The gain of the VCO is 32.4 MHz/V at the
center frequency of 460 MHz. Fig. 14 shows the output signal
spectrum (using HP8560A Spectrum Analyzer after locked) of
448 MHz with the phase noise 99 dBc/Hz at 100-kHz offset.
By adding an external frequency doubler, however, the phase
noise is 91 dBc/Hz at 100-kHz offset from 896-MHz carrier
as shown in Fig. 15. Also, the measured waveform in the time
domain is also shown in Fig. 16, and its rms and peak-to-peak
jitter measured by CSA803 (Communication Signal Analyzer)

YANG AND LIU: FAST-SWITCHING FREQUENCY SYNTHESIZER

1451

Fig. 14. Measured output spectrum of the frequency synthesizer. (a) With span
50 MHz. (b) With span 1 MHz.

Fig. 16.

Measured waveform. (a) In time domain. (b) Jitter performance.

Fig. 17.

Measured frequency jump waveform of the frequency synthesizer.

Fig. 15. Measured output spectrum of the frequency synthesizer with added
frequency doubler.

are 18.9 and 110 ps, respectively. Another important parameter


is the time that the PLL takes to lock in to a new frequency when
channel switches. Fig. 17 shows the switching waveforms for
a frequency jump from 448 to 462 MHz from the HP53310A
Modulation Domain Analyzer. Obviously, the DAPD can improve the switching speed of the PLL. The power consumption
is 120 mW and the chip area is 40 2.0 mm including the pad
areas.

1452

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 10, OCTOBER 2000

V. CONCLUSION
In this paper, a PLL with the DAPD is implemented in a
0.35- m CMOS process. The proposed DAPD can be applied
to enhance the switching speed of the PLL, but maintain better
noise bandwidth. When adding the DAPD in the PLL, it will
control the charge pump and loop filter and still maintain
the loop stablity with the same phase margin as in the steady
state. The prototype frequency synthesizer using this structure
is also implement at 448 MHz, and the output waveform
is 99 dBc/Hz at 100-kHz offset. By adding a frequency
doubler, the synthesizer can operate at 896 MHz, and the output
waveform is 91 dBc/Hz at 100-kHz offset from carrier.
ACKNOWLEDGMENT
The authors would like to thank the SHARP Technology
Company, Japan, for the fabrication of the chip.
REFERENCES
[1] F. M. Gardner, Phaselock Techniques, 2nd ed. New York, NY: Wiley,
1979.
[2] P. Larsson, Reduced pull-in time of phase-locked loops using a simple
nonlinear phase detector, IEE Proc. Commun., vol. 142, no. 4, pp.
221226, Aug. 1995.
[3] D. H. Wolaver, Phase-Locked Loop Circuit Design. Englewood Cliffs,
NJ: Prentice-Hall, 1991.
[4] F. M. Gardner, Charge-pump phase-locked loops, IEEE Trans.
Commun., vol. COM-28, pp. 18491858, Nov. 1980.
[5] R. E. Best, Phase-Locked Loops: Theory, Design and Applications. New York, NY: McGraw-Hill, 1984.
[6] M. V. Paemel, Analysis of a charge-pump PLL: A new model, IEEE
Trans. Commun., vol. 42, pp. 24902498, July 1994.
[7] C. Y. Yang, W. C. Chung, and S. I. Liu, Effectively reduced pull-in
time of PLL with nonlinear phase comparator, in 8th VLSI/CAD Symp.,
Taiwan, R.O.C., Aug. 1997, pp. 205208.
[8] D. Byrd, C. Davis, and W. O. Keese, A fast locking scheme for PLL
frequency synthesizer, National Semiconductor, Santa Clara, CA, Application Note, July 1995.
[9] J. Yuan and C. Svensson, Fast CMOS nonbinary divider and counter,
Electron. Lett., vol. 29, pp. 12221223, June 1993.
[10] S. Kim, K. Lee, Y. Moon, D. K. Jeong, Y. Choi, and H. K. Kim, A 960Mb/s/pin interface for skew-tolerant bus using low jitter PLL, IEEE J.
Solid-State Circuits, vol. 32, pp. 691700, May 1997.
[11] I. A. Young, J. K. Greason, and K. L. Wong, A PLL clock generator with
5- to 110-MHz of lock range for microprocessors, IEEE J. Solid-State
Circuits, vol. 27, pp. 15991607, Nov. 1992.

[12] Q. Huang and R. Rogenmoser, Speed optimization of edge-triggered


CMOS circuits for gigahertz single-phase clocks, IEEE J. Solid-State
Circuits, vol. 31, pp. 456465, Mar. 1996.
[13] B. Chang, J. Park, and W. Kim, A 1.2- GHz CMOS dual-modulus
prescaler using new dynamic D-type flip-flop, IEEE J. Solid-State Circuits, vol. 31, pp. 749752, May 1996.
[14] P. Larsson, High-speed architecture for a programmable frequency divider and a dual-modulus prescaler, IEEE J. Solid-State Circuits, vol.
31, pp. 744748, May 1996.
[15] C. Y. Yang, G. K. Dehng, J. M. Hsu, and S. I. Liu, New dynamic
flip-flops for high-speed dual-modulus prescaler, IEEE J. Solid-State
Circuits, vol. 33, pp. 15681571, Oct. 1998.
[16] F. Piazza and Q. Huang, A low-power CMOS dual-modulus prescaler
for frequency synthesizer, IEICE Trans. Electron., vol. E80-C, pp.
314319, Feb. 1997.
[17] S. J. Lee, B. Kim, and K. Lee, A fully integrated low-noise 1-GHz
frequency synthesizer design for mobile communication application,
IEEE J. Solid-State Circuits, vol. 32, pp. 760765, May 1997.
[18] D. Y. Jeong, S. H. Chae, W. C. Song, and G. H. Cho, High-speed differential-voltage clamped current-mode ring oscillator, Electron. Lett.,
vol. 33, pp. 11021103, June 1997.

Ching-Yuan Yang (S97) was born in Miaoli,


Taiwan, R.O.C., in 1967. He received the B.S.
degree in electrical engineering from the Tatung
Institute of Technology, Taipei, Taiwan, in 1990, and
the M.S. and Ph.D. degrees in electrical engineering
from National Taiwan University, Taipei, in 1996
and 2000, respectively.
He is currently and Assistant Professor with the
Department of Electronic Engineering, Huafan University, Taiwan. His research interests are in the area
of integrated circuits and systems for high-speed interfaces and wireless communications.

Shen-Iuan Liu (S88M93) was born in Keelung,


Taiwan, R.O.C., on April 4, 1965. He received the
B.S. and Ph.D. degrees in electrical engineering from
National Taiwan University, Taipei, Taiwan, in 1987
and 1991, respectively.
During 1991 to 1993, he served as a Second
Lieutenant in the Chinese Air Force. During 1991 to
1994, he was an Associate Professor in the Department of Electronic Engineering, National Taiwan
Institute of Technology. He joined the Department of
Electrical Engineering, National Taiwan University,
Taipei, Taiwan in 1994 and has been a Professor since 1998. He holds nine
U.S. patents and fourteen R.O.C. patents, with some pending. His research
interests are in analog and digital integrated circuits and systems.

2232

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998

Low-Power Dividerless Frequency


Synthesis Using Aperture Phase Detection
Arvin R. Shahani, Derek K. Shaeffer, Student Member, IEEE, S. S. Mohan, Student Member, IEEE,
Hirad Samavati, Student Member, IEEE, Hamid R. Rategh, Student Member, IEEE,
Maria del Mar Hershenson, Student Member, IEEE, Min Xu, Student Member, IEEE,
C. Patrick Yue, Student Member, IEEE, Daniel J. Eddleman, Student Member, IEEE,
Mark A. Horowitz, Senior Member, IEEE, and Thomas H. Lee, Member, IEEE

AbstractA phase-locked-loop (PLL)-based frequency synthesizer incorporating a phase detector that operates on a windowing
technique eliminates the need for a frequency divider. This new
loop architecture is applied to generate the 1.573-GHz local
oscillator (LO) for a Global Positioning System receiver. The
LO circuits in the locked mode consume only 36 mW of the
total 115-mW receiver power, as a result of the power saved by
eliminating the divider. The PLLs loop bandwidth is measured
to be 6 MHz, with a reference spurious level of 047 dBc. The
front-end receiver, including the synthesizer, is fabricated in a
0.5-m, triple-metal, single-poly CMOS process and operates on
a 2.5-V supply.
Index TermsFrequency synthesizers, Global Positioning System, phase detection, phase-locked loops, radio-frequency integrated circuits, radio receivers.
Fig. 1. GPS receiver architecture.

I. INTRODUCTION

HE growing demand for portable, low-cost wirelesscommunication devices has spurred interest in radiofrequency integrated circuits. Part of offering a completely
integrated solution involves identifying a low-power, monolithic gigahertz local oscillator (LO) implementation. A quartzcrystal-based oscillator cannot be used directly for the LO,
since the fundamental modes of inexpensive quartz crystals
are limited to approximately 30 MHz [1], and overtone orders
of 50 are impractical. However, a crystal oscillator can be
used as the reference in a static-modulus phase-locked-loop
(PLL) frequency synthesizer. As is well known, the stability of
the frequency-multiplied reference is retained by a wideband
loop. This ability to synthesize a stable high-frequency source
is beneficial, but it comes at the expense of significant power
consumption. This paper addresses the power issue by introducing a new type of phase detector capable of phaselocking
the synthesizers frequency-multiplied output to its reference
input, without the use of a divider. Eliminating the need for
the divider allows the synthesis of a 1.573-GHz output on only
36 mW of power in this technology.
Section II examines the PLL-based LO used for the Global
Positioning System (GPS) receiver architecture shown in
Fig. 1 [2] and introduces the element that eliminates the need
Manuscript received May 7, 1998; revised August 4, 1998.
The authors are with the Center for Integrated Systems, Stanford University,
Stanford, CA 94305 USA.
Publisher Item Identifier S 0018-9200(98)09432-3.

Fig. 2. Integer-

synthesizer block diagram.

for a divider: the aperture phase detector (APD). Treatment


begins at the architectural level and descends into the APDs
detailed nature. Both the theory and the implementation of
an APD are covered. Section III presents experimental results
on the APD PLL.
II. PLL
A. Architecture
The conventional and widely used implementation of the
PLL frequency synthesizer with static modulus is the integersynthesizer [3]. The traditional divide-by- block shown
in Fig. 2 can be realized with a single counter. However,
there are two drawbacks associated with the divider: power
consumption and switching noise. Power consumption is large,
particularly at high frequencies, because of the well-known
relationship. For example, a recently published 1.6-

00189200/98$10.00 1998 IEEE

SHAHANI et al.: FREQUENCY SYNTHESIS USING APD

2233

(a)

Fig. 4. APD synthesizer block diagram.

(b)

(c)

Fig. 5. Idealized APD state diagram.

(d)
Fig. 3. Phaselock techniques. (a) Phaselocked signals. (b) Phaselock with a
divider and PFD. (c) PFD along; negative charge pump current commands
the VCO to decrease its frequency, breaking phaselock. (d) Phaselock with
an APD.

GHz integer- synthesizer built in a 0.6- m CMOS technology reported a total power consumption of 90 mW, of which
22.5 mW were used by the divider [4]. A further disadvantage
of the divider is the on-chip interference generated by its highspeed digital transitions. This is particularly worrisome if the
synthesizer is to be integrated with the front ends sensitive
low-noise amplifier.
To reduce power consumption and high-frequency noise, a
windowing technique that eliminates the divide-by- block
for phase comparisons is investigated here. To appreciate how
windowing may be of benefit, it is worthwhile to revisit
the phenomenon of locking in a conventional PLL. To retain phaselock, it is necessary to align every th rising or
falling edge of the voltage controlled oscillator (VCO) with
a corresponding reference edge. Phaselock is demonstrated in
, where every fourth rising VCO edge
Fig. 3(a) for
lines up with a rising reference edge. A divider with a phasefrequency detector (PFD) accomplishes edge alignment by first
dividing down the VCO by the right multiple so that edge

alignment is unambiguous, as pictured in Fig. 3(b). Because


the PFD compares phase over the entire reference cycle, a
PFD cannot phaselock two inputs at different frequencies. In
fact, it is precisely this property that makes the PFD popular.
Now consider using a PFD without a divider. Clearly, there
would be an edge ambiguity problem, rendering the PFD
quite ineffective, as seen in Fig. 3(c). The reason is that the
PFD responds to every edge of the VCO, evidenced by the
charge pump currents net negative value. This erroneously
commands the VCO to decrease its frequency. However, by restricting the time interval during which phase is examined, one
may eliminate the edge ambiguity, and hence the frequency
divider. The dashed boxes in Fig. 3(d) define the window
during which phase may be compared, even if the two inputs
are of different frequency. The window can be controlled by
the reference time base, since it periodically opens at that
rate. Furthermore, the window need only be wide enough
so that a VCO edge falls within it, which is equivalent to
requiring that the window be active for a time longer than the
instantaneous VCO period. No dividers are thus necessary to
maintain phaselock, and this phase detector, called an APD,
can operate with two inputs that are at different frequencies,
as shown in Fig. 4.
A more substantive description of the APDs operation is
provided in Fig. 5, which illustrates the state diagram for an
idealized APD. When the window opens, the phase detector
becomes active. The -input rising edge sets the (denoting
late) terminal true, and the -input rising edge sets the

2234

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998

Fig. 7. APD PLL block diagram in lock.

where

is the reference phase and


(3)

is the VCO phase and


is the angular VCO frewhere
quency. The average charge pump current over one reference
cycle can thus be written as

N FAA.

Fig. 6. APD synthesizer block diagram with integer-

(denoting early) terminal true. Subsequent edges of the


-input are ignored until the next window opens. The time
and
signals
difference between the rising edges of the
is proportional to the phase error between the reference phase
and the VCO phase. If is set first, the VCO phase is late;
and conversely, if is set first, the VCO phase is early. When
the window closes, the and terminals are reset (to false).
Fig. 1 shows that some type of frequency acquisition aid
(FAA) is required to bring an APD-based loop initially into
lock. This necessity is a consequence of restricting phase
comparisons to a window, which eliminates the phase detectors ability to perform frequency detection. This issue is
discussed in further detail in Section II-D. For this work, an
external acquisition aid was used for experimental purposes.
An integrated implementation of the acquisition aid, Fig. 6,
uses the traditional divider with PFD to lock the loop and
then powers down the acquisition aid, transferring control to
the low-power APD. An APD can be used once in lock because
the reference is derived from a stable crystal oscillator.

(4)
, giving

When the loop is in lock,

(5)
is the phase-detector gain constant. Note that even
where
though there is no explicit divider in the loop, the VCO phase
in (5), just as in a conventional loop.
is divided by
This model can be used in place of the APD in Fig. 4, and
the other blocks in the same figure can be replaced by their
corresponding linear time-invariant (LTI) models, yielding
the overall system model shown in Fig. 7. Fig. 7 is an LTI
representation of the APD PLL in lock, from which the phase
transfer function is readily found to be
(6)
is the VCO gain constant and
where
filters impedance, expressed in the -domain.

B. Loop Theory
Having provided an overview of APD operation, we now
develop a linearized APD PLL model relating input and output
phase. This model is important for quantitative loop design and
ensures that the synthesized output has the desired stability
and noise performance.
From the description of the late and early APD signals given
in the previous subsection, the average charge pump current
over one reference cycle is given by
(1)
is the magnitude of the charge pump current,
is
where
the time of the first VCO rising edge in the window, is the
is the
time of the reference rising edge in the window, and
angular reference frequency. The current can be expressed
as a function of the reference and VCO phases by relating these
phases to and , assuming small phase errors. Expressions
relating edge time to signal phase are
(2)

C. APD Characteristic (

Versus

is the loop

The derivation in the previous subsection treats the APD


for small phase errors. For completeness, it is instructive to
examine the response of the APD to arbitrary phase errors.
Now, the delay between the time the window opens and the
time at which the reference edge occurs becomes important.
This delay is designated by , which is a positive quantity
, where
is
whose least restrictive range is limited to
the reference period. However, the loop can lock if and only
, where
is the VCO period.
if is in the interval
Otherwise, the first VCO edge within the window will always
precede the reference edge.
From Fig. 8, it is apparent that the characteristic will be
periodic in VCO phase, because when the VCO waveform has
moved one VCO period to the right, the situation is identical
to the start. As the VCO waveform moves to the right, the
varies proportionally with phase error
time difference
. Therefore, to find the APDs characteristic, and need
to be calculated at only two points, and the remainder of the

SHAHANI et al.: FREQUENCY SYNTHESIS USING APD

2235

Fig. 10. APD characteristic for

= (Tv )=2 and

= 4.

Fig. 8. Position of VCO and reference edge in window.

Fig. 11.
Fig. 9. APD characteristic over (2 )=N interval.

= 2;

= 7 subharmonic-lock mode.

phase errors periodicity, with larger values increasing the


periodicity. Fig. 10 shows the complete APD characteristic (a
variation in ) for the specific case where
and
.

characteristic is generated by connecting these endpoints.


and
are first calculated for
(7)
(8)
Next,

and

are calculated at the other extreme where

(9)
(10)
From this information, the portion of the APD characteristic
shown in Fig. 9 can be constructed.
The influence of two parameters , the delay between
the time the window opens and when the reference edge
occurs, and , the ratio between the VCO and reference
frequencieswarrants special attention. Decreasing shifts
the characteristic diagonally up (along the line of the characteristic), and increasing shifts the characteristic diagonally
down. It is desirable to have equal to half the synthesized
frequencys period. By designing for this condition, the APD
to provide a
characteristic will be centered about
affects the
symmetrical correction range. The parameter

D. Subharmonic-Lock Modes
The existence of subharmonic-lock modes explains the need
for an acquisition aid. During each window, which opens
periodically at the reference rate, the APD makes a single
phase comparison. It is this property that allows an APD to
phaselock the VCOs output to an integer multiple of the
reference input. But the ability to examine the phase of two
signals at different frequencies introduces more modes than
just the desired integer-lock modes. Additional subharmoniclock modes occur if the net current delivered over multiple
cycles of the reference is zero, allowing the loop to stay locked
at an undesired frequency [5].
the number of reference cycles over
If we designate by
which the net charge delivered to the loop filter is zero, then
an expression relating the reference frequency to the VCO
. Fig. 11
frequency when phaselock occurs is
displays the points on the APD characteristic between which
, and
the loop ping-pongs for the specific case where
. Because
, the charge pump alternates between
pumping up on one cycle and pumping down on the next cycle,
balancing the charge to the loop filter over two cycles.

2236

Fig. 12.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998

APD circuit diagram.

These subharmonic-lock modes are problematic because


they are spaced, in frequency, closer than the neighboring
integer-lock modes. However, the APD favors integer over
subharmonic modes for two reasons. First, the loops bandwidth imposes a limit on . If the number of reference cycles
over which the charge pump current averages to zero grows
too large, the loop will act on partial information because the
loop responds to signals averaged over a loop period. The loop
period is the reciprocal of the closed-loop bandwidth. Another
reason the APD favors integer over subharmonic modes is that
because
the subharmonic modes have a lower detector gain
the VCO edge arrives at a different time in each of the
cycles. If the APD characteristic is nonlinear, then the overall
individual linearized
detector gain is the average of the
detector gains.
Using an FAA to ensure frequency lock eliminates the
concern of locking in a subharmonic mode. Once lock has
been achieved, and control transferred to the APD, the APD
is capable of maintaining lock at the desired frequency.
E. APD Circuit Implementation
Fig. 12 shows a circuit implementation of an APD. The
reference clock (which has about a 50% duty cycle) is shaped
by the structure preceding the delay to have fast falling edges,
since these are the edges that enable the precharged gates.
is off and
is on,
When the reference input is low,
causing the output to be high. After the reference rises,
shuts off before
turns on due to the two inverter delays.
does not fight
to pull the output low, and
Therefore,
creates a fast falling edge. The window opens on this fast
falling edge. The delay between the opening of the window
and the reference edge is determined by two inverters with a
capacitor in the middle. The APD uses two precharged gates
to evaluate the reference and VCO phases. An advantage of
precharged gates is that they only respond once while active.
In this case, the precharged gates are precharged low, and rise
on detection of low levels. Also, the precharge action does not
affect the loop to first order, because the state
has the same action as the state
.

The behavior of this APD circuit differs somewhat from


the ideal APD discussed in Section II-A. In particular, the
circuit implementation responds to falling edges instead of
rising edges, and more precisely, the precharged gates act
as level detectors of a low voltage level instead of as edge
detectors.
A simulation of the APD characteristic over a
interval
. The phase error
is
is shown in Fig. 13, where
plotted against the average charge pump current over one
reference cycle , and includes the nonidealities of the charge
pump as well. The flat section near 0.2 rad is where the
signal driving the charge pump is compressing due to
the level detection nature of the precharged gates. Another
imperfection in this circuits APD characteristic is the section
with finite negative slope instead of a discontinuity. From the
in a state
characteristic, the phase detectors gain constant
is evaluated to be 7.4
of zero static phase error
A/rad.
F. PLL Circuit Implementation
In Section II-B, a general model for a locked APD PLL was
developed, expressing the closed-loop phase transfer function
in terms of the loop filters -domain impedance and an
idealized VCO. We now provide specific expressions for the
loop as actually implemented.
The loop filter used is the conventional network shown in
Fig. 14, whose -domain impedance is
(11)
A single-pole amplifier was used to interface with the VCOs
must
varactor, thus the ideal VCO transfer function
be modified to
(12)
is the 3-dB bandwidth of the VCOs preamplifier.
where
Using (11) and (12) in (6) enables us to write the complete
phase transfer function of the implemented APD PLL as shown
in (13) at the bottom of the page. In the next section, we
compare measured data to (13).
III. EXPERIMENTAL RESULTS
A test chip (see Fig. 15) containing a copy of the APD
PLL used in the complete GPS receiver is used to evaluate
the APD PLL. Two separate tests are performed; one to verify
the derived closed-loop transfer function of the APD PLL and
the other to observe the synthesized LO spectrum for the GPS
receiver. In the second test, the synthesized LO is also checked

(13)

SHAHANI et al.: FREQUENCY SYNTHESIS USING APD

Fig. 13.

Simulated APD circuit characteristic.

Fig. 14.

Loop filter used in APD PLL.

2237

Fig. 16. PLL test setup number 1.

Fig. 15.

Die photograph of PLL on GPS receiver test chip.

with a microwave frequency counter to verify its long-term


stability.
Fig. 16 shows the experimental setup for the first test. Phase
noise is measured for offsets from 1 kHz to 10 MHz with the
HP8563E spectrum analyzer, which has special phase-noisemeasurement software. Ten MHz is used as the upper limit
since the loop is designed to have a bandwidth less than 10
MHz. Beyond the loops bandwidth, the PLLs phase noise is

determined by the VCOs phase noise, making measurement


of the PLLs transfer function difficult. One of the largest
factors affecting measurement accuracy is the noise floor of
the instrument. To minimize this error source, measurements
of the floor with a clean source are performed first. These
results are later used to calibrate the data. Reference phase
noise and PLL phase noise are also both measured. After some
data processing, the PLLs closed-loop phase transfer function
is determined.
and the predicted
Fig. 17 shows the measured
, from (13), for the case where the reference frequency
.
is 143 MHz and the VCO frequency is 1.573 GHz
is
The seven loop parameters in (13) are set as follows:
is taken from measured VCO data;
, and
known;

2238

Fig. 17.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998

Measured and predicted

H (f ) .
j

Fig. 19. LO spectrum.


TABLE I
MEASURED APD PLL PERFORMANCE

Fig. 18.

PLL test setup number 2.

are taken to be their designed loop filter values;


is
is fit. The fit value
calculated from the technology data; and
, 6.6 A/rad, is a little less than the simulated value noted
of
for an ideal
in Section II-E, 7.4 A/rad. Since
is due to
APD, one could argue that the discrepancy in
an actual pump current that is lower than the pump current
is measured, it is found to
used in simulations. But when
is readily explained.
be correct. Still, the discrepancy in
The simulation in Fig. 13 establishes an upper bound on
because it is measured in a state of zero static phase error.
The simulated APD circuit characteristic illustrates that the
detector gain (i.e., the slope) decreases the farther that one
departs from zero radians. The charge pump is known to have
some offset; thus, the loop has some static phase error in lock
.
to overcome the offset, resulting in a slightly lower
Fig. 18 shows the experimental setup for the second test.
The LO spectrum is measured with the HP8563E spectrum
analyzer, and the frequency is checked with an HP5350B
microwave frequency counter. Fig. 19 displays the synthesized
output spectrum, in which the PLLs ability to track the
low close-in phase noise of the reference can be seen. The
visible skirts are due to the VCOs phase noise outside the
6-MHz bandwidth of the PLL. Spurious tones at 47 dBc
are primarily due to control-line ripple resulting from charge
pump leakage. In GPS applications, the measured spurious
level is acceptable because of the absence of blockers at
the corresponding offset frequencies. In more demanding
applications, one may reduce ripple through improved charge
pump design and the use of analog phase interpolation [3].
Table I provides a summary of the APD PLLs performance.

The PLL has a wide bandwidth of 6 MHz, and the APD circuit
consumes only one-quarter of the total synthesizer power. With
the elimination of the divider, the main power consumer in
the synthesizer is now the VCO.
IV. CONCLUSION
A new method for performing phase detection that eliminates the divide-by- function within a PLL has been presented. A frequency acquisition aid circuit, which can be
powered down once lock is established, is required. By using
an aperture phase detector, a 1.573-GHz local oscillator can be
synthesized on roughly half the power of a loop containing a
conventional divider. Additionally, elimination of the divider
also reduces the frequency of transitions that might cause
substrate and supply bounce. The power savings and noise
reduction make the APD PLL an attractive design for lowpower, integrated frequency synthesizers.
ACKNOWLEDGMENT
The authors gratefully acknowledge Rockwell International
for fabricating the receiver and Dr. C. Hull and Dr. P. Singh
for their valuable assistance. In addition, the authors acknowledge Tektronix, Inc., for supplying simulation tools and E.
McReynolds for his invaluable support of, and assistance with,
CMOS modeling issues. Last, the authors thank IBM for
generous student support through IBM fellowships.

SHAHANI et al.: FREQUENCY SYNTHESIS USING APD

REFERENCES
[1] M-tron Engineering Notes, Dec. 1997.
[2] D. K. Shaeffer, A. R. Shahani, S. S. Mohan, H. Samavati, H. R. Rategh,
M. Hershenson, M. Xu, C. P. Yue, D. J. Eddleman, and T. H. Lee, A
115-mW, 0.5-m CMOS GPS receiver with wide dynamic-range active
filters, IEEE J. Solid-State Circuits, vol. 33, pp. 22192231, Dec. 1998.
[3] T. H. Lee, The Design of CMOS Radio-Frequency Integrated Circuits.
Cambridge: Cambridge Univ. Press, 1998.
[4] J. F. Parker and D. Ray, A 1.6 GHz CMOS PLL with on-chip loop
filter, IEEE J. Solid-State Circuits, vol. 33, pp. 337343, Mar. 1988.
[5] F. M. Gardner, Phaselock Techniques, 2nd ed. New York: Wiley, 1979.

2239

Min Xu (S97), for a photograph and biography, see this issue, p. 2231.

C. Patrick Yue (S93), for a photograph and biography, see this issue, p. 2231.

Daniel J. Eddleman (S98), for a photograph and biography, see this issue,
p. 2231.
Arvin R. Shahani, for a photograph and biography, see this issue, p. 2041.

Derek K. Shaeffer (S98), for a photograph and biography, see this issue,
p. 2230.

S. S. Mohan (S98), for a photograph and biography, see this issue, p. 2231.

Hirad Samavati (S98), for a photograph and biography, see this issue, p.
2041.

Hamid R. Rategh (S98), for a photograph and biography, see this issue,
p. 2231.

Maria del Mar Hershenson (S98), for a photograph and biography, see this
issue, p. 2231.

Mark A. Horowitz (S77M78SM95) received the B.S. and M.S. degrees


in electrical engineering from the Massachusetts Institute of Technology,
Cambridge, in 1978 and the Ph.D. degree from Stanford University, Stanford,
CA, in 1984.
He is the Yahoo Founders Professor of Electrical Engineering and Computer
Science at Stanford. His research area is in digital system design. He has led
a number of processor designs including MIPS-X, one of the first processors
to include an on-chip instruction cache; TORCH, a statistically scheduled,
superscalar processor; and FLASH, a flexible DSM machine. He has also
worked on a number of other chip design areas, including high-speed memory
design, high-bandwidth interfaces, and fast floating point. In 1990, he took
a leave from Stanford to help start Rambus, Inc., a company designing
high-bandwidth memory interface technology. His current research includes
multiprocessor design, low-power circuits, memory design, and high-speed
links.
Dr. Horowitz received a 1985 Presidential Young Investigator Award and
an IBM Faculty Development Award, as well as the 1993 Best Paper Award
from the International Solid-State Circuits Conference.

Thomas H. Lee (S87M87), for a photograph and biography, see this issue,
p. 2041.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

777

A 1.8-GHz Self-Calibrated Phase-Locked Loop with


Precise I/Q Matching
Chan-Hong Park, Student Member, IEEE, Ook Kim, Member, IEEE, and Beomsup Kim, Senior Member, IEEE

AbstractThis paper describes a 1.8-GHz self-calibrated


phase-locked loop (PLL) implemented in 0.35- m CMOS
technology. The PLL operates as an edge-combining type fractionalfrequency synthesizer using multiphase clock signals
from a ring-type voltage-controlled oscillator (VCO). A self-calibration circuit in the PLL continuously adjusts delay mismatches
among delay cells in the ring oscillator, eliminating the fractional
spur commonly found in an edge-combing fractional divider due
to the delay mismatches. With the calibration loop, the fractional
spurs caused by the delay mismatches are reduced to 55 dBc,
and the corresponding maximum phase offsets between the
multiphase signals is less than 0.2 . The frequency synthesizer
PLL operates from 1.7 to 1.9 GHz and the closed-loop phase noise
is 105 dBc/Hz at 100-kHz offset from the carrier. The overall
circuit consumes 20 mA from a 3.0-V power supply.
Index TermsDelay mismatch, fractional- frequency synthesizer, I/Q signal generation, PLL, ring oscillator, self-calibration.

I. INTRODUCTION

N A MODERN wireless digital data transmission system,


both in-phase (I) and quadrature-phase (Q) channels are used
for channel efficiency. Therefore, precise I and Q clock signals for the modulation/demodulation of both I and Q channels
are needed for high-performance digital transceivers. Since any
gain or phase imbalance between I and Q signals reduces the
dynamic range and degrades the bit-error rate (BER) of the receivers, the accuracy of the 90 phase difference between I and
Q clock signals generated from a local oscillator (LO) must be
maintained as far as possible. Especially for homodyne or image
rejection receivers, the effects of I/Q mismatch become more
critical [1].
A lossy phase shifter utilizing resistors and capacitors [2],
[3] or a quadrature generator using a frequency divider [4],
[5] is widely used to derive quadrature-phase clock signals
from a single-phase oscillator. However, the RC phase shifter
circuit often produces phase and amplitude errors due to the
mismatches of components, and the quadrature generator may
suffer from phase imbalances due to the inexact input clock
duty cycle. For the correction of these I/Q phase errors, some
analog or digital calibration techniques have been adopted [6],
[7]; however, increased phase noise or complexity caused by
the added calibration system limits the performance of the
system.
Manuscript received June 15, 2000; revised October 15, 2000.
C.-H. Park and B. Kim are with the Department of Electrical Engineering
and Computer Sciences, Korea Advanced Institute of Science and Technology,
Taejon 305-701, Korea (e-mail: bkim@ee.kaist.ac.kr).
O. Kim was with SK Telecom, Sungnam Kunggi-do 463-020, Korea. He is
now with Silicon Image, Inc., Sunnyvale, CA 94085 USA.
Publisher Item Identifier S 0018-9200(01)03031-1.

Fig. 1.

I/Q signal generation from a ring oscillator.

On the other hand, a ring oscillator may be used to produce


the quadrature clock signals without such a phase shifter [8],
[9]. The multiphase clock signals from a multistage ring oscillator are easily converted into the I/Q clock signals, as shown in
Fig. 1. However, in practice, the mismatches between the delay
cells cause significant I/Q phase errors, and an additional phase
shifter is therefore required.
A self-calibration technique to eliminate the delay mismatches between the delay cells in a ring oscillator is proposed
in this paper. A delay calibration loop in the phase-locked loop
(PLL) measures the delay mismatch in each delay cell at a
time and eliminates it through an extra control line attached to
the delay cells. The calibration loop automatically operates in
the background and barely interferes with the main PLL loop
behavior.
freA prototype 1.8-GHz edge-combining fractionalquency synthesizer equipped with the calibration circuit is
implemented. Fig. 2 shows the block diagram. Thanks to
the calibration technique, the fractional spur caused by the
mismatches is attenuated by 25 dB and the I/Q phase offset is
maintained within 0.2 .
The structure of the proposed fractional- frequency synthesizer is presented in Section II. Section III describes the
delay mismatch problem of the edge-combining-type synthesizer PLL. Section IV shows the underlined algorithm for the
proposed mismatch calibration scheme. Section V presents
the circuit implementation of the self-calibrated frequency
synthesizer PLL. Finally, experimental results are shown in
Section VI, and conclusions are given in Section VII.
II. EDGE-COMBINING FRACTIONALSYNTHESIZER

FREQUENCY

The frequency resolution of a conventional integer- frequency synthesizer is the same as the reference frequency of

00189200/01$10.00 2001 IEEE

778

Fig. 2. Block diagram of the proposed fractionalPLL with a self-calibration loop.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

frequency synthesizer

the synthesizer PLL. Therefore, narrow channel spacing is accompanied by a small loop bandwidth, which leads to slow dynamics [10]. In case of a fractional- frequency synthesizer,
the output frequency is a fractional multiple of the reference frequency, so that narrow channel spacing is achieved along with a
higher phase detector frequency. Consequently, the loop bandwidth can be widened, and faster settling time and lower close-in
phase noise of the frequency synthesizer is achieved [10].
The noninteger dividing values in a fractional- synthesizer
can be realized by the periodic dithering of the dividing ratio
between integer values [11]. However, the dithering leads to
a periodic phase error and introduces spurious tones in the
output spectrum. To resolve the spurious noise problem, several
methods, such as phase interpolation using a digital-to-analog
(D/A) converter [12], or altering the dithering pattern using a
modulator [10], [11] have been proposed and used, but
they still have limitations such as increased power dissipation
and spurious noise.
On the other hand, if multiphase clock signals are available,
the noninteger divider is directly implemented without dithering
or interpolation. Fig. 3 shows the operation of the fractionaldivider using the proposed edge-combining technique. The
output of an integer frequency divider is sequentially latched
by the eight-phase voltage-controlled oscillator (VCO) output
signals, as shown in Fig. 3(a). As a result, a set of phase-shifted
waveforms is obtained, and the amount of the phase shift is
1/8 of the VCO period. By manipulating the waveform set, a
waveform whose period is a fractional multiple of the VCO
period is generated. For example, if the desired fractional
value is 1/8, the pulse-switching block multiplexes the delayed
waveforms with following periodic sequence:

and the output period of the fractional divider becomes


, as shown in Fig. 3(b). Here, the division ratio may
periodically change in order to create the noninteger division
ratio. For example, if the fractional ratio of 1/8 is required as
shown in Fig. 3(b), the division ratio should periodically switch

Fig. 3. (a) Block diagram of the fractional divider. (b) Operation of the
fractional divider when = 1.

Fig. 4. Edges of the multiphase signals. (a) With delay mismatches. (b)
Without delay mismatches.

from
to
when the output pulse is multiplexed from
out8 to out1.
When the PLL is locked, the output frequency of the PLL is
, and the synthesizer operates as a modulo-8
fractional- frequency synthesizer. The value of determines
the switching sequence of the pulse-switching block. The
switching sequence and division ratio are controlled by a
separated control logic.
III. DELAY MISMATCH PROBLEM AND FRACTIONAL SPURS
Although multiphase clocks from a ring oscillator can be used
to implement a fractional- frequency synthesizer, an important problem still exists: how to deal with the delay mismatches
between the delay cells in the VCO. Ideally, the phase differences among the multiphase signals from a ring oscillator are
precisely equal, as shown in Fig. 4(a). However, in practice,
the edges of the multiphase clocks are not uniformly spaced,
as shown in Fig. 4(b). The delay mismatches arise from several
mismatch, device size mismatch, and so on.
causes such as

PARK et al.: SELF-CALIBRATED PHASE-LOCKED LOOP WITH PRECISE I/Q MATCHING

Fig. 5.

779

s-domain model of a PLL.

Since each edge of the fractional- divider output is periodically synchronized with one of the multiphase signals from
the VCO, the timing information on the delay mismatches is
contained in the divider output. Therefore, when the PLL is
locked, the delay mismatches introduce periodic phase errors
at the input of the phase-frequency detector (PFD). Due to the
periodic phase errors, fractional spurs appear in the output spectrum of the synthesizer. Also, if I and Q signals are tapped from
the VCO, the I/Q phase offset will appear. Therefore, the delay
mismatches must be eliminated to realize low phase-noise frequency synthesizer.
The relationship between the phase errors and the fractional
spurs is derived from the noise transfer function of the PLL.
Fig. 5 shows the -domain model of the PLL. If only the effect of
the divider noise is considered, the output phase noise becomes

(1)

Fig. 6. Simulated output spectrum of the PLL with the maximum phase offset
of 2.5 .
TABLE I
PARAMETERS FOR SPURIOUS NOISE SIMULATION OF THE PLL

where
PFD gain;
loop filter transfer function;
VCO gain;
dividing ratio;
output phase noise;
phase noise generated from the fractional divider.
is
For the edge-combining frequency synthesizer,
mainly caused by the delay mismatches between the delay
cells, and is periodic because the fractional- dividing is performed by periodic combining of the phase-shifted waveforms.
is a sinusoidal, the output voltage of the
Assuming that
PLL is presented as

and the output spectrum of the PLL appears at


is given by
relative power of the spurious tones

dBc

. The

(3)

Fig. 6 shows the simulated output spectrums of the frequency


synthesizer. When the maximum phase offset is set to 2.5 ,
30-dBc spurious tones are shown near the carrier. During the
simulation, the design parameters of the PLL are selected as described in Table I.
IV. MISMATCH CALIBRATION ALGORITHM

(2)

Fig. 7 shows the input waveforms of the PFD when the PLL
is
is locked and the fractional dividing ratio is set to 1/8.
the relative phase error corresponding to the th output signal of

780

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

Fig. 7. Periodic phase error at the PFD input when PLL is locked without calibration.

the VCO. By adjusting the rising phases of the corresponding


outputs, the phase errors due to the delay mismatches can be
eliminated.
Since a PLL makes the average phase error zero in the locking
mode, the sum of the individual phase error becomes zero when
the PLL is locked. In other words, when the number of the multiphase clocks is eight

where
is the phase error due to the th output signal after
cycles of calibration, and
is the amount of the calibration
for the th VCO output in the th iteration.
If the above iteration is performed continuously until
is satisfied for all delay cells, the final value of the phase
error due to the 1st VCO output becomes

(4)
If the calibration circuit shifts the rising phase of the first output
, the phase errors are temporarily changed to
by
(5)

(9)

When the PLL is locked again


(6)
,
based on (4), and by assuming that the phase disturbance,
is equally distributed to all delay cells to satisfy (6), the phase
errors become

(7)
If this operation is performed on each output of the VCO one
by one, the phase errors are changed as shown at the bottom of
the page, and this is the completion of one iteration of the calibration procedure. The resulted phase errors after first iteration
are given by

(8)

Similarly
(10)
Consequently, all the phase errors become zero after finishing
the completion of the calibration. Note that the calibration algorithm performs correctly even if the amount of the individual
phase correction is different and even if the order of the calibration is changed. Fig. 8 shows the trend of phase error during the
calibration, simulated by MATLAB. Here, maximum 10-mV
mismatches are assumed.
V. CIRCUIT IMPLEMENTATION
A. Overall Structure
As shown in Fig. 2, a loop for the calibration is combined
with the main fractional- synthesizer. The calibration loop periodically measures the phase error due to delay mismatch at
the PFD, and compensates for the mismatches by updating the
offset control voltage of delay cells one after another. This update operation must be performed only when the PLL is locked,

initially:
1st step:
2nd step:
..
.
8th step:

..
.

..
.

..
.

PARK et al.: SELF-CALIBRATED PHASE-LOCKED LOOP WITH PRECISE I/Q MATCHING

Fig. 8. Simulated behavior of phase error during the compensation.

Fig. 10.

781

Detail structure of the self-calibration loop.

one of the offset control signals is updated. The


signal
is periodically asserted, and the period is much longer than the
locking time of the PLL. The output of the charge pump is sequentially connected to the one of the offset-control nodes in the
VCO. Since the sequence of the calibration is identical to the
pulse-combining sequence in the fractional- divider, the measured phase error affects only the corresponding VCO output.
When the fractional dividing ratio is zero, the frequency synthesizer operates as an integer- type, and only one output of
the VCO is used for phase comparison. The digital logic controls the sequence of the signal updating through the switches
.
in the capacitor array,
VI. MEASUREMENTS
Fig. 9. Schematic of the delay cell having offset control capability.

because the phase error due to the mismatches can be accurately


measured only in the locking mode. If the calibration interval is
shorter than the lock-in time of the main loop, the locking behavior of the main loop becomes disturbed and even unstable.
Therefore, it is important to make not only the calibration interval long enough but also the amount of phase change, , generated by the individual calibrating operation, small enough to
make sure that the main loop quickly responds. In this work,
the loop gain of the calibration loop is chosen to be 1/10 of
the main-loop loop gain. The updated offset control signals are
maintained until the next update by a capacitor array connected
to each delay cell. The delay cell used in this work is of the same
type as the low-noise delay cell used in [13]. However, to control the rising phase of each output, four transistors are added,
is low, the rising of
as shown in Fig. 9. For example, if
is pulled earlier.
B. Calibration Loop
Fig. 10 shows the circuit for the mismatch calibration. The
calibration circuit consists of a PFD shared with the main loop,
an additional charge pump, and a capacitor array. When
is high, the PFD output signal is driven to the charge pump, and

A self-calibrated fractional- frequency synthesizer PLL has


been fabricated in 0.35- m CMOS technology. The microphotograph of the fabricated chip is shown in Fig. 11, and its active
mm . Both frequency synthesizers with
area is about
and without a calibration loop have been integrated in the same
chip to demonstrate the proper operation of the mismatch calibration scheme. In both cases, an external 25-MHz crystal oscillator is used as a reference clock. The bandwidth of the PLL
is set to 1 MHz.
Fig. 12 shows the measured output spectrum of the fractional- RF synthesizer. Fig. 12(a) is the output spectrum of the
frequency synthesizer without a calibration loop. In this figure,
the fractional dividing ratio is set to 1/8. Without calibration
circuit, 30-dBc spurious noise appears at 3.125 ( 25/8) MHz
offset from the carrier frequency. In this case, the maximum
phase offset is estimated as about 2.5 by the equations in
Section II. On the other hand, in the output spectrum of the
self-calibrated frequency synthesizer, the power of the spurious
tones is attenuated to 55 dB, as shown in Fig. 12(b), and
the calculated maximum phase offset is less than 0.2 . Initial
settling of the calibration loop takes about 5.0 ms. Fig. 13 shows
the measured phase noise of the frequency synthesizer. The
closed-loop phase noise at 100-kHz offset from the 1.8-GHz
carrier is 105 dBc/Hz. Table II summarizes the measured
characteristics of the PLL.

782

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

Fig. 11.

Microphotograph of the self-calibrated PLL.

Fig. 12.

Output spectrum of the synthesizer PLL. (a) Without calibration. (b) With calibration.

VII. CONCLUSION
A self-calibrated 1.8-GHz PLL for fractional- frequency
synthesizing is fabricated in a 0.35- m CMOS process. A
ring-type oscillator is used to generate the multiphase signals,
and a self-calibration loop reduces the output fractional spurs
caused by delay mismatches between the delay cells. The phase
offset of the I/Q signals from the ring oscillator is also relieved.
With this calibration scheme, the fractional spur on the PLL is
attenuated by 25 dB and the maximum phase offset is thereby
reduced to less than 0.2 .
REFERENCES
Fig. 13.

Measured phase noise of the PLL.

TABLE II
PERFORMANCE SUMMARY OF THE SELF-CALIBRATED PLL

[1] B. Razavi, RF Microelectronics. Englewood Cliffs, NJ: Prentice Hall,


1998.
[2] C. D. Hull, J. L. Tham, and R. R. Chu, A direct-conversion receiver
for 900-MHz (ISM band) spread-spectrum digital cordless telephone,
IEEE J. Solid-State Circuits, vol. 31, pp. 19551963, Dec. 1996.
[3] M. Steyaert, M. Borremans, J. Janssens, B. D. Muer, N. Itoh, J.
Craninckx, J. Crols, E. Morijuji, H. S. Momose, W. Sansen, T. Yamaji,
H. Tanimoto, and H. Kokatsu, A single-chip CMOS transceiver for
DCS-1800 wireless communications, in ISSCC Dig. Tech. Papers, San
Francisco, CA, Feb. 1998, pp. 4849.
[4] A. Montalvo, A. Holden, W. Suter, C. Angell, S. White, N. Klemmer,
and D. Homol, A 22-mW NADC receiver IF Chip with integrated
second IF channel filtering, in ISSCC Dig. Tech. Papers, San Francisco,
CA, Feb. 1999, pp. 4849.
[5] J. L. Tham, M. A. Margarit, B. Pregardier, C. D. Hull, R. Magoon, and
F. Carr, A 2.7-V 900-MHz/1.9-GHz dual-band transceiver IC for digital wireless communication, IEEE J. Solid-State Circuits, vol. 34, pp.
286291, Mar. 1999.

PARK et al.: SELF-CALIBRATED PHASE-LOCKED LOOP WITH PRECISE I/Q MATCHING

[6] B. Razavi, Design considerations for direct-conversion receivers,


IEEE Trans. Circuits Syst. II, vol. 44, pp. 428435, June 1997.
[7] L. Yu and W. M. Snelgrove, A novel adaptive mismatch cancellation
system for quadrature IF radio receivers, IEEE Trans. Circuits Syst. II,
vol. 46, pp. 789801, June 1999.
[8] Y. Sugimoto and T. Ueno, The design of a 1-V 1-GHz CMOS VCO
circuit with in-phase and quadrature-phase outputs, in Proc. Int. Symp.
Circuits and Systems, Hong Kong, June 1997, pp. 269272.
[9] A. A. Abidi, Direct-conversion radio transceivers for digital communications, IEEE J. Solid-State Circuits, vol. 30, pp. 13991410, Dec.
1995.
[10] T. A. D. Riley, M. A. Copeland, and T. A. Kwasniewski, Deltasigma
modulation in fractional-N frequency synthesis, IEEE J. Solid-State
Circuits, vol. 28, pp. 553559, May 1993.
[11] M. H. Perrot, Techniques for high data rate modulation and low power
operations of fractional-N frequency synthesizers, Ph.D. dissertation,
Mass. Inst. of Technol., Cambridge, MA, 1997.
[12] U. L. Rohde, Digital PLL Frequency Synthesizers. Englewood Cliffs,
NJ: Prentice Hall, 1983.
[13] C.-H. Park and B. Kim, A low-noise 900-MHz VCO in 0.6-m
CMOS, IEEE J. Solid-State Circuits, vol. 34, pp. 586591, May 1999.

Chan-Hong Park (S92) received B.S. and M.S. degrees in electrical engineering from Korea Advanced
Institute of Science and Technology (KAIST),
Taejon, Korea, in 1994 and 1996, respectively. He
is currently working toward the Ph.D. degree in
electrical engineering at KAIST.
From 1994, he has been with the Department
of Electrical Engineering, KAIST, as a Graduate Researcher, where he has been involved in
designing 100Base-T transceiver ICs, low-noise
phase-locked loops, and RF front-ends for wireless
communications. His research interests include CMOS RF circuits for
wireless communication, high-frequency analog IC design, and mixed-mode
signal-processing IC design.

783

Ook Kim (M86) received the M.S. and Ph.D. degrees in electronics engineering from Seoul National
University, Seoul, Korea, in 1988 and 1994, respectively.
He was with the Electronics and Telecommunications Research Institute, Taejon, Korea, from 1994 to
1998, and with SK Telecom, Seoul, Korea, from 1998
to 1999. Since 1999, he has been with Silicon Image
Inc., Sunnyvale, CA. He was a Visiting Researcher
at the Department of Electrical and Electronic Engineering, Adelaide University, Adelaide, Australia,
during 1992, and a Visiting Scholar at the Department of Electrical Engineering,
Stanford University, Stanford, CA, during 1999. His research interests are in
CMOS mixed mode circuit design, high-speed data conversion, wireless circuit
technology, and high-speed data communication.

Beomsup Kim (S87M90SM95) received the


B.S. and M.S. degrees in electronic engineering
from Seoul National University, Seoul, Korea, in
1983 and 1985, respectively, and the Ph.D. degree in
electrical engineering and computer sciences from
the University of California, Berkeley, in 1990.
He worked as a Graduate Researcher and Graduate Instructor at the Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, from 1986 to 1990. From 1990 to
1991, he was with Chips and Technologies, Inc., San
Jose, CA, where he was involved in designing high-speed signal processing
ICs for disk drive read/write channel. From 1991 to 1993, he was with Philips
Research, Palo Alto, CA, conducting research on digital signal processing for
video, wireless communication, and disk drive applications. During 1994, he
was a Consultant, developing the partial response maximum likelihood detection scheme of the disk drive read/write channel. In 1994, he became an Assistant Professor with the Department of Electrical Engineering, Korea Advanced
Institute of Science and Technology (KAIST), Taejon, Korea, and is currently
an Associate Professor. During 1999, he took sabbatical leave at Stanford University, Stanford, CA, and at the same time, consulted for Marvell Semiconductor Inc., San Jose, on the Gigabit Ethernet and wireless LAN DSP architecture. His research interests include mixed-mode signal processing IC design
for telecommunication, disk drive, and LAN, high-speed analog IC design, and
VLSI system design.
Dr. Kim is a corecipient of the Best Paper Award for 19901991 from the
IEEE JOURNAL OF SOLID-STATE CIRCUITS. He received the Philips Employee
Reward in 1992. Between June 1993 and June 1995, he served as an Associate
Editor for the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II.

788

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 5, MAY 2000

A 2.6-GHz/5.2-GHz Frequency Synthesizer in


0.4-m CMOS Technology
Christopher Lam and Behzad Razavi, Member, IEEE

AbstractThis paper describes the design of a CMOS frequency


synthesizer targeting wireless local-area network applications in
the 5-GHz range. Based on an integer- architecture, the synthesizer produces a 5.2-GHz output as well as the quadrature phases
of a 2.6-GHz carrier. Fabricated in a 0.4- m digital CMOS technology, the circuit provides a channel spacing of 23.5 MHz at 5.2
GHz while exhibiting a phase noise of 115 dBc/Hz at 2.6 GHz and
100 dBc/Hz at 5.2 GHz (both at 10-MHz offset). The reference
sidebands are at 53 dBc at 2.6 GHz, and the power dissipation
from a 2.6-V supply is 47 mW.
Index TermsFrequency dividers, oscillators, phase-locked
loops, RF circuits, synthesizers, wireless transceivers.
Fig. 1.

Transceiver architecture.

I. INTRODUCTION

IRELESS local area networks (WLANs) provide great


flexibility in the communication infrastructure of environments such as hospitals, factories, and large office buildings.
While WLAN standards in the 2.4-GHz range have recently
emerged in the market, the data rates supported by such systems are limited to a few megabits per second. By contrast, a
number of standards have been defined in the 5-GHz range that
allow data rates greater than 20 Mb/s, offering attractive solutions for real-time imaging, multimedia, and high-speed video
applications. One of these standards is high-performance radio
LAN (HIPERLAN) [1].
HIPERLAN operates across 5.155.30 GHz and provides a
channel bandwidth of 23.5 MHz with Gaussian minimum shift
keying (GMSK) modulation. The receiver sensitivity must exceed 70 dBm.
This paper presents the design of a frequency synthesizer
for 5-GHz WLAN applications. To target realistic specifications, HIPERLAN is chosen as the framework. Employing an
integer- architecture, the circuit generates a 5.2-GHz output
for the transmit path and the quadrature phases of a 2.6-GHz
carrier for the receive path. Realized in a 0.4- m CMOS technology, the synthesizer provides a channel spacing of 23.5 MHz
while dissipating 47 mW from a 2.6-V supply. The phase noise
at 10-MHz offset is equal to 115 dBc/Hz at 2.6 GHz and 100
dBc/Hz at 5.2 GHz.
Section II of this paper describes the synthesizer environment
and general issues, and Section III introduces the synthesizer
architecture. Section IV presents the design of each building
block, and Section V summarizes the experimental results.

Manuscript received July 30, 1999; revised December 1, 1999.


The authors are with the Department of Electrical Engineering, University of
California, Los Angeles, CA 90095 USA (e-mail: razavi@ee.ucla.edu).
Publisher Item Identifier S 0018-9200(00)02987-5.

II. SYNTHESIZER ENVIRONMENT


The design of a 5-GHz synthesizer in a 0.4- m CMOS technology presents many difficulties at both the architecture and
the circuit levels. The high center frequency of the voltage-controlled oscillator (VCO), the poor quality of inductors due to
skin effect and substrate loss, the limited tuning range, the nonlinearity of the VCO input/output characteristic, the high speed
required of the feedback divider, the mismatches in the charge
pump, and the implementation of the loop filter are among the
issues encountered in this design.
A 0.4- m-long NMOS transistor in this technology achieves
of less than 15 GHz with a gatesource overdrive voltage
an
of about 400 mV, a typical value in this design.
Also, a 5-nH inductor exhibits a self-resonance frequency of
of 5 at this frequency, indicating that skin
6.5 GHz and a
effect and substrate loss are much more significant at 5.2 GHz
than at 2.6 GHz. The technology offers no high-density linear
capacitors, creating difficulty in the design of the loop filter.
The foregoing limitations make it necessary that the transceiver and the synthesizer be designed concurrently so as to
relax some of the synthesizer requirements. Fig. 1 shows the
transceiver architecture and its interface with the synthesizer.
The receive path consists of two downconversion stages, each
using a local oscillator (LO) frequency of 2.6 GHz, and the
transmit path modulates the VCO by the Gaussian-filtered baseband data, producing a GMSK output.
An important feature of this architecture is that the synthesizer is shared between the transmitter and the receiver, reducing
the system complexity substantially. This is possible because
HIPERLAN incorporates time-division duplexing (TDD). Also,
the transceiver requires the generation of the quadrature phases
of the 2.6-GHz carrier rather than the 5.2-GHz output, a task
readily accomplished by the synthesizer itself.

00189200/00$10.00 2000 IEEE

LAM AND RAZAVI: 2.6-GHz/5.2-GHz FREQUENCY SYNTHESIZER

Fig. 2. Synthesizer architecture.

Fig. 3. Position of reference sidebands.

III. SYNTHESIZER ARCHITECTURE


The synthesizer is based on an integer- phase-locked loop
architecture (Fig. 2). The feedback divider senses the 2.6-GHz
output because it is not possible to design a dual-modulus
divider in 0.4- m CMOS technology that operates at 5.2 GHz
reliably. Controlled by the digital channel-select input, the
220225 circuit generates frequency steps of
MHz in the 2.6-GHz band and 23.5 MHz in the 5.2-GHz band.
A critical issue in the architecture of Fig. 2 is the nonlinearity
of the VCO characteristic, i.e., the variation of the VCO gain,
, with the control voltage
. This effect manifests itself in the loop settling behavior as well as the magnitude of the
phase noise and reference sidebands at the output. The problem
is partially resolved through the use of a correction circuit that
adjusts the charge-pump current according to the value of
[2].
An interesting property of the architecture of Fig. 2 is the
position of the reference spurs with respect to the main carrier.
Since the reference frequency is half the channel spacing, such
spurs fall at the edge of the channel rather than at the center of
the adjacent channel for both 2.6- and 5.2-GHz outputs (Fig. 3).
Since the interference energy received by the antenna is small
at the edge, the maximum allowable magnitude of the spurs can
be quite higher than if the reference frequency were equal to
23.5 MHz.
IV. BUILDING BLOCKS
A. VCO
The VCO core is based on two 2.6-GHz coupled oscillators
operating in quadrature, as shown in Fig. 4(a) [3], [4]. The fully
differential topology of each oscillator raises the possibility of
sensing the common-source nodes A, B, C, or D as the 5.2-GHz
output. In fact, since the 2.6-GHz oscillators operate in quadrature, the waveforms at A and B (or C and D) are 180 out of
phase, thereby serving as a differential output at 5.2 GHz. With

789

proper choice of device dimensions and bias current, a differential swing of 0.5 V can be achieved at this port. Note that if a
frequency doubler were used, the output would be single-ended
and difficult to convert to differential form at such a high frequency.
The tuning of the oscillator poses several difficulties: the varactor diode must exhibit a small series resistance and remain
reverse-biased even with large swings in the oscillator, and the
varactor capacitance must be large enough to yield the required
tuning range, but at the cost of increasing the power dissipation
or the phase noise. This design incorporates a p -n diode inside an n-well and strapped with metal to reduce the n-well series resistance [4]. Such a structure suffers from a large parasitic
n-well/substrate capacitance, making it desirable to connect the
anode of the diode to the oscillator. This is accomplished as illustrated in Fig. 4(b), where only one of the two oscillators is
shown for clarity. Here, the control voltage varies the dc poten.
tial at nodes and by varying the on-resistance of
leads
However, the sharp variation of the on-resistance of
to significant change in the gain of the VCO. To make the tran, in series with a resistor
sition smoother, another transistor,
serves as a clamp,
is added as shown in Fig. 4(c). Transistor
keeping the tail current source in saturation. Otherwise, the oscillator may turn off during synthesizer loop transients.
Since the minimum voltage at node is only a few hundred
millivolts above ground, an NMOS differential pair cannot
directly sense the 5.2-GHz signal at this node. Instead, a
is constant,
common-gate stage is used [Fig. 4(d)]. But if
turns off for low values of
. Modifying the circuit
then
as shown in Fig. 4(e) ensures that the common-gate stage
carries a constant bias current across the full tuning range.
The choice of the inductors and capacitance of the varactors
entails a compromise between the phase noise and the tuning
range. In this design, 7-nH inductors are used, each contributing
a parasitic capacitance of 120 fF. The cross-coupled transistors
are relatively wide to ensure startup, yielding approximately
175 fF of gate-source capacitance. The differential pairs coupling the oscillators also load the tank. As a result, the varactor
capacitance for 2.6-GHz operation must not exceed 160 fF.
The inductors are realized as stacked spirals [5] made of metal
4 and metal 3 with a width of 6 m. Since the tuning range
is inevitably narrow, it is critical to predict the oscillation frequency accurately. A distributed model is used for each inductor,
yielding an error of only a few percent in the measured frequency of oscillation.
B. Frequency Divider
The design of a 2.6-GHz programmable divider with a reasonable power dissipation in 0.4- m CMOS technology is quite
difficult. A number of circuit techniques are introduced in this
work to ameliorate the powerspeed tradeoff.
The divider is based on a pulse-swallow topology. Shown in
Fig. 5(a) is a conventional implementation, consisting of a dualmodulus prescaler, a fixed-ratio program counter, and a programmable swallow counter. The RS latch is typically included
in the swallow counter and is drawn explicitly here for clarity.
1 until
The prescaler begins the operation by dividing by
the swallow counter is full. The RS latch is then set, changing

790

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 5, MAY 2000

Fig. 4. Evolution of the VCO topology.

the prescaler modulus to and disabling the swallow counter.


The division continues until the program counter is full and the
RS latch is reset. The overall divide ratio is therefore equal to
.
The pulse-swallow divider used in this work is shown in
Fig. 5(b). Here, the RS latch is followed by a D flip-flop to
allow pipelining of the prescaler modulus control signal. This
modification is justified below. The overall divide ratio is now
1. A critical decision in the design of the
equal to
divider is the choice between low-swing current-steering logic
and rail-to-rail CMOS logic. Simulations of the circuit with
various values of , , and indicate that the minimum power
dissipation occurs if the prescaler incorporates current steering,
its output is converted to rail-to-rail swings, and the remainder
of the circuit incorporates standard dynamic and static CMOS
logic. The use of current steering in the prescaler also obviates
the need for large oscillator swings, saving power in the VCO
buffer.
The design of the 8/9 prescaler for 2.6-GHz operation
presents a great challenge. Shown in Fig. 6, the prescaler
consists of a synchronous 2/3 circuit and two asynchronous

2 circuits. In a conventional 2/3 realization [Fig. 7(a)],


flip-flop FF is loaded by an OR gate, whereas FF is loaded
by FF , an AND gate, and an output buffer. Since FF limits the
speed, the fanout of three inherent to this topology translates
to substantial power dissipation. Furthermore, if the divider is
implemented by current-steering circuits, the AND gate requires
stacked logic and hence level-shift source followers. Both of
these issues intensify the powerspeed tradeoff.
The 2/3 circuit used in this work is shown in Fig. 7(b).
Here, FF is loaded by a NOR gate and FF by a NOR gate and
a buffer. Simulations indicate that the reduction of the load capacitance of FF increases the maximum operating speed by approximately 40%.
The NOR/flip-flop combination is realized as depicted in
Fig. 8. The resistors are made of n-well, and the bias voltage
is generated to fall midway between the high and low levels of
inputs and . The output of the prescaler drives a differential
to single-ended converter, producing rail-to-rail swings for the
remainder of the divider.
The divider of Fig. 5 incorporates pipelining for the prescaler
modulus control, thereby relaxing the minimum delay require-

LAM AND RAZAVI: 2.6-GHz/5.2-GHz FREQUENCY SYNTHESIZER

791

Fig. 7. Divide-by-2/3 circuit: (a) conventional topology and (b) circuit used in
this work.

Fig. 5. Pulse swallow divider. (a) Conventional topology. (b) Addition of


pipelining in the prescaler modulus control path.

Fig. 8. Implementation of NOR/flip-flop combination.

Fig. 6.

Prescaler.

ment in this path. Fig. 9 illustrates the issue. When the 9 operation of the prescaler is finished, the circuit would have at most
to change the modulus to eight. In this parseven cycles of
ticular prescaler, the timing budget is actually about five input
cyclesapproximately 1.9 ns. Thus, with no pipelining, the last
pulse generated by the prescaler in the 9 mode must propagate
through the level converter, the first 2 stage in the swallow
counter, the subsequent logic, the RS latch, and the three-input
NOR gate in less than 1.9 ns. Such a delay constraint necessitates the use of current steering in this path, raising the power
dissipation and complicating the design. With pipelining, on
the other hand, the maximum tolerable delay increases to about
eight input cyclesapproximately 3.1 ns.

Fig. 9. Pipelining in the prescaler modulus control path.

C. Charge Pump and Loop Filter


Fig. 10 shows the charge pump [6] and the loop filter. Here,
and
rather than
and
operate as switches.
Thus, the problem of transistor charge injection and clock
feedthrough to the output is somewhat alleviated. In addition
to these errors, up and down currents produced by the charge
pump may also create ripple on the control voltage. Since in
and
turn on at every phase comparlocked condition,
ison instant, any mismatch between their magnitudes, duration,
or absolute timing results in a net current that is drawn from
the loop filter.

Fig. 10.

Charge pump and loop filter.

To appreciate the significance of these effects, let us consider


some typical values in this design. If the reference sidebands are
GHz/V
to be 50 dB below the carrier, then with
MHz, the ripple amplitude must not exceed
and

792

Fig. 11.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 5, MAY 2000

(a) Addition of correction circuit to charge pump. (b) Simple folding circuit. (c) Folding circuit with one reference voltage.

75 V.1 This indicates that great attention must be paid to the


design of the phase/frequency, the charge pump, and the loop
filter so as to minimize the above errors.
Another source of ripple in the control voltage is the low
and
in Fig. 10, especially as
output impedance of
reaches within a few hundred millivolts of the rails. This effect creates additional mismatch between the up and down cur, potentially leading to larger referrents as a function of
ence sidebands near the ends of the tuning range. Transistors
and
degenerate
and
, respectively, alleviating
this issue (another advantage of this topology over the standard
charge-pump configuration).
in the circuit of Fig. 10 to suppress the
The addition of
ripple potentially degrades the stability of the loop. Simulations
, the settling time increases neglisuggest that for
pF,
pF, and
k .
gibly. In this design,
The two capacitors can be realized by either MOSFETs or polymetal sandwiches, a choice determined by the control voltage
must aprange. To achieve the maximum tuning range,
proach the supply and ground rails, demanding a reasonable capacitor linearity across this range. MOS capacitors, however, exhibit substantial change as their gate-source voltage falls below
the threshold. Even a parallel combination of an NMOS capacitor (connected to ground) and a PMOS capacitor (connected to
) suffers from a two-fold variation as
goes from zero
1The ripple is approximated by a sinusoid here. In a more rigorous method,
the ripple can be expressed as a Fourier series [7].

to
. For this reason,
and
are formed as poly-metal
sandwiches (albeit with much less density than MOS capacitors).
Another issue in the design of the loop filter of Fig. 10 relates
. Low-pass filtered by
to the thermal noise produced by
and , this noise modulates the VCO, raising the output phase
noise. The thermal noise on the control voltage per unit bandwidth is given by
(1)

denotes the noise density of


. From the
where
narrow-band frequency modulation theory [8], we know that
and frequency
if a sinusoid with a peak amplitude
modulates a VCO, the output sidebands fall at
rad/s below
and above the carrier frequency and exhibit a peak amplitude
. Approximating the noise per unit bandof
width in (1) by a sinusoid, we obtain the output relative phase
as
noise per unit bandwidth at an offset frequency

(2)

LAM AND RAZAVI: 2.6-GHz/5.2-GHz FREQUENCY SYNTHESIZER

Fig. 12.

793

Die photograph.

With the values chosen in this design, the output phase noise
reaches 138 dBc/Hz at 10-MHz offset for
GHz/V. While it is desirable to reduce the value of
, the releads to a severe area penalty because of
quired increase in
the low density of the poly-metal capacitors. Note that since the
, if
is,
stability factor
must be quadrupled to maintain constant
say, halved, then
(for a given charge-pump current).
D. Correction Circuit
The gain of the VCO varies substantially across the tuning
range, resulting in considerable change in the settling behavior.
As depicted in Fig. 11(a), it is desirable to vary the charge-pump
, such that the product of
and
and
current,
hence remain relatively constant. Rather than use piecewise
linearization [2], this work incorporates an analog folding techand
nique. Fig. 11(b) shows a possible solution. Here
are off if
is well below 1.1 V and hence
. As
approaches 1.1 V,
turns on while
is off. Thus,
drops,
carries most of
and
a negreaching a low value as
approaches and exceeds 1.3 V,
turns
ligible current. As
eventually returns to
. This design actually ution and
lizes the topology shown in Fig. 11(c), where only one reference
voltage is required and each differential pair provides a built-in
offset by virtue of skewed device dimensions. The characteristic
driving the curis similar to that shown for Fig. 11(b), with
rent mirrors in the charge pump.
The reference voltage of 1.2 V in Fig. 11(c) assumes that
V.
the gain of the VCO reaches its maximum at
This value is somewhat process- and temperature-dependent,
limiting (according to simulations) the suppression of the VCO
nonlinearity to about one order of magnitude.

Fig. 13. Measured spectra at 2.6 and 5.2 GHz in locked condition.

Fig. 14.

Measured spectrum at 2.6 GHz.

Fig. 15.

Setup for settling time measurement.

V. EXPERIMENTAL RESULTS
The frequency synthesizer has been fabricated in a 0.4- m
digital CMOS technology. All of the inductors and capacitors
are included on the chip. Fig. 12 is a photograph of the die,
which measures 1.75 1.15 mm . The circuit has been tested
with a 2.6-V supply.
Figs. 13(a) and (b) depict the output spectra in the locked
condition. The phase noise at 10-MHz offset is equal to 115
dBc/Hz at 2.5 GHz and 100 dBc/Hz at 5.2 GHz. A significant
part of the phase noise at 5.2 GHz is attributed to the considerable loss of the output 50- buffer. Fig. 14 shows the 2.6-GHz
output along with the reference sidebands. The sidebands are

approximately 53 dB below the carrier. For the 5.2-GHz output,


the sidebands are buried under the noise floor.
The settling behavior of the synthesizer has also been studied.
Fig. 15 illustrates the setup, where the modulus of the feedback

794

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 5, MAY 2000

Designing a multigigahertz synthesizer in 0.4- m CMOS


technology necessitates circuit techniques such as: 1) a quadrature VCO with inherent frequency doubling, 2) a dual-modulus
divider with equalized fanout, 3) pipelining in pulse-swallow
counters, and 4) use of folding stages to compensate for
nonlinearity in the VCO characteristic.
REFERENCES
[1] Radio equipment and systems (RES); High performance radio local
area network (HIPERLAN); Functional specification, ETSI, Sophia
Antipolis, France, ETSI TC-RES, July 1995.
[2] J. Craninckx and M. S. J. Steyaert, A fully integrated CMOS
DCS-1800 frequency synthesizer, IEEE J. Solid-State Circuits, vol.
33, pp. 20542065, Dec. 1998.
[3] A. Rofougaran et al., A 900-MHz CMOS LC oscillator with quadrature
outputs, in ISSCC Dig. Tech. Papers, Feb. 1996, pp. 392393.
[4] B. Razavi, A 1.8 GHz CMOS voltage-controlled oscillator, in ISSCC
Dig. Tech. Papers, Feb. 1997, pp. 388389.
inductors for multi-level
[5] R. B. Merril et al., Optimization of high
metal CMOS, in Proc. IEDM, Dec. 1995, pp. 38.7.138.7.4.
[6] J. Alvarez, H. Sanchez, and G. Gerosa, A wide-band low-voltage PLL
for PowerPC microprocessors, IEEE J. Solid-State Circuits, vol. 30, pp.
383391, Apr. 1995.
[7] B. Razavi, RF Microelectronics. Upper Saddle River, NJ: PrenticeHall, 1998.
[8] L. W. Couch, Digital and Analog Communication Systems, 4th
ed. New York: Macmillan, 1993.

Fig. 16.

Control voltage during loop settling.

TABLE I
SYNTHESIZER PERFORMANCE

Christopher Lam received the B.Sc. and M.Sc. degrees in electrical engineering from the University of
California, Los Angeles, in 1997 and 1999, respectively.
He is currently with the Wireless Communication
Group, National Semiconductor, Santa Clara,
CA. His interests include phase-locked loops and
communication circuits.

divider is switched periodically and the control voltage is monitored. The 0.8-pF capacitor results from the trace on the printed
circuit board, and the active probe presents an input capacitance
pF and
pF, the addition of these
of 2 pF. Since
parasitics markedly degrades the stability. Therefore, a 100-k
resistor is placed in series with the active probe to mimic the
and . The low-pass filter thus formed has a corner
role of
frequency comparable to the loop bandwidth, and the 0.8-pF capacitor still produces ringing in the time response. Fig. 16 shows
the measured control voltage, indicating a settling time on the
order of 40 s.
Table I summarizes the measured performance of the synthesizer.
VI. CONCLUSION
The speed and quality of the devices available in an IC technology directly affect the choice of transceiver architectures,
synthesizer topologies, and circuit configurations. In order to
optimize the overall system performance, the transceiver and
the synthesizer must be designed concurrently, with particular
attention to the frequency planning.

Behzad Razavi (S87M90) received the B.Sc. degree from Sharif University of Technology, Tehran,
Iran, in 1985 and the M.Sc. and Ph.D. degrees from
Stanford University, Stanford, CA, in 1988 and 1992,
respectively, all in electrical engineering.
He was with AT&T Bell Laboratories, Holmdel,
NJ, and subsequently Hewlett-Packard Laboratories,
Palo Alto, CA. Since September 1996, he has been
an Associate Professor of electrical engineering at the
University of California, Los Angeles. His current research includes wireless transceivers, frequency synthesizers, phase-locking and clock recovery for high-speed data communications, and data converters. He was an Adjunct Professor at Princeton University, Princeton, NJ, from 1992 to 1994, and at Stanford University in 1995.
He is a member of the Technical Program Committees of the Symposium on
VLSI Circuits and the International Solid-State Circuits Conference (ISSCC),
in which he is Chair of the Analog Subcommittee. He is the author of Principles
of Data Conversion System Design (New York: IEEE Press, 1995), RF Microelectronics (Englewood Cliffs, NJ: Prentice-Hall, 1998), and Design of Analog
CMOS Integrated Circuits (New York: McGraw-Hill, 2000), and the editor of
Monolithic Phase-Locked Loops and Clock Recovery Circuits (New York: IEEE
Press, 1996).
Dr. Razavi received the Beatrice Winner Award for Editorial Excellence at the
1994 ISSCC, the Best Paper Award at the 1994 European Solid-State Circuits
Conference, the Best Panel Award at the 1995 and 1997 ISSCC, the TRW Innovative Teaching Award in 1997, and the Best Paper Award at the IEEE Custom
Integrated Circuits Conference in 1998. He has also served as Guest Editor and
Associate Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS and IEEE
TRANSACTIONS ON CIRCUITS AND SYSTEMS.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

835

16

A CMOS Monolithic
-Controlled Fractional-N
Frequency Synthesizer for DCS-1800
Bram De Muer, Student Member, IEEE, and Michel S. J. Steyaert, Senior Member, IEEE

16

AbstractA monolithic 1.8-GHz


-controlled fractionalphase-locked loop (PLL) frequency synthesizer is implemented in a
standard 0.25- m CMOS technology. The monolithic fourth-order
type-II PLL integrates the digital synthesizer part together with
a fully integrated LC VCO, a high-speed prescaler, and a 35-kHz
2 mm2 . To investigate
dual-path loop filter on a die of only 2
the influence of the
modulator on the synthesizers spectral
purity, a fast nonlinear analysis method is developed and experimentally verified. Nonlinear mixing in the phase-frequency detector (PFD) is identified as the main source of spectral pollution in
fractional- synthesizers. The design of the zero-dead zone
PFD and the dual charge pump is optimized toward linearity and
spurious suppression. The frequency synthesizer consumes 35 mA
from a single 2-V power supply. The measured phase noise is as
low as 120 dBc/Hz at 600 kHz and 139 dBc/Hz at 3 MHz.
The measured fractional spur level is less than 100 dBc, even
for fractional frequencies close to integer multiples of the reference frequency, thereby satisfying the DCS-1800 spectral purity
constraints.

16

16

16

Index TermsCMOS RF integrated circuits,


modulator,
fractional- frequency synthesis, phase-locked loop, phase noise.

I. INTRODUCTION

HE END of the 20th century was characterized by the unrivaled growth of the telecommunication industry. The main
cause was the introduction of digital signal processing in wireless communications, driven by the development of high-performance low-cost CMOS technologies for VLSI. However, the
implementation of the RF analog front end remains the bottleneck. This is reflected in the large effort put into monolithic
CMOS integration of RF circuits both by academics and industry [1][3].
The goal of this work is the monolithic integration in standard CMOS technology of a frequency synthesizer to enable the
full integration of a transceiver front end in CMOS, including
a low-IF receiver and a direct upconversion transmitter [1]. To
achieve a high degree of integratability and fast settling under
fractional- synthesizer topology
low-noise constraints, a
fractional- synthesis circumhas been chosen [4] (Fig. 1).
vents the severe speedspectral purityresolution tradeoff of the
classic phase-locked loop (PLL) synthesizer, by providing synthesis of fractional multiples of the reference frequency. Spurious tones that emerge from the fractional division are whitened
action and ultimately filtered by
and noise shaped by the
the loop filter. To prevent degradation of the spectral purity by
Manuscript received November 5, 2001; revised January 31, 2002.
The authors are with the Katholieke Universiteit Leuven, Department
Elektrotechniek, ESAT-MICAS, B-3001 Heverlee, Belgium (e-mail: bram.demuer@esat.kuleuven.ac.be).
Publisher Item Identifier S 0018-9200(02)05856-0.

Fig. 1. Principle of

16 fractional-N synthesis.

digital noise coupling, the


modulator is scheduled for integration on the digital baseband signal processing IC of the full
transceiver system.
The paper describes the design of a monolithic 1.8-GHz
-controlled fractional- PLL frequency synthesizer. In
noise on PLL bandwidth
Section II, the influence of
requirements is theoretically analyzed for multistage noise
modulators.
shaping (MASH) and multibit single-loop
Next, a fast nonlinear analysis method is presented, which
predicts possible degradation of the PLL spectral purity by
in-band noise leakage and re-emerging of spurious tones.
The nonlinearities in the phase-frequency detector (PFD)
charge pumps are identified as the main trouble spots. The
fourth-order type-II PLL building-block design is discussed in
Section IV, focusing on integrated filter and voltage-controlled
oscillator (VCO) design and on the realization of a linear phase
error-to-charge-pump current conversion. In Section V, the
experimental results of the fractional- synthesizer prototype
are presented and compared to the simulations, showing good
correspondence.
II. THE FRACTIONAL-

SYNTHESIZER

A. Introduction
fractional- synthesizer is shown
A block diagram of a
modulator output controls the instantaneous
in Fig. 1. The
division modulus of the prescaler, such that the mean division
, with the number of bits of the
modulus is
modulator and the input word. The corresponding phase
changes at the prescaler output are quantized, leading to possible
spurious tones and quantization noise. By selecting higher order
modulators, the spurious energy is whitened and shaped to
high-frequency noise, which can be removed by the low-pass
loop filter. As a result, for a given frequency resolution, an arcan be chosen, by assigning the proper number
bitrary high

0018-9200/02$17.00 2002 IEEE

836

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

Fig. 2. Third-order multibit single-loop


reasons.

16 modulator. The internal modulator accuracy is 16 bit. From the five output bits, only four are used for stability

of bits to the modulator. The loop bandwidth is not restricted


by the reference spur suppression, resulting in faster settling and
higher integratability. Additionally, the division modulus is de(with
the minimum number of
creased by a factor
bits for the frequency resolution, i.e., 7.02 in this case), so that
noise of the PLL blocks, except for the VCO, is less amplified.
B. The

Modulators

The influence of both third-order MASH and multibit


modulators on the spectral purity of the
single-loop
fractional- synthesizer is investigated. Since the order of
the integrated PLL loop filter is three, the order of the
modulators must also be three or higher to ensure that
noise has at least a 20-dB/dec rolloff at intermediate offset
frequencies, causing no degradation of the output phase noise.
Both modulators have an internal accuracy of 16 bit and 1 LSB
dithering is applied to further randomize any spurious energy.
The dithering sequence is third-order noise shaped to avoid an
increased noise floor.
modulator is chosen beThe MASH or cascade 1-1-1
cause it is easy to integrate in CMOS and is unconditionally
stable. The noise transfer function (NTF) of the MASH moduand contains three poles at the
lator is
origin of the plane. The result is harsh LF noise shaping and
and substantial HF noise. In the time domain, this is reflected
in the intensive prescaler modulus switching. To synthesize a
, all moduli between 64 and 71 are
frequency of
employed.
modulator is shown in Fig. 2.
The multibit single-loop
For ease of integration, the feedforward and feedback coefficients are a power of 2. Only four output bits are needed to control the prescaler moduli, but five output bits are used, to avoid
overlap of the intended input operating range and the unstable
input regions. The NTF of the presented modulator is given in
(1) and contains only one pole at the origin of the plane and
, with a passband
two low- Butterworth poles at
gain of 3.2.
(1)
modulator is more complex than
Although the single-loop
the MASH modulator, it offers a higher flexibility in terms of
noise shaping. The HF quantization noise of the modulator
can be spread out by proper pole positioning. As a result, the
prescaler modulus switching is less intense. Only the moduli
.
between 66 and 69 are needed to synthesize
The reduced HF switching has advantageous effects on noise

Fig. 3. Maximum PLL bandwidth f


versus the reference frequency and
different
modulator orders, for the type-II fourth-order PLL. The dashed
curve is for the third-order single-loop modulator. The targeted phase-noise
specification is 136 dBc/Hz at 3 MHz for DCS-1800.

16

coupling and sensitivity to PLL nonlinearities, as will be


discussed in Section III.
C. Theoretical Analysis
control on the specTo theoretically model the impact of
tral purity of the synthesizer, a linear-time-invariant (LTI) PLL
quantization noise as an admodel is employed, with the
at the prescaler output. The prescaler
ditive noise source
control can be looked upon as a digital-to-phase (D/P)
with
converter. Every reference cycle, the prescaler subtracts
rad from its input signal, with
determined
modulator output. The resulting quantization noise
by the
on the division modulus, and thus output phase, is approximated
by uniformly distributed white noise [5]. The quantization noise
with
for both modupower is
the modulus range and the number of signiflators with
output bits. The phase noise contribution of the
icant
modulator at the output of the synthesizer is found in (2) [6],
the closed-loop transfer function of the fourth-order
with
type-II PLL.
(2)
fractional- synthesizers
Since the main advantage of
and the PLL
is the decoupling of the reference frequency
noise on the bandwidth
bandwidth , the influence of the

DE MUER AND STEYAERT: CMOS MONOLITHIC FREQUENCY SYNTHESIZER FOR DCS-1800

837

III. FAST NONLINEAR ANALYSIS METHOD

Fig. 4. Maximum PLL bandwidth f


different
modulator orders for
third-order single-loop modulator.

16

18

versus the reference frequency and


:
. The dashed curve is for the

<

15

requirement is examined. To comply with the most stringent


133 dBc/Hz
DSC-1800 phase noise specification, i.e.,
phase noise is
at 3 MHz offset [7], the target
(3 MHz)
dBc/Hz. In Fig. 3, the maximum PLL
is plotted versus the reference frequency
bandwidth
for different MASH modulator orders. The dashed line is the
modumaximum bandwidth for the single-loop multibit
lator of Section II-B. For a reference frequency of 26 MHz, not
much is gained from increasing the modulator order. For a high
bandwidth and thus a fast PLL, the reference frequency and/or
the modulator order should be increased leading to an increased
power consumption and circuit complexity. The maximum
bandwidth is 87 kHz for the third-order MASH modulator and
62 kHz for the single-loop multibit modulator.
Apart from the out-of-band phase-noise constraint, the integrated in-band phase noise, determining the rms phase error
of the PLL is of importance. To be sure that the
does not corrupt the rms phase error, the dynamic range of the
modulator must be higher than the dynamic range of the PLL
is given by
[8]. The integrated in-band frequency noise
with
the noise bandwidth of the PLL
the in-band phase noise in dBc/Hz. The noise
and 10
. The maxbandwidth of the presented PLL is
imum bandwidth of the PLL is calculated in (3) [8].
(3)
is plotted versus the refThe maximum PLL bandwidth
erence frequency of the PLL for different MASH modulator
modulators
orders in Fig. 4. For the single-loop multibit
(dashed curve), the actual maximum bandwidth can be calculated to be 25% smaller than in (3), due to the Butterworth poles.
In the case of a third-order modulator, a 1.5 rms phase error (to
of
ensure at least an overall rms phase error of 2 ) and a
26 MHz, the maximum bandwidth is 810 and 614 kHz, respecmodulator
tively. Obviously, the constraint posed on the
noise due to in-band noise contributions is much less severe than
the constraint due to the out-of-band phase noise at 3 MHz.

The theoretical analysis suggested that applying


control
to the prescaler would not cause any problems for the spectral
purity of the PLL. Practice, however, proves this wrong. A fast
nonlinear analysis method is developed which can take into account the nonlinearity of the PLL building blocks. The analysis
method is at the same time sufficiently fast to sweep simulations
over different degrees of nonlinearities and operating points, and
is capable of performing sufficiently long transient simulations
to get accurate fast Fourier transforms (FFTs) of the phase variable. The fractional operation of the PLL is simulated in discrete
time and in open loop under locked conditions to avoid drift of
the phase error. To further speed up the simulation, the building
blocks are represented by high-level models with parameters to
model any nonlinear behavior or mismatch in critical transistors. The simulations are performed in Matlab [9].
modulation of
To find the phase error, generated by the
the division modulus, the variation of the number of RF pulses,
, at the output of the divider is monitored. Every reference
cycle, the number of RF pulses at the divider output is detercontrol,
mined by the number of pulses swallowed by the
:
(4)
The resulting quantized phase changes are compared with the
phase that would be expected when the loop would be in lock,
i.e., the phase corresponding to the fractional part of the divi. The result is the instantaneous accumusion modulus
:
lated phase error
(5)
, in the
The phase error is converted to current pulses,
(phase-error charge-pump curcharge pump. The
rent) conversion is modeled to contain any PFD nonlinearity.
Mismatch in the up and down current sources, resulting in gain
mismatch for positive and negative phase errors is modeled by
. The occurrence of a dead zone is modeled by
(6)
By taking an FFT of the current pulses, the current noise
spectrum is obtained. The current noise spectrum is modeled
as a phase-noise source which is subjected to its corresponding
closed-loop transfer function, obtained from the LTI PLL
model. This means that the filter is modeled by its linear
transfer function, which includes parasitic gain and pole
position changes. The nonlinear conversion from voltage to
frequency/phase in the VCO is modeled by the variation of the
VCO gain, when changing the operating point of the PLL.
The analysis tool enables the evaluation and comparison of
noise on the PLL.
the effect of MASH and single-loop
This analysis is performed with the following nonlinearities: a
0.1% dead zone and a gain mismatch of 2%. The internal accuracy of both modulators is 16 bit. The reference frequency

838

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

Fig. 5. Simulation results. The phase error  for (a) the MASH modulator and (b) the single-loop multibit modulator. The FFT of the current pulses CP
(c) the MASH modulator and (d) the single-loop multibit modulator.

is 26 MHz and the fractional division number is 67.92. The


output frequency is 1.76592 GHz, i.e., 2.08 MHz offset from
. In Fig. 5(a) and (b), the time-domain
an integer multiple of
is plotted for both modulators. Note that the
phase error
fractional- PLL frequency synthesizer can hardly be called a
phase-locked loop, since the loop is never in lock! Due to the
modulator, the inshaping of the HF noise in the single-loop
stantaneous phase error is smaller than for a MASH modulator.
This has two important consequences. First, the on-time of the
charge pumps is smaller for the single-loop modulator, making it
less sensitive to noise coupling from the substrate and the power
consupply. Second, the sensitivity to the nonlinear
version in terms of noise leakage is reduced.
To be able to examine the effect of nonlinearities in the frequency domain, the FFTs of the charge-pump current pulses
are plotted in Fig. 5(c) and (d). A noise floor appears in
the output spectrum as well as spurious tones, although the
output is perfectly randomized and dithered. Due to the nonfolds
linear mixing in the PFD charge pump, noise at
back to lower offset frequencies, similar to the effect of a nonADC. Since the noise at
is
linear DAC in a multibit
modulator, its noise leakage
much lower for the single-loop
due to the nonlinear mixing in the PFD is also lower. In the time

[i] for

domain, this effect corresponds to the smaller phase excursions.


The difference in phase error between MASH and single-loop
modulators is reflected in a lower noise floor, i.e., a 10-dB difference. In addition, previously unnoticed spurious tones appear
in the output spectrum at
with
.
noise of both modulators as it appears at
Fig. 6 shows the
the PLL output for an ideal (dotted) and a nonlinear
conversion (solid). The results of the ideal case closely match
the theoretical results of Section II-C (solid light gray). Due
to nonlinearity, the simulated output spectrum of the integerPLL (the dash-dotted line) is seriously deteriorated by
noise
. Especially,
in the PLL noise bandwidth, increasing the
the MASH converter is critical in terms of in-band noise due
to the higher phase error [see Fig. 5(a)], despite the inherently
noise of the MASH modulator. Note that the simlower LF
ulations are performed without taking into account noise coupling through the substrate or power-supply lines. As a consefractionalquence, the actual spurious performance of the
PLL could be worse than simulated. The presented simulation
results are for a division modulus 67.92, close to an integer mul. When analyzing division moduli in between integer
tiple of
, noise leakage is still observed, but the spurious
multiples of
tones are well below the phase noise.

DE MUER AND STEYAERT: CMOS MONOLITHIC FREQUENCY SYNTHESIZER FOR DCS-1800

839

Fig. 7. Discrete time autocorrelation estimate of the modulator outputs for (a)
the MASH modulator and (b) the single-loop multibit modulator.

16

Fig. 6. Simulation results. The


noise at the output of the PLL for (a) the
MASH modulator and (b) the single-loop multibit modulator. The results are
plotted for an ideal PFD (dotted), which closely corresponds to the theoretical
results (solid light gray) and for a nonlinear PFD (solid). They are compared to
the simulated integer PLL phase noise (the dash-dotted line).

PFD. This effect can be worsened by substrate and power.


supply coupling with signals at
IV. PLL BUILDING-BLOCK CIRCUIT DESIGN
A. The Fourth-Order Type-II PLL

The explanation for the re-emerging of spurious tones is that


the modulator is unable to sufficiently decorrelate the successive
modulator
output samples. To quantify the correlation in the
output, the discrete time autocorrelation estimate is calculated
and plotted for both modulators for inputs close to an integer
value (see Fig. 7). The autocorrelation calculations show correlation, although 1LSB noise-shaped dithering is applied. The
modulator shows large
autocorrelation of the single-loop
correlation peaks, explaining the higher spurious tones in the
output phase-noise spectrum of the PLL. With the autocorrelamodtion estimate, the necessary internal accuracy of the
ulators is found to be at least 13 bits for MASH and 16 bit
for single-loop modulators to sufficiently decorrelate the
modulator output for inputs close to integers. A second possible
source of tones is the downconversion of tones which are inher[5], by the nonlinear mixing in the
ently present around

A fourth-order type-II PLL is integrated, including a 4-bit


prescaler, a zero-dead-zone PFD, a dual charge pump, and a
3-step equalizer, together with an on-chip LC-tank VCO and a
third-order dual-path 35-kHz low-pass loop filter (see Fig. 8).
The equalizer performs a 3-step piecewise equalization of the
loop gain, by keeping the product of the VCO gain and the
charge-pump current constant. To prevent switching between
different equalization states, the state transitions exhibit hysteresis.
B. The 4-Bit Prescaler
The first high-speed division of the prescaler is done
with two differential single-transistor-clocked (DSTC) logic
n-latches [10], forming a differential dynamic D-flip-flop. The
flip-flop operates with rail-to-rail internal signals to minimize
the residual prescaler phase noise [11] to levels insignificant to

840

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

Fig. 8. Fully integrated fourth-order type-II phase-locked loop.

the overall phase-noise performance. The 16-modulus division


(64 79) is implemented with the phase-switching topology
[12]. The division moduli are generated by switching between
the 90 -spaced output phases of the second D-flip-flop. When
the 90 spacing is not ideal, spurs appear at 1/4, 2/4, and 3/4 of
the PLL reference frequency. It takes careful layout and circuit
design to equalize the delays of the different quadrature paths,
such that these spurious tones are suppressed to negligible
levels.

SUMMARY

TABLE I
LOOP PROPERTIES AND PERFORMANCE
FOURTH-ORDER TYPE-II PLL

OF THE

OF THE

C. The Voltage-Controlled Oscillator


The LC VCO with on-chip inductor combines a 30% tuning
range at only 2 V and an excellent phase-noise performance
over a large frequency range. To minimize the VCO phase
noise, a simulator-optimizer program has been developed
which searches the optimal inductor geometry for a given
technology. The resulting hollow octagonal balanced inductor
as high as 9 with an inductance of 2.86 nH, for a
has a
standard 0.25- m CMOS technology with only two metal
layers (0.6 and 1.0 m) [13].
The VCO is implemented as a single differential pMOS-only
topology, leading to an enhanced tuning range, without in[13].
creasing the power consumption and the VCO gain,
is between 100 and
For the frequency range of interest,
200 MHz/V, explaining the need for equalization of the loop
gain. The VCO output is buffered from the prescaler input to
prevent kickback noise from entering the tank. The measured
phase noise is as low as 127.5 dBc/Hz at 600 kHz and
142.5 dBc/Hz at 3 MHz for a carrier frequency of 1.82 GHz.
D. The 35-kHz Dual-Path Loop Filter
To achieve full integration, a dual-path filter topology has
been implemented (Fig. 8). Two filter paths, one active integraare added
tion ( ) and one passive low-pass filter

with a multiplication factor in the dual charge pumps. The


addition realizes the low-frequency zero needed for loop stability in a type-II PLL, without adding the actual capacitor [12].
The total number of capacitors is the same as in a classical
fourth-order type-II PLL, but for the same phase noise the integrated capacitance is more than 5 times smaller. Due to the
rather high VCO gain, the integrated capacitance is still 1.4 nF
to be able to comply with the DCS-1800 phase-noise requireis added at 210 kHz to ensure
ments. An extra pole
enough suppression at higher offset frequencies. A filter optimization model is developed, determining all pole and zero
positions and the capacitanceresistance tradeoff to obtain low
noise and high integratability [14]. The results of the optimization at 1765.92 MHz are listed in Table I. The total phase noise
is without the
noise. The MASH and single-loop (SL)
noise contributions result from the nonlinear analysis. As

DE MUER AND STEYAERT: CMOS MONOLITHIC FREQUENCY SYNTHESIZER FOR DCS-1800

841

Fig. 9. (a) Timing control circuit and signals to control the dummy and the output current branch of the charge pump. (b) Charge-pump circuit with (at the left)
the dummy current branch, denoted by the suffix d, and the output branch.

seen in Section II-C, the loop bandwidth needs to be smaller than


noise suppression. However, to ensure sufficient
62 kHz for
suppression of the low-frequency fractional spurious tones for
inputs close to the integers, the bandwidth is designed to 35 kHz.
Despite the rather low loop bandwidth for a fractional- synthesizer, a settling time of less than 293 s for a 104-MHz step
is simulated.
E. The

Conversion

The nonlinear analysis of Section III identified nonlinearity


conversion as the main cause of noise leakage
of the
and spurious tones. Therefore, the PFD and charge-pump circuits are carefully optimized toward spurious suppression as
such and toward a highly linear phase-error detection for
spurious suppression.
First, the reference spur generation by the PFD charge-pump
circuit is carefully minimized. The integration in the first path of
the loop filter is done actively to keep the charge-pump output

at a fixed level (see Fig. 8). Additionally, the charge-pump current is designed to be at least a magnitude larger than the fixed
parasitic charge injection of the switch transistors. The current
switches are implemented with pMOS and nMOS transistors to
compensate charge injection. Finally, a timing control scheme
[Fig. 9(a)] is developed to control the charge-pump switches.
The up and down control pulses of the PFD are converted to synchronized control signals to drive both the output current branch
and the dummy current branch of the charge pump [Fig. 9(b)].
Fig. 9(a) shows the dummy and output control signals. The
is delayed versus the output control
by
dummy control
modifying the thresholds of the second inverter-string (indicated
always flows, preby high and low) such that the current
venting hard on/off switching of the current sources. To equalize
rise and fall times and force a perfect rad relation between
nMOS and pMOS control signals, latches at the outputs of both
inverter strings are implemented. Capacitors at the control outputs lower the rise and fall times to prevent large charge injections by fast switching.

842

Fig. 10.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

IC microphotograph and the measurement setup in which it is embedded.

To linearize the
conversion, the phase detection
is performed by a zero-dead-zone PFD [15], to prevent a hard
nonlinearity around 0 phase error. Due to the delay added in
the PFD, both the up and down current sources are on, for small
or zero phase errors, enabling the PFD to react to very small
phase errors. The on-time fraction of the charge pump due to
the delay is less than 10%. This value is a tradeoff between
dead-zone prevention and sensitivity to noise coupling, when
the charge pumps are on. To further minimize digital noise coupling, the sampling in the PFD and the computational events
modulator and prescaler are offset in phase. Consein the
quently, the phase-error decision making is done in a relatively
quiet environment. To make sure that the gains for positive and
negative phase-error detection are equal, the current source transistors are oversized to ensure sufficient matching. As a side efnoise, which can seriously affect the
fect, the current source
in-band noise, is decreased. Additionally, the timing control of
Fig. 9(a) provides synchronization between the two filter paths
and the switches of the charge pumps themselves, thereby ensuring equal positive and negative phase-error detection gain.
HSPICE simulations of the PFD charge-pump circuit are performed and show no dead zone and no gain mismatch with ideal
transistor matching.

V. EXPERIMENTAL RESULTS
Fig. 10 shows the IC microphotograph and the measurement
fractional measuresetup in which it is embedded. The
ments are performed by controlling the PLL divider moduli with
an HP80000 data generator, which generates the 4-bit control
output bit stream is generated using Matlab.
word. The 4-bit
modThis provides a flexible way to test different kinds of
ulators, without the need for redesigns. All presented measurements are performed with a 26-MHz reference frequency and at

16

Fig. 11. Measured output spectrum of the


fractionalGHz. All spurious tones are well below 75 dBc/Hz.

N PLL at 1.76592

1.76592 GHz, i.e., for a fractional division by 67.92 for comparmodulators


ison with the simulated results. The input to the
), resulting in a frequency resolution of
is a 16-bit word (
around 400 Hz. The power-supply voltage is only 2 V. Fig. 11
shows the output spectrum of the fractional- PLL over a span
of 55 MHz. The reference spurs are well below 75 dBc, due
to the careful charge-pump timing control.
To measure the fractional performance of the frequency synthesizer, the Matlab data is stored in the data generator memory.
Unfortunately, the maximum memory capacity is only 128 kbit,
leading to large spurious tones at the output at low offset frequencies. These large tones corrupt the gain calibration, which is
performed by the phase-noise measurement system every offset
frequency decade, such that accurate measurements of the phase
noise at offsets smaller than 10 kHz are not feasible. The measured phase noise of the PLL with the MASH modulator and the

DE MUER AND STEYAERT: CMOS MONOLITHIC FREQUENCY SYNTHESIZER FOR DCS-1800

SUMMARY

843

OF

TABLE II
MEASURED SPECIFICATIONS COMPARED
DCS-1800 SPECIFICATIONS

TO THE

16

Fig. 12. Phase-noise measurement with the


single-loop multibit converter
at 1.76592 GHz compared to the phase noise at integer division (light).

Fig. 13. Phase noise measurement with the MASH converter at 1.76592 GHz
compared to the simulated
noise at the output of the PLL (dashed), and
control (dash-dotted).
with the simulated PLL output without

16

16

single-loop multibit modulator is presented in Figs. 12 and 13.


Small spurs are present at 2.08 MHz as predicted by the simulations in Fig. 6. The spur level is well below 100 dBc, due to
careful PFD charge-pump design. The phase noise at 600 kHz
is lower than 120 dBc/Hz.
In Fig. 12, the measured phase noise of the PLL with a
multibit single-loop modulator (dark) is compared to the phase
noise at integer division (light). Noise at lower offsets origimodulator due to noise folding in the PFD,
nates from the
as predicted by the simulations. As a result, the rms phase error
is increased from 1.7 to 3 . Note that the phase noise
of the PLL at integer divisions is as low as 124 dBc/Hz
at 600 kHz, which is only 0.3 dB higher than predicted by
the PLL simulations (see Table I). The measured results for
fractional division are much noisier than predicted by simulation. The phase noise at offset frequencies close to 10 kHz
is increased due to the limited memory of the data generator.
The noise at higher offset frequencies is corrupted by noise
coupling from the data generator. As can be seen in Fig. 10,
-control bonding wires, which conduct rail-to-rail, very
the

noisy control pulses are close to the LC tank and the bonding
wires of the VCO power supply. Without proper shielding, the
VCO phase noise is seriously degraded by this noise coupling.
noise and the
noise as simIn Fig. 13, the measured
ulated in Section III (dashed) is compared. The dash-dotted line
control. The
is the simulated phase noise of the PLL without
noise leakage closely matches the measured resimulated
sults, except at very low offsets due to the limited memory. The
phase noise at high offsets is increased versus the simulated PLL
results due to noise coupling. Second-order tones are larger in
measurements, since the models in the simulator do not include
second-order effects and noise coupling. Tones at 520 kHz are
believed to come from subharmonic tones present in the
modulator output [5], which are amplified by mixing through
noise coupling. When comparing the results for the MASH and
the single-loop modulator, the measured results are less pronounced than the simulated results (see Fig. 6). The measured
phase noise for the single-loop modulator is however a few decibels lower than for the MASH modulator. Note that all measurements are performed for frequencies close to integer multiples
.
of
The measured settling time of the PLL is 226 s for a
104-MHz frequency step. The power consumption of the PLL
is 70 mW from a 2-V power supply. The fully integrated
low-phase-noise VCO is responsible for almost 66% of the
total power consumption. The IC area is 2 2 mm , including
bonding pads and bypass capacitors. Table II shows the measured specifications compared to the DCS-1800 specifications
[1]. The specifications of the IC prototype comply with the
is degraded due to the limited
DCS-1800, only the
resolution of the measurement setup.
VI. CONCLUSION
A monolithic 1.8-GHz
-controlled fractional- PLL
frequency synthesizer is implemented in a standard 0.25- m
CMOS technology. The monolithic fourth-order type-II PLL

844

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

integrates the digital synthesizer part together with a fully


integrated LC VCO, a high-speed prescaler, and a 35-kHz
dual-path loop filter on a die of only 2 2 mm . To investigate
modulator on the synthesizers spectral
the influence of the
purity, a fast nonlinear analysis method is developed, showing
good correspondence with measurements, in contrast to the
results of the theoretical analysis. Nonlinear mixing in the
phase-frequency detector and the VCO is identified as the main
fractional- synthesizers.
source of spectral pollution in
modulators are compared
MASH and single-loop multibit
for use in fractional- synthesis. Although the MASH is stable
and easy to integrate, the single-loop modulator presents a
better solution, showing less sensitivity to noise leakage and
noise coupling and providing more flexibility. The measured
phase noise is lower than 120 dBc/Hz at 600 kHz and
139 dBc/Hz at 3 MHz. The measured fractional spur level is
lower than 100 dBc, satisfying the DCS-1800 spectral purity
requirements. All measurements are performed for frequencies
close to integer multiples of the reference frequency, where the
synthesizer is most sensitive to spurious tones.
REFERENCES
[1] M. S. J. Steyaert, J. Janssens, B. De Muer, M. Borremans, and N. Itoh, A
2-V CMOS cellular transceiver front-end, IEEE J. Solid-State Circuits,
vol. 35, pp. 18951907, Dec. 2000.
[2] T. Cho, E. Dukatz, M. Mack, D. Macnally, M. Marringa, S. Mehta, C.
Nilson, L. Plouvier, and S. Rabii, A single-chip CMOS direct-conversion transceiver for 900-MHz spread-spectrum digital cordless phones,
in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, San
Francisco, CA, Feb. 1999, pp. 228229.
[3] A. Rofougaran, G. Chang, J. J. Rael, J. Y.-C. Chang, M. Rofougaran, P.
J. Chang, M. Djafari, J. Min, E. W. Roth, A. A. Abidi, and H. Samueli,
A single-chip 900-MHz spread-spectrum wireless transceiver in 1-m
CMOSPart II: Receiver design, IEEE J. Solid-State Circuits, vol. 33,
pp. 547555, Apr. 1998.
[4] M. Copeland, T. Riley, and T. Kwasniewski, Deltasigma modulation
in fractional-N frequency synthesis, IEEE J. Solid-State Circuits, vol.
28, pp. 553559, May 1993.
[5] S. R. Norsworthy, R. Schreier, and G. C. Themes, DeltaSigma Data
Converters: Theory, Design and Simulation. New York: IEEE Press,
1997.
[6] B. Miller and R. Conley, A multiple modulator fractional divider,
IEEE Trans. Instrum. Meas., vol. 40, pp. 578583, June 1991.
[7] Digital cellular communication system (Phase 2+); Radio transmission
and reception, Eur. Telecommun. Standards Inst., ETSI 300 190 (GSM
05.05 version 5.4.1), 1997.
[8] W. Rhee, B.-S. Song, and A. Ali, A 1.1-GHz CMOS fractional-N
modulator, IEEE J.
frequency synthesizer with a 3-b third-order
Solid-State Circuits, vol. 35, pp. 14531460, Oct. 2000.
[9] The Mathworks Inc., Matlab Users Guide, Version 5. Englewood
Cliffs, NJ: Prentice Hall, 1997.
[10] J. Yuan and C. Svensson, New single-clock CMOS latches and flipflops with improved speed and power savings, IEEE J. Solid-State Circuits, vol. 32, pp. 6269, Jan. 1997.

16

[11] B. De Muer and M. S. J. Steyaert, A single-ended 1.5-GHz 8/9 dualmodulus prescaler in 0.7-m CMOS with low phase-noise and high
input sensitivity, in Proc. Eur. Solid-State Circuits Conf. (ESSCIRC),
The Hague, Sept. 1998, pp. 256259.
[12] J. Craninckx and M. S. J. Steyaert, Low-phase-noise fully integrated
CMOS frequency synthesizers, Ph.D. dissertation, Katholieke Univ.
Leuven, Belgium, 1997.
[13] B. De Muer, M. Borremans, N. Itoh, and M. S. J. Steyaert, A 1.8-GHz
highly tunable low-phase-noise CMOS VCO, in Proc. IEEE Custom
Integrated Circuits Conf. (CICC), Orlando, FL, May 2000, pp. 585588.
[14] B. De Muer and M. S. J. Steyaert, Fully integrated CMOS frequency
synthesizers for wireless communications, in Analog Circuit Design,
W. Sansen, J. H. Huijsing, and R. J. van de Plassche, Eds. Norwell,
MA: Kluwer, 2000, pp. 287323.
[15] F. M. Gardner, Phaselock Techniques. New York: Wiley, 1979.

Bram De Muer (S00) was born in Sint-Amandsberg, Belgium, in 1973. He received the M.Sc.
degree in electrical engineering in 1996 from the
Katholieke Universiteit Leuven, Belgium, where
he is currently working toward the Ph.D. degree
on high frequency low-noise integrated frequency
synthesizers at the ESAT-MICAS laboratories.
He has been a Research Assistant with
ESAT-MICAS laboratories since 1996. His research
is focused on integrated low-phase-noise VCOs with
on-chip planar inductors and high-speed prescaler
design, leading to fully integrated
fractional-N synthesizers in CMOS
technology.

16

Michel S. J. Steyaert (S85A89SM92) was born


in Aalst, Belgium, in 1959. He received the M.S.
degree in electrical-mechanical engineering and
the Ph.D. degree in electronics from the Katholieke
Universiteit Leuven (K.U. Leuven), Heverlee,
Belgium, in 1983 and 1987, respectively.
From 1983 to 1986, he obtained an IWONL fellowship (Belgian National Foundation for Industrial
Research) which allowed him to work as a Research
Assistant at the Laboratory ESAT at K.U. Leuven.
In 1987, he was responsible for several industrial
projects in the field of analog micropower circuits at the Laboratory ESAT as
an IWONL Project Researcher. In 1988, he was a Visiting Assistant Professor
at the University of California, Los Angeles. In 1989, he was appointed by
the National Fund of Scientific Research (Belgium) as a Research Associate,
in 1992 as a Senior Research Associate, and in 1996 as a Research Director
at the Laboratory ESAT, K.U. Leuven. Between 1989 and 1996, he was also
a part-time Associate Professor and since 1997 an Associate Professor at
the K.U. Leuven. His current research interests are in high-performance and
high-frequency analog integrated circuits for telecommunication systems and
analog signal processing.
Dr. Steyaert received the 1990 European Solid-State Circuits Conference
Best Paper Award, the 1995 and 1997 ISSCC Evening Session Award, the
1999 IEEE Circuit and Systems Society GuilleminCauer Award, and the
1991 NFWO Alcatel-Bell-Telephone award for innovative work in integrated
circuits for telecommunications.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 5, MAY 1997

691

A 960-Mb/s/pin Interface for Skew-Tolerant Bus


Using Low Jitter PLL
Sungjoon Kim, Student Member, IEEE, Kyeongho Lee, Student Member, IEEE, Yongsam Moon, Student Member, IEEE,
Deog-Kyoon Jeong, Member, IEEE, Yunho Choi, and Hyung Kyu Lim, Member, IEEE
AbstractThis paper describes an I/O scheme for use in a highspeed bus which eliminates setup and hold time requirements
between clock and data by using an oversampling method. The
I/O circuit uses a low jitter phase-locked loop (PLL) which
suppresses the effect of supply noise. Measured results show peakto-peak jitter of 150 ps and rms jitter of 15.7 ps on the clock line.
Two experimental chips with 4-pin interface have been fabricated
with a 0.6-m CMOS technology, which exhibits the bandwidth
of 960 Mb/s per pin.
Index Terms Skew-tolerant, high speed bus, oversampling,
phase locked loop, jitter, CMOS, phase frequency detector, voltage controlled oscillator.

I. INTRODUCTION

S the speed of high-speed digital systems tends to be


limited by the bandwidth of pins, new I/O architectures
are gaining momentum over conventional ones. The advent
of 64 Mb and 256 Mb DRAMs and faster logic chips also
propels the need for high-speed I/O interface while reducing
the number of pins and hence the system cost. Synchronous
DRAMs increased chip bandwidth up to 220 Mb/s/pin [1]. A
revolutionary architecture using delay-locked loops (DLLs) or
phase-locked loops (PLLs) was also successful in providing
over 500 Mb/s/pin bandwidth [2], [3]. Such a narrow, highspeed bus provides large bandwidth in a small, low pin-count
package, but such high-speed bus architectures inevitably
require strict phase relationships between clock and data.
A phase-tolerant I/O scheme was also developed previously
for a point-to-point link [4]. This paper describes an I/O
scheme for use in a high-speed bus which eliminates setup
and hold time margins by using blind 3 oversampling and
data recovery. In the new scheme, the clock line delivers
only frequency information. The data receiving circuits extract
phase information from the data itself. An 8-b data bus
employing this skew insensitive scheme can deliver over
960 MB/s. Two experimental chips with 4-pin interface were
fabricated.
In Section II, the chip architecture and the skew-tolerant I/O
scheme will be presented. The circuit design techniques for
low jitter PLL and other circuits are discussed in Section III.

Manuscript received August 20, 1996; revised December 3, 1996.


S. Kim, K. Lee, Y. Moon, and D.-K. Jeong are with the Inter-University
Semiconductor Research Center, Seoul National University, Seoul 151-742,
Korea.
Y. Choi and H. Lim are with Samsung Electronics Co., Yongin-City,
Kyungki-Do, Korea.
Publisher Item Identifier S 0018-9200(97)02850-3.

The chip layout and experimental results are presented in


Section IV followed by a conclusion in the final section.

II. SYSTEM ARCHITECTURE


Two chips, bus master and bus slave, were designed. Bus
masters in a system bus initiate bus transactions, and slaves respond to the tenured master. For example, a memory controller
works as the master chip and a memory with a high-speed
interface works as the slave chip. A simplified block diagram
of the two chips is shown in Fig. 1. The bus signals are
composed of 4-b wide data lines, a clock line, and a reference
line. A charge pump PLL multiplies the external clock by
two and generates two sets of multiphase clocks for both bit
serialization and data oversampling. The relationship between
internal 12-phase clocks and external clock is shown in Fig 2.
First set of multiphase clocks are 12 multiphase clocks with
30 of phase separation. These 12 clocks are shown in Fig 2(a)
as PCK[0] to PCK[11]. These multiphase clocks were laid out
to minimize the interference. Fig 2(b) shows the multiphase
clock distribution. Ground lines were inserted between each
multiphase clock to minimize the interference. When one
clock is switching, the adjacent clocks are guaranteed to
be in stable state. This configuration minimizes coupling
between clocks. The second set of multiphase clocks are
four multiphase clocks with 90 of phase separation. This
second set of multiphase clocks, TCK[0] to TCK[3], are in
phase with PCK[0], PCK[3], PCK[6], PCK[9], respectively.
We generate these two separate sets of clocks to equalize
loading conditions.
An 8-b parallel data stream is first converted to a 4-b
data stream by an internal clock and then serialized with a
serialization circuit. The serializer circuit used is the same
type of circuit reported in [4]. The only difference is that
four phase clocks instead of ten phase clocks of the previous
design are used in this design, thereby reducing area and
parasitic capacitance at high-speed nodes. The serial stream
is driven by a current controlled open-drain output driver.
The second set of multiphase clocks, TCK[0] to TCK[3],
are used by the transmitter to serialize 4 b of data. Each
pin connected to a high-speed bus has 12 oversamplers and
a output driver. In [6], 32 clock phases are generated to
oversample the incoming data. The decision on the degree
of oversampling is a tradeoff between input data phase jitter
tolerance, power, and area. If too many clock phases are used
per bit period, power consumption and chip area will increase.
But low oversampling ratio may affect the tolerance of phase

00189200/97$10.00 1997 IEEE

692

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 5, MAY 1997

Fig. 1. Simplified block diagram of master and slave chip.

(a)

(b)
Fig. 2. (a) External clock and 12 multiphase clocks relationship. (b) Multiphase clock layout.

jitter on the incoming data. If the phase jitter on the incoming


data is low and the PLL has low jitter characteristics, the
oversampling ratio can be as low as three [7]. The oversampler
oversamples the bus data three times per bit using 12 phase
clocks provided by a PLL. To extract correct phase information
from the data stream, the high-to-low transition is inserted in
each head of a packet on each pin for correct data sampling.
The slaves of the bus keep oversampling the bus signals to
catch the start of a bus transfer. This process is illustrated in

Fig. 3. The serial input data is sampled at the rising edges


of each multiphase clock. The receiver samples the serial
data blindly without any constraint on setup and hold time
margins. The sampled data is amplified again regeneratively to
reduce possible metastability. Fig. 3 shows two high-speed bus
signals, bus signal 0 and bus signal 1, with skew between them.
When the signal receiver detects the first 1-to-0 transition, it
selects the next bit as the first valid data. The third bit after
the first valid bit is also selected as valid. It is assumed

KIM et al.: 960-Mb/s/pin INTERFACE FOR SKEW-TOLERANT BUS

693

Fig. 3. Skew-insensitive bus operation.

Fig. 4. Byte skew handling operation.

Fig. 5. Functional block diagram of charge pump PLL.

Fig. 7. Implemented phase frequency detector.

Fig. 6. Conventional phase frequency detector.

that the next oversampled bit after the first 1-to-0 transition
was sampled near the center of data eye pattern. Each pin
of the data bus tracks the start phase of a data transfer
separately. After each pin catches the start of a data transfer,
the demultiplexed data of each pin is retimed into a single
internal clock domain. Since this process can be done in one
clock cycle, the masters can respond quickly as distance from
the signal source changes.
Since this scheme allows skew not only in clock line but
also among data lines, there is a possibility that some of
the demultiplexed parallel data are one internal clock cycle
earlier or later than the other demultiplexed data after retiming.

694

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 5, MAY 1997

Fig. 8. (a) PFD dead zone and (b) PLL jitter.

Fig. 9. Voltage controlled oscillator circuit diagram.

The skew handler examines the parallel output of each pin


and checks whether every pin is aligned properly. If some
of the parallel outputs are not aligned, skew handler delays
the parallel outputs which arrived earlier. Fig 4 explains the
operation of the interpin skew handler operation.
III. CIRCUIT DESCRIPTION
A. Low Jitter PLL
The performance of the PLL or DLL is one of the limiting
factors of the high-speed interface or serial communications.
The jitter characteristics become more important especially for
such applications that require integration of PLL or DLL with
noisy digital circuits. Integration with digital circuits induces
noise on the supply rails or on the substrate. Since the charge
pump PLL used in this design generates multiple phase clocks
to divide one external clock period into many equally spaced
intervals, the accuracy and the jitter characteristics become
more important.
Fig. 5 shows the functional block diagram of a charge
pump PLL clock generator. It consists of a phase frequency

Fig. 10. Simulated UP/DOWN pulse width difference as a function of input


phase difference.

detector, charge pump, loop filter, clock divider, and a voltagecontrolled oscillator (VCO). With a six-stage differential VCO,
12 clock phases are available to oversample the incoming data
and to serialize parallel data into serial bit stream.
One of the critical building blocks of the PLL is the phase
frequency detector (PFD). A low precision PFD has a wide
dead zone (undetectable phase difference range), which results
in increased jitter. The jitter caused by the large dead zone can
be reduced by increasing the precision of the phase frequency
detector. Fig. 6 shows a conventional implementation of a
static PFD [8]. This conventional PFD is an asynchronous state
machine. The delay time to reset all internal nodes determines
the circuit speed. The critical path of the conventional PFD
is shown in bold lines in Fig. 6. The critical path forms a
feedback path with six gate delays. The dead-zone occurs when
the loop is in a lock mode and the output of the charge pump

KIM et al.: 960-Mb/s/pin INTERFACE FOR SKEW-TOLERANT BUS

Fig. 11.

695

Voltage controlled oscillator circuit diagram.

does not change for small changes in the input signals at the
PFD. Any width of the dead-zone directly translates to jitter
in the PLL and must be avoided.
To overcome the speed limitation and to reduce the dead
zone, a new dynamic logic style PFD was designed. A
similar dynamic comparator was reported before [9]. But our
implementation requires fewer number of transistors. Fig. 7
shows the circuit diagram of the PFD. Conventional static
logic circuitry was replaced by dynamic logic gates. As a
result, the number of transistors in the PFD core is reduced
from 44 to 16. The critical path of this PFD is shown also
in Fig. 7. The critical path of this PFD is composed of threegate feedback path. The shortened feedback path delay and
dynamic operation allow high precision in the high-frequency
operation.
Fig. 8. shows the relation between dead zone of PFD and
the phase error of PLL. If the phase difference of EXT clock
and VCO clock is smaller than the dead zone, the PFD cannot
detect the phase difference. So the phase error signal of PFD
will remain zero, resulting in unavoidable phase error between
EXT clock and VCO clock. The minimum peak-to-peak phase
error caused by this dead zone is
Minimum Peak-to-Peak Phase Error

(1)

In order to avoid dead zone, the PFD asserts both UP and


DOWN outputs as shown in Fig 9. For in-phase inputs of
EXT_CLK and VCO_CK, the charge pump will see both
UP and DOWN pulse for the same short period of time. If
there is a phase difference between EXT_CLK and VCO_CK,
the width of UP and DOWN pulse will be proportional to
the phase differences of the inputs. Fig. 10 shows the SPICE
simulation result of the UP/DOWN pulse width differences
as a function of the input phase differences. The deadzone of
the PFD is significantly smaller than the measured maximum
PLL jitter.
Several critical parameters of the PLL, such as speed, timing
jitter, spectral purity, and power dissipation, strongly depend

Fig. 12. VCO operation for step supply noise.

on the performance of the VCO. So the noise insensitivity of


the VCO is very important. The VCO implemented in this
design has a simple bias circuit to reject supply step noise.
The processor or bus can have intervals when there is heavy
circuit activity in switching large amounts of capacitance and
intervals when there is very little circuit activity. This will
show up as steps or impulses on the power supply of PLL [8].
The actual peak-to-peak jitter in this case becomes dominated
by the peaks in the impulse transient noise response. The VCO
used in the design is a six-stage differential-type ring oscillator
with limited voltage swing and is shown in Fig. 11. Each stage
is made up of a differential NMOS pair with variable resistance
loads made of PMOS devices operating in the triode region.
The bias voltage for the PMOS is generated by a replica bias
circuit. The operation of this bias circuit is shown in Fig. 12.
voltage dynamically tracks the supply variations. The
The
replica bias circuit which consists of replica delay cell and an
op-amp sets the minimum voltage level of the internal VCO
The
signal is generated by two resistors and
swing to
one capacitor. When the supply rail is quiet, the voltage swing
Let us assume that there is a
of the internal VCO is
at some point. After the
supply voltage step variation of

696

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 5, MAY 1997

Fig. 13.

Phase and byte sync block diagram.

Fig. 14.

Sampler circuit diagram.

step change at the supply, the

level settles to
(2)

to the increased supply voltage for a short period of time.


And the voltage swing at the VCO increases with a time
constant determined by
and OPAMP bandwidth
and approaches to

with a time constant of


(4)
(3)
At the instant of supply step change, the voltage difference
and
remains the same due to the capacitor
between
at the
generator. If
is fixed, the delay cells run
a little bit faster due to the supply voltage increase instead
remains the
of keeping exact constant delay. Since
same temporarily, the delay cells run a little bit faster due

which result in the increase of one stage delay. This gives


an averaging effect on the VCO delay after the supply step
change, making the delay change minimized with supply step
change. If we select
and
values for a minimum
average delay change, the effect of supply step change can be
nullified. The values we chose for this particular process are
k
k and
pF.

KIM et al.: 960-Mb/s/pin INTERFACE FOR SKEW-TOLERANT BUS

697

PLL circuits can be sensitive to noise pickup from the


supplies and substrate. So the PLL circuit has a dedicated
power and ground pads. Bypass capacitors are included in the
layout to stabilize VDD and GND of PLL. Guard rings are
used to isolate PLL and other digital parts. The placement of
multiphase clocks were carefully chosen to remove possible
coupling between clocks.
B. Phase and Byte Sync
Phase and byte sync block at Fig. 1 is shown in Fig 13.
It consists of 3-to-1 mux array, metastability resolver, start
bit finder, phase memory, word memory, shifter, and D-flipflops (DFFs). This circuit finds the start bit and decimates
the oversampled 12 b and aligns the byte boundary. The
oversampled 12 b are sent from the sampler to the metastability
resolver. Since the oversampled 12 b are not sampled at
the center of the eye, there is a possibility that some of
the bits are still at the metastable state. The metastability
is practically removed by one more stage of synchronizers
in the metastability resolver. The start bit finder receives
information from the metastability resolver and selects one
of the three phases as a correct phase and also extracts byte
align information. The phase and byte align information are
stored at the phase and word memory. The 3-to-1 mux array
decimates 12 b into 4 b. The shifter at the final stage aligns
the byte boundary according to the value of the word memory.

Fig. 15. Microphotograph of master chip.

C. Oversampler
The oversampler used in the data receiver is shown in
Fig. 14. Each oversampler is a cascaded sense amplifier and
uses four clocks for correct, timely sampling. It is very
important to reduce the probability of metastability by careful
design and layout. The same size is used for both PMOS and
NMOS in the core synchronizing amplifier to maximize the
loop bandwidth.
IV. EXPERIMENTAL RESULTS
Two prototype chips, master and slave, have been fabricated
in a 0.6- m double-metal CMOS process. Fig. 15 shows the
microphotograph of the fabricated master chip. This chips
occupies 4100 m 4300 m including pad area. The master
chip incorporates a common skew-insensitive I/O macro block,
a bus protocol handler, and a self-test circuit for chip and
system diagnostics. The common skew-insensitive I/O macro
block includes a charge-pump PLL for multiphase generation,
oversamplers, I/O buffers, parallel-to-serial converters, and a
bias generator for internal use. The core area for the skewinsensitive I/O macro block is 3600 m 700 m for 4-pin
interface. The microphotograph of the fabricated slave chip
is shown in Fig 16. It has the same die size as the master
chip. Many blocks are shared with the master chip. The skewinsensitive I/O macro block and the charge pump PLL are the
same as those of the masters. The slave chip includes a small
internal fast SRAM to verify correct read/write operations.
The measured charge pump PLL jitter histogram of the
master and the slave chips is shown in Fig. 17. Since the two
chips use the same PLL, it showed similar jitter performance.

Fig. 16. Microphotograph of slave chip.

The rms jitter is 15.7 ps when the tested chip is active. The
peak-to-peak jitter was measured to be less than 150 ps. This
PLL jitter characteristic is especially important for multiphase
operation.
Fig. 18 shows an output data waveform at 960 Mb/s. The
master chip is sending data to the bus according to the
predetermined bus protocol. The jitter at the output data is
larger than the jitter at the charge pump PLL clock due to the
extra modulation effect of supply voltage fluctuation to data

698

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 5, MAY 1997

Fig. 17.

PLL jitter histogram.

Fig. 18.

Output data waveform.

output. The speed limit came from several reasons. CMOS


driving capability limitation and the signal degradation through
chip packaging and printed circuit board (PCB) were among
the main factors.
The skew-insensitive receiving operation was also observed.
There are four high-speed pins in the prototype chip. We
made a PCB with four high-speed impedance controlled bus
lines. The length of normal lines is 12 cm. One of the highspeed signal paths was made intentionally longer than the other
signals by 10 cm. The 960 Mb/s high-speed serial data was
sent into the receiver. The receiver recovers the serial data
into 8-b 120 - MHz parallel data. Fig. 19 shows 120 MHz
recovered parallel data. The upper waveform is from the

TABLE I
MAIN FEATURES OF THE CHIP

2 0.7 mm

Core Area

3.6 mm

Technology

0.6-m double-metal CMOS

Supply Voltage
Data Rate

960 Mb/s

PLL jitter
Power

15.8 ps rms @ 960 Mb/s


0.7 W fully active

3.3 V

pin with a longer trace. The lower waveform is from the


normal length pin. Although the two pins have different trace
lengths, the chips could receive data without errors. The power
dissipation at 960 Mb/s was 0.7 W for the master chip. The
chip characteristics is summarized in Table I.

KIM et al.: 960-Mb/s/pin INTERFACE FOR SKEW-TOLERANT BUS

Fig. 19.

699

Skew-insensitive I/O operation.

V. CONCLUSION
A new high-speed skew-insensitive I/O scheme has been
described in this paper. Two chips that incorporated the new
I/O scheme using the low jitter PLL technique have been
fabricated in a 0.6- m double-metal CMOS process. Three
times oversampling technique relaxed the strict requirement of
setup and hold margins of high-speed chip-to-chip interfaces.
Newly designed fast phase frequency detector and a high
noise immunity VCO circuit improved jitter performance of
PLL. The measured PLL rms jitter was 15.7 ps. Accurate
multiphase clock generation for oversampling the bus signal
was made possible by utilizing the low jitter PLL. By using
such techniques, skew-insensitive data transfer was tested.
This skew-insensitive I/O scheme is useful for high-speed
ASIC-to-memory and ASIC-to-ASIC interfaces. This scheme
will become more important as the chip-to-chip data transfer
speed goes up.

REFERENCES
[1] M. Horiguchi et al., An experimental 220 MHz 1 Gb DRAM, in
ISSCC 1995 Dig. Tech. Papers, pp. 252253.
[2] M. Horowitz et al., PLL design for a 500 MB/s interface, in ISSCC
1993 Dig. Tech. Papers, pp. 160161.
[3] T. H. Lee et al., A 2.5 V CMOS delay-locked loop for an 18 Mbit,
500 Megabytes/s DRAM, IEEE J. Solid-State Circuits, vol. 29, pp.
14911496, Dec. 1994.
[4] E. Reese et al., A phase tolerant 3.8 GB/s data-communication router
for a multiprocessor supercomputer backplane, in ISSCC 1994 Dig.
Tech. Papers, Feb. 1994, pp. 296297.
[5] S. Kim et al., A pseudo-synchronous skew-insensitive I/O scheme for
high bandwidth memories, in Proc. Symp. VLSI Circuits, June 1994,
pp. 4142.
[6] M. Bazes and R. Ashuri, A novel CMOS digital clock and data
decoder, IEEE J. Solid-State Circuits, vol. 27, pp. 19341940, Dec.
1992.
[7] S. Kim et al., An 800 Mbps multi-channel CMOS serial link with 3
oversampling, in Proc. IEEE Custom Integrated Circuit Conf., 1995,
pp. 451454.
[8] I. Young et al., A PLL clock generator with 5 to 110 MHz lock range for
microprocessors, IEEE J. Solid-State Circuits, vol. 27, pp. 15991607,
Nov. 1992.

[9] H. Notani et al., A 622-MHz CMOS phase-locked loop with prechargetype phase frequency detector, in Proc. Symp. VLSI Circuits, June 1994,
pp. 129130.

Sungjoon Kim (S91) was born in Pusan, Korea, on


June 2, 1970. He received the B.S. and M.S. degrees
in electronics engineering from Seoul National University in 1992 and 1994, respectively. Since 1994
he has been working toward the Ph.D. degree in the
same university.
He spent the summer of 1995 working on the
limiting factors of CMOS Gb/s transmission at SUN
Microsystems, CA. His research interests include
clock and data recovery for high-speed communication and high-speed I/O interface circuits.

Kyeongho Lee (S92) was born in Seoul, Korea,


on August 5, 1969. He received the B.S. and M.S.
degrees in electronics engineering from Seoul National University in 1993 and 1995, respectively.
He is currently working toward the Ph.D. degree in
electronics engineering of the same university.
He is working on various CMOS high-speed circuits for data communication. His research interests
include high-speed CMOS interface circuits, highspeed video display system, and PLL systems for
Gigabit communication.

Yongsam Moon (S97) was born in Incheon, Korea,


on March 1, 1971. He received the B.S. and M.S. degrees in electronics engineering from Seoul National
University in 1994 and 1996, respectively, where he
is currently working toward the Ph.D. degree.
He has been working on architectures and CMOS
circuits for microprocessors. His current research
interests are in clock and data recovery circuits for
high-speed data communication.

700

Deog-Kyoon Jeong (S87M89) received the B.S.


and M.S. degrees in electronics engineering from
Seoul National University, Seoul, Korea, in 1981
and 1984, respectively, and the Ph.D. degree in
electrical engineering and computer sciences from
the University of California, Berkeley, in 1989.
From 1989 to 1991, he was with Texas Instruments, Dallas, TX, where he was a member of the
technical staff working on the single chip implementation of the SPARC architecture. Since 1991,
he has been on the faculty of the School of Electrical
Engineering and the Inter-University Semiconductor Research Center, Seoul
National University. His main research interests include high-speed circuits,
VLSI systems design, microprocessor architectures, and memory systems.

Yunho Choi was born in Incheon, Korea, on March


29, 1960. He received the B.S. degree in electrical
engineering from Seoul National University, Seoul,
Korea, in 1983.
He joined Samsung Semiconductor Inc., Santa
Clara, CA, in 1983, where he was engaged in
the design of the 256K DRAM. Since 1986, he
has been working on the design of high-density
dynamic memory including synchronous DRAM at
the Semiconductor Research Center, Samsung Electronic Company, Ltd., Kiheung, Korea. Currently he
is in charge of specialty memory design such as graphics memory and merged
DRAM and logic product development.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 5, MAY 1997

Hyung Kyu Lim (S82M84) was born February


4, 1953, in Kyung-Nam, Korea. He received the B.S.
degree from the Seoul National University, Seoul,
Korea, the M.S. degree from the Korea Advanced
Institute Science and Technology, and the Ph.D.
degree from the University of Florida, Gainesville,
all in electrical engineering, in 1976, 1978, and
1984, respectively.
Since 1976, he has been with the Semiconductor
Research and Development Center, Samsung Electronics Co., Kiheung, Korea. From 1978 to 1981,
he was engaged in the development of bipolar linear integrated circuits
and CMOS watch chips. After finishing his Ph.D. study, he worked mainly
in the area of high-density MOS memory development. Starting from a
64 Kb EEPROM design in 1984, he led various memory device research
and development projects that include 256 Kb EEPROM, 16 Mb mask
ROM, 1 Mb high-speed static Ram, and 1/3 inch CCD image sensor. He
is currently responsible for design engineering of all MOS memory research
and development projects in which dynamic RAM and specialty memories are
added. He has authored or coauthored over 20 technical journal and conference
papers and holds 23 patents.
Dr. Lim is a member of the IEEE Electron Device Society.

784

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

A Dual-Loop Delay-Locked Loop Using Multiple


Voltage-Controlled Delay Lines
Yeon-Jae Jung, Seung-Wook Lee, Daeyun Shim, Wonchan Kim, Changhyun Kim, Member, IEEE, and Soo-In Cho

AbstractThis paper describes a dual-loop delay-locked loop


(DLL) which overcomes the problem of a limited delay range by
using multiple voltage-controlled delay lines (VCDLs). A reference
loop generates quadrature clocks, which are then delayed with
controllable amounts by four VCDLs and multiplexed to generate
the output clock in a main loop. This architecture enables the DLL
to emulate the infinite-length VCDL with multiple finite-length
VCDLs. The DLL incorporates a replica biasing circuit for
low-jitter characteristics and a duty cycle corrector immune to
prevalent process mismatches. A test chip has been fabricated
using a 0.25- m CMOS process. At 400 MHz, the peak-to-peak
jitter with a quiet 2.5-V supply is 54 ps, and the supply-noise
sensitivity is 0.32 ps/mV.

(a)

Index TermsClock synchronization, delay-locked loop, duty


cycle corrector, replica biasing, voltage-controlled delay lines.
(b)

I. INTRODUCTION

Fig. 1. (a) Block diagram of a conventional DLL. (b) Lock-failure cases.

OR high-performance microprocessors and memory ICs,


the use of phase-locked loops (PLLs) or delay-locked loops
(DLLs) is essential to minimize the negative effects caused by
skews and jitters of clock signals. In applications where the frequency multiplication is not required, a DLL is a natural choice
since it is free from the jitter accumulation problem of an oscillator-based PLL. Conventional DLLs, however, suffer from the
problem of their limited delay range since DLLs adjust only the
phase, not the frequency.
We propose a new dual-loop DLL architecture that allows unlimited delay range by using multiple voltage-controlled delay
lines (VCDLs). In our architecture, the reference loop generates four evenly spaced clocks, which are then delayed with
controllable amounts by four VCDLs and multiplexed to generate the output clock in the main loop. The selection and delay
control in the main loop permit the DLL to emulate the infinite delay range with a multiple of finite-length VCDLs. Moreover, a fully analog control technique can be applied to exploit
the established benefits of conventional DLLs such as low skew
and low jitter. To reduce supply-noise sensitivity further, a new
low-jitter scheme is employed in a replica biasing circuit, which
compensates the delay variation of a delay line against the injected supply noise. Finally, a duty cycle corrector immune to
process mismatches is also used.
This paper is arranged as follows. In Section II, following a
brief overview of conventional DLLs, the proposed architecture
Manuscript received June 27, 2000; revised October 15, 2000.
Y.-J. Jung, S.-W. Lee, D. Shim, and W. Kim are with the School of Electrical
Engineering, Seoul National University, Seoul 151-742, Korea.
C. Kim and S.-I. Cho are with Samsung Electronics Company, Kyungki-do,
Korea.
Publisher Item Identifier S 0018-9200(01)03028-1.

is described with design concepts and various building blocks.


Section III describes circuits for low-jitter scheme and duty
cycle correction. Section IV discusses the prototype chip
implementation and shows experimental results. Section V
concludes this paper with a summary.
II. ARCHITECTURE
A. Limited Range Problem of Conventional DLLs
A simplified block diagram of a conventional DLL [1] is outlined with its lock-failure cases in Fig. 1. In the normal condi) to be aligned
tion, the DLL forces the output clock (
) through the negative
with the input reference clock (
feedback loop, which comprises a voltage-controlled delay line,
a phase detector, a charge pump, and a loop filter. The clock
buffer (CLK-BUF) is inserted to provide the chip-wide clock.
Although this simple architecture offers many design flexibilities, the main problem in the conventional DLL of Fig. 1(a)
) has a minimum
is that the delay time of the VCDL (
and a maximum boundary. Therefore, the DLL has states in
which it does not work, as shown in Fig. 1(b). When
has a maximum delay and the
leads the
, DN
pulses are generated but the VCDL can not produce any more
has a minimum delay
delay. On the other hand, when
lags the
, UP pulses are generated but
and the
the VCDL cannot reduce any more delay. These lock-failure
is limited
cases arise from the facts that the range of
is not known at loop startup. An
and the initial value of
additional loop startup control circuitry may solve this problem
and the DLL acquire lock. Unfortunately, the delay time of the
)
clock buffer and following clock distribution tree (

00189200/01$10.00 2001 IEEE

JUNG et al.: DUAL-LOOP DELAY-LOCKED LOOP

Fig. 2.

Block diagram of the proposed dual-loop DLL.

deviates from the value at the simulation stage according to


temperature and voltage variations [2]. When the variation of
is excessive, the DLL loses the lock and falls into
the lock-failure cases in Fig. 1(b).
A DLL relying on quadrature phase mixing [3] has been
proposed to overcome the limited range problem of the conventional DLL. The phase mixing technique using quadrature
clocks provides unlimited phase shift capability. However,
phase mixing uses two small slew-rate clocks to obtain linear
results. Therefore, this approach has the disadvantage of the
increased dynamic noise sensitivity and jitter. In the semidigital DLL [4], a digitally controlled phase interpolator uses
internally generated 30 -spaced clocks through the dual DLL
architecture. Although noise sensitivity issues on the phase
interpolation could be alleviated by smaller interpolation
intervals, inherent digital nature causes dithering around zero
phase error due to continuous control-bit updates. A digital
DLL architecture with infinite phase capture ranges [5] is also
not free from the same dithering problem and requires a large
chip area for fine delay control.
B. Proposed Dual-Loop DLL
Fig. 2 shows a block diagram of the proposed dual-loop DLL
architecture [6]. This architecture is based on two loops: the reference loop and the main loop. The reference loop is locked
at 180 phase shift through the conventional DLL architecture.
Since the reference loop VCDL is composed of four main delay
cells, each delay cell generates a 45 phase shift at locked condition. All delay cells including delay buffers are differential elements commonly controlled by the output of the charge pump.
The delay cell named 3 means three parallel-connected delay
cells, so that the load balance between 0 and 180 clock is
preserved. The reference loop provides two differential clocks
spaced by 90 to the main loop. To cover the entire 360 phase
range, clocks from the reference loop are partially inverted and
inputted to four sets of VCDL in the main loop. Each main loop
VCDL is composed of three delay cells and generates low swing
,
, and
. These clocks expeinternal clocks- ,
rience the analog delay time control by two kinds of four con-

785

trol voltages generated from two main loop charge pumps. The
multiplexer selects one of four clocks as
and this clock
feeds the clock buffer whose function is to convert low swing to
full CMOS-level as well as provide the chip-wide output clock,
. The
drives the phase detector which compares
it to the reference clock. The output of the phase detector is used
by two charge pumps and four loop filters to control the delay
time of each main loop VCDL. Four-to-one clock switching
is implemented by the window finder and the state decoder
block. The window finder monitors the boundary where the seis switched and forces the state decoder to update
lected
the two-bit selection code at the switching event. The selection
code not only controls the clock selection at the multiplexer but
changes the configuration of two charge pumps and four loop
filters to accommodate the clock switching. Duty cycle correction (DCC) is employed to remove the duty cycle imperfections
and the output clock
. Fiof the input clock
and
, can
nally, although two input clocks,
be merged into one clock input, lower jitter clock source is pre, if possible, since it determines the jitter
ferred as the
characteristics of the whole DLL.
In this architecture, the clock selection scheme enables the
output clock to cover the entire phase range (modulo ). Furthermore, seamless clock switching is possible by optimizing
the main loop VCDL delay control scheme. Moreover, the phase
locking is achieved by fully analog control in all loops, so that
we can apply low-skew and low-jitter techniques, established in
conventional DLLs.
C. Reference Loop Design
The objectiveness of the reference loop is to provide quadrature clocks to the main loop. Since the main loop uses these
multiphase clocks as references, the phase distribution in the
output clocks should be preserved against a possible harmonic
lock. The reference loop phase detector depicted in Fig. 3(a) has
the capability to detect and escape up to the second harmonic
lock. This design is made of two level-sensitive AND/NAND logic
which requires 45 and 90 clocks as well as 0 and 180 clocks.
At one period lock, clocks and UP/DN output waveforms are
shown in Fig. 3(b). The phase detector asserts their UP and DN
outputs for equal duration due to 45 clock in order to avoid a
dead-zone problem, although the phase offset of the reference
loop gives negligible effects on the offset of the main loop output
clock. At the second harmonic lock as shown in Fig. 3(c), the
phase detector detects that the loop is in the harmonic lock due
to 90 clock and asserts only UP output to escape the harmonic
lock. By limiting the delay range of a delay line, there is no possibility of harmonic lock over third since the reference loop is
composed only of delay cells with no additional delay elements
such as the clock buffer.
D. Main Loop Design
The main loop design is focused on the selection control and
delay control of the main loop VCDL to achieve the infinite
delay range by using four finite-length VCDLs. Fig. 4(a) shows
the conceptual timing diagram of the main loop VCDL selection
clock is selected as
, the
control. Assuming

786

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

Fig. 4. Selection control of the main loop VCDL. (a) Conceptual timing
diagram. (b) Block diagram of the control logic.

Fig. 3. Reference loop phase detector. (a) Block diagram . (b) Operation at 1
period lock. (c) Operation at 2 period lock.

moves in the movable range according to the output of the main


loop phase detector. Other clocks remain fixed at the initial
phase relationship spaced by 90 . When the rising edge of the
coincides with that of
(or
) clock, select up
is changed to
(or select down) is generated and then
(or
) clock. Now
(or
) clock acts as a new
selected clock in a right-shifted (or left-shifted) movable range.
Thus, clock switching at the quadrant boundaries can be repeated in this manner, to cover the entire phase range. Fig. 4(b)
shows a block diagram of the selection control logic. Since the
passes through the MUX stage, a MUX replica is reand all internal
quired for delay matching between the
clocks. Therefore, clock waveforms in Fig. 4(a) are validated.
In the window finder, one inverterone NAND pair makes the
window which is bounded by rising edges of two input clocks.
Thus, four windows are generated. Sampled values of these winenable the window finder to find which
dows by the
belongs to. If the found window is the sewindow the
lect up or select down region, UP or DN signal is generated,
respectively. Then, the state decoder updates two-bit selection
in one clock cycle. Although clock
code to change the
switching occurs immediately after the switching event, there is
since
the possibility of the small delay difference in the
may have a different time position
the rising edge of old
after clock switching. This delay differwith that of new
ence can be represented as a switching jitter at the lock state.
The delay control of the main loop VCDL should be optimized between two conflicting conditions, delay range and
power consumption. More delay cells mean larger delay range
but their power consumption is proportional to the number of

required delay cells. Furthermore, a larger delay causes a larger


jitter. Intuitively, we apply a single control scheme as shown in
rotates and other clocks remain
Fig. 5(a), where only the
fixed in phase space. Thus, clock switching occurs at the quadrant boundaries. Unfortunately, since the required delay range
is from 90 to 90 , this control scheme consumes the same
number of delay cells per VCDL as those in the reference loop.
In order to reduce the number of required delay cells, a differential delay control scheme is employed. The differential conrotates counterclockwise, all
trol means that when the
other clocks rotate clockwise with their phase relationship fixed.
If all clocks move with same speed, the required delay range
is from 45 to 45 , as shown in Fig. 5(b). However, if the
must rotate in the opposite direction after switching due
to the delay fluctuation of the reference clock or the clock buffer,
there is the problem of losing the lock since the delay range of
a VCDL was already exhausted. In Fig. 5(c), we adopt a differential delay control with 3 speed difference, where the
moves three times faster than other clocks, so that 3/4 of delay
cells in the single delay control case satisfy the required delay
range, 67.5 to 67.5 . Since 3 speed difference provides a
shared region in the available delay range of two neighboring
clocks, seamless clock switching is possible in any direction
without losing the lock with three delay cells per VCDL.
Fig. 6 shows the configuration of the main loop phase detector, charge pumps, and loop filters. Outputs of the phase detector are connected to the charge pump1 (CP1) directly and to
the charge pump2 (CP2) with inversion. Thus, if the CP1 generates an increasing control voltage for a VCDL which gener, the CP2 generates a decreasing control voltage
ates the
for all other VCDLs. As a result, two substantially identical
charge pumps are used for the differential delay control scheme.
Three times speed difference is implemented by the fact that the
CP1 has one loop filter and the CP2 has three loop filters. In

JUNG et al.: DUAL-LOOP DELAY-LOCKED LOOP

787

Fig. 5. Delay control of the main loop VCDL. (a) Single control with other clocks fixed. (b) Differential control with same speed. (c) Differential control with
3 speed difference.

Fig. 6. Configuration of the main loop phase detector, charge pumps, and loop
filters.

case of clock switching, the selection code alters the connection


between charge pumps and loop filters. Consequently, charge
redistribution occurs between three loop filters except a loop
. This charge redistribution proceeds
filter for the new
rapidly since two different voltages converge into one value. The
fast VCDL control voltage change prevents possible dithering
around the clock switching phase.
Fig. 7 shows one example of the main loop VCDL control
procedure starting at the unlock state. Let us assume the
should be near 180 in phase space to acquire the lock. Initially,
clock is selected as the
assuming the selection code is 00,
. The
rotates counterclockwise in phase space according to outputs of the phase detector. All clocks excluding
rotates clockwise with one-third speed compared to
the
. Before the delay range of the VCDL genthat of the
is reached at a limit, the
is changed
erating the
clock. Thus, the selection code is 01. All clocks exto
settle near their original phase positions
cept the new
-phase space by the charge redistribution of loop filwith
still moves counterclockters. After clock switching, the
clock. Since this 10 state is near
wise to be switched to
the lock state, the DLL can acquire the lock by a minor delay
control. However, let us assume the delay time of the
must decrease due to the delay fluctuation of the reference clock
or the clock buffer. Similarly in the delay increase case, beis returned
fore a VCDL delay range is exhausted, the

Fig. 7.

Example of the main loop VCDL control procedure.

to
clock, 01 state. In result, the proposed DLL covers
the entire phase range and remains at the lock state in any direction switching by optimizing the control schemes of multiple VCDLs. Therefore, since this architecture makes it possible to emulate the infinite-length VCDL by using multiple finite-length VCDLs, the DLL overcomes the problem of conventional DLLs, described by the limited delay range and the initial
phase relationship constraint.
III. LOW JITTER SCHEME AND DCC
A. Low-Jitter Scheme
The jitter performance of the DLL is degraded by various
noise sources, typically in the form of supply and substrate noise
in high speed and highly integrated circuits. To reduce the jitter,
the loop bandwidth should be set as high as possible but must
have an upper limit for stability issues. Thus, low-jitter DLL
designs strongly depend on the delay characteristics of a delay
line with supply-noise injection. In order to design the delay
line with low supply-noise sensitivity, the replica biasing for
the delay control must be considered in noisy environment. The
replica biasing circuit, which consists of a half-replica of a differential delay cell and an operational amplifier (op-amp), sets
the low swing level of the delay cell to the reference voltage,
. In the conventional replica biasing, the
tracks the
supply variation with the same amount. Unfortunately, this is
not the optimal solution. The variation of the op-amp gain and

788

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

Fig. 8. (a) Circuit diagram of a replica biasing. (b) Operation of a reference


voltage generator under supply-noise injection.

the tail-current source distorts the delay characteristics of the


delay cell. The delay equation of this case is described by
(1)
where
delay time of a delay cell;
load capacitance;
swing voltage of the delay cell;
current of the tail-current source.
, since
is positive
For a positive supply variation of
negative,
greatly decreases.
and
In the design depicted in Fig. 8(a), an additional reference
voltage generator is attached to the replica biasing circuit. The
reference voltage generator is composed of one transistor and
, in the
two resistors and generates the reference voltage,
nominal supply condition. When there is a supply variation of
, the reference voltage generator produces a predetermined
, which is a reduced swing compared to
,
variation of
as shown in Fig. 8(b). The reduced swing compensates the
delay variation due to the aforementioned variations induced by
supply noise. Thus, supply-noise sensitivity can be minimized.
supply noise, the desired
is a function
For a given
across
transistor as follows:
of
(2)

Fig. 9. (a) Block diagram of the duty cycle corrector [3]. (b) Circuit diagram
of the proposed duty detection stage.

The sensitivity of
over
to process variations should
be analyzed to guarantee a reliable operation. For example, the
of the transensitivity to the threshold voltage variation
can be obtained by (3), shown at the bottom of the
sistor
means
of transistor
. The senpage. In (3),
with a
of 100 mV.
sitivity value is in the order of
Similar analyses with other process parameters also show that
is kept nearly constant under moderate
the predetermined
process variations. This replica biasing circuit is commonly applied to all VCDLs of the reference loop and the main loop to
achieve the low-jitter characteristics through the whole DLL.
B. Duty Cycle Corrector
The duty cycle of clock signals within the DLL deviates from
its ideal value of 50% due to various asymmetries in signal paths
and voltage offsets in an off-chip generated reference clock.
For applications in which the timing of both edges of the clock
is critical, a duty cycle corrector (DCC) is required to maximize timing margins. A DCC [3] in Fig. 9(a) is configured as
the error-voltage feedback with a corrector stage and a duty
detection stage. The duty detection stage outputs the differen), which is proportional to the
tial control voltage (
. This differduty cycle error of inputted clocks
ential control voltage then effectively introduces offset voltage
at the corrector stage to correct the
to clock inputs
duty cycle of output clocks.

(3)

JUNG et al.: DUAL-LOOP DELAY-LOCKED LOOP

789

Fig. 12.

Selection code waveforms with the refCLK input grounded.

Fig. 10. Simulated mismatch sensitivity characteristics of the DCC with the
proposed duty detection stage.

Fig. 11.

Prototype chip layout.

As the clock frequency is increased, tighter bound is placed


on the performance of the DCC. Even worse, process mismatches between transistors work as a serious error factor in the
DCC especially under deep-submicron technology. Although
process mismatches plague all devices, special care must be
paid to the duty detection stage since near-ideal performance
of this stage can remove the duty cycle distortion caused by
the mismatches of all other nonideal blocks. The proposed
duty detection block is based on two stacked source-coupled
pairs configuration, as shown in Fig. 9(b). The source-coupled pair is immune to device mismatches due to its current
steering capability, i.e., since for fairly large input signals, the
source-coupled pair conducts the current set by the tail-current
source through only one branch, various mismatch effects
in transistors can be hidden. The common-mode problem of
this approach is solved by the transistors in boxed area, comprising the self-biasing technique [7], which enables the output
common mode to be dynamically adjusted by input clocks. Two
transistors with source and drain tied are added to eliminate the
load imbalance caused by the self-biasing circuit. Fig. 10 shows
the simulated mismatch sensitivity characteristics of the DCC
with the proposed duty detection stage over typical process
mV m
mismatch parameters,
m [8]. Under 50% duty cycle
and
of the input clock, the duty cycle error is less than 2 ps, which
guarantees a robust operation against process mismatches.

(a)

(b)
Fig. 13. Jitter histograms at 400 MHz. (a) Quiet supply. (b) Added 2.5-MHz
300-mV square-wave supply noise.

IV. EXPERIMENTAL RESULTS


The test chip has been fabricated using a 0.25- m five-metal
CMOS process. The threshold voltages in this process are

790

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

TABLE I
PERFORMANCE CHARACTERISTICS OF THE PROTOTYPE DLL

achieves 54-ps peak-to-peak jitter and 0.32-ps/mV jitter


supply-noise sensitivity.
REFERENCES

0.57 V (nMOS) and 0.55 V (pMOS). The gate-oxide thickness is 5.8 nm. Fig. 11 shows the layout of the prototype chip.
The active area of the DLL occupies 0.13 mm .
Waveforms depicted in Fig. 12 shows two-bit selection code
with the reference clock input grounded, while running the input
clock at its nominal frequency of 400 MHz. In this configuration, the main loop phase detector always asserts DN signals.
Therefore, the selection code is continuously updated in accordance with sequences of 00, 01, 10, and 11. This means
the infinite times rotation of the output clock throughout the full
0 360 range.
Fig. 13(a) and (b) shows the jitter histograms of the DLL
clock output at 400 MHz. Fig. 13(a) shows 6.7 ps RMS and
54 ps peak-to-peak jitter characteristics with a quiet power
supply. With a 300-mV 2.5-MHz square-wave supply noise, the
peak-to-peak jitter increases to 150 ps, as shown in Fig. 13(b).
The ratio of the peak-to-peak jitter to the RMS jitter is well
maintained in spite of supply-noise injection. Supply-noise
sensitivity is measured to be 0.32 ps/mV.
Table I summarizes the DLL performance characteristics.
The DLL operates from 150- to 600- MHz frequency range
with a 2.5-V supply. Static phase error between the reference
clock and the output clock of the DLL is less than 20 ps.
Operating at 400 MHz, the DLL dissipates 60 mW.
V. CONCLUSION
We have described a dual-loop DLL architecture that allows
the unlimited delay range by using multiple VCDLs. The
reference loop generates four evenly spaced clocks without
a possible harmonic lock. Clock selection in the main loop
enables the DLL to cover the entire phase range and seamless
clock switching is achieved by optimizing the main loop
VCDL delay range control. Thus, this architecture can emulate
the infinite-length VCDL with multiple finite-length VCDLs.
To obtain low supply-noise sensitivity, the low-jitter scheme
generates a reduced swing voltage compared to supply noise
for the delay compensation of a delay line. Finally, a duty cycle
corrector presents a high immunity to process mismatches with
the help of two stacked source-coupled pairs configuration.
A prototype fabricated using 0.25- m CMOS technology

[1] M. Johnson and E. Hudson, A variable delay line PLL for CPU-coprocessor synchronization, IEEE J. Solid-State Circuits, vol. 23, pp.
12181223, Oct. 1988.
[2] T. Yoshimura, Y. Nakase, N. Watanabe, Y. Morooka, Y. Matsuda, M.
Kumanoya, and H. Hamano, A delay-locked loop and 90 phase shifter
for 800-Mb/s double data rate memories, in Symp. VLSI Circuits Dig.
Tech. Papers, June 1998, pp. 6667.
[3] T. H. Lee, K. S. Donnelly, J. T. C. Ho, J. Zerbe, M. G. Johnson, and
T. Ishikawa, A 2.5-V CMOS delay-locked loop for an 18-Mb 500Mbyte/s DRAM, IEEE J. Solid-State Circuits, vol. 29, pp. 14911496,
Dec. 1994.
[4] S. Sidiropoulos and M. A. Horowitz, A semidigital dual delay-locked
loop, IEEE J. Solid-State Circuits, vol. 32, pp. 16831692, Nov. 1997.
[5] K. Minami et al., A 1-GHz portable digital delay-locked loop with infinite phase capture ranges, in ISSCC Dig. Tech. Papers, Feb. 2000, pp.
350351.
[6] Y.-J. Jung, S.-W. Lee, D. Shim, W. Kim, C.-H. Kim, and S.-I. Cho, A
low-jitter dual-loop DLL using multiple VCDLs with a duty cycle corrector, in Symp. VLSI Circuits Dig. Tech. Papers, June 2000, pp. 5051.
[7] M. Bazes, Two novel fully complementary self-biased CMOS differential amplifiers, IEEE J. Solid-State Circuits, vol. 26, pp. 165168, Feb.
1991.
[8] M. J. M. Pelgrom, H. P. Tuinhout, and M. Vertregt, Transistor matching
in analog CMOS applications, in IEDM Dig. Tech. Papers, Dec. 1998,
pp. 915918.

Yeon-Jae Jung was born in Korea in 1974. He received the B.S. and M.S. degrees from the School
of Electrical Engineering, Seoul National University,
Seoul, Korea, in 1997 and 1999, respectively, where
he is currently working toward the Ph.D. degree.
He has worked on architectures and CMOS circuits
for high-speed I/O interfaces. His current research interests include high-speed CMOS circuits and communication ICs.

Seung-Wook Lee was born in Seoul, Korea, in 1971.


He received the B.S. and M.S. degrees in electronics
engineering from Seoul National University, Seoul,
Korea, in 1995 and 1997, respectively, where he is
currently working toward the Ph.D. degree in the
School of Electrical Engineering.
His research interests include CMOS RF circuit
design and high-speed communication interfaces.
Mr. Lee is the winner of the Bronze Prize of the
IC design contest held by the Federation of Korean
Industries in 1995.

Daeyun Shim was born in Seoul, Korea, in 1962. He


received the B.S., M.S., and Ph.D. degrees in electronics engineering from Seoul National University,
Seoul, Korea, in 1985, 1987, and 2000, respectively.
His Ph.D. dissertation was related to the design of
high-speed locking clock generators.
Since 1987, he has been working on digital video
signal processing and ASIC design at Samsung Electronics Corporation. His research interests are video
signal processing and compression, high-speed digital circuit design, and high-speed locking systems.
He is currently working on DVD-PRML system design.

JUNG et al.: DUAL-LOOP DELAY-LOCKED LOOP

Wonchan Kim was born in Seoul, Korea, in


1945. He received the B.S. degree in electronics
engineering from Seoul National University, Seoul,
Korea, in 1972. He received the Dip.-Ing. and
Dr.-Ing. degrees in electrical engineering from the
Technische Hochschule Aachen, Aachen, Germany,
in 1976 and 1981, respectively.
In 1972, he was with Fairchild Semiconductor
Korea as a Process Engineer. From 1976 to 1982, he
was with the Institut fr Theoretische Electrotecnik
RWTH, Aachen. Since 1982, he has been with
the School of Electrical Engineering, Seoul National University, where
he is currently a Professor. His research interests include development of
semiconductor devices and design of analog/digital circuits.

Changhyun Kim (M85S90M95) received the


B.S. and M.S. degrees in electronics engineering
from Seoul National University, Seoul, Korea, in
1982 and 1984, respectively and the Ph.D. degree
in electrical engineering from the University of
Michigan, Ann Arbor, in 1994.
In 1984 he joined Samsung Electronics Company,
Ltd. (SEC), Kyungki-do, Korea, where he was involved with the circuit design for high speed dynamic
RAM, ranging from 64 kb to 16 Mb densities. From
1989 until 1994, he was a Research Assistant in the
Center for Integrated Sensors and Circuits, University of Michigan. His present
research interest is in the area of circuit design for low-voltage high-performance gigascale DRAMs and future DRAM architecture. He has served as the
Committee Member of the Symposium on VLSI circuits since 1995.
Dr. Kim received the Grand Prize of the Samsung group for the successful
development of 1-Mb and 1-Gb DRAMs in 1986 and 1996, respectively. His
work on the characterization of submicron devices and reliability issues in highdensity DRAM, including reducing soft-error rate and reducing sensitivity to
electrostatic discharge problems, earned a technical achievement award from
the Samsung R&D Center in 1988. At the Center for Integrated Sensors and
Circuits in 1991 and 1993, he won first prizes for design excellence in student
VLSI design contests sponsored by several U.S. companies.

791

Soo-In Cho was born in Seoul, Korea, in 1957. He received the B.S. degree in
electronics engineering from Seoul National University, Seoul, Korea, in 1979.
He joined the Semiconductor Research and Development Center, Samsung
Electronics Company, Ltd., Kyungki-Do, Korea, in 1979, where he was engaged
in the design of CMOS logic LSI. Since 1983, he has been working on MOS
dynamic memory design.

582

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997

A Low Jitter 0.3165 MHz CMOS PLL


Frequency Synthesizer for 3 V/5 V Operation
Howard C. Yang, Lance K. Lee, and Ramon S. Co

Abstract This paper describes a phase-locked loop (PLL)based frequency synthesizer. The voltage-controlled oscillator
(VCO) utilizing a ring of single-ended current-steering amplifiers
(CSA) provides low noise, wide operating frequencies, and operation over a wide range of power supply voltage. A programmable
charge pump circuit automatically configures the loop gain and
optimizes it over the whole frequency range. The measured PLL
frequency ranges are 0.3165 MHz and 0.3100 MHz at 5 V and 3
V supplies, respectively (the VCO frequency is twice PLL output).
The peak-to-peak jitter is 81 ps (13 ps rms) at 100 MHz. The chip
is fabricated with a standard 0.8-m n-well CMOS process.
Index TermsCMOS phase-locked loop, current-steering amplifier, current-steering logic, frequency synthesizer, low noise,
low voltage VCO.

I. INTRODUCTION

ITH the ever-increasing performance and decreasing


price of microprocessors and PC/workstation systems,
much more stringent requirements have been placed on the design of system clock synthesizers. Todays high-performance
clock synthesizers are often required: 1) to operate over a wide
frequency range (high frequency for increased performance
and low frequency for power saving) using a crystal oscillator
input with constant frequency; 2) to have small phase jitter
and frequency variation; 3) to operate from 3 V/5 V
supplies for both portable and desktop systems; 4) to have
smooth transition between high and low frequencies; and
5) to have integrated loop filter on the chip. To satisfy all
these requirements simultaneously, a stable loop over the
entire operating frequency range is needed; and a low jitter,
wide frequency range, and variable supply voltage-controlled
oscillator (VCO) circuit is essential.
Several design techniques that improve the performance of a
phase-locked loop (PLL) are presented in this paper. A current
D/A converter is implemented to control the PLL bandwidth
so that the loop performance is optimized over the operating
frequency range of 0.3165 MHz. The VCO is formed by
a ring of single-ended current-steering amplifier (CSA) cells,
which were first introduced as a current-steering logic (CSL)
family for low noise and low power supply applications [1].
This VCO circuit can operate over a wide frequency range with
low phase jitter at variable power supply. To achieve smooth
frequency transitions, a pulse width limiting circuit is used to
control the pulse width of the phase/frequency detector output.
Manuscript received June 25, 1996; revised October 4, 1996.
H. C. Yang and L. K. Lee are with the Shanghai Belling Microelectronics
Manufacturing Co., Ltd., Shanghai 200233, P.R. China.
R. S. Co is with the Kingston Technology Corp., Fountain Valley, CA 92708
USA.
Publisher Item Identifier S 0018-9200(97)02477-3.

In Section II of this paper, the PLL architecture is described


with an analysis of the loop stability and loop optimization.
The circuit design techniques for the PLL are considered in
Section III. The measured results are discussed in Section IV.
Finally, conclusions are made in Section V regarding this
work.
II. PLL ARCHITECTURE
It is often difficult to design a PLL that can operate over a
wide frequency range due to the practical limit of the capacitor
size that can be integrated for the loop filter. One method to
widen the frequency range is to vary the PLL bandwidth as
a function of the desired output frequency. This principle is
applied in our design by utilizing a current D/A converter
which controls the charge pump current. With this technique,
we can also optimize the loop performance, including damping
factor and loop gain, over the entire operating frequency range.
A block diagram of a conventional frequency synthesizer is
enclosed in the dashed-line box in Fig. 1. The output frequency
is synthesized as
(1)
Using linear approximations, the loop equations of the PLL
[2] for stability analysis are
(2)
(3)
is the loop gain, is the damping factor,
is the
where
VCO gain,
is the charge pump current, is the loop filter
is the integration capacitor of the loop filter. In
resistor, and
order to have an adequate margin of stability, (2) and (3) must
satisfy the constraints
and
, where
is the operating frequency of the phase/frequency detector. The
former constraint is required to prevent aliasing effects as a
result of the Nyquist Criterion, while the latter constraint is
required to have a satisfactory transient response. In practice,
a loop gain which is ten times less than the phase/frequency
detector frequency [2] is more than adequate.
For a clock synthesizer with a frequency range of
0.3165 MHz, the feedback divider
in the loop equations
can vary by more than 20 times. The tolerances of the
loop parameters (
, and
) would also have to
be accounted for. These conditions make the constraints for
stability margin extremely difficult to satisfy if
is kept
within a reasonable size (a few hundred picofarads) for onchip integration. As shown in Figs. 1 and 2, the proposed

00189200/97$10.00 1997 IEEE

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997

583

Fig. 1. Block diagram of frequency synthesizer.

Fig. 2. Programmable charge pump with D/A converter control.

PLL architecture uses a current D/A converter to control the


charge pump current, . The product
is optimized
for given values of
and
such that the stability margin
constraints are satisfied by (2) and (3). Using a decoding
table, the optimization is performed by the logic block in
Fig. 1 which sets the charge pump current upon examination
of
and . Typically, a loop bandwidth which is ten times
less than
for the entire frequency range can be easily
achieved using this architecture.
III. CIRCUIT DESIGN TECHNIQUES
A. VCO Circuit
Two types of VCO based on the ring oscillator topology
are commonly used in CMOS PLL design: the current-starved
inverter based VCO [3][5] and the differential-pair based
VCO [6][8]. In spite of a wide frequency range, the first
type of VCO is sensitive to power supply noise. Although
an on-chip voltage regulator can be used to reduce the effect
of power supply noise [5], it is not effective for operation at
high frequency since a voltage regulator inherently has poor
ac rejection. Another drawback of using an on-chip regulator
is that it reduces the useful power supply range, making
it undesirable for low-supply applications. A local feedback

loop on the VCO can also be used for reducing the PLL
jitter; however, it is likely to cause glitches or overshoots
whenever the frequency transition mode is activated due to
complicated feedback loops [8]. Hence, it is not a suitable
approach for microprocessor applications, wherein a smooth
transition between frequencies is usually required. While the
second type of VCO rejects power supply noise well, the
frequency range of operation may not be sufficient for some
applications. To widen the frequency range of differential-pairbased VCOs, complex MOS resistors [7] can be used at the
cost of higher
supply and more complex design.
The VCO design in this work utilizes a simple CSA circuit
[1]. Fig. 3(a) shows a CSA cell which consists of a current
source, , and a pair of NMOS devices.
is the input device
and
is the load. When
is high,
turns on, sinking
the bias current , while
shuts off. Under this condition,
the on resistance of
defines the output low voltage,
.
When
is low,
turns off and
is steered to
.
Under this condition, the resistance of the diode-connected
defines the output high voltage,
. By varying the bias
current, , a current-controlled CSA-based ring oscillator is
formed with an output voltage swing of
(4)
(typically between 1 and 2 V)
Equation (4) indicates that
varies with
. Thus, the voltage swing of the CSA cell
increases correspondingly with frequency. This is a desirable
feature because the signal level improves at high frequency
when the power supply switching noise becomes worse. Since
the voltage swing is limited by the diode-connected
,
the current source always operates in the saturation region;
consequently, very small switching noise is generated. For an
n-well process, the PMOS current source can be guarded by its
own well and isolated from the noisy p-substrate. The current
source also buffers the output from
, thereby reducing
the noise injected from
to the output. Any ground noise
(coupled from other circuitry within the chip) is rejected by
the CSA as a common mode noise because both its output
and input are referred to the same ground. By referring the
charge pump, loop filter, V/I converter, VCO, and other analog

584

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997

(a)

Fig. 5. Histogram of PLL period jitter at 100 MHz.

(b)
Fig. 3. (a) Current steering amplifier (CSA) cell. (b) VCO using a three-stage
CSA ring oscillator.

supply for proper operation. At


V, the frequency
of the CSA ring oscillator is practically independent of power
supply. The maximum useful frequency of the VCO is limited
by the saturation voltage ( 0.5 V) of the cascoded PMOS
current source in the charge-pump circuit.
The measured VCO performance is illustrated in Fig. 4. The
frequency range of the VCO was observed from 174 kHz to
378.8 MHz at 5 V supply (the VCO frequency is twice the
PLL output frequency). At
V, the VCO frequency
achieved 200 MHz indicating that the PLL can operate at
100 MHz for 3 V power supply.
B. Phase/Frequency Detector

Fig. 4. Measured VCO performance.

circuits in the PLL to the same ground, i.e., p-substrate, the


ground noise can be substantially rejected [9].
Fig. 3(b) shows the VCO circuit using a three-stage CSA
ring oscillator. The current sources in the CSA ring oscillator
are cascoded with high-swing bias. In the V/I converter,
stage provides a first-order linear
a single degenerated
relationship between the oscillation frequency and the control
is forced in the linear region to provide the highvoltage.
swing cascoded bias for the VCO. This VCO is suitable for
low supply voltage operation since it only needs about 2.5 V

A modified dead-zone free, phase/frequency detector (PFD)


is used in this design [2]. In the frequency transition mode,
a pulse width limiting circuit in the PFD limits the pulse
width of the UP and DN signals. Such limited UP/DN pulses
provide a finite amount of charge to the integration capacitor
of the loop filter in order to slow down the frequency ramp
of the VCO. The UP/DN pulses, however, cannot be made
arbitrarily narrow. The noise level in the chip may dominate
over the minute UP/DN correction pulses causing the PLL
not to properly acquire in frequency. The maximum frequency
transition rate of less than 0.1%, i.e., the difference in period
between two consecutive clock cycles, is achieved by using
this technique.
C. Charge Pump and Loop Filter
The charge pump shown in Fig. 2 is designed using cascoded current sources with CMOS switches. The amount

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997

585

Fig. 6. Measured PLL output at 155 MHz after 5 ms delay, i.e., after 775 000 clock cycles. The rms jitter (standard deviation) is 28 ps as shown.

Fig. 7. Measured frequency transition from 33.3 to 100 MHz.

of the current, typically 10150 A, is controlled by the


current digital-to-analog converter (DAC). A bandgap current
reference circuit is used to compensate the
variation over
temperature. The opamp in the charge pump circuit reduces the
transients caused by the charge transfer [4] as
is switched.
The capacitors in the loop filter are formed by NMOS devices
with the sources and drains connected to ground and the gate
connected to the filter output node. The capacitor C2 is about
400 pF.
IV. MEASURED RESULTS
The measured results show that the PLL operates from 0.3
to 165 MHz and 0.3 to 100 MHz at 5 V and 3 V power
supplies, respectively. Since the output frequency is the VCO

frequency divided by two, a PLL output with 50% duty cycle


is guaranteed. Fig. 5 shows the histogram of PLL period
jitter (also referred to as short-term or clock-to-clock jitter)
at 100 MHz after 42 321 hits using a standard 14.318 MHz
crystal as the input reference frequency. The peak-to-peak jitter
is 81 ps with an rms jitter of 13 ps. The long-term jitter of the
PLL was also measured. An rms jitter of 28 ps is observed
at 155 MHz with 5 ms delay, i.e., measured after a delay of
775 000 clock cycles from the triggered clock cycle, as shown
in Fig. 6. In order to appreciate the noise rejection capability
of this PLL, the chip was used to generate the clock for an HP
laser printer. During the printing process, which is very noisy
electrically and thermally, both the period jitter and the longterm jitter of the clock must be low for good printing quality. A

586

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 4, APRIL 1997

SUMMARY

OF

TABLE I
MEASURED PLL PERFORMANCE

Frequency Range
Period (Short-Term) Jitter at 100 MHz
Long-Term Jitter at 155 MHz
Supply Voltage Range
Output Duty Cycle
VCO Linearity
VCO Current Consumption
Crosstalk between 2 PLLs

0.3165 MHz
13 ps rms, 81 ps peak-to-peak
28 ps rms after 5 ms delay
2.5 V to 7 V
50%, VCO frequency is twice
PLL output
2% from 10200 MHz
500 A at 200 MHz
50 dB down

comparison of test results between the use of this PLL and the
original crystal clock showed no perceptible difference even
under high magnification. Fig. 7 shows a smooth frequency
transition of the PLL from 33 to 100 MHz. No frequency
glitches and overshoots were observed during the transition
time. Since there are two independent PLLs on the same
chip, crosstalk between the two PLLs was also measured.
The signal coupling between the two PLLs is at least 50 dB
down. The measured performance of the PLL is summarized
in Table I.
V. CONCLUSION
In this paper, we demonstrated the design of a fully integrated CMOS PLL circuit that achieves wide operating
frequency range and low jitters (both short-term and longterm) over a wide range of power supply voltage. The key

element in the design that provides all these features is the


CSA-based VCO circuit. A programmable current DAC is also
used to optimize the loop gain of the PLL. Smooth frequency
transition is realized by using a modified PFD with a pulse
width limiting circuit. The chip is implemented in a standard
0.8- m CMOS process.
REFERENCES
[1] D. J. Allstot, G. Liang, and H. C. Yang, Current-mode logic techniques
for CMOS mixed-mode ASICs, in Proc. IEEE Custom Integrated
Circuits Conf., 1991, pp. 25.2.125.2.4.
[2] F. M. Gardner, Charge-pump phase-lock loops, IEEE Trans. Commun.,
vol. 28, pp. 18491858, Nov. 1980.
[3] R. Shariatdoust, K. Nagaraj, M. Saniski, and J. Plany, A low jitter 5
MHz to 180 MHz clock synthesizer for video graphics, in Proc. IEEE
Custom Integrated Circuits Conf., 1992, pp. 24.2.125.2.5.
[4] M. G. Johnson and E. L. Hudson, A variable delay line PLL for CPUcoprocessor synchronization, IEEE J. Solid-State Circuits, vol. 23, pp.
12181223, Oct. 1988.
[5] K. M. Ware, H.-S. Lee, and C. G. Sodini, A 200-MHz CMOS phaselocked loop with dual phase detectors, IEEE J. Solid-State Circuits,
vol. 24, pp. 15601568, Dec. 1989.
[6] B. Kim, D. N. Helman, and P. R. Gray, A 30-MHz hybrid analog/digital
clock recovery circuit in 2-m CMOS, IEEE J. Solid-State Circuits,
vol. 25, pp. 13851394, Oct. 1990.
[7] I. A. Young, J. K. Greason, and K. L. Wong, A PLL clock generator
with 5 to 110 MHz of lock range for microprocessors, IEEE J. SolidState Circuits, vol. 27, pp. 15991606, Nov. 1992.
[8] D. Mijuskovic et al., Cell-based fully integrated CMOS frequency
synthesizers, IEEE J. Solid-State Circuits, vol. 29, pp. 271279, Mar.
1994.
[9] D. J. Allstot and W. C. Black Jr., A substrate-referenced dataconversion architecture, IEEE Trans. Circuits Syst., vol. 38, pp.
12121217, Oct. 1991.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 4, APRIL 1999

513

A Low-Jitter PLL Clock Generator for


Microprocessors with Lock Range of 340612 MHz
David W. Boerstler

Abstract A fully integrated, phase-locked loop (PLL) clock


generator/phase aligner for the POWER3 microprocessor has
been designed using a 2.5-V, 0.40-m digital CMOS6S process.
The PLL design supports multiple integer and noninteger frequency multiplication factors for both the processor clock and
an L2 cache clock. The fully differential delay-interpolating
voltage-controlled oscillator (VCO) is tunable over a frequency
range determined by programmable frequency limit settings,
enhancing yield and application flexibility. PLL lock range for
the maximum VCO frequency range settings is 340612 MHz.
The charge-pump current is programmable for additional control
of the PLL loop dynamics. A differential on-chip loop filter with
common-mode correction improves noise rejection. Cyclecycle
jitter measurements with the microprocessor actively executing
instructions were 10.0 ps rms, 80 ps peak to peak (P-P) measured
from the clock tree. Cycle-cycle jitter measured for the processor
in a reset state with the clock tree active was 8.4 ps rms, 62 ps P-P.
PLL area is 1040 2 640 m2 . Power dissipation is <100 mW.
Index Terms Clock generator, clocking, microprocessors,
phase-locked loop (PLL).

I. BACKGROUND

HE use of phase-locked loops (PLL) for generating


phase-synchronous, frequency-multiplied clocks in microprocessors has been prevalent in industry [1][4]. In recent
years, the trend toward ever increasing clock frequency has
made PLLs even more attractive due to the difficulties in
distributing high-frequency clocks through several levels of
packaging [5], [6], but the jitter penalty for using a PLL has
not kept pace with the rate of reduction in processor cycle
time. Until this year,1 the best reported microprocessor PLL
jitter penalties ranged from 82 to 83 ps peak to peak (P-P)
for inactive processors [1], [5], and a PLL on a small (600-K
transistor) graphics display chip has been reported with 80 ps
P-P jitter for a quiet supply at 320 MHz [7]. Many examples
of higher jitter PLL designs exist in the literature. Powersupply noise created from the digital switching activity on a
microprocessor is recognized as a major source of PLL jitter,
and the primary focus of designers has been directed toward
reducing this sensitivity.

Manuscript received December 10, 1997; revised August 10, 1998.


The author is with the IBM Research Division, IBM Austin Research
Laboratory, Austin, TX 78758 USA.
Publisher Item Identifier S 0018-9200(99)02429-4.
1 Recent announcements of a 1-GHz microprocessor PLL [12], [13] and
a PLL with an on-chip regulator [14] reported jitter of
9 ps (quiet
conditions)/
36 ps (processor active) and
10 ps (sinusoidal external
noise)/
20 ps (square wave external noise), respectively.

<6

<6

<6

<6

II. INTRODUCTION
This paper describes a fully integrated PLL-based clock
generator/phase aligner used for the POWER3 microprocessor.
The microprocessor is fabricated in IBM CMOS6S technology
and contains approximately 12 million transistors. With the microprocessor actively executing instructions, this PLL achieved
cyclecycle jitter of 10.0 ps rms, 80 ps P-P in its application
environment and 8.4 ps rms, 62 ps P-P with the microprocessor
in a reset state with a portion of the clock tree active.
A simplified block diagram of the PLL clock generator is
shown in Fig. 1. The external reference or BUSCLK enters
before
a receiver and is divided by two by divider stage
The inentering the phase/frequency detector (PFD) as
from divider
is compared to
ternal feedback signal
by the PFD, which generates an error signal
, which
is used by the charge-pump and filter network to control
the voltage-controlled oscillator (VCO). The output frequency
and is used as the main
of the VCO is divided by
processor clock (PCLK) after passing through four levels of
clock buffering in an H-tree clock distribution network. The
processor clock is passed through a delay-matching receiver
completing the feedback path.
before entering divider
Since at equilibrium the inputs of the PFD will be matched
in frequency (and phase), the processor-to-bus frequency ratio
which is equal to the ratio
is equal to the ratio
allowing integer or noninteger frequency synthesis by
changing divider ratios. Since the technique does not require
clock choppers [2], the duty cycle and phase alignment are
relatively insensitive to environment and process tolerances.
The output of the VCO is also connected to frequency divider
which is used for the L2 cache clock (L2CLK). Since
the processor-to-L2
clock-frequency ratio is also adjustable to integer or noninteger
ratios. Other phase-synchronous clocks may be designed in
similar fashion, and quadrature or interstitial clocks may be
created by a polarity change at the divider input. Using the
is equal to
structure of Fig. 1, the VCO frequency
times the processor clock frequency
For cases when
is even, the processor clock edges are generated from only
one VCO clock edge; hence a nearly ideal 50% processor
clock duty cycle may be achieved through its independence
from the VCO duty cycle.
III. PROCESS TECHNOLOGY
The microprocessor and integral clock generator PLL are
fabricated in a five-layer CMOS process with 0.4- m feature

00189200/99$10.00 1999 IEEE

514

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 4, APRIL 1999

Fig. 1. PLL block diagram.

TABLE I
CMOS6S PROCESS SUMMARY

sizes. Table I lists some of the relevant attributes of this


process technology.
The PLL clock generator is shown in the microprocessor
die photograph of Fig. 2(a). The dimensions of the entire PLL
640 m . It is shown with the major features
are 1040
identified in Fig. 2(b).
IV. PLL CLOCK GENERATOR COMPONENTS
A. Phase/Frequency Detector
The digital PFD generates a signal that conveys relative
phase and frequency error information about its inputs to the
charge pump and filter. The PFD design is based on a threestate machine structure [8], as depicted in Fig. 3(a). From the
input will assert
initial reset state, a rising edge on the
appears, which
the UP output until the rising edge of
deasserts UP and forces a reset of both flip-flops [Fig. 3(b)].

A rising edge first appearing on


similarly asserts DOWN
followed by a subsequent
until a rising edge arrives at
reset. Complementary outputs are generated by the PFD for
use in the differential charge-pump stage that follows the
PFD. The pulse width of the output varies proportionally with
the phase error between the two inputs, except for the deadzone region as the difference approaches zero. This dead zone
exists when the phase error becomes small relative to the
combined response time of the PFD, charge pump, and filter
circuits. Circuit simulation results show a nominal dead zone
of 25 ps. Concerns of current mismatch in the charge-pump
and filter networks are reduced at the expense of increased
dead zone by preventing simultaneous assertions of UP and
DOWN
B. Power-Supply Isolation
A separate analog power connection (AVDD) is used for
the analog circuits [current reference, charge pump, commonmode rejection (CMR), filter initialization, and VCO circuits]
to increase the isolation of the sensitive circuits from the logicinduced switching noise present on the main power supply.
To allow the detection of potential defects using conventional
testing, the AVDD pin is held low, disabling the analog devices
that normally draw dc current. Both on-chip and on-module
decoupling is used on AVDD.
C. Reference Circuit
A thermal voltage-referenced current source is used to
provide temperature- and supply-independent biasing for the
analog circuits in the PLL. The circuit contains an array of P
diffusions in the N-well connected to form two forward-biased
diodes with areas that differ by a factor of ten. When connected

BOERSTLER: LOW-JITTER PLL CLOCK GENERATOR

515

(a)

(b)
(a)

Fig. 3. (a) PFD state diagram. (b) Phase detector implementation.

(b)

Fig. 4. Reference circuit.

Fig. 2. (a) Die photograph of POWER3 microprocessor. (b) PLL layout.

D. Process and Temperature Compensation


as shown in Fig. 4, the current
through each leg has two
A
stable operating points,
The startup circuit prevents the zero current state
or
from occurring by injecting current into one leg during initial
power-on. The resistor is implemented using the precision
resistor available in the process, which has a temperature
coefficient (TC) of 2000 ppm/ C. The positive TCs of the
thermal voltage term and the resistor tend to cancel, providing
a reference current TC of 785 ppm/ C at 85 C. The reference
is used for subsequent generation of reference
current
through mirroring.
currents and the PMOS bias voltage
Sensitivity to power-supply change is 1.7%/V for 20%
change on VDD.

due to process are monitored using the


Variations in
circuit shown in Fig. 5(a). All of the current sources are generA constant
ated directly from the reference circuit current
current is passed through a branch containing short-channel
, which
NMOS devices, creating a monitoring voltage
is sensitive to NMOS device length variations. This voltage
generated by a
is compared to a reference voltage
constant current through a long-channel NMOS device that is
relatively insensitive to length variations. The devices and bias
and
currents used for length sensing are sized so that
are equivalent for a nominal
process. To minimize
temperature sensitivity, the bias currents correspond to the
zero-temperature coefficient (0-TC) region of the devices.

516

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 4, APRIL 1999

Fig. 6. Charge-pump circuit.


(a)

(b)
Fig. 5. (a) Process compensation circuit. (b) Temperature compensation
circuit.

The two voltages are compared using a differential amplifier,


which generates a current proportional to the NMOS
offset from nominal. This current is mirrored to produce a
that is injected into a precision resistor
current
used for combining various process monitors to generate a
compensating reference voltage. The compensating reference
voltage is connected to the active load elements of the VCO,
which control the VCOs voltage swing. A current
generated from a similar PMOS circuit also is injected into
the resistor.
Weighted combinations of standard bias circuits with differing voltage and temperature coefficients have been used
previously to compensate reference circuits for VCOs [9].
In this case, however, temperature was monitored directly
by comparing the voltage of two series-connected devices
below their 0-TC operating point to
biased by current
the voltage of two parallel devices biased by current
significantly above their effective 0-TC point [Fig. 5(b)]. The
devices and bias currents are sized so that both branches of
for nominal
the differential amplifier are balanced at
temperature conditions. The inset shows the IV characteristics as a function of temperature for the series (subscript
2) and parallel (subscript 1) connected devices; the 0-TC
points correspond to the crossing point where the current is

invariant with temperature. The current in one leg of the


differential amplifier varies proportionally with temperature
and is mirrored and added to the summing junction of the
A constant bias current is also added to the
resistor
summing junction to establish the correct weighting of the
various compensating currents and to correct for the TC of
the summing junction resistor.
Using a statistical process model, the process compensation
was designed to favor the stabilization of the best case side
of the distribution over the worst case side in anticipation
of future process trends. Given the limited range over which
a circuit may be practically compensated, the performance for
the best case devices was not sacrificed at the expense of
extensive compensation of the poorest performing devices. For
the unsorted population, this approach allowed a reduction in
the sensitivity of the VCO to process variability by a factor of
3.6 (55.415.2%) over the uncompensated VCO; temperature
sensitivity was reduced by a factor of 4.7 (38.68.2%).
E. Charge Pump
The reference circuit is used to generate the currents
and
for use within the charge pump. The peak chargepump current may be adjusted in 30- A increments from 30
to 240 A by scaling the mirror currents as shown in Fig. 6.
and
generated by the PFD are used
The error signals
to switch the peak current selected. Adjusting the charge pump
allows for optimization of the loop characteristics for different
divider and VCO settings. Differential outputs P and P are
included for high CMR in the subsequent analog circuits.
F. Loop Filter
The differential loop filter and initialization circuits are
shown in Fig. 7. Currents to and from the charge-pump circuit
enter the filter at nodes P, P. The input to the filter
contains NMOS transmission-gate clamping devices to limit
where
is
the maximum filter voltage to
the NMOS threshold voltage for a large source-bulk voltage.
For the CMOS6S process, the clamps prevent the filter voltage
from exceeding approximately 1.8 V, eliminating concern for
the VCO input stages shutting off. The filter capacitors are
accumulation-mode gate-oxide devices, and are interleaved
to improve the matching. Both loop-filter capacitors together
280 m and are
occupy an area of approximately 865
approximately 450 pF each. Precision resistors (1.2 K each)
are used to produce a zero in the filter transfer function.

BOERSTLER: LOW-JITTER PLL CLOCK GENERATOR

517

Fig. 9. Voltage-controlled oscillator.

Fig. 7. Loop filter and filter initialization.

(a)

Fig. 8. Common-mode control.

The filter output is connected to the VCO control input at


An initialization circuit activated during the
nodes
initial system power-on-reset is used to precharge the filter
capacitors to the nominal common-mode voltages at nodes
(b)

G. Common-Mode Control
It is possible for common-mode voltages to develop in
the filter from leakage, drift, or device mismatch. Since the
common-mode voltage can introduce frequency offsets in the
VCO or even inhibit operation for extreme cases, the circuit
shown in Fig. 8 was used in conjunction with the filter clamps
described earlier. The common-mode voltage of the filter
and
is sensed by generating currents proportional to
and summing them across a load device to produce
A differential amplifier compares
to a reference
, which is proportional to
voltage and generates a current
is mirrored by two
the common-mode voltage. The current
identical current sources, which bleed current from both filter
capacitors simultaneously without affecting the differential
voltage between them. The maximum drain currents for this
structure, which corresponds to the case when both clamps
have activated, are approximately 16 A. For typical cases
where the common-mode voltage is below 600 mV, the bleed
currents are 1 A. Stability of the network is assured by
heavy dominant-pole compensation.

Fig. 10. (a) Delay element. (b) Mixer circuit.

H. Voltage-Controlled Oscillator
The VCO design is based upon a delay-interpolating ring
oscillator structure [9][11], as shown in Fig. 9. In contrast
to the current-starved and current-modulated VCOs, which
are very commonly used for microprocessor clock generators,
delay-interpolating VCOs have relatively low-to-moderate
VCO gains and are well suited to fully differential control
and signal path circuit implementations. The lower VCO
gain of the delay-interpolating VCOs produces significantly
less jitter due to coupled noise than higher gain structures.
The limited operating frequency range for delay-interpolating
VCOs, which must be less than 2 : 1 to ensure monotonicity,
may be effectively augmented by selecting suitable divider
ratios or by adding programmability to the VCO signal paths.
The frequency limits of the VCO are determined by the
longest and shortest path delays through the structure. Fig. 9
composed
shows an example high-frequency limit of period
of three delay units and one mixer unit, and a low-frequency

518

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 4, APRIL 1999

(a)

(b)
Fig. 11.

Cyclecycle processor clock jitter: (a) quiet processor and (b) active processor.

limit of period
composed of six delay units and one mixer
unit. These frequency limits also affect the VCO gain (for
a given mixer design) as well as the center frequency. The
frequency limits may be independently controlled using the
multiplexers shown in Fig. 9, allowing flexible control of the
VCO operating range and greater than ten-to-one adjustment
range for VCO gain.
The delay elements and mixer designs are based upon
PMOS source-coupled pair differential amplifiers with NMOS
load networks [Fig. 10(a) and (b)] which allow voltagecontrolled swing adjustment through effective load-line
The high impedance
translation by adjusting the voltage
provided by the current source improves the supply noise
rejection for the source-coupled pair, and the N-well improves
the isolation to the p bulk substrate noise. The variation of
the threshold voltage due to bulk effect is eliminated using
bulk-to-source biasing throughout the structure. Sensitivity of
the VCO to low repetition rate, 100-mV steps on VDD and
AVDD is 0.418 ps/mV. Center-frequency common-mode voltage sensitivity is 3.5% over the full input range dictated by

the common-mode control circuit. Nominal VCO gain for the


settings that produce the maximum VCO range is 185 MHz/V.
The worst case VCO power dissipation is 30 mW.
I. Dividers and Receivers
Dividers
and
(Fig. 1) may be individually
programmed and support division by 2, 3, 4, 5, 6, 8, or 10.
The dividers are placed in pairs within the layout to improve
and
and between
and
device matching between
The receivers shown in Fig. 1 are also placed together
and are located near the I/O pad for BUSCLK.
V. PLL MEASUREMENTS
The damping factor, loop gain, and natural frequency of
the PLL may be adjusted over a wide range to match the
application by changing the charge-pump and VCO gain as described above. System testing was conducted with 90-A peak
charge-pump current using the maximum frequency and range
on the VCO with a variety of divider settings and BUSCLK

BOERSTLER: LOW-JITTER PLL CLOCK GENERATOR

519

frequencies. The processor clock was accessed from the clock


tree through a series of inverters. A time-interval measurement
(TIM) system was used to measure cyclecycle period jitter
statistics for a number of packaged die representing various
process skews. The processor was operated using an array
initialization program loop with the fixed-point and floatingpoint processors active for the active processor tests, and
was also operated in a quiet mode reset state. All tests
were performed at room temperature with ambient forcedair cooling. Conventional first-cycle oscilloscope-based jitter
measurements were performed periodically and provided PP jitter results that were consistent with those measured on
the TIM system. The external clock was provided by a highfrequency pulse generator, with 7.3 ps rms, 36 ps P-P jitter.
Fig. 11(a) shows a histogram of cyclecycle period measurements taken with the processor in an inactive reset state
but with the clock tree active. The frequencies of the reference
clock, processor clock, and VCO are 85, 170, and 340 MHz,
respectively, which corresponds to a 3-dB loop bandwidth of
2 MHz. The distribution of samples in the histogram follows
a Gaussian distribution with period jitter of 8.4 ps rms, 62 ps
P-P. The minimum period measured for this sample size
was 26.2 ps less than the mean (3.1 sigma away).
Assuming that cycle-time failures only occur on the minimum
period side, the worst case clock jitter penalty for this system
(i.e., a quiet processor) is 26.2 ps at 3.1 sigma confidence
(or 25.2 ps penalty at 3.0 sigma). Since a peak-to-peak jitter
approximately equal to the PFD dead zone can exist for the
PLL, the 25 ps simulated value for the dead zone may be a
significant component of the measured jitter.
Fig. 11(b) shows a clock-jitter histogram for the processor
executing the array initialization routine for a large population
A Gaussian curve has been superimposed on
the histogram for comparison purposes. The frequencies of
the reference clock, processor clock, and VCO are 90, 180,
and 360 MHz, respectively. For this system (i.e., an active
processor), the period jitter has increased to 10.0 ps rms, 80
ps P-P, and the worst case clock-jitter penalty is 37.1 ps at 3.7
sigma confidence (or 30.1 ps at 3.0 sigma). The effective noise
penalty for running the array initialization routine is 4.9 ps at
3.0 sigma.

divider implementation, R. Kodali for circuit simulation and


specification, D. Woeste and J. Strom for the divider and lock
detector circuits, and S. Dhong and M. Papermaster for their
continuous support of this work.

VI. CONCLUSION

David W. Boerstler received the B.S. degree


in electrical engineering from the University of
Cincinnati, Cincinnati, OH, in 1978 and the M.S.
degree in computer engineering and in electrical
engineering from Syracuse University, Syracuse,
NY, in 1981 and 1985, respectively.
Since joining IBM in 1978, he has held a
variety of assignments, including the design of
high-frequency PLLs for clock generation and
recovery, fiber-optic transceiver and system design,
and other analog, digital, and mixed-signal bipolar
and CMOS circuit development projects. He currently is a Research Staff
Member with the High-Performance VLSI group at the IBM Austin
Research Laboratory, Austin, TX. His current research interests include
high-frequency synchronization techniques and signaling approaches for
high-speed interconnect.
Mr. Boerstler has received IBM Outstanding Technical Achievement
Awards for his work on the design of the serializer/deserializer for the
ESCON fiber-optic channel products and for the clock-generator design of
IBMs 1-GHz PowerPC microprocessor prototype. He has received seven
IBM Invention Achievement Awards.

This work demonstrates the viability of a low-jitter PLL


design approach amenable to high-speed microprocessors.
Measured jitter for the design was 8.4 ps rms, 62 ps P-P
for quiet conditions and 10.0 ps rms, 80 ps P-P for the
processor active. A tunable, moderate-gain VCO with active
process and temperature compensation provides high powersupply rejection and low sensitivity to temperature and process
variability. A differential design approach maintains noise
immunity in both control and signal paths within the analog
portions of the PLL.
ACKNOWLEDGMENT
The author wishes to thank J. Peter for layout of the PLL,
N. James and H. Casal for the hardware characterization and

REFERENCES
[1] I. Young, M. Mar, and B. Bhushan, A 0.35 m CMOS 3-880 MHz
PLL N /2 clock multiplier and distribution network with low jitter for
microprocessors, in ISSCC Dig. Tech. Papers, Feb. 1997, pp. 330331.
[2] J. Alvarez, H. Sanchez, G. Gerosa, and R. Countryman, A widebandwidth low-voltage PLL for powerPC microprocessors, IEEE J.
Solid-State Circuits, vol. 30, pp. 383391, Apr. 1995.
[3] J. Cho, Digitally-controlled PLL with pulse width detection mechanism
for error correction, in ISSCC Dig. Tech. Papers, Feb. 1997, pp.
334335.
[4] I. Young, J. Greason, and K. Wong, A PLL clock generator with 5110
MHz of lock range for microprocessors, IEEE J. Solid-State Circuits,
vol. 27, pp. 15991607, Nov. 1992.
[5] V. von Kaenel, D. Aebischer, C. Piguet, and E. Dijkstra, A 320 MHz,
1.5 mW at 1.35 V CMOS PLL for microprocessor clock generation,
in ISSCC Dig. Tech. Papers, Feb. 1996, pp. 132133.
[6] P. E. Gronowski, P. Bannon, M. Bertone, R. Blake-Campos, G.
Bouchard, W. Bowhill, D. Carlson, R. Castelino, D. Donchin, R.
Fromm, M. Gowan, A. Jain, B. Loughlin, S. Mehta, J. Meyer, R.
Mueller, A. Olesin, T. Pham, R. Preston, and P. Rubinfeld, A 433
MHz 64b quad-issue RISC microprocessor, in ISSCC Dig. Tech.
Papers and Slide Supplement, Feb. 1996, pp. 222223.
[7] Z. Zhang, H. Du, and M. Lee, A 360 MHz 3V CMOS PLL with 1
V peak-to-peak power supply noise tolerance, in ISSCC Dig. Tech.
Papers, Feb. 1996, pp. 134135.
[8] D. H. Wolaver, Phase-Locked Loop Circuit Design. Englewood Cliffs,
NJ: Prentice-Hall, 1991, pp. 5961.
[9] J. F. Ewen, A. Widmer, M. Soyuer, K. Wrenner, B. Parker, and H.
Ainspan, Single-chip 1062 Mbaud CMOS transceiver for serial data
communication, in ISSCC Dig. Tech. Papers, Feb. 1995, pp. 3233.
[10] B. Lai and R. Walker, A monolithic 622 Mb/s clock extraction and data
retiming circuit, in ISSCC Dig. Tech. Papers, Feb. 1991, pp. 144145.
[11] S. K. Enam and A. Abidi, NMOS ICs for clock and data regeneration in gigabit-per-second optical fiber receivers, IEEE J. Solid-State
Circuits, vol. 27, pp. 17631774, Dec. 1992.
[12] D. W. Boerstler and K. Jenkins, A phase-locked loop clock generator
for a 1 GHz microprocessor, in Symp. VLSI Circuits Dig. Tech. Papers,
June 1998, pp. 212213.
[13] J. Silberman, N. Aoki, D. Boerstler, J. Burns, S. Dhong, A. Essbaum, U.
Ghoshal, D. Heidel, P. Hofstee, K. Lee, D. Meltzer, H. Ngo, K. Nowka,
S. Posluszny, O. Takahashi, I. Vo, and B. Zoric, A 1.0 GHz singleissue 64b PowerPC integer processor, in ISSCC Dig. Tech. Papers,
Feb. 1998, pp. 230231.
[14] V. von Kaenel, D. Aebischer, R. van Dongen, and C. Piguet, A 600
MHz CMOS PLL microprocessor clock generator with a 1.2 GHz
VCO, in ISSCC Dig. Tech. Papers, Feb. 1998, pp. 396397.

726

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 6, JUNE 2002

A Low-Jitter Wide-Range Skew-Calibrated


Dual-Loop DLL Using Antifuse Circuitry
for High-Speed DRAM
Se Jun Kim, Sang Hoon Hong, Jae-Kyung Wee, Joo Hwan Cho, Pil Soo Lee, Jin Hong Ahn, and Jin Yong Chung

AbstractThis paper describes a delay-locked loop (DLL) circuit having two advancements, a dual-loop operation for a wide
lock range and programmable replica delays using antifuse circuitry and internal voltage generator for a post-package skew calibration. The dual-loop operation uses information from the initial
time difference between reference clock and internal clock to select
one of the differential internal loops. This increases the lock range
of the DLL to the lower frequency. In addition, incorporation of
the programmable replica delay using antifuse circuitry and the
internal voltage generator allows for the elimination of skews between external clock and internal clock that occur from on-chip
and off-chip variations after the package process. The proposed
DLL, fabricated on 0.16- m DRAM process, operates over the
wide range of 42400 MHz with 2.3-V power supply. The measured
results show 43-ps peak-to-peak jitter and 4.71-ps rms jitter consuming 52 mW at 400 MHz.
Index TermsDelay-locked loop, dual-loop operation,
high-speed DRAM, programmable replica delay, skew calibration.

I. INTRODUCTION

HE DELAY-LOCKED loop (DLL) has become an indispensable component in high-speed synchronous DRAMs
such as DDR SDRAM. Since the DLL determines the operation range of the DRAM and has a large effect on the data valid
window, a high-performance DLL that has a wider range and
lower jitter is essential for increasing the speed of DRAM. A
DLL can be categorized into either of two types, the digital
and the analog type. Although the digital DLL has robustness,
process portability, and design simplicity, it is difficult to use on
a very high-bandwidth DRAM (over 600 Mb/s) due to poor jitter
performance [1], [2]. Therefore, in spite of sensitivity on process
variation, the analog DLL, which ensures lower jitter by the continuous characteristics of analog operation, is more suitable in
the higher speed DRAM. In addition to the jitter performance,
another important issue of the DLL is the lock range. Process
variation makes the lock range of the analog DLL more limited
and results in a narrower operation range of the DRAM. The
limited range of the DLL limits the flexibility of implementation
on memory applications and increases test costs in mass production. For solving the limited lock-range problems, various types
Manuscript received October 2, 2001; revised January 29, 2002.
S. J. Kim, S. H. Hong, J. H. Cho, P. S. Lee, J. H. Ahn, and J. Y. Chung
are with the Advanced Design Team, Memory Research and Development,
Hynix Semiconductor Inc., Ichon-si, Kyoungki-Do 467-701, Korea (e-mail:
sejun.kim@hynix.com).
J.-K. Wee is with the Department of Electronics Engineering, Hallym University, Chunchun-si, Kangwon-Do 200-702, Korea.
Publisher Item Identifier S 0018-9200(02)04934-X.

of DLLs have been developed [3][6]. However, such DLLs


resulted in complex architectures that faced such problems as
increased area, added power consumption, and degradation of
jitter performance.
For these issues, a novel dual-loop architecture, which increases the lock range having no degradation of jitter performance with a relatively small overhead in area and power, is
proposed in this paper. Another enhancement in the proposed
DLL is the post-package skew calibration. Process variations in
on-chip and trivial mismatches in off-chip parameters can result
in a large static skew in addition to the phase offset of the phase
detector. In the proposed DLL, an improved scheme using antifuse circuitry is applied for reducing the skew. It enables a practical calibration of inevitable skews after the package process.
This paper is arranged as follows. The limited range
problem of the conventional DLL is described in Section II.
In Section III, the concept of the proposed dual loop for wide
locking range is briefly explained, followed by presentation
of the architecture and physical implementation based on the
concept. The skew calibration method using antifuse circuitry
is described in Section IV. Section V discusses the fabricated
chip and shows the experimental results. Finally, the paper is
concluded in Section VI.
II. LIMITED RANGE PROBLEM OF CONVENTIONAL DLL
Fig. 1(a) shows the architecture of the conventional analog
DLL and the delay characteristic of the voltage-controlled
(minimum delay
delay line (VCDL). When
(maximum delay of VCDL), the
of VCDL)
(operation frequency of DLL) is determined by
range of
(control voltage of loop filter) at the initial state. When
(minimum control voltage of loop filter) at the
(the cycle time
initial state and
of reference clock), the lock failure occurs because the phase
detector produces a DN pulse which discharges the capacitor in
the loop filter, as shown in Fig. 1(b). Therefore, in this case, it
at the initial state for satmust be
isfying the condition without lock failure. Therefore, the range
is
. In the
of
(maximum control voltage of
other case, when
,
loop filter) at the initial state and
the lock failure occurs because of the UP pulse of the phase
detector shown in Fig. 1(c). In this case, the range of
is
when
is
. For utilizing the full range of
the initial

0018-9200/02$17.00 2002 IEEE

KIM et al.: SKEW-CALIBRATED DUAL-LOOP DLL

727

(a)

(b)

(c)
Fig. 1.

(a) Block diagram and delay characteristic of conventional DLL. Cases of lock failure at initial control voltage: (b) initial V
.

=V

without the lock failures as in Fig. 1(b) and (c), the


must be set at a level such that the initial
initial
is approximately
.
is determined as
In this condition, the range of
. But this method
can cause stuck/harmonic lock and makes the jitter perforis desired at
mance worse. Therefore, if the range of
should be set to
the higher frequency range, the initial

=V

and (c) initial

since it is stuck/harmonic lock free and the delay cell


has a fast slew-rate that produces less phase noise [7]. But in
is very sensitive to process, voltage, and
reality,
temperature (PVT) variation. As a result, designing
to be in the target range becomes more careful and difficult
work as the operation frequency becomes higher. Therefore,
becomes
considering the PVT variation, the range of
more limited with the higher operating range.

728

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 6, JUNE 2002

(a)
Fig. 3.

Architecture of proposed DLL.

switched from ICLK to ICLKB (the differential clock of ICLK),


becomes
, and FCLK is also switched
from ICLK to ICLKB. This means that ICLKB is synchronized
to REFCLK. Therefore, in our proposed DLL, the locking fre.
quency range is
As a consequence, although the same delay source was used,
the operation range of the proposed DLL can be extended to a
lower frequency than that of conventional DLLs. As an analogous concept, the phase inversion technique was developed
for the wide range [8]. It uses instantaneous phase inversion
in VCDL input at the final moment, when it realizes current
lock-in process cannot meet the range by monitoring its control
voltage. The proposed concept achieves faster lock-in time
since it utilizes the dual-loop operation at the beginning by
selecting an optimized path.

(b)

B. Architecture and Implementation of the Proposed DLL

(c)
Fig. 2. (a) Concept of proposed DLL. Two cases of loop selection according to
the initial time difference: (b) (1=2)
and (c) 0
(1 2)
.

< = T

<T

<T

<

III. PROPOSED DLL IN HIGH-SPEED DRAM


A. Range of the Proposed DLL
The concept of the proposed dual-loop DLL is shown in
,
Fig. 2(a). After the DLL starts up at the initial
the initial time difference between REFCLK and ICLK is
monitored at the first REFCLK cycle. The first REFCLK
cycle refers to the cycle of the REFCLK when ICLK
is produced first in the loop after the DLL starts up. If
as shown in Fig. 2(b),
is adjusted to
by the phase detector and charge
pump like the conventional DLL. As a result, ICLK becomes
FCLK of the synchronized output clock of DLL. In the other
as shown in Fig. 2(c),
case, if
the input clock for phase comparison in the phase detector is

Fig. 3 shows the proposed dual-loop architecture of the


DLL for the wide lock range. Unlike conventional DLLs, the
proposed DLL is composed of dual negative feedback loops
(Loop1, Loop2). Loop1 and Loop2 are the feedback loops of
the differential internal clocks. For the correct operation of
dual loops, a loop selector, an initial circuit, a reset controller,
and a 2 : 1 MUX are implemented. In the proposed dual-loop
operation, the DLL determines one of the two differential
internal clocks in Loop1 and Loop2 according to the initial
time difference between the internal clock and the reference
clock (shown in Fig. 3). Before the DLL starts up, the initial
circuit sets VBP (control voltage of loop filter in Fig. 3) to
the minimum value, which minimizes the delay of the VCDL
to ensure harmonic lock-free operation. After RESET (DLL
enable signal) transits from low state to high state, CLK and
CLKB (the external differential clocks) are provided to the reset
controller. The reset controller outputs IRESET to the clock
buffer in the time between the falling edge of the next CLK and
the next rising edge, as shown in Fig. 4(c). Since RESET can
be asserted at any time, the direct application of this signal to
the clock buffer is not feasible because it can make the clock
buffer produce internal clocks with a distorted cycle and cause
incorrect initial time difference in the loop selection cycle, as

KIM et al.: SKEW-CALIBRATED DUAL-LOOP DLL

729

(a)
(a)

(b)
Fig. 5. (a) Delay cell of the replica bias circuit. (b) Cross-sectional view of
bias line in the proposed DLL.
(b)

(c)
Fig. 4. (a) Case where the clock buffer produces an incorrect initial time
difference without reset controller. (b) Schematic and (c) timing diagram of the
reset controller.

shown in Fig. 4(a). Fig. 4(b) shows the schematic of the reset
controller. When IRESET is asserted, the clock buffer produces
the three clocks (ICLK, ICLKB, and REFCLK). ICLK and
ICLKB, which are the differential clocks, are input to VCDL
and REFCLK is input to the phase detectors as the reference
clock. Fig. 5(a) shows the delay cell and the biasing scheme
used in the proposed DLL. To reduce the supply voltage sensitivity, the VCDL is implemented as a series of the differential
delay cell with symmetric loads, as shown in Fig. 5(a) [9]. VBP
is a control voltage from the loop filter and VBN is generated
by the replica bias circuit [boxed area in Fig. 5(a)]. The replica
bias circuit makes the constant swing independent of VBP,
which provides better jitter performance and a wider operation
range [10]. For shielding the analog biases from the external
noise, VBP and VBN are physically enclosed with inter- and
intra-layers as shown in Fig. 5(b). This shielding technique
improves the jitter characteristic. DCLK and DCLKB in Fig. 3,
which are converted from a small swing output of VCDL to a
full swing output by amplifiers, each forms negative feedback
loops and also are changed to LCLK and LCLKB by replica
delays. LCLK and LCLKB are input to the phase detectors

and LCLK is also input to the loop selector as shown in


Fig. 6(a). If the time difference between the first LCLK (the
first produced LCLK after DLL is enabled) and REFCLK is in
as shown in Fig. 6(b), Lsel, the
output of the loop selector, preserves the low state initialized by
IRESET. Lsel at the low state enables PD1 (the phase detector
of Loop1) and disables PD2 (the phase detector of Loop2).
Furthermore, the state makes the MUX select DCLK as FCLK
(the output clock of DLL). Since only PD1 is enabled, the phase
of LCLK is compared with that of REFCLK. The selected
PD1, as shown in Fig. 7(a), produces the UP/DN pulses having
a pulsewidth matching the phase difference between REFCLK
and LCLK, as shown in Fig. 7(b). This PD has small phase
offset due to the fast operation and precision of dynamic logic.
Also, it does not have phase dithering problems because no
pulses are produced at the locked state. The simulation results
show about 40-ps phase offset at the worst case. The UP/DN
pulse of PD1 is transferred to the charge pump and generates
VBP on the loop filter. The linear capacitor of the loop filter is
designed to achieve a large capacitance value in a small area
while minimizing the substrate noise, as shown in Fig. 8 [11].
Although compromising the linearity, if a MOS capacitor is
used, larger capacitance can be achieved in smaller area. The
replica bias generator produces VBN to control the current
source transistor of delay cell according to VBP. Finally, the
phase of LCLK is synchronized with that of REFCLK. In the
other case, where the time difference between the first LCLK
as shown in
and REFCLK is
Fig. 6(c), Lsel is changed from the low state initialized by
IRESET to the high state. It enables PD2 and disables PD1,
and makes the MUX select DCLKB as FCLK. In contrast to
the prior case, the phase of LCLKB is compared with that
of REFCLK. Through the same locking process, LCLKB is

730

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 6, JUNE 2002

(a)

(a)

(b)

(c)

2T

Fig. 6. (a) Schematic and timing diagrams at (b) (1=2)


, (c) 0
(1 2)
of the loop selector.

<T

< = 2T

<T

<

(b)
Fig. 7.

synchronized with REFCLK. Once one of the two loops is


selected, with the exception that the DLL is disabled or the
power is down, the selected loop is never changed by the time
difference between LCLK and REFCLK after the loop selection
cycle. Therefore, malfunction is avoided on the lock-in process.
Consequently, the proposed dual-loop operation eliminates the
, between the reference
initial condition,
clock and the internal clock and enables the delay of VCDL
to be utilized fully. The lock range is also extended to lower
frequency without compromising the jitter characteristic.
IV. SKEW CALIBRATION BY PROGRAMMABLE REPLICA DELAY
AND ANTIFUSE CIRCUITRY
Although the replica delay is well matched with the sum of
on-chip and off-chip delay at design process, process variation
of on-chip and unexpected change in the circumstance of

(a) Schematic and (b) timing diagram of the phase detector.

off-chip such as output load, clock slew-rate, and so on, may result in unavoidable skew. There are two methods for skew elimination, wafer trimmed by laser [12] and post-package tuning by
antifuse [13]. The wafer-level tuning is not effective because the
wafer tester is not precise, and the off-chip condition cannot be
considered. Although post-package tuning by antifuse is more
practical, the previous post-package method has some problems.
The previous post-package method uses external high voltage
through pins for rupturing the antifuse. Providing sufficient high
voltage for rupturing the antifuse can cause physical damage to
other circuits connected to the pin and can negatively affect the
reliability of the device. To remove the high-voltage problem,
the antifuse programming scheme by internal negative voltage
is used [14]. Fig. 9 shows the programmable replica delay circuit. The circuitry has three functional parts, the replica delay
of clock, the replica delay of the output buffer, and the tun-

KIM et al.: SKEW-CALIBRATED DUAL-LOOP DLL

Fig. 8. Linear capacitor in the loop filter.

Fig. 9.

731

Fig. 10. Antifuse circuit for skew calibration and SEM photograph of the
antifuse.

Replica delay including the programmable delay.

able delay circuit. The tunable delay circuit is connected to the


antifuse circuitry and the antifuse is made of ONO (oxidenitrideoxide) dielectrics, as shown in Fig. 10. The sequence of
skew calibration is explained as follows. When DLL is enabled
by RESET for the test, nodes fd [1][8] and bd [1][8] in Fig. 9
are all fixed at the high state, because the initial program voltage
is at ground level, and RESET initializes node A and B as
level. In this state, no address code can have an effect on the
fixed levels of fd [1][8] and bd [1][8]. First, the skew between
the external clock and the data strobe signal is measured. The
measured skew is estimated by selection of optimal number of
delay loads. After the program mode (PGM) is activated, the
program code signifying the estimated number of delay loads
is applied to the address pins and the skew is remeasured. This
process is iterated to increase or decrease replica delay times by
left-shiftright-shift (LSRS) for minimizing the skew. When the
skew is almost eliminated, the inserted program address code is
fixed and the on-chip negative voltage generator is enabled to
V) for rupturing the
produce a program voltage (
antifuses. The replica delay is tuned through the flow shown in
Fig. 11. According to the simulation results, the programmable
tuning range using the eight antifuses is from 350 to 350 ps
and the minimum tuning resolution is approximately 10 ps.
V. EXPERIMENTAL RESULTS
The proposed DLL has been fabricated using 0.16- m
DRAM process. Fig. 12 shows a microphotograph of the

Fig. 11.

Flow of skew calibration after package process.

Fig. 12.

Microphotograph of the proposed DLL.

fabricated chip. The active area of DLL occupies 0.27 mm .


50 of total area. For high-freThe loop filter consumes
quency measurements of the proposed DLL, a chip-on-board
(COB) has been fabricated both to reduce parasitics and to
match 50- impedance of the measurement instrument. The
proposed DLL operates from 42 to 400 MHz with a 2.3-V
power supply. Fig. 13 shows the synchronized waveforms at

732

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 6, JUNE 2002

(a)

(b)

Fig. 13. Synchronized waveforms at (a) 42 MHz and (b) 400 MHz.

(a)
Fig. 14.

(b)

Measured jitter characteristics at 400 MHz in (a) a quiet supply and (b) with injected 1-MHz

42 and 400 MHz. At 400 MHz, the peak-to-peak jitter is 43 ps


and the rms jitter is 4.71 ps, as shown in Fig. 14(a). When
a 300-mV 1-MHz square wave is injected externally on
the power supply, the peak-to-peak jitter and the rms jitter is
measured to be 80 and 7.46 ps, respectively, at 400 MHz, as
shown in Fig. 14(b). Fig. 15(a) shows a skew that is composed
of the phase offset of the phase detector and replica mismatch
by process variation before skew calibration and the tuned skew
after skew calibration. Before the calibration, the measured
skew is 55 ps. After the calibration, the remeasured skew is

6300-mV square wave noise.

reduced to 9 ps with measured peak-to-peak jitter of 46 ps. In


theory, error reduction resulting in negative phase shift will
have increased jitter due to increased load in replica delay,
but error reduction resulting in positive phase shift will have
decreased jitter by decreased load in replica delay. However,
from analyzing the measured results, the amount of increased
jitter by negative phase reduction is insignificant compared
to the reduced phase error. Fig. 15(b) shows the resolution
and partial range of the skew calibration through antifuse
programming. Minimum resolution is about 10 ps and total

KIM et al.: SKEW-CALIBRATED DUAL-LOOP DLL

733

(a)

(b)

Fig. 15. (a) Measured skew at 400 MHz before skew calibration and after skew calibration. (b) Range (full range not displayed for limitation of tester) and
resolution of skew calibration.
TABLE I
PERFORMANCE CHARACTERISTICS OF THE PROPOSED DLL

clock and the internal clock of the DLL. Also, an improved skew
calibration method demonstrated a practical post-package skew
calibration using the antifuse circuitry and the internal negative
voltage generator. The proposed DLL, fabricated on 0.16- m
DRAM process, achieves a wide range from 42 to 400 MHz,
and 43 ps peak-to peak jitter and 4.71 ps rms jitter at 400 MHz
that is applicable to high-speed DRAMs.
ACKNOWLEDGMENT
The authors are grateful to H. Ryu and Dr. Y. Kim for helpful
discussion about COB-type PCB design.
REFERENCES

calibration range is from 350 to 350 ps, as expected from


simulation. These results show that the skew by variation in
on-chip or off-chip can be eliminated through programmable
replica delays using the antifuse circuitry, and also verifies
that the improved skew calibration technique can effectively
eliminate the skews after packaging without degradation of
the jitter characteristic. The power dissipation of the proposed
DLL is 52 mW at 400 MHz. Table I summarizes the measured
characteristics of the proposed DLL.
VI. CONCLUSION
In this paper, the dual-loop architecture with the improved
skew calibration method was presented. The dual-loop architecture enabled the wide range of the DLL by using the loop selection decided by an initial time difference between the reference

[1] A. Hatakeyama et al., A 256-Mb SDRAM using a register-controlled


digital DLL, IEEE J. Solid-State Circuits, vol. 32, pp. 17281734, Nov.
1997.
[2] Y. Okajima et al., Digital delay-locked loop and design technique for
high-speed synchronous interface, IEICE Trans. Electron., vol. E79-C,
pp. 798807, June 1996.
[3] T. H. Lee et al., A 2.5-V CMOS delay-locked loop for an 18-Mbit 500Mbyte/s DRAM, IEEE J. Solid-State Circuits, vol. 29, pp. 14911496,
Dec. 1994.
[4] S. Tanoi et al., A 250622-MHz deskew and jitter-suppressed clock
buffer using two-loop architecture, IEEE J. Solid-State Circuits, vol.
31, pp. 487493, Apr. 1996.
[5] S. Sidiropoulos et al., A semi-digital dual delay-locked loop, IEEE J.
Solid-State Circuits, vol. 32, pp. 16831692, Nov. 1997.
[6] Y. Okuda et al., A 66400-MHz adaptive-lock-mode DLL circuit with
duty-cycle error correction, in Symp. VLSI Circuits Dig. Tech. Papers,
June 2001, pp. 3738.
[7] C. H. Park et al., A low-noise 900-MHz VCO in 0.6-m CMOS, IEEE
J. Solid-State Circuits, vol. 34, pp. 586591, May 1999.
[8] T. Yoshimura et al., A delay-locked loop and 90-degree phase shifter
for 800-Mb/s double data rate memories, in Symp. VLSI Circuits Dig.
Tech. Papers, June 1998, pp. 6667.
[9] J. G. Maneatis, Low-jitter and process-independent DLL and PLL
based on self-biased techniques, IEEE J. Solid-State Circuits, vol. 31,
pp. 17281732, Nov. 1998.

734

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 6, JUNE 2002

[10] I. A. Young et al., A PLL clock generator with 5 to 110 MHz of lock
range for microprocessors, IEEE J. Solid-State Circuits, vol. 27, pp.
15991607, Nov. 1992.
[11] F. Herzel et al., A study of oscillator jitter due to supply and substrate
noise, IEEE Trans. Circuits Syst. II, vol. 46, pp. 5662, Jan. 1999.
[12] T. Hamamoto et al., A skew and jitter suppress DLL architecture for
high-frequency DDR SDRAMs, in Symp. VLSI Circuits Dig. Tech. Papers, June 2000, pp. 7677.
[13] S. Kuge et al., A 0.18-m 256-Mb DDR-SDRAM with low-cost postmold-tuning method for DLL replica, in IEEE Int. Solid-State Circuits
Conf. (ISSCC) Dig. Tech. Papers, Feb. 2000, pp. 402403.
[14] K. S. Min et al., A post-package bit-repair scheme using static latches
with bipolar-voltage programmable antifuse circuit for high-density
DRAMs, in Symp. VLSI Circuits Dig. Tech. Papers, June 2001, pp.
6768.

Se Jun Kim was born in Seoul, Korea, in 1974. He


received the B.S. and M.S. degrees in electronics engineering from Hanyang University, Seoul, in 1998
and 2000, respectively.
In 2000, he joined the Memory Research and
Development Division, Hynix Semiconductor
Inc., Kyungki-Do, Korea, as a Research Engineer,
where he has been working on CMOS circuit and
architecture for high-speed digital/analog interface.
His current interests include clock recovery circuits,
data converters, clock distribution, and I/O circuits
for high-speed digital/analog interface.

Joo Hwan Cho was born in Seoul, Korea, in 1968.


He received the B.S. degree in electronic materials
engineering from Kwang-Woon University, Seoul, in
1992.
He joined the Semiconductor Research and Development Center, Hynix Semiconductor Inc, Ichon-si,
Kyungki-Do, Korea, in 1992. Since then, he has been
working on DRAM design and failure analysis.

Pil Soo Lee was born in Seoul, Korea, in 1963. He


received the B.S. and M.S. degrees from Inchon University, Korea, in 1990 and 1992, respectively.
In 1993, he joined KEC, Kumi, Korea, where
he worked on power device design and analysis. In
1997, he joined Hynix Semiconductor Inc., Ichon-si,
Kyungki-Do, Korea, where he has been working on
signal integrity analysis of high-frequency devices,
circuits, and boards.

architectures.

Jin Hong Ahn was born in Busan, Korea, in 1958.


He received the B.S. and M.S. degrees in electronic
engineering from Seoul National University, Seoul,
Korea, in 1982 and 1984, respectively.
He joined Gold-Star Semiconductor Company,
Gumi, Korea, in 1984. From 1986 to 1990, he was
involved in designing SRAMs and mask ROMs.
In 1991, he moved to the DRAM design group,
Gold-Star Electron Company, Seoul. From 1991 to
1998, he managed several generations of advanced
DRAM design projects, including 64-Mb, 256-Mb,
MML, and intelligent RAM. His interests in DRAM design include new
DRAM architectures, next-generation DRAM circuit technologies, and
low-cost DRAM design techniques. In 1999, he joined the Memory Research
and Development Group, Hynix Semiconductor Inc., Ichon-si, Korea, where
he was engaged in the development of 0.15-m 256-M DRAM. He is currently
a Technical Director in DRAM Design technology.

Jae-Kyung Wee was born in Seoul, Korea, in 1966.


He received the B.S. degree in physics from Yonsei
University, Seoul, in 1988 and the M.S. degree
from Seoul National University in 1990. In August
1998, he received the Ph.D. degree in electronics
engineering on modeling and characterization
of interconnects for high-speed and high-density
circuits from Seoul National University.
In 1990, he joined Hyundai Electronic Company
working on the process integration of 16 MDRAM
and LOGIC devices. In 1996, he was engaged in the
development of the manufacturable 0.35-m CMOS logic technology for highperformance logic products at Hyundai Electronics. In August 1998, he became
a Project Leader of the Antifuse Repair Circuit Development Team. From August 1999 to June 2000, he was a Project Leader of 1-G DDR SDRAM using
0.13-m technology. Beginning in July 2000, he also worked on next-generation
DRAM and its related systems. He is currently with the faculty of Hallym University, Chunchun-si, Kangwon-Do, Korea. His research interest is in the area of
future DRAM architecture including high-speed DRAM with 200 400 MHz
clock, interconnect modeling, charge pump, DLL, I/O, and module designs for
high-speed chips. He holds several patents and is an author or co-author of several papers.

Jin Yong Chung received the B.S.E.E. degree from


Seoul National University, Seoul, Korea, in 1974 and
the M.S.E.E. degree from Korea Advanced Institute
of Science and Technology, Taejon, Korea, in 1976.
From 1976 to 1978, he worked for Korea Semiconductor Inc., which later became Semiconductor Business Unit of Samsung Electronics, where he was involved in the design of timepieces and custom CMOS
chip designs. Since 1979, he was involved in memory
design area and worked for various companies including National Semiconductor, Synertek, Vitelic,
developing CMOS SRAMs, 4 K to 64 K and mask ROMs and CMOS DRAMs.
In 1987, he joined LG Semiconductor, Korea, where he developed 256 K to 16 M
DRAMs and other standard logic products. In 1992, he joined Mosel-Vitelic,
where he developed high-speed DRAMs and the 256 K 8 high-speed DRAM
became the first semi-standard DRAM, which helped the company to go public.
Since 1996, he has worked for Hynix Semiconductor Inc., Ichon-si,
Kyoungki-Do, Korea, as a Senior Vice President and Chief Architect in the
Memory Research and Development Division. His current research interest is
in development of ultrahigh-speed, super low-voltage and low-power memory
products, novel device research in ferroelectric and magnetic memories, and
new-generation 3-D devices.

Sang Hoon Hong received the B.S. degree in


electronic engineering from Yonsei University,
Seoul, Korea, in 1993. He received the M.S. and
Ph.D. degrees in engineering sciences from Harvard
University, Cambridge, MA, in 1998 and 2001,
respectively.
He is currently with the Memory Research and
Development Division of Hynix Semiconductor
Inc., Ichon-si, Kyongki-Do, Korea, working on
high-speed dynamic memories with a particular interest in low-voltage/power circuits and

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 8, AUGUST 2000

1137

A Low-Noise Fast-Lock Phase-Locked Loop with


Adaptive Bandwidth Control
Joonsuk Lee, Student Member, IEEE, and Beomsup Kim, Senior Member, IEEE

AbstractThis paper presents a salient analog phase-locked


loop (PLL) that adaptively controls the loop bandwidth according
to the locking status and the phase error amount. When the phase
error is large, such as in the locking mode, the PLL increases the
loop bandwidth and achieves fast locking. On the other hand,
when the phase error is small, this PLL decreases the loop bandwidth and minimizes output jitters. Based on an analog recursive
bandwidth control algorithm, the PLL achieves the phase and
frequency lock in less than 30 clock cycles without pre-training,
and maintains the cycle-to-cycle jitter within 20 ps (peak-to-peak)
in the tracking mode. A feed forward-type duty-cycle corrector
is designed to keep the 50% duty cycle ratio over all operating
frequency range.
Index TermsAdaptive bandwidth PLL, analog implementation, clock recovery, fast locking time, frequency hopping, gear-shifting algorithm, low jitter, phase-locked loops,
time-varying channel.

I. INTRODUCTION

HASE-LOCKED loops (PLLs) have been widely used


in high-speed data communication systems such as Ethernet receivers, disk drive read/write channels, digital mobile
receivers, high-speed memory interfaces, and so forth, because
PLLs efficiently perform clock recovery or clock generation
with relatively low cost. Those PLLs used in the systems are
required to generate low-noise or low-jitter clock signals and at
the same time need to achieve fast locking.
Conventional analog PLLs in clock recovery applications
use a narrow-band loop filter to reduce output jitters at the
expense of elongated locking time. In order to improve the
locking-time characteristics, digital or hybrid analog/digital
PLLs with a loop bandwidth stepping capability have been
studied [1], [2]. Since the stepping hardware is implemented
with complex digital building blocks, these PLLs usually
suffer from high power dissipation, low operating speed and
large die size. In order to reduce consuming power and die
size, simpler algorithms such as a gear-shifting or a lock-detection algorithm were attempted [3], [4]. The PLLs with such
algorithms control the loop bandwidth according to a prestored
charge-pump current control sequence in memory during the
start-up mode. However, in clock recovery applications such
as HDD and DVD, where the channel characteristics vary in
time, the prestored control sequence cannot make the PLLs
Manuscript received October 29, 1999; revised February 23, 2000.
J. Lee was with the Boston Design Center, IBM Microelectronics, Lowell,
MA 01851 USA. He is now with the Korea Advanced Institute of Science and
Technology, Taejon 305-701, Korea.
B. Kim is with the Korea Advanced Institute of Science and Technology,
Taejon 305-701, Korea (e-mail: bkim@ee.kaist.ac.kr).
Publisher Item Identifier S 0018-9200(00)06435-0.

respond properly to unpredictable phase fluctuation, instant


frequency shift, and time-varying jitter because the sequence
was calculated with preknown fixed noise statistics.
Discrete-time PLLs, which are programmed on DSP processors, based on a recursive least squared (RLS) algorithm [5]
or the Kalman filter algorithm [6] can respond to such unpredictable jitter variations, but require enormous amount of hardware. The outputs generated from the discrete-time PLLs are
in a digital domain, and therefore the discrete-time PLLs require digital-to-analog converters (DAC) and an analog-to-digital converter (ADC) to sample input signals for detection. Slow
signal-processing speed of the digital-to-analog conversion in
the discrete-time PLLs limits the operating frequency and confines the use of the PLLs to the applications dealing with lowfrequency signals like digital wireless base stations.
This paper presents a new analog adaptive PLL (AAPLL) architecture capable of varying the loop bandwidth according to
an adaptively updated control sequence under a time-varying
noise environment. Since the control sequence is generated from
analog signal processing, the PLL operates at several hundred
megahertz and can be easily modified to run at gigahertz frequency ranges.
This paper consists of five sections including the present section. Section II describes the AAPLL architecture and the analog
adaptive bandwidth-control algorithm. Stability and jitter analysis for the AAPLL are given in Section III. AAPLL locking behaviors are also discussed in this section. Section IV shows the
AAPLL IC implementation and measurement results. Finally, a
brief summary of this paper is given in Section V.
II. RECURSIVE EQUATION AND ANALOG LOOP BANDWIDTH
CONTROLLER
In this section, a recursive bandwidth update algorithm for the
analog adaptive controller and its implementation are described.
A. Adaptive Bandwidth Control
As mentioned in the introduction, a common approach to
improve the locking speed of a PLL is to use a gear-shifting
method for loop bandwidth control. In such a PLL, when fast
locking is required, as in the initial frequency/phase acquisition
mode, the loop bandwidth of the PLL is expanded by the increased charge-pump current or the phase detector gain [2][4].
Zero phase start (ZPS) is also helpful to reduce the phase acquisition time [4], but limited to the case when the initial-frequency locking has been already established. For the case where
rapid initial-frequency locking is required, various techniques
with a prestored gear-shifting sequence have been studied [4],

00189200/00$10.00 2000 IEEE

1138

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 8, AUGUST 2000

Fig. 1. Linearized model of a CP-PLL.

equation is required and used in the proposed AAPLL. The update equation is given by (2), in terms of the loop gain
that is proportional to the loop bandwidth.1
(2)

Fig. 2. Conceptual diagram of an analog adaptive controller.

[5]. However, in the case where the channel characteristics vary


in time, such as in a disk drive, the prestored gear-shifting sequence is not helpful. Unpredictable phase fluctuation, instant
frequency shift, and varying input jitter force such a PLL to
use an indefinite wide loop bandwidth in order not to lose the
locking. Although a discrete-time adaptive PLL can adjust the
bandwidth according to the input noise statistics, it still requires
complex hardware and its applications are limited to the low frequency operating systems [5].
A linearized model of a charge-pump PLL (CP-PLL) is
shown in Fig. 1. The transfer function for domain is represented by
(1)
. Here
and
where
are the phase detector gain given by
and the voltage, respectively,
controlled oscillator (VCO) gain given by
is the -transform of the sampled version of
,
and
is the PLL loop filter in domain given by
where
if a simple passive low-pass filter (
) is assumed
is called the
to be used as a loop filter. The quantity
.
PLL loop gain
Discrete-time PLLs that have an adaptive stepping capability
can control the loop bandwidth by a loop-gain update equation
minimizing the RLS error [7]. However, it is difficult to fully
implement the update equation used in the discrete-time PLL
because it requires a significant amount of die size and power
consumption. A simpler but still an effective loop gain update

Here, is a forgetting factor that has a positive value close to


but less than unity. is a coefficient that normalizes and converts the absolute value of inputoutput phase errors from radians to dimensionless numbers. The loop gain
is calculated by a recursive manner according to (2). When the
inputoutput phase error becomes zero, the forgetting factor
makes the loop gain converge to zero as the discrete time
increases. Equation (2) reflects the most recent inputoutput phase
most significantly. This recursive
error
relation is similar to the RLS algorithms commonly used for an
estimator [8].
at time
is calculated as
The loop gain
and the abthe weighted sum of the present loop gain
solute value of the present inputoutput phase error,
at time
. The equation indicates that the loop
gain, thus the loop bandwidth, rapidly grows when the recent
absolute phase errors become large, and greatly improves the
PLL loop tracking capability. When the recent absolute phase
errors become small, the loop gain shrinks, as does the loop
bandwidth, because the first part of (2) dominates. The reduced
loop bandwidth improves the PLLs input jitter rejection capability. Therefore, (2) satisfies the necessary loop bandwidth control under the presence of unpredictable jitter variation.
B. Analog Adaptive Bandwidth Control
Equation (2) is achieved by a CP-PLL with a small amount of
extra hardware. The second term of (2), an absolute phase error,
is obtained from outputs of a phase frequency detector (PFD).
Since the PFD and the following charge-pump circuit generate
up/down current signals proportional to the inputoutput phase
difference, simply combining these up/down signals through
an OR gate gives the absolute phase error signal
at time
. Fig. 2 shows a conceptual
diagram of how the recursive loop gain of the AAPLL is calcubecomes the
lated in the controller. The bandwidth voltage
bias voltage of the following current source in the charge-pump
circuit and controls the amount of charge-pump currents. The
current switch steers the current proportional to the phase
by the
error and increases the voltage across the capacitor
corresponding amount at a constant rate while the resistor
exponentially discharges the capacitor. The resistor and the capacitor realize the first part of (2) with the forgetting factor . As
K

1Here,

the loop bandwidth and the loop gain have the following relationship:
=f .

=W

LEE & KIM: LOW-NOISE FAST-LOCK PHASE-LOCKED LOOP

Fig. 3.

1139

AAPLL total block diagram.

derived in the Appendix, in the steady state the voltage across


at time
is given by
the capacitor
(3)
, is
Here, the forgetting factor equals
, and
is the amount of the charging
current in the controller.
The loop gain is asymptotically proportional to the bandwidth
governed by (3) because the charge-pump current
voltage
is directly controlled by this voltage. It means that the bandwidth
of the AAPLL follows (2).
III. CIRCUIT IMPLEMENTATION
This section describes the circuit implementation of the
adaptive bandwidth controller, the charge-pump circuit, the
VCO, and the duty cycle correction circuit. Fig. 3 shows
the overall block diagram of the AAPLL, which modifies
a conventional PLL by attaching an analog adaptive bandwidth-controlling block. Due to the minor change, the AAPLL
is easily applicable to various PLL applications and still takes
advantage of the full adaptability.

Fig. 4.

PFD schematic with simplified TSPC D-flipflops.

Fig. 5.

Up/Down and phase-error signal diagram.

A. Adaptive Bandwidth Controller and Charge Pump


The well-designed PFD is used as a phase-detecting block
instead of a mixer, though the input signal frequency is high,
in order to achieve a wideband capturing capability. The PFD
shown in Fig. 4 consists of two simplified true single-phase
clock (TSPC) D-flipflops and one NOR gate. Since the input
frequency of the AAPLL is selected to recover the clock signal
in DVD systems, whose clock frequency is about 250 MHz, the
PFD should generate up and down signals at such a high speed.
In order to minimize the abnormal operation of the PFD, TSPC
D-flipflops are used as leaf cells since these intrinsic delays
are smaller than those of conventional ones.
The adaptive bandwidth controller shown in Figs. 3 and 6
consists of an OR gate and a differential switch, which takes the
differential signals from the OR gate. The OR gate sums the up
and down signals generated from the PFD and gives the absolute phase error. The differential switch controls current paths
to the bandwidth capacitor
according to the
from
for one clock period.
phase difference
When the phase difference signal is mostly on over a period,
such as in the initial-phase acquisition state, the charging rate

of the capacitor
exceeds the discharging rate. Hence the
of the capacitor
and the pumping
capacitor voltage
current in the charge pump increase. As a result, the phase detector gain increases and so does the loop bandwidth. On the
other hand, when the phase error signal is off for the most part

1140

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 8, AUGUST 2000

Fig. 6. Adaptive bandwidth controller and CP schematics.

of one period, such as in the tracking state, the discharging rate


exceeds the charging rate and the capacitor voltage decreases.
Therefore both the phase-detector gain and the loop bandwidth
decrease. In the steady-state tracking mode, the AAPLL loop
bandwidth can be very narrow because the phase error becomes
zero. However, the AAPLL still maintains the minimum loop
bandwidth even in such a case because of the up/down signals
generated from the set/reset type PFD, as shown in Fig. 4. In the
zero-phase-error and perfect locking case in Fig. 5, the OR-gated
effective phase-error signal can still supply currents to the bandwidth capacitor. The statistical variation of the input signal also
contributes to maintain this minimum bandwidth. Fig. 5 shows
the relation between the phase error and the bandwidth control
of the AAPLL.
Fig. 6 shows the circuit diagram of the analog adaptive bandwidth controller with a charge-pump circuit. As mentioned beacross the capacitor
in parallel
fore, the voltage
controls the phase-detection gain
. In
with a resistor
,a
order to control the discharging rate of the capacitor
, as shown in Fig. 7, is
voltage-controlled resistor (VCR)
used. The VCR is designed to have fully linear IV characteristics for a given power supply range. By adjusting the magnitude of the bias current in the VCR branches, the resistance of
is changed and so does the discharging rate of the capac.
itor
is connected to the gates of nMOS
The output node of
transistors, and controls the charge-pump current by adjusting
and
in Fig. 6. The charge pump consists
the bias point of
of two differential input stages, a mirror stage, an output stage,
and two small extra current sources. These two small current
and
sources help the rapid turn-on/off operation for MOS ,
. The differential PFD signals drive the charge-pump inputs.
When the down signal goes high, the current controlled by the
is drawn from the loop
voltage of the bandwidth capacitor
filter. When the up signal goes high, the same amount of current
is supplied to the loop filter.
B. Voltage-Controlled Oscillator (VCO)
A four-stage VCO as shown in Fig. 8 is used for the AAPLL.
The basic delay cell consists of six transistors. The cross-coupled
and
, guarantee the differential operapMOS transistors,
tion of the delay cell without a tail-current bias. Auxiliary pMOS
and
, control the oscillation frequency. Unlike
transistors,

Fig. 7. VCR schematic.

conventional differential VCOs with a current bias, this VCO allows the AAPLL to operate under a single 1.5-V power supply,
consuming 1.5 mA. Because the output signal of the VCO swings
rail-to-rail, no additional level shifter with a carefully designed
replica bias circuit is required to generate CMOS level outputs.
and
, sharpens the
The latch, configured with pMOS ,
edge of the output signal so that the added noise has little chance
to be converted as jitters. Eventually this latch helps the reduction
process of the VCO jitter [9], [10].
C. Duty-Cycle Corrector
Maintaining a 50% duty-cycle ratio for a clock signal is
extremely important in most high-speed clock recovery and
clock generation applications because several systems, such
as double-data rate (DDR) SDRAMs and pipelined microprocessors, use negative transition edges of a clock signal
in order to increase total system throughputs. This is often
achieved by a VCO running at twice as high as the desired clock
frequency, and then dividing the VCO frequency by 2. Other
approaches use a feedback-type duty-cycle corrector. Since
precise placement of the falling edge between two successive
rising edges of the VCO output signal is generally controlled by
an additional feedback loop, the duty-cycle correctors require
an extra training period to stabilize the feedback loop.
The AAPLL uses a feed forward-type duty-cycle corrector
instead of the feedback type in order to eliminate the extra feedback hardware and the training period, as shown in Fig. 9(a).
The duty cycle corrector utilizes multiphase signals generated
from a multistage differential VCO. The signal in Fig. 9(b)
and
selected from the multiphase signals turns on MOS ,

LEE & KIM: LOW-NOISE FAST-LOCK PHASE-LOCKED LOOP

1141

Fig. 8. Four-stage VCO.

(a)

Fig. 10. Stability diagram for loop gain

A. Stability
(a)
Fig. 9. Feed forward-type duty-cycle corrector. (a) Duty-cycle corrector
schematic. (b) Conceptual diagram of the correcting operation.

, and charges the output node of the duty-cycle corrector


almost instantaneously, because the discharge path of the node
is already off due to the signal . The signal , which is also
selected from the multiphase signals, is the one whose rising
edge is shifted by 180 in phase from that of . Similarly, the
signal rapidly discharges the node and delivers the desired
50% duty-cycle signal. Since this duty-cycle correction circuit
consists of only two transmission gates and two inverters, the silicon area is minimal and the power consumption is negligible.
In HSPICE simulation, the proposed duty-cycle corrector keeps
the output duty cycle almost perfectly at 50% with the input duty
cycles varying from 10 to 90%.
IV. ANALYSIS AND SIMULATION
In this section, the stability of the AAPLL is analyzed for the
adaptively generated loop sequence, and behavioral simulation
results for fast lock and large jitter reduction are described.

Since the AAPLL automatically changes the loop bandwidth,


a careful loop stability analysis is required. As mentioned in
the previous section, an analog adaptive controller adjusts
the phase-detector gain of the CP-PLL. Therefore, stability
checking for the PLL for each different phase gain should
be accomplished first. A complete stability analysis for the
CP-PLL is cumbersome because a PLL operates in both a
linear and a nonlinear region. A simplified stability analysis for
a second-order CP-PLL [8] is used in this section. When the
criterion is extended to include the logic delay effect, it can be
expressed as
(4)
Here, , , and are the clock period, the logic delay, and the
RC time constant of a loop filter respectively. The stability limit
of the AAPLL is derived and simulated
for the loop gain
using this criterion as shown in Fig. 10. The adaptively generated loop gain sequence by the recursive equation is also shown
in the same figure to verify the AAPLL stability. The sequence
converges to the minimum bandwidth and the amplitude of this
bandwidth is almost similar to that derived from the MMSE criterion [4]. Equation (4) can be written to obtain the stability criby solving a MOS I-V
terion for the bandwidth voltage

1142

Fig. 11.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 8, AUGUST 2000

Stability limit graph for bandwidth voltage V

characteristic equation for


ration condition.

Fig. 12.

Simulation setup for the locking behavior measurement.

in Fig. 6 assumiing a satu-

(5)
,
,
, and
are
, the
Here,
VCO gain, the nMOS threshold voltage, and the size of a resistor
is the
of the loop filter in Fig. 3, respectively. Here,
,
. The stability limit for the bandwidth voltage,
size of
which is one of the observable values in the measurement setup,
and resistor
is visualized in Fig. 11 for various capacitor
values. The figure shows that all the sequences
and
are within the stable region for the various resistor and
capacitor values.

(a)

B. Output Jitter
Recently, it was reported that a CP-PLL has an optimum loop
bandwidth that generates minimum jitter in the steady state [11].
A clean tone, that is assumed to have only noise floor and no
random walking phase noise, is used as a reference signal for the
jitter derivation. Because the AAPLL eventually achieves the
steady state locking with a clean reference signal like other conventional PLLs, the output cycle-to-cycle jitter of the AAPLL
can be calculated by

(b)
Fig. 13. Simulation results for the locking behavior of the AAPLL and
conventional ones. (a) Fixed narrow-bandwidth PLL. (b) Fixed wide-bandwidth
PLL.

(6)
,
,
, and
are the internal jitter
Here,
from the VCO, the jitter of the input signal, the rms value of
the charge-pump current variation, and the rms value of VCO
control voltage noise in the steady state, respectively.
C. Behavioral Simulation of a Locking Feature
Closed-form analysis of locking behaviors for the AAPLL is
difficult because of its nonlinear operation. In this paper, a simulation-based approach like the Monte Carlo Method is used instead. The AAPLL is modeled in a SPICE circuit simulator and

extensively tested by the circuit simulator. Fig. 12 shows the simulation setup for the AAPLL. Fig. 13 compares the simulated
locking behavior of the AAPLL with that of a conventional PLL.
The bandwidths of the conventional PLL are selected to have two
typical values. One is optimized for the initial locking, and the
other for the steady-state tracking. The gray line in Fig. 13(a) indicates an incoming signal in the phase domain. The solid line
in the figures shows the phase of the AAPLL output signal from
initial locking to steady-state tracking. The phase variation of
the conventional PLL optimized for steady-state tracking with a
narrow bandwidth is shown in the same figure as a dashed line.

LEE & KIM: LOW-NOISE FAST-LOCK PHASE-LOCKED LOOP

1143

Fig. 17.

Experimental results of the locking for a 150200-MHz input signal.

Fig. 14. Micrograph of the fabricated AAPLL chip.

Fig. 18. Experimental results of the locking for a 180220-MHz input signal
by four steps.

Fig. 15.

Control voltage change for a 0250 MHz frequency input.

Fig. 16.

Loop bandwidth voltage change for a 0250-MHz frequency input.

In Fig. 13(b), the phase change of the conventional PLL optimized for initial locking is also shown as a dashed line. This simulation result gives several characteristics of the AAPLL. The
AAPLL controlled by the recursive algorithm achieves fast lock
in the initial locking period, comparable to the speed obtained
from a wide-bandwidth PLL because the consecutive error signals rapidly increase the loop bandwidth of the AAPLL. In the
steady-state tracking mode, the AAPLL substantially rejects the
input jitter due to the narrower loop bandwidth.
V. EXPERIMENTAL RESULTS
The AAPLL is fabricated in a 0.6- m single-poly triple-metal
n-well CMOS process [12]. The die size for the AAPLL is
0.11 mm . The total power consumption is less than 15 mW
with a single 3-V supply. A microphotograph of the AAPLL

Fig. 19. 2.544-ps (rms) and 20-ps (peak-to-peak) cycle-to-cycle jitter at


250-MHz input.

is shown in Fig. 14. To get the forgetting factor


,
k ,
20 pF are used. And
100 A
. The locking-speed measureis selected to get
ment is carried out using an abrupt change of the input signal
frequency from 0 to 250 MHz. Fig. 15 shows the corresponding

1144

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 8, AUGUST 2000

Fig. 20. 50% duty-cycle correction operation over the entire frequency range.

Fig. 22.
work.

Comparison between recently reported PLLs and DLLs and this

TABLE I
AAPLL CHARACTERISTICS SUMMARY

Fig. 21.

VCO linearity.

VCO control-voltage variations. In order to measure the locking


speed precisely, the running cycles of the output waveform are
counted from the frequency triggering point in the initial locking
state. The AAPLL requires less than 30 clock cycles for both fre, reprequency and phase lock in this case. The voltage of
senting the AAPLL bandwidth, is also measured. Fig. 16 shows
the measured voltage and describes the adaptation of the loop
bandwidth in the AAPLL. Figs. 17 and 18 show the measured
control voltages and the corresponding output signals when the
input frequencies vary from 150 to 200 MHz and from 180 to
220 MHz by four steps respectively. In this case, the frequency
and the phase locking require less than 10 symbol periods because the frequency steps are much smaller compared to the previous case. The measured cycle-to-cycle jitters of the AAPLL
output signal at a 250-MHz input signal are 2.54 ps (rms) and 20
ps (peak-to-peak) as depicted in Fig. 19. This jitter value contains the inherent measurement setup jitter [13].
In order to test the performance of the duty cycle correction
circuit, the duty cycle ratio of the output signal is measured with
the input signals from 90 to 260 MHz. Fig. 20 shows the measured result of the corresponding duty cycle ratio. This result
indicates the feed forward-type duty-cycle corrector maintains

50% duty-cycle ratio within 2% error for the region. Fig. 21


shows a VCO linearity diagram. The VCO gain is about 100
MHz/V at a 250-MHz input frequency. The AAPLL operates
from 80 to 290 MHz with a 3-V supply voltage. Fig. 22 compares the normalized peak-to-peak jitter and the lock time of the
AAPLL with those of recently reported PLLs and DLLs. Measured characteristics are summarized in Table I.
VI. CONCLUSION
This paper presents the design of a 250-MHz low-jitter
fast-lock analog adaptive bandwidth-controlled PLL on a single
chip. The chip is implemented in a 0.6- m standard CMOS
process. Simple recursive control logic is proposed to control
the bandwidth effectively. The measured locking time is less
than 10 cycles in a 10-MHz frequency step and less than 30
cycles from an unknown frequency signal to the 250MHz
signal respectively. The measured output jitters are 2.6 ps (rms)
and 20 ps (peak-to-peak). All the components are designed

LEE & KIM: LOW-NOISE FAST-LOCK PHASE-LOCKED LOOP

1145

using analog technique and hence the required die size and the
power consumption are minimal.
APPENDIX
As shown in Fig. 2, the OR gate gives the control signal for
the switch according to the phase error signal in the bandwidth controller. When the phase error of the signal is high, the
controller signal from the OR gate feeds current to the bandand the voltage across the capacitor inwidth capacitor
. As a result, the bandwidth
creases at a constant rate
increases proportional to the normalized phase
voltage
. After the charging process, the controller signal
error
from the OR gate disconnects the path from the current source
discharges
and connects to the resistor. So the capacitor
. The switching action occurs every
through the resistor
clock cycle period.
of the bandwidth capacitor at time
The voltage
can be written as

(7)
where
at time

is the voltage of the previous capacitor voltage


. The voltage equation can be simplified to (8).

[8] S. Haykin, Adaptive Filter Theory. Englewood Cliffs, NJ: Prentice


Hall, 1995.
[9] T. C. Weigandt, B. Kim, and P. R. Gray, Analysis of timing jitter in
CMOS ring oscillators, in Proc. Int. Symp. Circuit and Systems, vol. 4,
London, U.K., June 1994, pp. 2730.
[10] C. H. Park and B. Kim, A low-noise 900-MHz VCO in 0.6-m CMOS,
IEEE J. Solid-State Circuits, vol. 34, pp. 586591, May 1999.
[11] K. Lim, C. H. Park, and B. Kim, Low noise clock synthesizer design
using optimal bandwidth, in Proc. Int. Symp. Circuit and Systems, Monterey, CA, June 1998, pp. 163166.
[12] J. Lee and B. Kim, A 250 MHz low jitter adaptive bandwidth PLL,
ISSCC Dig. Tech. Papers, pp. 346347, Feb. 1999.
[13] J. McNeil, Jitter in ring oscillators, IEEE J. Solid-State Circuits, vol.
32, pp. 870879, June 1997.

Joonsuk Lee (S99) received the B.S. and M.S. degrees in electrical engineering and computer sciences
from Korea Advanced Institute of Science and Technology (KAIST), Taejon, Korea, in 1995 and 1997,
respectively. Since 1997 he has been working toward
the Ph.D. degree at the same university.
From 1999 to 2000, he was with IBM Microelectronics, Boston, MA, as an Analog and Mixed Signal
Designer involved in a high performance sigmadelta
ADC/DAC project with Motorola, Lowell, MA. His
research interests include PLL/DLL, timing recovery
algorithms, high-speed SDRAM interface, and LAN and mixed-mode signal
processing technique for telecommunication ICs.
Mr. Lee is the Gold Medal winner of the Human-Tech Thesis Prize from Samsung Electronics Co. Ltd. in 1997, the Gold Medal winner of the Chip Design
Contest from LG Semicon Co. Ltd. in 1998, and the Gold Medal winner of the
Integrated Design Center (IDEC) Award in 1998.

(8)
,
. In the initial locking mode,
the AAPLL does the locking operation based on (8). Once
the AAPLL finished the phase and frequency locking, the
. In this case, the forgetting
phase error is far less than
factor and the proportional coefficient can be be replaced by
and
.

Here

REFERENCES
[1] J. Dunning et al., An all-digital phase-locked loop with 50-cycle lock
time suitable for high-performance microprocessors, IEEE J. SolidState Circuits, vol. 30, pp. 412422, Apr. 1995.
[2] B. Kim, D. N. Helman, and P. R. Gray, A 30-MHz hybrid analog/digital
clock recovery circuit in 2-m CMOS, IEEE J. Sold-State Circuits, vol.
25, pp. 13851394, Dec. 1990.
[3] M. Mizuno et al., A 0.18 m CMOS hot-standby phase-locked loop
using a noise immune adaptive-gain voltage-controlled oscillator,
ISSCC Dig. Tech. Papers, pp. 268269, Feb. 1995.
[4] G. Roh, Y. Lee, and B. Kim, An optimum phase-acquisition technique
for charge-pump phase-locked loops, IEEE Trans. Circuit Syst. II, vol.
44, pp. 729740, Sept. 1997.
[5] B. Chun, Y. Lee, and B. Kim, Design of variable loop gain of dual-loop
DPLL, IEEE Trans. Commun., vol. 45, pp. 15201522, Dec. 1997.
[6] P. F. Driessen, DPLL bit synchronizer with rapid acquisition using
adaptive Kalman filtering techniques, IEEE Trans. Commun., vol. 452,
pp. 26732675, Sept. 1994.
[7] B. Kim, Dual-loop DPLL gear-shifting algorithm for fast synchronization, IEEE Trans. Circuits Syst. II, vol. 44, pp. 577586, July 1997.

Beomsup Kim (S87M90SM95) received the


B.S. and M.S. degrees in electronic engineering
from Seoul National University, Seoul, Korea, in
1983 and 1985, respectively, and the Ph.D. degree in
electrical engineering and computer sciences from
the University of California, Berkeley, in 1990.
From 1986 to 1990, he worked as a Graduate Researcher and Graduate Instructor at Department of
Electrical Engineering and Computer Sciences, University of California, Berkeley. From 1990 to 1991,
he was with Chips and Technologies, Inc., San Jose,
CA, where he was involved in designing high speed-signal processing ICs for
disk drive read/write channels. From 1991 to 1993, he was with Philips Research, Palo Alto, CA, where he was conducting research on digital signal processing for video, wireless communication, and disk drive applications. During
1994, he was a Consultant, developing the partial-response maximum likelihood detection scheme of the disk drive read/write channel. In 1994, he became
an Assistant Professor with the Department of Electrical Engineering, Korea
Advanced Institute of Science and Technology (KAIST), Taejon, Korea, and
is currently an Associate Professor. During 1999, he took a sabbatical leave
and stayed at Stanford University, Stanford, CA, and also consulted for Marvell
Semiconductor Inc., San Jose, CA, on the Gigabit Ethernet and wireless LAN
DSP architecture. His research interests include mixed-mode signal processing
IC design for telecommunications, disk drive, local area network, high-speed
analog IC design, and VLSI system design.
Dr. Kim is a corecipient of the Best Paper Award (19901991) for the IEEE
JOURNAL OF SOLID-STATE CIRCUITS, and received the Philips Employee Reward
in 1992. Between June 1993 and June 1995, he served as an Associate Editor for
the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL
SIGNAL PROCESSING.

632

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

A Portable Digital DLL for


High-Speed CMOS Interface Circuits
Bruno W. Garlepp, Kevin S. Donnelly, Associate Member, IEEE, Jun Kim, Pak S. Chau,
Jared L. Zerbe, Charles Huang, Chanh V. Tran, Clemenz L. Portmann, Member, IEEE,
Donald Stark, Yiu-Fai Chan, Member, IEEE, Thomas H. Lee, Member, IEEE, and Mark A. Horowitz

Abstract A digital delay-locked loop (DLL) that achieves


infinite phase range and 40-ps worst case phase resolution at
400 MHz was developed in a 3.3-V, 0.4-m standard CMOS
process. The DLL uses dual delay lines with an end-of-cycle detector, phase blenders, and duty-cycle correcting multiplexers. This
more easily process-portable DLL achieves jitter performance
comparable to a more complex analog DLL when placed into
identical high-speed interface circuits fabricated on the same
test-chip die. At 400 MHz, the digital DLL provides <250 ps
peak-to-peak long-term jitter at 3.3 V and operates down to 1.7 V,
where it dissipates 60 mW. The DLL occupies 0.96 mm2 :
Index TermsDelay circuits, delay-locked loops (DLLs), digital control, digital DLL, phase blending, phase control, phase
synchronization.

I. INTRODUCTION

N RECENT years, there has been a great deal of interest


in delay-locked loops (DLLs) for clock alignment. Both
analog and digital DLLs have been developed [1][6], with
analog loops generally providing better jitter performance
at the expense of greater complexity. This paper describes
a digital DLL that achieves jitter performance comparable
to an analog DLL. Although the digital DLL uses more
area and power than the analog DLL, its greater simplicity,
easier portability, and lower minimum required supply voltage
makes it very attractive in many clock alignment applications.
Additionally, the digital DLL not only operates at lower supply
voltages than the analog DLL but it also demonstrates that
digital DLLs have the potential for good power-consumption
scaling as supply voltage is decreased.
The motivation for the development of this digital DLL
was the need for a clock alignment circuit for use in the
CMOS interface cells [6] of a high-speed memory system
as in [7].1 The memory system operates at 400 MHz, with
data transferred on both edges of the clock, producing an
effective 800-Mb/s/pin transfer rate. This corresponds to a
1.25-ns bit time. With such tight timing requirements, it
becomes imperative to include clock alignment circuits in
Manuscript received September 15, 1998; revised December 23, 1998.
B. W. Garlepp, K. S. Donnelly, J. Kim, P. S. Chau, J. L. Zerbe, C. Huang,
C. V. Tran, C. L. Portmann, D. Stark, and Y.-F. Chan are with Rambus, Inc.,
Mountain View, CA 94040 USA.
T. H. Lee and M. A. Horowitz are with the Center for Integrated Systems,
Stanford University, Stanford, CA 94305 USA.
Publisher Item Identifier S 0018-9200(99)03668-9.
1 Documentation is available at http://www.rambus.com/html/direct_documentation.html.

the interface cells to provide internal on-chip clocks that


are aligned in phase with an external system clock. The
clock alignment circuits must provide a phase resolution
better than 50 ps and produce a worst case long-term jitter
of less than 250 ps peak-to-peak (pp). To facilitate the
use of many different application-specific integrated-circuit
controllers with the memory system, the clock alignment
circuit should be easily portable across multiple processes
without compromising performance.
The clock alignment function can be provided using either
phase-locked loops (PLLs) or DLLs. Because frequency synthesis is not needed in this application, DLLs are preferred for
their unconditional stability, lower phase-error accumulation,
and faster locking time. In previous designs of the interface
cells for this memory system, we have used an analog DLL
with a two-step coarse/fine architecture. A high-level drawing
of this approach is shown in Fig. 1. This analog DLL includes
a quadrature generator, which produces four reference signals
spaced 90 apart in phase to evenly cover the full 360
of phase space. A phase interpolator circuit in the analog
DLL receives these reference signals and selects a phase
adjacent pair that define a phase quadrant for interpolation to
produce an output signal phase-aligned to a reference signal,
RefClk.
Analog DLLs constructed with this approach provide several significant benefits. Because most of the elements in the
signal path can be made from differential analog blocks with
good power-supply rejection ratio (PSRR), the analog DLL
architecture of Fig. 1 can provide very good jitter performance.
Additionally, it can be carefully designed to occupy relatively
little area and consume relatively little current. Furthermore,
the analog DLL can provide very small phase steps when
locked ( 50 ps). Finally, the architecture of Fig. 1 provides
infinite phase range, and one set of quadrature reference
signals can be fed to multiple phase interpolators, allowing
phase alignment to multiple reference signals simultaneously.
However, because of the relatively high analog complexity of
this DLL and its individual elements, the analog DLL of Fig. 1
requires a detailed, process-specific implementation, making it
relatively labor intensive to port across multiple processes.
Although we have traditionally used analog DLLs to provide the clock alignment function in the CMOS interface
cells of the memory system described above, we decided to
consider using a digital DLL. Digital DLLs are characterized
by their use of a digital delay line and are typically made from

00189200/99$10.00 1999 IEEE

GARLEPP et al.: PORTABLE DIGITAL DLL

633

Fig. 1. Block diagram of a two-step, coarse/fine analog DLL architecture.

simple, digital circuit elements. This facilitates their design and


portability across multiple processes. Additionally, because
phase information in a digital DLL is stored as a digital
state, digital DLLs can provide very fast timing recovery after
being placed into a low power mode. However, conventional
digital DLLs provide only moderate phase resolution and jitter
performance [8], [9].
Another benefit of digital DLLs is their ability to readily
operate at lower voltages than analog DLLs. Because analog
DLLs require the use of saturated current sources, they
experience voltage headroom problems as supply voltages
decrease. Digital DLLs, on the other hand, need only enough
voltage to ensure the proper operation of their digital gate
elements. For the same reason, digital DLLs better utilize
the power-saving benefits of digital CMOS voltage scaling
than analog DLLs. The power of an analog DLL is typically
distributed between IV power (where I is power and V is
voltage) from the constant current (differential) stages and
CV f power (where C is capacitance and f is frequency) from
the CMOS (single-ended) stages (if any). The power of digital
DLLs, on the other hand, is determined primarily by CV f
power, which decreases quadratically with supply voltage.
This paper describes a digital DLL [10] used as the clock
alignment circuit in the CMOS interface cells of a high-speed
memory system. This work improves upon the performance of
previous digital DLLs by paralleling the two-step coarse/fine
analog DLL architectures presented in [4], [5], [7], and [11],
allowing the digital DLL to achieve jitter performance comparable to the analog DLLs.
This paper is arranged as follows. Section II describes
delay-generation techniques used in conventional digital
DLLs and describes the improved techniques implemented
in the new DLL. This section also describes infinite phase
generation with the new delay-line scheme. Section III
describes several new circuit techniques used for enhancing
the phase resolution and signal quality in the new digital DLL.
Section IV describes the overall DLL architecture. Section V
discusses our test chip and measured results, with special
attention given to making a direct, side-by-side comparison of
the new digital DLL with an analog DLL placed into identical

CMOS interface cells on the same test-chip die. Section VI


concludes this paper.
The terms phase and delay are used throughout this paper
to describe the DLLs operation. It is helpful to recall that at a
given system frequency, the two quantities are related by the
simple equation
(1)
is phase in degrees,
where
is frequency in hertz.

is delay in seconds, and

II. DIGITAL DELAY CIRCUIT TECHNIQUES


A. Conventional Digital Delay Lines
As mentioned above, the purpose of a DLL in a clock
alignment application is to provide an output clock signal that
is aligned in phase with a reference clock signal of the same
frequency. To do this, the DLL must include a mechanism for
providing a variable delay to an input signal. The DLL then
adjusts this variable delay such that the input signal passes
through the delay mechanism and emerges at the output of the
DLL aligned in phase with the reference signal.
Digital DLLs generally incorporate a tapped digital delay
line as the variable-delay mechanism. The delay line receives
an input clock signal (e.g., a buffered version of the reference
signal) and passes it through a series of delay elements. The
outputs of the delay elements are tapped and buffered to
provide a series of phase-adjacent signals. The DLL then
selects the delay-line tap that provides the signal that produces
an output with a phase that most closely matches the desired
phase.
A conventional delay line suitable for a CMOS digital DLL
is shown in Fig. 2. The delay elements could be implemented
with almost any circuit block, but because the phase resolution
of the delay line is determined by the delay through the delay
elements, delay elements that provide minimal delay are generally preferred. Thus, the delay line of Fig. 2 uses inverters,
since they provide the shortest delay of any CMOS digital gate.
Because of the inverting characteristic of all standard CMOS
gates, the delay line is tapped only at every other inverter

634

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

Fig. 2. Conventional digital delay line with inverter delay elements.

Fig. 3. Complementary delay line with inverter delay elements for improved phase resolution.

output to ensure that each successive tap provides a signal


that is adjacent in phase to the signals at its adjacent taps.
Although conventional delay lines are attractive for their
simplicity, DLLs designed around such conventional delay
lines suffer from several significant limitations. First, the delay
line provides fairly coarse phase resolution. For example, the
delay line in Fig. 2 provides a minimum phase step corresponding to two inverter delays. Such coarse phase resolution
is not fine enough for our clock alignment application. Second,
conventional delay lines deliver only a finite phase range.
Typically, in order to cover at least one full cycle of phase, the
delay-line length and element delays are adjusted to provide
at least 360 of phase under the fastest process, voltage,
and temperature (PVT) conditions and minimum operating
More often, however, the delay
frequency
line is designed with as much as 720 (i.e., two cycles)
of phase under these conditions. This requires the use of a
long delay line, occupying a large silicon area and dissipating
additional power as the input signal propagates through the
many delay elements. Additionally, because inverters offer
poor PSRR, voltage supply noise-induced jitter can accumulate
as the signal propagates down the delay line. This causes
the signals available from the later taps in the delay line
to be more jitter prone than the signals from the earlier
taps. Last, even with an extended delay line, the DLL can
nonetheless run out of phase range and lose lock in a system
with slowing drifting phase (e.g., spread-spectrum clocking).
These limitations prohibited the use of a conventional delay
line in our DLL design.

B. Delay-Line Improvements
To overcome some of these limitations, we developed a
complementary delay line as shown in Fig. 3 for our DLL.
In this architecture, two parallel delay lines with weak cross
coupling are driven by complementary input signals ClkIn and

ClkInb. Because of the use of complementary inputs, the two


delay lines are tapped after every inverter to provide phaseadjacent signals separated by only one inverter delay, thereby
improving the phase resolution by a factor of two. An example
of how this delay-line scheme provides single inverter delay
resolution is shown by the shaded paths in Fig. 3. The signal
that emerges from Tap 2 has passed through three inverter
delays, while the signal that emerges from Tap 3 has passed
through four inverter delays. However, ClkInb is exactly 180
out of phase with ClkIn, providing the additional inversion
required to ensure that the signals emerging from Taps 2 and
3 are indeed separated in phase by exactly one inverter delay.
This complementary delay-line architecture also allows the
delay lines to be made shorter. The true taps from the delay
line can provide the first 180 of phase, while the complement
taps can provide the second 180 of phase. Thus, each of
the two delay lines can be tuned for only 180 of phase
Shorter delay
under the fastest PVT conditions and
lines provide the additional benefits of reduced maximum
jitter accumulation, smaller silicon area, and lower power
consumption. The problem that this design creates is a need to
determine when to switch from the true taps to the complement
taps and vice versa to ensure full and even coverage of the
entire 360 phase plane. This is particularly important because
the number of delay elements (and output taps) needed to cover
180 changes with PVT conditions and operating frequency.
C. Infinite Phase Generation
To solve the problem of determining when to switch between the true and complement taps of the complementary
delay line, we developed an end-of-cycle (EOC) detector, as
shown in Fig. 4, for use with the complementary delay line. An
EOC detector is essentially a bank of data flip-flops arranged
as a time-to-digital converter for measuring the delay through
the delay line. The EOC detector produces a thermometer code

GARLEPP et al.: PORTABLE DIGITAL DLL

635

Fig. 4. EOC detector circuit (180 ).

In other words, to travel counterclockwise around the phase


plane, the DLL would successively select Taps 14, then Taps
1b4b, then Taps 14, etc., to provide infinite phase range.
In this manner, all phase steps are equivalent to at most one
inverter delay (i.e., 50 ), except for the Tap 4 to Tap 1b and
the Tap 4b to Tap 1 transitions, which are less (30 ).
III. RESOLUTION-ENHANCING CIRCUIT TECHNIQUES
A. Phase Blending

Fig. 5. Phasor diagram with phasors of signals from the taps of a complementary delay line with one inverter delay
50 :

indicating the first 180 of delay in the delay lines. The first
state transition in the EOC code indicates the first true tap
from the delay line that provides a signal with phase that
lags the phase of the signal from Tap 1 by more than 180
With this information, the DLL logic knows when to switch
between the true and complement taps of the delay line to
ensure full coverage of all 360 of phase space, with phase
steps of at most one inverter delay. Use of the EOC code also
prevents negative phase steps in the phase-transfer function as
taps are successively selected from the delay line. This allows
the complementary delay lines to provide infinite, monotonic
phase range for the DLL. The clocking signal for the EOC
detector, SampClk, is synchronized to the signal from Tap 1
by a replica timing network (not shown).
To illustrate the principle of infinite phase generation using
the EOC code with this delay-line scheme, refer to Fig. 5,
which shows a phasor diagram of the signals from the first
five true and complement taps of a complementary delay line
like the one shown in Fig. 3. The figure assumes that the
PVT conditions and operating frequency are such that the
propagation delay of each inverter stage is equal to 50 of
phase. In the figure, the solid lines correspond to signals from
the true taps, while dashed lines correspond to signals from
the complement taps. Because Tap 5 delivers a signal that is
delayed by 200 from the signal at Tap 1, the EOC detectors
thermometer code would indicate that Tap 5 is the first true
tap to provide a signal with phase beyond 180 relative to the
signal from Tap 1. With this information, the DLL knows to
switch between the true and complement taps after four stages.

Although the delay-line improvements discussed above reduced the required power and area of the delay line, improved
its jitter accumulation performance, enabled infinite phase
range, and improved the available phase resolution by a factor
of two, this phase resolution was still not good enough to
meet the requirements of our memory system. In the 0.4- m
process we used, the propagation delay of one inverter over all
anticipated PVT conditions varied from 100 to 300 ps. This
is much larger than the worst case phase step specification of
50 ps. Therefore, to ensure compliance with this specification,
the DLLs phase resolution needed to be improved by at least
six times over what the delay line provided.
To solve this problem, we used inverter phase blending. A simple, single-stage phase-blender circuit is shown
in Fig. 6(a). This circuit receives two phase-adjacent input
and
, which are separated in phase by one
signals,
inverter delay. The phase blender directly passes these two
and
signals with a simple delay to produce output signals
However, it also uses a pair of phase-blending inverters to
interpolate between these two input signals to produce a third
, having a phase between that of
and
output signal,
This effectively doubles the available phase resolution.
However, it is not sufficient to use equal-sized inverters
for the phase blending. Fig. 6(b) illustrates a simple model
[12] used for determining the ideal relative sizes of the two
lies
phase-blending inverters to ensure that the phase of
and
The model approximates
directly between that of
the two inverters with two simple switched current sources
sharing a common resistancecapacitance (RC) load. For two
the model
rising edge input signals separated in time by
yields the equation

(2)

636

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

(a)

(b)

(c)

(d)

(e)

Fig. 6. Phase blending for phase-resolution improvement. (a) Single-stage phase-blender circuit, (b) simple model of phase-blending inverters, (c) plot of
WA =(WA + WB ) = 0:50, (d) phase-blender output signal edges for w = 0:50, and (e) phase-blender
signal voltages in the simple model for w
output signal edges for w = 0:60:

where is the total resistive load, is the output capacitance,


is the total pulldown current of the two phase-blending
is the unit step function, and
is the phaseinverters,
blending inverter relative size ratio [refer to Fig. 6(a), where
is the ratio of the device widths in
inverter to the total device widths in both inverters and
]. Equation (2) is the sum of two decaying exponential terms,
and Fig. 6(c) shows a plot of the resulting waveform according
Because the
to this equation for the case where
relative to
second exponential term is delayed in time by
the first, it only begins to affect the slope of the decay after
this delay has elapsed. Therefore, without explicitly solving
and
it is not
the equation for each case of
will cross
obvious when
For input signals separated in phase by one inverter delay
), the model specifies that in order to ensure
(i.e.,
lies directly in between that of
that the phase of
and
the phase-blending inverters must be sized in a
ratio, such that the leading phase is
coupled to an inverter that is bigger than the one that receives
the lagging phase. This ratio was also confirmed empirically
with simulations. The effect of the relative sizing of the phaseblending inverters is illustrated in Fig. 6(d) and (e), which
and
shows the resulting output signal edges for
, respectively. Clearly, the phase of output signal
is closer to that of
than to that of
when the
Although
phase-blending inverter size ratio is
inverter sizing ensures good, evenly
asymmetrical
spaced edge placement of the three output signals, it requires
lead
Reversing the phase of these two input
that
since the
signals would result in a severely misplaced
effective sizing ratio would then be

Another design constraint of the phase-blender circuit is that


all paths through the circuit must provide precisely the same
loading and delay to ensure that the phase relationship between
and
is maintained by
and
The phase-blender idea can be extended to multiple cascaded stages for further phase-resolution improvement, with
each additional stage improving the resolution by a factor of
two. Fig. 7 shows a two-stage cascaded phase-blender circuit
that provides a 4x improvement in phase resolution from input
to output. Although it is theoretically possible to increase phase
resolution indefinitely by adding more and more phase-blender
stages, there is a practical limit. The number of inverters in
each signal path increases by two with each additional phaseblending stage, making the circuit increasingly susceptible
to voltage supply noise-induced jitter due to the additional
delay in the signal path. Therefore, it is prudent to increase
the number of blending stages to improve phase resolution
only until the output phase step size from the phase blender
is approximately equivalent to the anticipated voltage supply
noise-induced jitter.
There are several design limitations that must be considered
when designing a cascaded phase blender. First, the importance of proper (asymmetrical) sizing of the phase-blending
inverters grows with the number of cascaded blending stages
because edge misplacement has a compounding effect as the
signals travel through the multiple stages. Additionally, close
attention must be paid to ensuring equal loading for equal
delay through all paths, requiring the use of dummy devices
on otherwise unbalanced paths. Finally, like a single-stage
phase blender, a cascaded phase blender also requires the
to lead that of
to ensure even output phase
phase of
spacing.

GARLEPP et al.: PORTABLE DIGITAL DLL

637

Fig. 7. Two-stage, cascaded phase-blender circuit for 4x phase-resolution improvement.

Fig. 8. Three-stage, symmetrical phase-blender circuit.

To overcome these design limitations of the cascaded phase


blender, we developed a symmetrical phase blender. A block
diagram of a three-stage symmetrical phase blender is shown
in Fig. 8. This circuit is essentially two parallel cascaded
phase-blender circuits, sharing some common paths. When
leads
the outputs
provide
leads
the outequal output phase spacing. When
provide equal output phase
puts
spacing. Therefore, the circuit provides phase blending with an
8x improvement in phase resolution and equally spaced output
signals regardless of which input signal leads in phase.
Additionally, the symmetrical blender allows for seamless
input switching for continuous phase blending over multiple
leads
in
input delays. For example, assume that

phase. Beginning with output


outputs
can be successively selected to evenly span
and
Once
is selected,
the phase range between
can be changed to another signal that lags
This
beswitching is possible without affecting the signal
has no dependence on or coupling from
Then
cause
can be successively seoutputs
and
lected to evenly span the phase range between
Once
is selected,
can be changed to yet another
Again, this is possible without any change
signal that lags
because
has no dependence on or
in the signal
This process can continue indefinitely.
coupling from
Also, because all paths through the symmetrical phase blender
are inherently balanced, no dummy devices are needed.

638

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

(a)

(b)
Fig. 9. (a) A 16 : 1 duty-cycle correcting multiplexer circuit. (b) Duty-cycle correction control circuit.

B. Signal Selection and Duty-Cycle Correction


Since the digital DLL was to be placed into a memory
system that exchanges data on both edges of the clock, good
duty cycle (i.e., close to 50%) is required to ensure that the
data exchanged on either edge of the clock have equal bit
times. Duty-cycle distortion is usually addressed in PLLs by
simply running the PLLs voltage-controlled oscillator (VCO)
at twice the system frequency and using a postdivider triggered
on one edge of the VCO output to produce the output clock
from the PLL [13][15]. This ensures good, 50% duty cycle. In
a DLL, however, no frequency multiplication is possible. The
duty cycle of the output signal must be directly corrected to
50%, for example, by using a duty-cycle correcting amplifier
in the signal path as in Fig. 1 and in [4].
Although duty-cycle correction can be addressed by placing
a duty-cycle corrector at the output of the DLL, this approach
has several limitations. First, since duty cycle is corrected only
at the output of the DLL, internal DLL signals may have
poor duty cycle. It is good practice, however, to maintain
50% duty cycle throughout the signal path to maximize signal
propagation as frequency is increased. Second, performing all
the duty-cycle correction in one stage at the output of the
DLL places a great deal of strain on the duty-cycle correcting
circuit; it must have a large duty-cycle correction range to
compensate for all the duty-cycle distortion that can accumulate in the signal path. Finally, adding a duty-cycle corrector
directly into the signal path increases signal path delay, and
thus susceptibility to voltage supply noise-induced jitter.
To address the issue of duty cycle, we developed the
idea of duty-cycle correcting multiplexers. Since multiplexers
would be needed in our DLL regardless, by adding duty-

cycle correcting functionality to the multiplexing circuitry, we


implemented duty-cycle correction while requiring minimal
additional power, area, and delay.
A 16 : 1 duty-cycle correcting multiplexer is shown in
Fig. 9(a) with a corresponding control circuit in Fig. 9(b). To
facilitate understanding of this circuits operation, consider an
is selected and has dutyexample. Assume that signal
has a high
cycle distortion such that output signal
is sensed by a duty-cycle
duty cycle. Assume also that
error detector, which produces a differential output error signal
proportional to the difference in duty cycle beand the ideal 50%. Thus, in our example,
tween
will be greater than
causing more current to be steered
through the right branch of the control signal in Fig. 9(b) than
through the left side. This in turn increases the strength of
and
compared to
and
in the duty-cycle
correcting multiplexer of Fig. 9(a). These transistors alter the
to
driving
duty cycle of the signal as it passes from
to the ideal 50% duty cycle. The use of both PMOS and
NMOS devices to perform the duty-cycle correction ensures
a symmetrical duty-cycle correction range. Furthermore, because duty-cycle correction has been distributed through two
stages, the requirements on each individual duty-cycle correcting stage are reduced. By combining both necessary functions
of signal selection and duty-cycle correction, this circuit
minimizes signal path delay, jitter accumulation, circuit area,
and power compared to performing both functions separately.
IV. DLL ARCHITECTURE
Fig. 10 is a block diagram of the entire digital DLL, with
shading indicating the circuit blocks that were described in

GARLEPP et al.: PORTABLE DIGITAL DLL

Fig. 10.

639

Complete block diagram of the new digital DLL.

greater detail above. The DLL receives an input clock ExtClk


and passes it through a clock amplifier and splitter to provide
the two complementary input signals (ClkIn and ClkInb) to a
16-stage, 32-tap complementary delay line with EOC detector.
The delay line provides 32 signals at its output taps, which then
feed into two 32 : 1 duty-cycle correcting multiplexers. Each
multiplexer selects one of a pair of phase-adjacent signals
from the delay line. The two selected signals then pass to
a three-stage, 2 : 16 symmetrical phase-blender circuit, which
improves the phase resolution by a factor of eight. A final 16 : 1
duty-cycle correcting multiplexer selects one of the phaseblender output signals and passes it through a clock tree to
provide the DLLs output signal ClkOut. The digital DLL also
includes two independent duty-cycle correction loops as shown
in the figure. By using two separated duty-cycle correcting
loops, duty-cycle correction is distributed throughout the signal
path. This ensures a good duty cycle throughout the signal path
and reduces the duty-cycle correcting requirements of any one
stage.
The DLL uses bang-bang-type, all-digital feedback to lock
the phase of its output signal ClkOut to that of a reference
signal RefClk. A phase detector compares the phase of ClkOut
to RefClk and produces a binary error signal, which passes
through an optional digital filter to a control logic circuit. The
digital filter is a simple majority detector, which has no effect
when the loop is acquiring lock but reduces dithering once
lock is acquired. The control logic is composed of simple
combinational logic and counters that drive the multiplexers
to select the two phase-adjacent coarse phase signals from the
delay line and the fine phase signal from the phase blender
that minimize the phase error between ClkOut and RefClk.
Because the phase information is stored in this DLL as a
digital state, the DLL can quickly recover from low-power
modes, requiring only enough time for the signals to propagate

(a)

(b)

Fig. 11. Test-chip micrograph showing on the left side (a) the analog DLL
of [6] and on the right side (b) the new digital DLL integrated into identical
interface cells.

through the signal path of the circuit from ExtClk to ClkOut


to provide a phase-locked output signal.
It is important to recognize the role of the EOC detector and
code in this architecture. Because the delay line and blender
are uncontrolled, open-loop circuits, the architecture relies on
the control circuits use of the EOC code to ensure proper
coarse phase selection, small maximum phase step size, and
phase transfer function monotonicity. The EOC code enables
the control logic to determine when to switch between the true
and complement taps of the delay line to ensure that phaseadjacent taps are always selected by the coarse multiplexers
for the phase blender. The EOC code also enables the control
logic to determine which set of blender taps provides evenly
spaced output signals.

640

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

(a)
Fig. 12.

(b)

Measured transmit eye diagrams at 3.3 V and 400 MHz of the high-speed interface cells with (a) the analog DLL of [6] and (b) the new digital DLL.

V. MEASURED PERFORMANCE
A. Test Chip
Both the digital DLL presented here and an implementation
of the analog DLL of Donnelly et al. [6] were integrated into
identical high-speed CMOS interface cells on opposite sides
of a single test chip. A micrograph of this test chip is shown in
Fig. 11. The test chip I/O was laid out symmetrically so that
either interface cell could be tested on the same hardware by
simply removing the test chip from the test socket, rotating
it 180 and reinserting it into the socket. This allowed a
true side-by-side comparison of the two DLLs operating in a
system. The test-chip circuits were fabricated using a standard
0.4- m, 3.3-V CMOS process with 0.65-V threshold voltages.
B. Test Results
Unless indicated otherwise, all test results described in this
section were measured with the analog and digital DLLs
operating in their respective high-speed interface cells at 3.3 V
and 400 MHz (800 Mb/s/pin) using the same test vectors.
Additionally, the test chip included noise-generator circuits,
which produced digital switching noise during the testing of
both interfaces.
Fig. 12(a) and (b) shows eye diagrams of the two interfaces
with the analog and digital DLLs, respectively. The diagrams
indicate the output timing performance of the interface cells
in the test system. Although the interface with the analog
DLL provided slightly better timing performance, 320 ps pp
versus 380 ps pp for the interface with the digital DLL, the
performances of both interfaces (and therefore, both DLLs)
were comparable. This is surprisingly good considering the
extensive use of poor PSRR elements, such as inverters, in

the signal path of the digital DLL. (Note: I/O circuit dutycycle distortion produced the unequal eyes in both diagrams.
This is unrelated to the DLLs.)
Fig. 13(a) and (b) shows receive shmoo diagrams for the
two interfaces with the analog and digital DLLs, respectively.
The diagrams indicate the CMOS interfaces valid timing windows for receiving data. On the diagrams, the -axis is supply
4.0 V) while the -axis indicates input
voltage (2.5 V
Mb/s
ns).
data positioning along a bit period (
The normal data position is in the center of the bit period. A
black dot in the diagram indicates incorrectly received data for
Ideally, the window
that combination of bit position and
should be entirely white, but realistically, it is limited by jitter
from the DLL and other sources. Therefore, this test measures
the amount of tolerable skew on the input timing over a range
of supply voltages. Although the interface with the analog DLL
delivers better timing performance than the interface with the
digital DLL (1.02 versus 0.92 ns), both meet the component
specification of 0.85 ns.
Fig. 14 is a circle plot of the measured phase of the DLLs
output signal ClkOut, illustrating the DLLs ability to provide
infinite phase range. The -axis indicates delay [or phase, as in
(1)] of the ClkOut signal relative to a fixed 400-MHz signal.
The -axis indicates cycle count. These data were measured by
probing the on-chip DLL output signal (ClkOut) and forcing
the DLLs phase-detector output low. This caused the DLLs
output phase to continually advance over time. The term circle
plot is used because this diagram is equivalent to sweeping a
phasor that represents the phase of ClkOut around the phase
plane, thereby drawing a circle in the phase plane. Because
the phase of ClkOut is measured relative to a fixed 400-MHz
ns
signal, the plotted delay appears modulo 2.5 ns, where

GARLEPP et al.: PORTABLE DIGITAL DLL

641

(a)

(b)

Fig. 13. Measured shmoo diagrams showing the 400-MHz receive timing windows of the high-speed interface cells with (a) the analog DLL of [6]
and (b) the new digital DLL.

Fig. 14.

Measured circle plot illustrating the infinite phase transfer characteristic of the digital DLL.

at 400 MHz. The absolute value of delay (i.e., from 3.4


to 5.9 ns) is irrelevant since it includes some test-system setup
time. The data were measured and plotted using a time-interval
analyzer.
The circle plot illustrates the DLLs phase transfer function,
showing its reasonably good linearity, monotonicity, and lack
of discontinuities. The small bumps in the transfer function
indicate a change in coarse reference phase selected from

the delay line. The slope of the transfer function depends on


PVT conditions and system frequency, since these conditions
determine how many delay-line taps are required to provide
180 of phase. In this case, nine taps were required, resulting
in an average phase step size of 20 ps or 2.9
Table I presents a summary of many of the measured and
simulated results of the analog and digital DLLs operating in
their respective CMOS interfaces. Although the analog DLL

642

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

(a)
Fig. 15.

(b)

Measured DLL power consumption (a) as a function frequency for

TABLE I
ANALOG AND DIGITAL DLL PERFORMANCE SUMMARY AT 3.3 V

AND

400 MHz

uses less power and area, and provides better timing performance (smaller long-term jitter) and phase resolution (smaller
maximum phase step), both DLLs enable the interface cells to
meet the component requirements when operating in the test
system. Additionally, the digital DLL has a higher maximum
operating frequency, works at lower supply voltages, and
requires much less effort to port to other processes (one versus
four man-months).
Fig. 15(a) and (b) shows plots of measured DLL power
V and measured DLL power
versus frequency at
MHz, respectively. Although
versus voltage supply at
both plots show that the digital DLL dissipated more power
than the analog DLL for all measured conditions, the plots illustrate the different characteristics of the power consumed by
the two DLLs. As mentioned earlier, the power of both DLLs
is distributed between IV power in the constant-current stages
and CV f power in the CMOS stages. The curves in Fig. 15(a)
show that the digital DLLs power dissipation has a greater
dependence on frequency than does the analog DLLs power.
The curves in Fig. 15(b) show that the digital DLLs power
dissipation has a predominantly square-law dependence on
supply voltage, whereas the analog DLLs power dissipation
has a mixed square-law and linear dependence. These trends
confirm that the power of the analog DLL has a relatively
higher IV term, whereas the power of the digital DLL has a

VDD = 3:3 V and (b) as a function supply voltage for f = 400 MHz.
relatively higher CV f term. This indicates that digital DLLs
have the potential for providing better power scaling than
analog DLLs as supply voltages decrease in the future.
Finally, we have shown in Table I and in Fig. 15(b) that
the digital DLL operates at lower supply voltages than the
analog DLL. Although the operation of the digital DLL was
limited to 1.7 V, this limitation was due to our use of several
analog elements in the digital DLL (i.e., it was a mostly digital
DLL). The digital DLL used an analog clock amplifier, two
analog duty-cycle error detectors (see Fig. 10), and an analog
quadrature phase detector (in a second loop, not shown). Using
an analog design for these circuit blocks in the digital DLL
was faster to implement without preventing evaluation of the
key digital blocks in the DLL, but their use determined the
minimum supply voltage of the digital DLL.
VI. CONCLUSION
We have described the architecture of a portable digital
DLL and demonstrated that it provides jitter performance
comparable to an analog DLL when fabricated in the same
3.3-V, 0.4- m standard CMOS process. Several circuits were
developed to enable the DLL to provide very fine phase
resolution, infinite phase range, and good duty-cycle performance throughout the signal path. Despite its relatively simple
architecture, the digital DLL meets all system specifications,
and it operates down to lower supply voltages than its analog
counterpart. Utilizing essentially only simple digital CMOS
gates, the DLL can be ported to new processes in minimal time. For these reasons, this digital DLL provides an
alternative to analog DLLs for clock alignment applications.
ACKNOWLEDGMENT
The authors thank J. McBride and P. Gordon for layout
support and S. Sidiropoulos for helpful insights.
REFERENCES
[1] A. Efendovich, Y. Afek, C. Sella, and Z. Bikowsky, Multifrequency
zero-jitter delay-locked loop, IEEE J. Solid-State Circuits, vol. 29, pp.
6770, Jan. 1994.

GARLEPP et al.: PORTABLE DIGITAL DLL

[2] J.-M. Han, J. Lee, S. Yoon, S. Jeong, C. Park, I. Cho, S. Lee, and D. Seo,
Skew minimization techniques for 256 Mb synchronous DRAM and
beyond, in VLSI Circuits Dig. Tech. Papers, June 1996, pp. 192193.
[3] A. Hatakeyama, H. Mochizuki, T. Aikawa, M. Takita, Y. Ishii, H.
Tsuboi, S. Fujioka, S. Yamaguchi, M. Koga, Y. Serizawa, K. Nishimura,
K. Kawabata, Y. Okajima, M. Kawano, H. Kojima, K. Mizutani, T.
Anezaki, M. Hasegawa, and M. Taguchi, A 256 Mb SDRAM using
register-controlled digital DLL, in ISSCC 1997 Dig. Tech. Papers, Feb.
1997, pp. 7273.
[4] T. Lee, K. Donnelly, J. Ho, J. Zerbe, M. Johnson, and T. Ishikawa, A
2.5 V CMOS delay-locked loop for 18 Mbit, 500 megabyte/s DRAM,
IEEE J. Solid-State Circuits, vol. 29, pp. 14911496, Dec. 1994.
[5] S. Sidiropoulos and M. Horowitz, A semidigital dual delay-locked
loop, IEEE J. Solid-State Circuits, vol. 32, pp. 16831692, Nov. 1997.
[6] K. Donnelly, Y. Chan, J. Ho, C. Tran, S. Patel, B. Lau, J. Kim, P.
Chau, C. Huang, J. Wei, L. Yu, R. Tarver, R. Kulkarni, D. Stark, and M.
Johnson, A 660MB/s interface megacell portable circuit in 0.3 m0.7
m CMOS ASIC, IEEE J. Solid-State Circuits, vol. 31, pp. 19952003,
Dec. 1996.
[7] N. Kushiyama, S. Ohshima, D. Stark, H. Noji, K. Sakurai, S. Takase,
T. Furuyama, R. Barth, A. Chan, J. Dillon, J. Gasbarro, M. Griffin,
M. Horowitz, T. Lee, and V. Lee, A 500-Megabyte/s data-rate 4.5M
DRAM, IEEE J. Solid-State Circuits, vol. 28, pp. 490508, Apr. 1993.
[8] M. Hasegawa, M. Nakamura, S. Narui, S. Ohkuma, Y. Kawase, H.
Endoh, S. Miyatake, T. Akiba, K. Kawakita, M. Yoshida, S. Yamada, T.
Sekigguchi, I. Asano, Y. Tadaki, R. Nagai, S. Miyaoka, K. Kajigaya, M.
Horiguchi, and Y. Nakagome, A 256 Mb SDRAM with subthreshold
leakage current suppression, in ISSCC 1998 Dig. Tech. Papers, Feb.
1998, pp. 8081.
[9] T. Saeki, Y. Nakaoka, M. Fujita, A. Tanaka, K. Nagata, K. Sakakibara,
T. Matano, Y. Hoshino, K. Miyano, S. Isa, E. Kakehashi, J. Drynan,
M. Komuro, T. Fukase, H. Iwasaki, J. Sekine, M. Igeta, N. Nakanishi,
T. Itani, K. Yoshida, H. Yoshino, S. Hashimoto, T. Yoshii, M. Ichinose,
T. Imura, M. Uziie, K. Koyama, Y. Fukuzo, and T. Okuda, A 2.5
ns clock access 250 MHz 256 Mb SDRAM with synchronous mirror
delay, ISSCC 1996 Dig. Tech. Papers, Feb. 1996, pp. 374375.
[10] B. Garlepp, K. Donnelly, J. Kim, P. Chau, J. Zerbe, C. Huang, C. Tran,
C. Portmann, D. Stark, Y. Chan, T. Lee, and M. Horowitz, A portable
digital DLL architecture for CMOS interface circuits, in VLSI Circuits
Dig. Tech. Papers, June 1998, pp. 214215.
[11] M. Griffin, J. Zerbe, A. Chan, Y. Jun, Y. Tanaka, W. Richardson, G.
Tsang, M. Ching, C. Portmann, Y. Li, B. Stonecypher, L. Lai, K. Lee,
V. Lee, D. Stark, H. Modarres, P. Batra, J. Louis-Chandran, J. Privitera,
T. Thrush, B. Nickell, J. Yang, V. Hennon, and R. Sauve, A process
independent 800 MB/s DRAM bytewide interface featuring command
interleaving and concurrent memory operation, in ISSCC 1998 Dig.
Tech. Papers, Feb. 1998, pp. 156157.
[12] S. Sidiropoulos, High-performance interchip signalling, Ph.D. dissertation, Computer Systems Laboratory, Stanford University, Stanford, CA, Apr. 1998. Available as Tech. Rep. CSL-TR-98-760 from
http://elib.stanford.edu/.
[13] I. Young, M. Mar, and B. Bhushan, A 0.35 m CMOS 3-880 MHz
PLL N/2 multiplier and distribution network with low jitter for microprocessors, in ISSCC 1997 Dig. Tech. Papers, Feb. 1997, pp. 330331.
[14] V. von Kaenel, D. Aebischer, C. Piguet, and E. Dijkstra, A 320 MHz,
1.5 mW at 1.35 V CMOS PLL for microprocessor clock generation,
in ISSCC 1996 Dig. Tech. Papers, Feb. 1996, pp. 132133.
[15] V. von Kaenel, D. Aebischer, R. van Dongen, and C. Piguet, A 600
MHz CMOS PLL microprocessor clock generator with a 1.2 GHz
VCO, in ISSCC 1998 Dig. Tech. Papers, Feb. 1998, pp. 396397.

643

Kevin S. Donnelly (A93) was born in Los Angeles,


CA, in 1961. He received the B.S. degree in electrical engineering and computer science from the
University of California, Berkeley, in 1985 and the
M.S. degree in electrical engineering from San Jose
State University, San Jose, CA, in 1992.
He was with Memorex, Sipex, and National Semiconductor, specializing in bipolar and BiCMOS
analog circuits for disk-drive read/write and servo
channels. In 1992, he joined Rambus, Inc., Mountain View, CA, where he has designed high-speed
CMOS PLL circuits for clock recovery and data synchronization, and highspeed I/O circuits. He currently manages a group developing I/O circuits
and PLLs. His interests include PLLs and DLLs, I/O circuits, and data
converters. He is a Member of the ISSCC Digital Subcommittee. He has
received several circuit design patents.
Mr. Donnelly is a coauthor of the paper that won the Best Paper Award
at the 1994 ISSCC.

Jun Kim was born in Tokyo, Japan, on November


14, 1966. He received the B.S.E.E. degree from the
University of California, Berkeley, in 1989.
From 1989 to 1991, he was with Vitelic, Inc.,
where he worked on SRAM and DRAM development. Between 1991 and 1994, he was with Sun
Microsystems, where he was involved in microprocessor and digital circuit design. Since 1994, he
has been with Rambus, Inc., Mountain View, CA,
as a Designer of high-speed CMOS I/O and DLL
circuits.

Pak S. Chau was born in Hong Kong in 1966.


He received the B.S. degree in computer system
engineering from the University of Massachusetts,
Amherst, in 1989 and the M.S. degree in electrical engineering from the University of California,
Davis, in 1991.
He was with National Semiconductor and Chrontel, Inc., where he worked as an Analog Circuit
Designer. In 1994, he joined Rambus, Inc., Mountain View, CA, where he has engaged in designing
high-speed I/O and DLL circuits.

Jared L. Zerbe was born in New York, NY, in


1965. He received the B.S. degree in electrical engineering from Stanford University, Stanford, CA,
in 1987.
He joined VLSI Technology, Inc., in 1987, where
he worked on semicustom ASIC design. In 1989, he
joined MIPS Computer Systems, where he designed
high-performance floating-point blocks. Since 1992,
he has been with Rambus Inc., Mountain View, CA,
where he has specialized in the design of highspeed I/O and PLL/DLL clock recovery and data
synchronization circuits.

Bruno W. Garlepp was born in Bahia, Brazil, on


October 29, 1970. He received the B.S.E.E. degree
from the University of California, Los Angeles,
in 1993 and the M.S.E.E. degree from Stanford
University, Stanford, CA, in 1995.
In 1993, he joined the Hughes Aircraft Advanced
Circuits Technology Center, Torrance, CA. There,
he designed high-precision analog integrated circuits
for A/D applications, as well as CMOS, bipolar,
and SiGe RF circuits for wide-band communications applications. In 1996, he joined Rambus, Inc.,
Mountain View, CA, where he designs and develops high-speed CMOS
clocking and I/O circuits for synchronous chip-to-chip communication.

Charles Huang received the B.S. degree in electrical engineering from the University of Fuzhou,
China, in 1982 and the M.S. degree in electrical
engineering from the University of Arkansas, Fayetteville, in 1990.
He was with ULSI and SGI, working in the area
of PLL and cache circuit design. He joined Rambus,
Inc., Mountain View, CA, in 1994, where he has
being engaged in high-speed CMOS DLL and I/O
circuit design.

644

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 5, MAY 1999

Chanh V. Tran was born in Vietnam in 1964. He


received the B.S. degree in electrical engineering
and computer science form the University of California, Berkeley, in 1989.
From 1989 to 1992, he was with National Semiconductor Corp., Santa Clara, CA, where he worked
on CMOS mixed-signal IC design in the Data
Acquisition Group. In 1992, he joined Rambus Inc.,
Mountain View, CA, where he has been involved in
DLL and high-speed I/O design.

Clemenz L. Portmann (S92M95) received the


B.S.E.E. degree from the University of Washington,
Seattle, in 1986, the M.S.E.E. degree from the
University of Hawaii at Manoa, Honolulu, in 1988,
and the Ph.D. degree in electrical engineering from
Stanford University, Stanford, CA, in 1995.
From 1988 to 1989, he was a Visiting Researcher
at Nagoya University, Nagoya, Japan, and the Toyohashi University of Technology, Toyohashi, Japan,
under the Monbusho (Ministry of Education) scholarship program. From 1989 to 1990, he was a
Design Engineer for VLSI Technology, Inc., San Jose, CA, where he designed
standard cell libraries and SRAMs for ASIC designs. In 1995, he joined
Rambus, Inc., Mountain View, CA, where he is engaged in the design of
high-speed I/O circuits and DLLs for DRAM interfaces.

Yiu-Fai Chan (S76M78) received the B.S. and


M.S. degrees in electrical engineering and computer
science (with highest honors) from the University
of California (UC), Berkeley, in 1972 and 1973,
respectively.
He joined Rambus, Inc., Mountain View, CA, in
1992, where he is Director of Engineering, responsible for the development, application engineering,
and customer support of high-speed mixed-signal
circuits, device packaging, signal integrity, and system engineering. Prior to that, he was with Tera
Microsystems in charge of developing chips for workstations based on the
Sparc architecture. He was with Altera Corp. from 1983 to 1990, where he
led a team of engineers to develop the industrys first CMOS programmable
logic devices. From 1976 to 1983, he held various technical and management
positions at Intersil, Inc. (later a division of General Electric), where he was
engaged in the development of various CMOS memories, microprocessors,
and peripheral devices. It was there that he developed the first EPROM devices
in CMOS technology. From 1974 to 1976, he designed calculator and TV
game integrated circuits at National Semiconductor. He has received several
patents in circuits and systems technologies.
Mr. Chan is a member of Tau Beta Pi, Phi Beta Kappa, and Eta Kappa
Nu. He received the University Science Fellowship from UC Berkeley and
conducted research on solid-state devices and microwave acoustics. He has
published in various IEEE technical publications and presented papers at IEEE
technical conferences.

Thomas H. Lee (S87M87), for a photograph and biography, see this issue,
p. 585.
Donald Stark received the B.S. degree from the
Massachusetts Institute of Technology, Cambridge,
in 1985 and the M.S. and Ph.D. degrees from
Stanford University, Stanford, CA, in 1987 and
1991, respectively, all in electrical engineering.
His research interests at Stanford included circuit
design and CAD tools for analysis of voltage and
current distributions in VLSI circuits. From 1987
to 1991, he was also a Member of the Western
Research Laboratory, Digital Equipment Corp., Palo
Alto, CA, working on CAD development and ECL
circuit design. From 1991 to 1993, he was with the Semiconductor Device
Engineering Laboratory, Toshiba Corp., Kawasaki, Japan, working on DRAM
design. In 1993, he joined Rambus, Inc., Mountain View, CA, where he
currently works on DRAM, high-speed I/O design, and CAD.

Mark A. Horowitz, for a photograph and biography, see p. 528 of the April
1999 issue of this JOURNAL.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 4, APRIL 1999

565

A Register-Controlled Symmetrical DLL for Double-Data-Rate DRAM


Feng Lin, Jason Miller, Aaron Schoenfeld, Manny Ma, and R. Jacob Baker

Abstract This paper describes a register-controlled symmetrical delay-locked loop (RSDLL) for use in a high-frequency
double-data-rate DRAM. The RSDLL inserts an optimum delay
between the clock input buffer and the clock output buffer,
making the DRAM output data change simultaneously with the
rising or falling edges of the input clock. This RSDLL is shown to
be insensitive to variations in temperature, power-supply voltage,
and process after being fabricated in 0.21-m CMOS technology.
The measured rms jitter is below 50 ps when the operating
frequency is in the range of 125250 MHz.
Index TermsDelay-locked loops, double-data rate, DRAM.

Fig. 1. Data timing chart for DDR DRAM.

I. INTRODUCTION

N synchronous DRAM, the output data strobe (DQS)


should be locked to the data outputs (DQ outputs) for
high-speed performance. The clock-access and output-hold
times of conventional DRAM designs are determined by the
delay time of the internal circuits such as the clock input
and output buffers. Variations in temperature and process
shifts will change the access time and make the valid data
window small. To optimize and stabilize the clock-access
and output-hold times, an internal register-controlled delaylocked loop (RDLL) [1], [2] has been used to adjust the
time difference between the output and input clock signals in
SDRAM. Since the RDLL is an all-digital design, it provides
robust operation over all process corners. Another solution to
the timing constraints found in SDRAM was given in [3] with
the synchronous mirror delay (SMD). Compared to RDLL,
SMD does not provide as tight of locking but has the advantage
that the time to acquire lock between the input and output
clocks is only two clock cycles. As the clock speeds used in
DRAM continue to increase, the skew becomes the dominating
concern, outweighing the disadvantage of the added time to
acquire lock needed in an RDLL.
This paper describes a modified register-controlled symmetrical delay-locked loop (RSDLL) used to meet the requirements of double-data-rate (DDR) SDRAM (read/write
accesses occur on both rising and falling edges of the clock).
Here, symmetrical means that the delay line used in the
DLL has the same delay whether a high-to-low or a low-tohigh logic signal is propagating along the line. The data output
timing diagram of a DDR SDRAM is shown in Fig. 1. The
RSDLL is used to increase the valid output data window and
Manuscript received September 3, 1998; revised November 2, 1998. This
work was supported by Micron Technology, Inc.
F. Lin and R. J. Baker are with the Microelectronics Research Center,
University of Idaho, Boise, ID 83712 USA (e-mail: danlin@uidaho.edu).
J. Miller, A. Schoenfeld, and M. Ma are with Micron Technology, Inc.,
Boise, ID 83707-0006 USA.
Publisher Item Identifier S 0018-9200(99)02438-5.

diminish the undefined


by synchronizing both rising
and falling edges of the DQS signal with the output data DQ.
The target specifications for the DLL described in this paper
are:
1) robust operation eliminating the need for postproduction
tuning (something required in an analog implementation);
2) operating frequency ranging from 143 (286 Mb/s/pin) to
250 MHz (500 Mb/s/pin);
3) tight synchronization (skew less than 5% of the cycle
time) between the output clock and data on both rising
and falling edges of the output clock;
4) low skew between the input and output clocks (with low,
5% duty cycle distortion);
5) power-supply-voltage operating range from 2.5 to 3.5 V;
6) portability for ease of use in other processes.
II. RSDLL ARCHITECTURE
Fig. 2 shows the block diagram of the RSDLL. The replica
input buffer dummy delay in the feedback path is used to
match the delay of the input clock buffer. The phase detector
(PD) is used to compare the relative timing of the edges of
the input clock signal and the feedback clock signal, which
comes through the delay line, controlled by the shift register.
The outputs of the PD, shift-right and shift-left, are used to
control the shift register. In the simplest case, one bit of the
shift register is high. This single bit is used to select a point
of entry for CLKIn in the symmetrical delay line (more on
this later). When the rising edge of the input clock is within
the rising edges of the output clock and one unit delay of the
output clock, both outputs of the PD, shift-right and shift-left,
go to logic LOW and the loop is locked. The basic operation
of the PD is shown in Fig. 3. The resolution of this RSDLL
is determined by the size of a unit delay used in the delay

00189200/99$10.00 1999 IEEE

566

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 4, APRIL 1999

Fig. 4. Symmetrical delay element used in RSDLL.


Fig. 2. Block diagram of RSDLL.

Fig. 5. Delay line and shift register for RSDLL.

switches from LOW to HIGH. An added benefit of the twoNAND delay element is that two point-of-entry control signals
are now available. Both are used by the shift register to solve
the possible problem caused by the power-up ambiguity in the
shift register.
Fig. 3. Phase detector used in RSDLL.

B. Control Mechanism of the Shift Register


line. The locking range is determined by the number of delay
stages used in the symmetrical delay line. Since the DLL
circuit inserts an optimum delay time between CLKIn and
CLKOut, making the output clock change simultaneously with
the next rising edge of the input clock, the minimum operating
frequency to which the RSDLL can lock is the reciprocal
of the product of the number of stages in the symmetrical
delay line with the delay per stage. Adding more delay stages
will increase the locking range of the RSDLL at the cost of
increased layout area.
III. CIRCUIT IMPLEMENTATION
A. Basic Delay Element
Instead of using an AND gate as the unit-delay stage (NAND
inverter), as was done in [1], we used a NAND-gate-based
delay element. The implementation of a three-stage delay line
is shown in Fig. 4. The problem when using a NAND inverter
as the basic delay element is that the propagation delay through
the unit delay resulting from a HIGH-to-LOW transition is
not equal to the delay of a LOW-to-HIGH transition (
). Further, this delay varies from one run to another. If the
and
is 50 ps, for example, the total
skew between
skew of the falling edges through ten stages will be 0.5 ns.
inverter delay element
Because of this skew, the NAND
cannot be used in a DDR DRAM. In our modified symmetrical
delay element, another NAND gate is used instead of an inverter
(two NAND gates per delay stage). This scheme guarantees
independent of process variations, since
that
while one NAND switches from a HIGH to LOW, the other

As shown in Figs. 4 and 5, the input clock is a common


input to every delay stage. The shift register is used to
select a different tap of the delay line (the point of entry
for the input clock signal into the symmetrical delay line).
The complementary outputs of each register cell are used to
is connected directly to the input
select the different tap:
of a delay element, and
is connected to the previous
stage of input . From right to left, the first LOW-to-HIGH
transition in the shift register sets the point of entry into the
delay line. The input clock will pass through the tap with
a high logic state in the corresponding position of the shift
of this tap is equal to a LOW, it will
register. Since the
disable the previous stages; therefore, it does not matter what
the previous states of the shift register are (shown as dont
cares, , in Fig. 5). This control mechanism guarantees that
only one path is selected. This scheme also eliminates powerup concerns since the selected tap is simply the first, from the
right, LOWHIGH transition in the shift register.
C. Phase Detector
To stabilize the movement in the shift register, after making
a decision, the phase detector will wait at least two clock cycles
before making another decision (Fig. 3). A divide by two was
included in the phase detector so that every other decision,
resulting from comparing the rising edges of the external clock
and the feedback clock, was used. This will provide enough
time for the shift register to operate and the output waveform
to stabilize before another decision by the PD is implemented.
The unwanted side effect of this delay is an increase in the
lock time. The shift register is clocked by combining the

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 4, APRIL 1999

567

Fig. 6. Measured rms jitter versus input frequency.

Fig. 7. Measured delay per stage versus VCC and temperature.

shift-left and shift-right signals. The power consumption will


decrease when there are no shift-left or -right signals and the
loop is locked. Another concern with the phase-detector design
is the design of the flip-flops (FFs). To minimize the static
phase error, very fast FFs should be used, ideally with zero
setup time. Also, the metastability of the flip-flops becomes
a concern as the loop becomes locked. This together with
possible noise contributions and the need to wait, as discussed
above, before implementing a shift-right or -left may increase
the desirability of adding additional filtering in the phase
detector. Some possibilities include increasing the divider ratio
used in the phase detector or using a shift register in the phase
detector to determine when a numbersay, fourshift-rights
or -lefts have occurred. For the present design, we were forced
to use a divide by two in the phase detector because of lock
time requirements.

IV. EXPERIMENTAL RESULTS


The RSDLL was fabricated in a 0.21- m, four-poly, doublemetal CMOS technology (a DRAM process). We used a 48stage delay line with an operation frequency of 125250 MHz.
The maximum operating frequency was limited by delays
external to the DLL such as the input buffer and interconnect.
There was no noticeable static phase error on either rising
or falling edges. Fig. 6 shows the resulting rms jitter versus
input frequency. One sigma of jitter over the 125250-MHz
frequency range was below 50 ps. The peak-to-peak jitter over
this frequency range was below 100 ps. The measured delay
per stage versus VCC and temperature is shown in Fig. 7. Note
that the 150-ps typical delay of a unit-delay element was very
close to the rise and fall times on-chip of the clock signals and
represents a practical minimum resolution of a DLL for use in
a DDR DRAM fabricated in a 0.21- m process. The power

568

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 4, APRIL 1999

Fig. 8. Measured ICC (DLL current consumption) versus input frequency.

consumption (current draw of the DLL when VCC


V) of
the prototype RSDLL is illustrated in Fig. 8. We found that the
power consumption was mainly determined by the dynamic
power dissipation of the symmetrical delay line. Our NAND
delays in this test chip were implemented with 10/0.21- m
NMOS and 20/0.21- m PMOS. By reducing the widths of
both the NMOS and PMOS transistors, the power dissipation
can be greatly reduced without a speed or resolution penalty
(with the added benefit of reduced layout size).
V. CONCLUSIONS
The concept of a register-controlled symmetrical delaylocked loop has been presented. The modified symmetrical
delay element makes the RSDLL useful in DDR DRAMs.
Experimental results verify that this RSDLL is stable against
temperature, process, and power-supply variations.
Further development of the RSDLL will include investigations into reducing power consumption, implementing phaselocked loops where the symmetrical delay is used as part of a
purely digital registered-controlled oscillator, and developing

two-loop architectures where coarse loops (resolutions on the


order of 100 ps) are used with fine loops (resolutions on the
order of 10 ps [2]) for wide tuning range and small static
phase errors.
REFERENCES
[1] A. Hatakeyama, H. Mochizuki, T. Aikawa, M. Takita, Y. Ishii, H.
Tsuboi, S.-Y. Fujioka, S. Yamaguchi, M. Koga, Y. Serizawa, K.
Nishimura, K. Kawabata, Y. Okajima, M. Kawano, H. Kojima, K.
Mizutani, T. Anezaki, M. Hasegawa, and M. Taguchi, A 256-Mb
SDRAM using a register-controlled digital DLL, IEEE J. Solid-State
Circuits, vol. 32, pp. 17281732, Nov. 1997.
[2] S. Eto, M. Matsumiya, M. Takita, Y. Ishii, T. Nakamura, K. Kawabata,
H. Kano, A. Kitamoto, T. Ikeda, T. Koga, M. Higashiro, Y. Serizawa,
K. Itabashi, O. Tsuboi, Y. Yokoyama, and M. Taguchi, A 1Gb SDRAM
with ground level precharged bitline and non-boosted 2.1V word line,
in ISSCC Dig. Tech. Papers, Feb. 1998, pp. 8283.
[3] T. Saeki, Y. Nakaoka, M. Fujita, A. Tanaka, K. Nagata, K. Sakakibara,
T. Matano, Y. Hoshino, K. Miyano, S. Isa, S. Nakazawa, E. Kakehashi,
J. M. Drynan, M. Komuro, T. Fukase, H. Iwasaki, M. Takenaka, J.
Sekine, M. Igeta, N. Nakanishi, T. Itani, K. Yoshida, H. Yoshino, S.
Hashimoto, T. Yoshii, M. Ichinose, T. Imura, M. Uziie, S. Kikuchi, K.
Koyama, Y. Fukuzo, and T. Okuda, A 2.5-ns clock access 250-MHz,
256-Mb SDRAM with synchronous mirror delay, IEEE J. Solid-State
Circuits, vol. 31, pp. 16561665, Nov. 1996.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 11, NOVEMBER 1997

1683

A Semidigital Dual Delay-Locked Loop


Stefanos Sidiropoulos, Student Member, IEEE, and Mark A. Horowitz, Senior Member, IEEE

AbstractThis paper describes a dual delay-locked loop architecture which achieves low jitter, unlimited (modulo 2 ) phase
shift, and large operating range. The architecture employs a core
loop to generate coarsely spaced clocks, which are then used by
a peripheral loop to generate the main system clock through
phase interpolation. The design of an experimental prototype in
a 0.8-m CMOS technology is described. The prototype achieves
an operating range of 80 kHz400 MHz. At 250 MHz, its peakto-peak jitter with quiescent supply is 68 ps, and its jitter supply
sensitivity is 0.4 ps/mV.
Index TermsClock synchronization, delay-locked loops, phase
interpolation, phase-locked loops.

I. INTRODUCTION

HASE-LOCKED loops (PLLs) and delay-locked loops


(DLLs) are routinely employed in microprocessor and
memory ICs in order to cancel the on-chip clock amplification
and buffering delays and improve the I/O timing margins.
However, the increasing clock speeds and integration levels
of digital circuits create a hostile operating environment for
these phase alignment circuits. The supply and substrate noise
resulting from the switching of digital circuits affects the PLL
or DLL operation and results in output clock jitter which
subtracts from the I/O timing margins.
In applications where no clock synthesis is required, DLLs
offer an attractive alternative to PLLs due to their better
jitter performance, inherent stability, and simpler design. The
main disadvantage of conventional DLLs, however, is their
limited phase capture range. This paper presents a dual DLL
architecture which combines several techniques to achieve
unlimited phase capture range, low jitter and static-phase error,
and four orders of magnitude operating frequency range. This
architecture is based on a cascade of two loops. The core
loop generates six clocks evenly spaced by 30 which are
then used by the peripheral loop to generate the output clock,
under the control of a digital finite state machine (FSM). By
using phase interpolation, the dual loop can provide unlimited
phase shift without the use of a voltage controlled oscillator
(VCO). Using an FSM for phase control offers the advantage
of enabling the flexible implementation of complicated phase
capture algorithms in the digital domain. Finally, by utilizing
self-biased techniques, the loop achieves large operating range
and low jitter.
This paper begins with a brief overview of conventional
DLL design. After outlining some of the disadvantages of
Manuscript received April 10, 1997; revised June 5, 1997. This work was
supported by ARPA under contract DABT63-94-C-0054.
The authors are with the Computer Systems Laboratory, Stanford University, Stanford, CA 94305 USA and with Rambus Inc., Mountain View, CA
94040 USA.
Publisher Item Identifier S 0018-9200(97)08033-5.

Fig. 1. Block diagram of a conventional DLL.

conventional approaches, Section II presents the dual interpolating DLL architecture. Section III discusses circuit design
issues that arose in the prototype implementation of the architecture in a 0.8- m CMOS technology. Section IV discusses
the experimental results, and concluding remarks follow in
Section V.
II. ARCHITECTURE
A. Conventional DLLs
A simplified block diagram of a conventional DLL [1] is
outlined in Fig. 1. The components are a voltage controlled
delay line (VCDL), a phase detector, a charge pump, and a
first-order loop filter. The input reference clock drives the
delay line which comprises a number of cascaded variable
delay buffers. The output clock clk drives the loop phase
detector (depicted in this example as a conventional flip-flop).
The output of the phase detector is integrated by the charge
pump and the loop filter capacitor to generate the loop control
voltage
. The loop negative feedback drives the control
voltage to a value that forces a zero phase error between the
output clock and the reference clock.
This simple design offers many advantages compared to
VCO-based PLLs. Due to frequency acquisition constraints,
PLLs usually resort to a specific type of phase detector, the
state-machine-based phase frequency detector (PFD). In contrast, DLLs can be easily implemented by using bangbang
controli.e., the control signal of the loop, rather than being
proportional to the phase error magnitude, can simply be a
binary up or down indication. Thus, in a bangbang
DLL the phase detector can be a replica of the input data
receiver resulting in an optimal placement of the sampling
clock in the center of the input receivers sampling uncertainty
window. Additionally, since DLLs do not use a VCO, phase
errors induced by supply or substrate noise do not accumulate
over many clock cycles. This improved noise immunity is

00189200/97$10.00 1997 IEEE

1684

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 11, NOVEMBER 1997

Fig. 2. Dual interpolating DLL architecture.

the main reason for the increased adoption of DLLs in


applications that do not require clock synthesis.
The conventional DLL architecture of Fig. 1 suffers from
two important disadvantages: clock jitter propagation and
limited phase capture range. Since the VCDL simply delays
the reference clock by a single clock cycle, the reference
clock jitter directly propagates to the output clock. This allpass filter behavior with respect to the frequency of the
jitter of the reference clock results in reduced I/O timing
margins, especially in source-synchronous interfaces where
the reference clock emanates from another noisy digital chip.
To overcome this problem, a separate low-jitter differential
clock can be used as the input to the delay line. This way the
on-chip common-mode noise and the reference clock jitter do
not affect the I/O timing margins.
A more important problem is that a VCDL does not have
the cycle slipping capability of a VCO. Therefore, at a given
operating clock frequency, the DLL can delay its input clock
by an amount bounded by a minimum and a maximum
delay. As a consequence, extra care must be taken by the
designer so that the loop will not enter in a state in which
it tries to lock toward a delay which is outside these two
limits. A compromising solution is to extend the VCDL range
and use an FSM that controls the loop start-up. However,
DLLs relying on quadrature phase mixing [2], [3] completely
eliminate this problem. This approach is based on the fact that
quadrature clocks can be easily generated, given a clock of
the correct frequency. The quadrature clocks are then fed to a
phase mixer which can produce a clock whose phase can span
the complete 0360 phase interval. This approach eliminates
the limited phase range problem of conventional DLLs since
it can essentially rotate the output clock phase infinite times
providing seamless switching at the quadrant boundaries. The
main disadvantage of quadrature mixing is that the output of

the phase mixer is a clock with a slew rate inherently limited


by
, where
is the output swing of the phase
mixer and the period of the clock. This slow clock exhibits
increased dynamic noise sensitivity, thus degrading the jitter
performance of quadrature mixing DLLs.
The approach presented here overcomes this limitation of
quadrature mixing DLLs since it generates the output clock
by interpolating between smaller 30 phase intervals [5].
Simultaneously, by avoiding the use of a VCO it eliminates the
phase error accumulation problem of similar approaches [4].
B. Dual Interpolating DLL
Fig, 2 shows a high-level block diagram of the proposed
architecture. This architecture is based on cascading two loops.
A conventional first-order core DLL is locked at 180 phase
shift. Assuming that the delay line of the core DLL comprises
six buffers, their outputs are six clocks which are evenly
spaced by 30 . The peripheral digital loop selects a pair of
clocks,
and , to interpolate between. Clocks
and
can be potentially inverted in order to cover the full 0360
phase range. The resulting clocks,
and , drive a digitally
controlled interpolator which generates the main clock . The
phase of this clock can be any of the
quantized phase steps
and , where
is the
between the phases of clocks
interpolation controlling word range.
The output clock
of the interpolator drives the phase
detector which compares it to the reference clock. The output
of the phase detector is used by the FSM to control the phase
selection, the selective phase inversion, and the interpolator
phase mixing weight. The FSM moves the phase of the clock
according to the phase detector output. In the more common
case this means just changing the interpolation mixing weight
by one. If, however, the interpolator controlling word has
reached its minimum or maximum limit, the FSM must change

SIDIROPOULOS AND HOROWITZ: SEMIDIGITAL DUAL DELAY-LOCKED LOOP

the phase of clock or


to the next appropriate selection.
This phase selection change might also involve an inversion
of the corresponding clock if the current interpolation interval
is adjacent to the 0 or 180 boundary. Since these phase
selection changes happen only when the corresponding phase
mixing weight is zero, no glitches occur on the output clock.
The digital bangbang nature of the control results in dithering around the zero phase error point in the lock condition. The
dither amplitude is determined by the interpolator phase step
and the delay through the peripheral loop.
In this architecture the output clock phase can be rotated, so
no hard limits exist in the loop phase capture range: the loop
provides unlimited (modulo 2 ) phase shift capability. This
property eliminates boundary conditions and phase relationship constraints, common in conventional DLLs. The only
requirement is that the DLL input clock and the reference
clock are plesiochronous (i.e., their frequency difference is
bounded), making this architecture suitable for clock recovery
applications. Since the system does not use a VCO, it does
not suffer from the phase error accumulation problem of
conventional PLLs. Moreover, the input clocks of the phase
interpolator are spaced by just 30 , so the output of the
phase interpolator does not exhibit the noise sensitivity of
the quadrature mixing approach. Finally, the fact that the
capture algorithm can be completely implemented in the
digital domain gives great flexibility in its implementation
as will be discussed in Section III. Although the prototype
described in this paper is implemented with an analog core
loop, possible implementations of the architecture can use
digital control in both loops, further enhancing the system
versatility. Moreover, the architecture can be easily extended
to use a clock recirculating scheme in the core loop, so that
the output clock frequency is a multiple of the input clock [7].
C. Dual-Loop Dynamics
Cascading two loops can compromise the overall system
stability and lead to undesired jitter peaking effects. However,
as the analysis in this section will show, this dual-loop
architecture does not exhibit any jitter peaking irrespective
of the dynamics of the two loops. The behavior of the DLL
can be analyzed with respect to two types of perturbations:
i) input or reference clock delay variations and ii) delay
variations resulting from supply and substrate noise. The
frequency response of the dual loop can be analyzed by making
a continuous time approximation, in which the sampling
operation of the phase detectors and the digital nature of the
peripheral loop are ignored. This approximation is valid for
core and peripheral loop bandwidths at least a decade below
the operating frequency. This constraint needs to be satisfied
anyway in a DLL in order to eliminate the effects of higher
order poles resulting from the delays around loop.
Fig. 3 shows the dual loop linearized model including
both the loop clocks
and
, and delay errors
introduced by supply or substrate noise
. Each of the two
loops is modeled as a single pole system, in which the input,
output, and error variables are delays, similar to the single-loop
analysis published in [7]. For example, the output delay of the

1685

Fig. 3. Linearized dual DLL model.

core loop
(in seconds) is the delay established by the
core loop delay line, while the input delay
is the delay
for which the core loop phase detector and charge pump do not
generate an error signal. Since the core loop VCDL spans half
a clock cycle,
is equal to half an input clock period. By
using these loop variables, the input-to-output transfer function
of the core loop can be easily derived
(1)
where
(in rads/s) is the pole of the core loop as determined
by the charge pump current, the phase detector and delay line
gain, and the loop filter capacitor. Similarly, the noise-to-delay
error transfer function of the core loop can be shown to be
(2)
where
is the additional delay introduced in the core
loop from supply or substrate noise, and
is the delay
error seen by the core loop phase detector. This transfer
function indicates that noise induced delay errors can be
tracked up to the loop bandwidth and that the response of
the loop to a supply step consists of an initial step followed
by a decaying exponential with a time constant equal to
.
Before proceeding to analyze the response of the dual loop,
it should be noted that the linearized model of Fig. 3 uses
a simplifying assumption. The assumption is that the delay
error
introduced by supply or substrate variations is
identical in both loops and does not depend on the state
of the phase selection multiplexers. Since the supply and
substrate sensitivity of the peripheral loop depends on the
phase selection and will be typically higher due to the presence
of the final CMOS system clock buffer, this assumption is
not necessarily accurate. However, it does not affect the
conclusions drawn below about the stability of the loop, since
it only removes a modifying constant, which is equal to the
ratio in the delay sensitivities of the two loops. This constant
only affects the relative location of the poles and zeros of the
resulting transfer function, and, as it will be shown below,
the loop is unconditionally stable irrespective of the relation
between the individual poles and zeros. Using the model of
Fig. 3, it is straightforward to show that the transfer function
of the peripheral loop is identical in form to
that of the core loop. This result agrees with intuition since
in the dual loop system reference clock perturbations do not

1686

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 11, NOVEMBER 1997

(a)

(b)

Fig. 5. Dual DLL detailed block diagram.


Fig. 4. Dual loop response to: (a) step change in input clock and (b) supply
noise step.

affect the core loop. More interesting is the transfer function of


the input clock
to dual-loop error
since changes
in the period of the input clock will cause both the core and
peripheral loop to react. Based on (1) and (2), this transfer
function can be shown to be
(3)
This bandpass transfer function exhibits no peaking at any
frequency regardless of the relative magnitudes of
and .
The step response of the system, shown in Fig. 4(a), reveals
that unit-step changes in
(i.e., step changes in the
input clock period) will initially peak at a less than unity
value determined by the ratio of the two poles.1 Moreover, as
the magnitude of
increases, the disturbance on the output
is reduced since the peripheral loop compensates quickly for
disturbances at the output caused by changes of the input clock.
Finally, the transfer function from supply or substrate noiseinduced delay errors
to the delay error of the dual loop
can be derived
(4)
Equation (4) also exhibits no peaking at any frequency since
the location of the last zero can never be above that of
the poles. The step response of the system is plotted in
Fig. 4(b) for various ratios of the core to peripheral pole
frequencies. Under all conditions, the initial delay error is
equal to twice the injected unity error
since this error is
added on both loops. When the peripheral loop bandwidth is
less than half that of the main loop, there is no overshoot
in the dual-loop step response. This result occurs because
the core loop compensates for its delay error quickly, while
the slower peripheral loop compensates for the output delay
1 It should be noted that in case in-CLK and ref-CLK are identical or
correlated, the resulting transfer function exhibits a low-pass peaked behavior.
Nevertheless the resulting peaking is small, exhibiting a maximum of 15%
when pp pc , while it is less than 5% as long as pp and pc are an order of
magnitude apart in frequency.

error later. When the pole frequencies of the two loops are
very close, the system overshoots since the peripheral loop
compensates for the output delay error at approximately the
same rate as the peripheral loop. The worst case overshoot of
approximately 4.5% of the initial disturbance occurs when the
peripheral loop bandwidth is twice that of the core loop. As the
peripheral loop bandwidth increases, the overshoot becomes
progressively smaller since the peripheral loop corrects for
both the peripheral and core delay errors. Subsequently, the
influence of the slower core loop correction on the output
delay error is compensated by the peripheral loop. Therefore,
even in the worst case, the dual loop cascade exhibits only
minor overshoot.
III. CIRCUIT DESIGN
A. Overview
A more detailed block diagram of the dual loop is shown in
Fig. 5. This design uses a separate local differential clock as
the input to the delay line. Although the use of this clock
is not inherent in the loop architecture, it minimizes the
supply sensitivity in applications such as source synchronous
interfaces. To minimize the effects of input clock duty cycle
imperfections and common-mode mismatches, a duty cycle
adjuster (DCA) [2] is employed after the first clock receiving
buffer. The 50% duty cycle clock drives the core DLL. The
core delay line consists of six differential buffers. An extra pair
of buffers B B generate two clocks which drive the core
loop 180 phase detector. The output of the phase detector
controls the charge pump which forces clocks C and C to
be 180 out of phase. Since all the buffers in the core delay line
(including B and B ) have the same size, all the core VCDL
stages have the same fan-out and delay. Therefore, forcing
C and C to be 180 out of phase will generate six evenly
spaced by 30 clocks at the outputs of the core delay line.
The phase selection and phase inversion multiplexers are
differential elements controlled by the core loop control voltage. In order to eliminate jitter-sensitive slow clocks, all
buffers in the clock path need to have approximately the same

SIDIROPOULOS AND HOROWITZ: SEMIDIGITAL DUAL DELAY-LOCKED LOOP

1687

Fig. 7. Core loop phase detector.


(a)

(b)
Fig. 6. (a) Core loop delay buffer and (b) charge pump.

bandwidth. For this reason, the phase selection in this design


is implemented as a combination of a 3-to-1 and a 2-to-1
multiplexers, instead of a single 6-to-1 differential multiplexer
with lower total power. Since the phase selection multiplexer
can affect the phase shift of the core delay line through datadependent loading, the six output clocks are buffered before
driving the phase selection multiplexers. This way, changing
the multiplexer select does not affect the core delay line phase
shift.
The outputs
of the phase inversion multiplexer drive
the phase interpolator which generates the low swing differential clock . This clock is then amplified and buffered through
a conventional CMOS inverter chain generating the main clock
(CLK). The peripheral loop phase detector [1] compares that
clock to the reference clock, generating a binary phase error
indication that is then fed to the FSM. The FSM based on the
phase detector (PD) output selects phases
and controls
the phase interpolation.
B. Core Loop
To minimize the jitter supply sensitivity, all the delay buffers
in the design, from the input clock (in-CLK) to the output
of the phase interpolator ( ), use differential elements with
replica feedback biasing [6]. In order to linearize the loop
gain and obtain large operating range, the core loop charge
pump current is scaled along with the VCDL buffer current as
illustrated in Fig. 6 [7]. Voltage
is generated through the
replica-feedback biasing circuit while
is a buffered version
of the charge pump control voltage
. In addition to the core
VCDL buffers, voltages
and
control the differential
buffer elements of the peripheral loop. This ensures that all
the buffers in the design have approximately equal delays and
that the edge rates of the interpolator input clocks (
) scale
with the operating frequency of the loop.

The sensitivity of the dual-loop architecture to the core loop


phase offset depends on the particular application. For the
case that the dual DLL is used to just generate a clock whose
phase is directly controlled by the phase detector output, the
phase offset of the core loop does not affect the system phase
offset. In this case, the loop operation will not be affected as
long as the core loop phase offset is bounded. An absolute
core loop offset less than 30 ensures monotonic switching at
the 0 and 180 interpolation boundaries, so the interpolating
loop functions correctly, albeit with a larger than nominal
interpolation phase step. Core loop phase offsets larger than
this amount will result in a hysteretic locking behavior at the
quadrant boundaries, which will increase the dither jitter if the
reference clock phase forces the dual loop to lock at this point.
The dual-loop operation becomes more sensitive to core
loop phase offsets in case the designer wants to use this
architecture to generate an additional clock that is offset by
90 relative to the reference clock. In such an application, the
quadrature clock would be generated by using an extra pair
of phase selection and inversion multiplexers whose selects
would be offset by three relative to those generating the main
clock. This would create a 90 interpolation interval offset,
resulting in the required quadrature phase shift. In this case
the core loop phase offset would impact the quadrature phase
if the selects of the extra multiplexers happen to wrap around
the 0 or 180 interpolation interval boundaries.
Even though the prototype does not implement quadrature
phase generation, a low offset phase detector and careful
matching of the layout were used to ensure uniform spacing
of the six clocks. A self-biased DLL requires a linear phase
detector. To avoid start-up problems that would result from the
use of a conventional state machine PFD [7], the core loop uses
the phase detector depicted in Fig. 7. This design comprises
an SR latch augmented with two input pulse generators. The
absence of extra state storage in this design eliminates any
start-up false locking conditions. Additionally, its symmetric
structure and the use of pulse triggering minimize the core
loop phase offset.
The core of the phase detector is an SR latch-based phase
detector. The SR latch ensures a 180 phase shift between
the falling edges of its inputs only when the duty cycle of
the two input clocks is identical. However, when the duty
cycle of the two input clocks is different, this mismatch will
propagate as a core loop phase locking offset. This happens
because an unbalanced overlap of the two input clocks causes
the output of the SR latch to have a duty cycle deviating
from 50%. To compensate for this effect, the SR latch is

1688

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 11, NOVEMBER 1997

Fig. 9. Phase interpolator (type-I) schematic.


Fig. 8. Phase detector and charge pump simulated transfer function.

augmented with two pulse generators which propagate a low


pulse on the positive edges of the input clocks. Since potential
overlaps are minimized, the design can tolerate large duty
cycle imperfections and still provide an accurate 180 lock
in the core loop.
Fig. 8 shows the simulated transfer characteristics of the
phase detector and charge pump over three extreme process
and environment conditions. The cycle time of the two input
clocks is set at 4 ns, while their duty cycles are mismatched
by 0.5 ns such that the duty cycle of
is 37.5% while
the duty cycle of clock
is 62.5%. It can be seen that
the transfer function is linear and has no offset or dead-band
around the 2-ns point where the loop actually locks. However,
the combination of input pulsing and duty cycle imperfections
results in nonlinear transfer function characteristics at the
vicinity of the boundaries of the locking range (i.e., 0 and
4 ns). The only effect of this nonlinearity is that the core
loop can exhibit an initial slew-rate limited reduction of its
phase error, since the output of the phase detector and charge
pump is constant. After the phase error has been reduced,
such that the phase detector operates within its linear region,
the core loop will exhibit a conventional single-pole response.
Harmonic locking problems, common in PLLs using SR
phase detectors, are eliminated in this design since the core
loop is reset to its minimum delay at system start-up.
C. Phase Interpolator Design
The most critical circuit in the design of the peripheral
digital loop is the phase interpolator. The phase interpolator
receives two clocks
and generates the main clock
whose phase is the weighted sum of the two input phases.
Essentially, the phase interpolator converts a digital weight
code generated from the FSM to the phase of clock .
Linearity is not important in the design of this digital-to-phase
converter since it is enclosed in the peripheral loop feedback.
The important requirement is that the interpolation process
is monotonic to ensure that no hysteresis exists in the loop
locking characteristics. Additionally, the phase step must be
minimized since it determines the loop dither amplitude. In this

case, the interpolation step is 1/16 of the 30 interval resulting


in approximately 2 peripheral loop nominal dither. Another
important requirement is that the design should provide for
seamless interpolation-boundary switching. This means that
when the input code is such that the weight on one of the
input clocks is zero, this clock should have no influence on
the output.
Fig. 9 shows a schematic diagram of the interpolator used
in the prototype chip. This design is a dual input differential
buffer which uses the same symmetric loads as all the core
VCDL buffers and peripheral loop multiplexers. The bias
and
are identical with those biasing the rest
voltages
of the loop, ensuring that its total delay is approximately 30
of the clock period which is the same as the rest of the loop
buffers. Therefore, the transition time of the interpolator input
clocks is larger than the minimum delay through the interpolator, and the two input transitions overlap. This condition
ensures that the interpolator outputs never settle at half of
the swing range. The current sources of the two differential
pairs are thermometer controlled elements. The thermometer
codes are generated by a 16-b long up/down shift register
which is controlled by the peripheral loop FSM. By changing
the thermometer code, the FSM adjusts in a complementary
fashion the currents of the two input differential pairs resulting
in a mixing of the two input clock phases. This design
(type-I) does not completely satisfy the seamless boundaryswitching requirement. Even when the current through one of
the differential pairs is zero, the input still influences the output
of the interpolator. This influence is due to the capacitive
coupling of the gate-to-drain capacitance of the differential
pair input transistors.
Fig. 10 shows an alternative design which does not suffer
from this problem. In this design (type-II), the interpolator differential pairs consist of unit cell differential pairs. Therefore,
when one of the interpolation weight thermometer codes is
zero, the corresponding input is completely cut off from the
output, eliminating the gate-to-drain coupling capacitance.
Fig. 11 shows the simulated transfer function of the interpolator alternative designs. This simulation includes random
( 20 mV) threshold voltage offsets in the thermometer code

SIDIROPOULOS AND HOROWITZ: SEMIDIGITAL DUAL DELAY-LOCKED LOOP

1689

(a)

(b)

Fig. 12. (a) Simplified FSM algorithm and (b) resulting loop behavior.

Fig. 10.

Phase interpolator (type-II) schematic.

Fig. 11.

Simulated phase interpolator transfer function.

current sources. The type-I design exhibits a nominal step of


approximately 2 . However, due to the gate-to-drain capacitive
coupling effect, the maximum step of 3.8 occurs at the
interpolation boundary when the input clock is switched to
the next selection. In the lower power implementation where
no buffering is used at the core delay line outputs (typeI-unbuf), the data-dependent loading on the previous stage
results on a double phase step at the interpolation interval
boundaries. Although the alternative design (type-II) does not
exhibit a boundary phase step, it was not used since it occupies
more layout area and exhibits more nonlinear characteristics
due to data-dependent loading of the previous stage. So in
the present implementation, worst-case dithering occurs at
the interpolation interval boundaries and has an approximate
magnitude of 3.8 .
D. Finite State Machine
A simplified version of the peripheral loop FSM algorithm
is outlined in Fig. 12(a). The single state Early of the FSM
indicates the relationship of the two interpolator input clocks.

Fig. 13. Prototype chip microphotograph.

On every cycle of its operation, the FSM might undertake two


actions.
In the more frequent case of in-range interpolation (i.e.,
0), the FSM simply increments or decrements
weight
the interpolation weight by shifting up or down the
interpolator controlling shift register. The direction of the
shift is decided based on the phase detector output and
the current value of the state Early.
If the peripheral loop has run out of range in the current
interpolation interval, the FSM seamlessly slides the
current interpolation interval by switching phase or to
the next selection. The fact that the interpolation has run
out of range in the current interval is simply indicated by
a combination of the current value of the state Early, the
most or least significant bit of the thermometer register,
and the output of the phase detector. In case the current
is adjacent to the 0 or 180
selection of phase or
interpolation interval boundary, switching to the next
selection involves toggling the select of the second-stage
phase inversion multiplexer.

1690

Fig. 14.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 11, NOVEMBER 1997

Noise generation and monitoring circuits.

The loop phase capture behavior resulting from this simple


algorithm is illustrated in Fig. 12(b). The phase error decreases
at a linear rate until the system achieves lock. Subsequently,
the loop dithers around the zero phase error point with a
dither magnitude of one phase interpolation interval. This
occurs because in this type of bangbang system, the output
of the phase detector is just a binary phase error without
any indication of the magnitude of the phase error. The
complementary interpolation weights slew linearly, changing
direction at the interpolation interval boundaries. Once the
system finds lock, they either dither by one or they stay
constant if the dither point happens to lie on an interval
boundary.
The magnitude of the peripheral loop phase dither is
determined by the minimum interpolation step and the
delay through the feedback loop. In conventional analog
bangbang DLLs, the loop delay is largely determined by
the delay through the delay line and the clock distribution
network. However, this digital implementation has a larger
minimum loop delay. The underlying reason is that driving
the FSM directly from the phase detector output might lead
into metastability problems, especially since the whole loop
operation is driving the phase detector to its metastable point
of operation. For this reason, in this implementation the
output of the phase detector is delayed by three metastability
hardened flip-flops. This increases the mean time between
failures (MTBF) of the system to a calculated worst case of
approximately 100 years, but at the same time increases the
peripheral loop delay by three cycles. To compensate for that
delay and decrease the loop dither, the FSM logic implements
a front-end filter which counts eight continuous phase detector
up or down results before propagating this signal to the
core FSM. This causes the FSM to delay its next decision
until the results of its previous action have been propagated to
the phase detector output and reduces the inherent peripheral
loop dither to one phase interpolation interval.
The digital nature of the peripheral loop control enabled the
implementation of the FSM to be done through synthesis of a
behavioral verilog model followed by a simple standard cell
place and route. The FSM behavioral model was verified by
simulation in conjunction with a behavioral core loop model.
The significance of this automated methodology is that other

more complicated algorithms can be implemented requiring


minimal effort from the designer. Faster phase acquisition
can be obtained by disabling the front end counter/filter and
changing the interpolation step by a larger amount while the
loop is not in lock. The loop can also implement a periodic
phase calibration algorithm. In this case, the FSM is activated
initially to drive the loop to zero phase error. Then it is
shut down to save power and it is periodically turned on to
compensate for slow phase drifts. Since the FSM can run
at a frequency slower than that of the system clock, the
implementation of different algorithms is not in the system
critical path.
IV. EXPERIMENTAL RESULTS
To verify the dual DLL architecture, a chip has been
fabricated through MOSIS in the HP CMOS26B process. This
is a 1.0- m drawn process with the channel lengths scaled to

0.8 m. Although the gate oxide in this process is 170 A


allowing 5-V operation, the loop design and testing was done
with a 3.3-V power supply voltage.
Fig. 13 is a micrograph of the chip. The chip integrates the
dual DLL, along with noise injection and monitoring circuits
and current-mode differential output buffers. The dual DLL
occupies 0.8 mm of silicon area, the majority ( 60%) of
which is devoted to the peripheral loop logic. This is mainly
due to the relatively large standard cell size of the library used
in this implementation.
The block labeled NOISE-GEN in Fig. 13 is used to inject
and measure on-chip supply noise. Fig. 14 shows a schematic
diagram of these circuits. The 1000- m wide transistor
shorts the on-chip supply rails creating a voltage drop across
the off-chip 4- resistor
. In order to monitor the droop
and the external 5- load
on the on-chip supply, device
resistor
form a broadband attenuating buffer which drives
the 50- scope. The gain of the buffer is computed during an
initial calibration step. The use of these circuits enables the
injection and monitoring of fast ( 1-ns rise time) steps on the
on-chip supply.
The dither jitter of the loop with quiescent on-chip supply
varies with the input phase. This occurs because the offset of
the interpolator and the phase selection multiplexers change
according to the point of lock. Fig. 15 shows the worst-

SIDIROPOULOS AND HOROWITZ: SEMIDIGITAL DUAL DELAY-LOCKED LOOP

Fig. 15.

Jitter histogram with quiet supply.

Fig. 16.

Jitter histogram with 1-MHz 750-mV square wave supply noise.

case jitter (68 ps) with quiescent supply. The jitter histogram
consists of the superposition of two Gaussian distributions
resulting from the switching of the peripheral loop between
two adjacent interpolation intervals. The distance between the
peaks of the two superimposed distributions is about 40 ps,
which is in fair agreement with the simulation results. With
the noise generation circuits injecting a 750-mV 1-MHz square
wave on the chip supply, the peak-to-peak jitter increases to
400 ps (Fig. 16). It should be noted that simulation results
indicate that approximately 50% of this jitter is not inherent to
the loop, but is due to the supply sensitivity of the succeeding
static CMOS clock buffer and off-chip driver.
Fig. 17 illustrates the linearity of the interpolation process
in the peripheral loop. The figure shows the histogram of
the output clock with the peripheral loop FSM continuously
rotating that clock. The histogram was generated by keeping

1691

the reference clock to a constant voltage while the input clock


ran at its nominal frequency of 250 MHz. The histogram
valleys correspond to the interpolation interval boundaries.
The spacing of the valleys is within 10% of their nominal
333-ps distance, indicating good matching of the delays of
the core loop buffers. The absence of one valley at the 180
interpolation boundary indicates a slight offset in the core
loop. The fact that the magnitude of the highest peak of the
histogram is smaller than the magnitude of the deepest valley
indicates that the interpolator achieves the 4-b target linearity
(the 4-b linearity of the interpolator was also confirmed by
a similar histogram of a single interpolation interval). Thus
the overall linearity of the DLL is limited by the steps at the
interpolation interval boundaries.
Table I summarizes the loop performance characteristics.
With a 3.3-V supply, the loop operates from 80 kHz to

1692

Fig. 17.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 11, NOVEMBER 1997

Interpolation process linearity.

TABLE I
PROTOTYPE PERFORMANCE SUMMARY

complicated phase alignment algorithms in a straightforward


manner.
A prototype using a linear self-biased core loop has been
implemented in a 0.8- m technology. The prototype achieves
68-ps peak-to-peak jitter, 0.4-ps/mV supply sensitivity, and
0.08400 MHz operating range.
ACKNOWLEDGMENT
The authors are grateful to M. Johnson, T. Lee, J. Maneatis,
and K. Yang for helpful discussions.
REFERENCES

400 MHz. The phase offset between the reference clock and
the output clock of the loop is less than 40 ps. Operating at
250 MHz, the dual DLL draws 31 mA dc from a 3.3-V power
supply.
V. SUMMARY
Although DLLs are easier to design than PLLs and offer
better jitter performance, their main disadvantage is their
limited phase capture range. This disadvantage limits their
application to completely synchronous environments and complicates start-up circuitry. This paper presented a dual DLL
architecture which removes this limitation by using a core DLL
to generate coarsely spaced clocks which are then used by a
peripheral DLL to generate the output clock by using phase
interpolation. This architecture has unlimited (modulo 2 )
phase shift capability, therefore removing boundary conditions
and phase relationship constraints between the system clocks.
The only requirement is that the DLL input and reference
clocks are plesiochronous, making the dual DLL suitable for
clock recovery applications. In addition, the digital nature
of the peripheral loop control enables implementation of

[1] M. Johnson and E. Hudson, A variable delay line PLL for CPUcoprocessor synchronization, IEEE J. Solid-State Circuits, vol. 23, Oct.
1988.
[2] T. Lee et al., A 2.5 V CMOS delay-locked loop for an 18 Mbit, 500
MB/s DRAM, IEEE J. Solid-State Circuits, vol. 29, pp. 14911496,
Dec. 1994.
[3] M. Izzard et al., Analog versus digital control of a clock synchronizer
for a 3 Gb/s data with 3.0 V differential ECL, in Dig. Tech. Papers
1994 Symp. VLSI Circuits, June 1994, pp. 3940.
[4] M. Horowitz et al., PLL design for a 500 MB/s interface, in Dig.
Tech. Papers Int. Solid State Circuits Conf., Feb. 1993, pp. 160161.
[5] S. Sidiropoulos and M. Horowitz, A semi-digital delay locked loop with
unlimited phase shift capability and 0.08400 MHz operating range, in
Dig. Tech. Papers Int. Solid State Circuits Conf., Feb. 1997, pp. 332333.
[6] J. Maneatis and M. Horowitz, Precise delay generation using coupled
oscillators, IEEE J. Solid-State Circuits, vol. 28, pp. 12731282, Dec.
1993.
[7] J. Maneatis, Low-jitter process-independent DLL and PLL based
on self-biased techniques, IEEE J. Solid-State Circuits, vol. 31, pp.
17231732, Nov. 1996.

Stefanos Sidiropoulos (S93), for a photograph and biography, see p. 690 of


the May 1997 issue of this JOURNAL.

Mark A. Horowitz (S77M78SM95), for a photograph and biography,


see p. 690 of the May 1997 issue of this JOURNAL.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

1021

A Wide-Range Delay-Locked Loop With a Fixed


Latency of One Clock Cycle
Hsiang-Hui Chang, Student Member, IEEE, Jyh-Woei Lin, Ching-Yuan Yang, Member, IEEE, and
Shen-Iuan Liu, Member, IEEE

AbstractA delay-locked loop (DLL) with wide-range operation and fixed latency of one clock cycle is proposed. This DLL
uses a phase selection circuit and a start-controlled circuit to
enlarge the operating frequency range and eliminate harmonic
locking problems. Theoretically, the operating frequency range of
the DLL can be from 1 (
max ) to 1 (3
min ), where
min and
max are the minimum and maximum delay of a
is the number of delay cells used
delay cell, respectively, and
in the delay line. Fabricated in a 0.35- m single-poly triple-metal
CMOS process, the measurement results show that the proposed
DLL can operate from 6 to 130 MHz, and the total delay time
between input and output of this DLL is just one clock cycle.
From the entire operating frequency range, the maximum rms
jitter does not exceed 25 ps. The DLL occupies an active area of
515 m and consumes a maximum power of 132 mW
880 m
at 130 MHz.
Index TermsDelay-locked loops, latency, phase-locked loops,
wide range.

I. INTRODUCTION

ITH THE evolution and continuing scaling of CMOS


technologies, the demand for high-speed and high integration density VLSI systems has recently grown exponentially. However, the important synchronization problem among
IC modules is becoming one of the bottlenecks for high-performance systems.
Phase-locked loops (PLLs) [1][3] and delay-locked loops
(DLLs) [4][7] have been typically employed for the purpose of
synchronization. Due to the difference of their configuration, the
DLLs are preferred for their unconditional stability and faster
locking time than the PLLs. Additionally, a DLL offers better
jitter performance than a PLL because noise in the voltage-controlled delay line (VCDL) does not accumulate over many clock
cycles.
Conventional DLLs may suffer from harmonic locking over
wide operating range. If the DLLs are to operate at lower frequency without harmonic locking, the number of delay stages
must be increased to let the maximum delay of the delay line
be equal to the period of the lowest frequency. However, the

Manuscript received November 5, 2001; revised March 27, 2002.


H.-H. Chang and S.-I. Liu are with the Department of Electrical Engineering
and Graduate Institute of Electronics Engineering, National Taiwan University,
Taipei, Taiwan 10617, R. O. C. (e-mail: lsi@cc.ee.ntu.edu.tw).
J.-W. Lin was with the Department of Electrical Engineering and Graduate
Institute of Electronics Engineering, National Taiwan University, Taipei, Taiwan
10617, R. O. C. He is now with Sunplus Corporation, Hsinchu 300, Taiwan.
C.-Y. Yang is with the Department of Electrical Engineering, Huafan University, Taipei, Taiwan 223, R. O. C.
Publisher Item Identifier 10.1109/JSSC.2002.800922.

Fig. 1. Block diagram of the conventional analog DLL.

maximum operating frequency of a DLL will be limited by the


minimum delay of the delay line.
In this paper, a DLL with wide-range operation and fixed
latency of one clock cycle is proposed by using the phase selection circuit and the start-controlled circuit. The proposed DLL
not only locks the delay equal to one clock cycle but also operates without the restrictions stated above. The operating frequency range of the proposed DLL can also be increased.
The range problem of conventional DLLs will be discussed
in Section II. The architecture of the proposed DLL will be introduced in Section III and the building blocks in this DLL will
be described in Section IV. Measurement results are given in
Section V. Conclusions are given in Section VI.
II. RANGE PROBLEM OF CONVENTIONAL DLLS
A conventional DLL, as shown in Fig. 1, consists of four
major blocks: the phase detector (PD), the charge-pump circuit, the loop filter, and the VCDL. In the DLL, the reference
clock, ref_clk, is propagated through VCDL. The output signal,
vcdl_clk, at the end of the delay line is compared with the reference input. If delay different from integer multiples of clock
period is detected, the closed loop will automatically correct it
by changing the delay time of the VCDL. However, the conventional DLL will fail to lock or falsely lock to two or more pe, of the input signal if the initial delay of the VCDL is
riods,
or longer than 1.5
, as shown in Fig. 2.
shorter than 0.5
Therefore, if the DLL is required to lock the delay to one clock
cycle of the input reference signal, the initial delay of the VCDL
and 1.5
[7], regardless
needs to be located between 0.5
of the initial voltage of the loop filter. Assume that the maximum and the minimum delay of the VCDL are

0018-9200/02$17.00 2002 IEEE

1022

Fig. 2.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

DLL in normal lock and false lock conditions.


Fig. 3.

System architecture of the proposed DLL.

Fig. 4.

Small-signal model of the conventional analog DLL.

and
, respectively. As a result, the period of the input
signal should satisfy the following inequality [7]:
Max
Min

(1)

Equation (1) shows that the DLL is prone to the false locking
problem when process variations are taken into account [7].
Therefore, some solutions [6][10] are proposed to overcome
this problem. They are described as follows.
First, the basic idea is to use a phase-frequency detector
(PFD) [5], because it has a capture range of 2 , 2 wider
than other phase detectors. So, the PFD is a better choice for
wide range operation. However, the PFD cannot be used in the
DLL alone without any control circuit because the DLL will
try to lock a zero delay. A PFD combined with a control circuit
is presented in [6]. Nevertheless, in some cases, especially for
high-frequency operations, the initial delay between ref_clk
and vcdl_clk, as shown in Fig. 1, may be larger than two clock
cycles and harmonic locking will occur.
Second, a solution called an all-analog DLL using a replica
delay line [7] has been developed to solve the narrow frequency
range problem of a conventional DLL. If the delay range of the
,
VCDL satisfies the relation
the DLL will have a maximum operation range of 7:1.
Third, a digital-controlled DLL called the self-correcting
DLL is proposed in [8]. The problem of false locking is
solved by the addition of a lock-detect circuit and the modified
phase detector. Although this self-correcting DLL avoids false
locking, the outputs of the VCDL are required to have an exact
50% duty cycle.
The DLL developed in [9] uses a stage selector for fast-locked
and wide-range operations, but the DLL requires an additional
VCDL, which increases the area. A similar DLL can automatically change its lock mode to extend the operation range, but
the latency of the DLL will be larger than one clock cycle [10].
The approach presented in this work uses a phase selection
circuit to automatically decide what number of delay cells
should be used. This can enable the DLL to operate in the
wide-frequency range. A new start-controlled circuit is also

Fig. 5. Block diagram of the phase selection circuit.

presented for the DLL to solve false locking problems and keep
the latency of one clock cycle. The exact 50% duty cycle is not
necessary.
III. ARCHITECTURE OF THE PROPOSED DLL
The architecture of the proposed DLL is shown in Fig. 3. It
is composed of a conventional analog DLL, a phase selection
circuit, and a start-controlled circuit. Before the DLL begins to
lock, the phase selection circuit will choose an appropriate delay
cell to be a feedback signal (vcdl_clk) according to different frequencies of input signal. In other words, the number of the delay
cells may change at different input frequencies. The minimum
of the delay line is determined by one unit-delay
delay
where
cell. The maximum delay can be decided as
is the number of unit-delay cells. Thus, the operating freto
quency range of the DLL can be from
.
The linear model of the DLL is shown in Fig. 4, where the
is the charge-pump cursummer stands for a phase detector,

CHANG et al.: WIDE-RANGE DELAY-LOCKED LOOP WITH FIXED LATENCY OF ONE CLOCK CYCLE

Fig. 6.

Schematic of edge detection circuit. (a) Edge detection circuits. (b) Clock edge generation. (c) Latch

(a)

1023

N.

(b)

Fig. 7. Timing diagram of edge detection circuit.

rent,
is the period of the input reference clock, is the
is the gain of the
capacitor value in the loop filter, and
VCDL which is proportional to the number of delay cells. In
the steady-state locked condition, the -domain transfer function can be expressed as [11]
(2)

is the input delay time and


is the output delay
where
can be expressed as [11]
time. The loop bandwidth
(3)
Since the transfer function is inherently stable, a wider loop
bandwidth can be used to achieve fast acquisition time, but

the jitter performance will be degraded. Hence, the following


tradeoff design guideline was suggested in [12]:
(4)
.
where
When the input frequency is higher, the phase selection circuit
will
will select the smaller number of delay cells and
become smaller. In order to have an adequate loop bandwidth for
the DLL, the capacitances used in the loop filter must become
smaller. In this work, the 3-bit control signals generated from
the phase selection circuit will switch the number of capacitors
in the loop filter depending on the selected phase.
After the vcdl_clk is decided, the DLL will start the locking
process, which is controlled by the start-controlled circuit. First,
the delay between input and output of the VCDL is initially set
to the minimum value and then allows the down signal of the
PFD output activate, supposing that the VCDLs delay increases
with control voltage decreasing. Therefore, the delay between

1024

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

Fig. 8. Schematic of start-controlled circuit associated with PFD.

Fig. 9. Timing diagram of start-controlled circuit.

input and output of the VCDL will increase until it reaches one
clock period of the input signal. Thus, the DLL will not fall into
false locking and the latency is fixed to one clock cycle no matter
how long a delay the VCDL provides.
IV. CIRCUIT DESCRIPTION
A. Phase Selection Circuit
The phase selection circuit consists of two blocks: an edge detector and a multiplexer with a decoder, as shown in Fig. 5. The
schematic and timing diagram of the edge detector are shown
in Figs. 6 and 7, respectively. To guarantee that the latency of
the DLL is just one clock cycle, the first two clock phases in
Fig. 6 are reserved for measurement. In practice, the first two
clock phases could be included in the phase selection circuit to
improve the operating frequency range of the DLL. At the initial
state, the signal startb is set to low to reset the edge detector outputs (i.e., d3 d10) and the delay of the VCDL is set to its minimum value. When the signal startb goes high, the edge detector
will detect the rising edge of input signals in sequence during
the next two rising edges of ref_clk. Referring to Fig. 7(a), suppose that the signals all have rising edges in sequence during
one clock cycle, therefore, the outputs (d3 d10) are all high
and the multiplexer will select phase 10 as the output signal,
vcdl_clk. However, if the input frequency is higher, suppose that
the timing diagram is similar to Fig. 7(b). All the inputs have
rising edges during one clock cycle, but only the rising edges of
phases 1 4 in sequence lead the selected phase to be 4. The
vcdl_clk will be low until the selected phase is chosen. After

Fig. 10.

Schematic of the PFD circuit [12].

the vcdl_clk is decided, the DLL will start the locking process,
which will be explained later. By the decoder, signals (d3 d10)
are decoded to generate 3-bit control signals, which switch the
number of capacitors used in the loop filter for tuning the loop
bandwidth.
B. Start-Controlled Circuit
The schematic of the start-controlled circuit and the associated PFD are shown in Fig. 8. It is composed of only two
rising-edge trigger D-flip-flops (DFFs), two NAND gates, and

CHANG et al.: WIDE-RANGE DELAY-LOCKED LOOP WITH FIXED LATENCY OF ONE CLOCK CYCLE

Fig. 11.

Schematic of the charge-pump circuit [11].

Fig. 12.

Schematic of the delay cell with replica bias [12].

1025

Fig. 14. Microphotograph of the chip.

Fig. 13.

Simulated transfer curve of the VCDL.

two inverters. The timing diagram of this start-controlled circuit


is shown in Fig. 9. Initially, startb is set to low in order to clear
the two DFFs outputs. Therefore, setupb is low and pulls the
, as shown in Fig. 3 (i.e., set the VCDL
control voltage to
delay to its minimum value). In this way, the two inputs of the

PFD are in low level. When startb goes to high, setupb will
also go to high. After two consecutive falling edges of vcdl_clk
trigger the DFFs, the down signal of the PFD will be activated
and let the delay of the VCDL increase. The delay of the VCDL
will increase until it is equal to one clock period of the input
signal due to the nature of negative feedback architecture. Since
the start-controlled circuit forces the delay of the VCDL to its
minimum value and controls the delay of the VCDL to increase
until its delay is equal to one clock period, the DLL will not fall
. In order to get
into false locking even when
equal delays for path1 and path2, dummy loads should be added
in point A. In comparison with [6], this start-controlled circuit

1026

Fig. 15.

Fig. 16.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 8, AUGUST 2002

DLL at initial state when operating frequency is 6 MHz.

DLL at initial state when operating frequency is 130 MHz.

has two advantages: the proposed circuit is simple, and the duty
cycle of ref_clk and vcdl_clk is not required to be exactly 50%.

Fig. 17.

Jitter histogram when DLL operates at 130 MHz.

Fig. 18.

Measurement results of rms jitter over different frequencies.


TABLE I
PERFORMANCE SUMMARY

C. Other Circuits
In this work, the dynamic logic style PFD [13] is adopted to
avoid the dead-zone problem and improve the operating speed.
To mitigate charge injection errors induced by the parasitic
capacitors of the switches and current source transistors, the
charge-pump circuit developed in [11] is used here. The delay
cell circuit is similar to [11]. The schematics of these circuits
are shown in Figs. 1012. The control voltage of the loop filter
is directly connected to nMOS rather than pMOS. Therefore,
the transfer curve of delay versus control voltage is monotonic
decreasing, as shown in Fig. 13.
V. EXPERIMENTAL RESULTS
The prototype chip is fabricated in a 0.35- m single-poly
triple-metal standard CMOS process. The microphotograph of
the chip is shown in Fig. 14. The capacitors used in the loop
filter are integrated in the chip and formed by metal-to-metal
capacitors. The experimental results show that the DLL can operate in the frequency range of 6130 MHz. Figs. 15 and 16

show the first four cycles of the DLL in the locking process
when the operating frequency is 6 and 130 MHz, respectively.
After the signal startb is high, the phase selection circuit will select one of the outputs of the VCDL as close as possible to the
next rising edge of the input clock, ref_clk. Figs. 15 and 16 also
show that after the signal startb is high, the first rising edge of

CHANG et al.: WIDE-RANGE DELAY-LOCKED LOOP WITH FIXED LATENCY OF ONE CLOCK CYCLE

the output clock of the VCDL, vcdl_clk, leads that of the input
clock, ref_clk. Since the signal startb will set the control voltage
in Fig. 3 to
, the proposed phase detector and the current-pump circuit will discharge the loop filter to increase the
delay of the VCDL. It will align the phases between the input
clock and output clock of the VCDL. Fig. 17 shows the jitter
histogram when the DLL operates at 130 MHz. Fig. 18 shows
the measurement results of rms jitter over different frequencies.
Table I gives the performance summary. The proposed DLL can
be seen to have a wide-operational range and a fixed latency of
one clock cycle.
VI. CONCLUSION
A DLL with wide-range operation and fixed latency of one
clock cycle is proposed. First, the multiphase outputs of the
VCDL are all sent to the phase selection circuit. Then the
phase selection circuit will automatically select one of the
delayed outputs to feedback. As a result, this DLL can operate
over a wide range without suffering from harmonic locking
problems. Ideally, this DLL can operate from
to
. The experimental results also demonstrate
the functionality of the proposed DLL. Moreover, at different
operating frequencies, the jitter performances are all in an
acceptable range and the latency is just one clock cycle. Since
the speed of the proposed circuits can be increased if the
more advanced process is used, the performance of the DLL
such as the operating frequency range can be improved with a
little hardware and design effort. The power consumption of
the digital part in the DLL and the total die area will also be
reduced.
REFERENCES
[1] B. Razavi, Monolithic Phase-Locked Loops and Clock Recovery Circuits: Theory and Design. Piscataway, NJ: IEEE Press, 1996.
[2] F. M. Gardner, Charge-pump phase-lock loops, IEEE Trans.
Commun., vol. COM-28, pp. 18491858, Nov. 1980.
[3] R. E. Best, Phase-Locked Loops: Theory, Design and Applications. New York: McGraw-Hill, 1998.
[4] R. L. Aguitar and D. M. Santos, Multiple target clock distribution
with arbitrary delay interconnects, Electron. Lett., vol. 34, no. 22, pp.
21192120, Oct. 1998.
[5] R. B. Watson Jr. and R. B. Iknaian, Clock buffer chip with multiple
target automatic skew compensation, IEEE J. Solid-State Circuits, vol.
30, pp. 12671276, Nov. 1995.
[6] C. H. Kim et al., A 64-Mbit 640-Mbyte/s bidirectional data strobed,
double-data-rate SDRAM with a 40-mW DLL for a 256-Mbyte memory
system, IEEE J. Solid-State Circuits, vol. 33, pp. 17031710, Nov.
1998.
[7] Y. Moon, J. Choi, K. Lee, D. K. Jeong, and M. K. Kim, An all-analog
multiphase delay-locked loop using a replica delay line for wide-range
operation and low-jitter performance, IEEE J. Solid-State Circuits, vol.
35, pp. 377384, Mar. 2000.
[8] D. J. Foley and M. P. Flynn, CMOS DLL-based 2-V 3.2-ps jitter 1-GHz
clock synthesizer and temperature-compensated tunable oscillator,
IEEE J. Solid-State Circuits, vol. 36, pp. 417423, Mar. 2001.
[9] H. Yahata, T. Okuda, H. Miyashita, H. Chigasaki, B. Taruishi, T. Akiba,
Y. Kawase, T. Tachibana, S. Ueda, S. Aoyama, A. Tsukinori, K. Shibata,
M. Horiguchi, Y. Saiki, and Y. Nakagome, A 256-Mb double-data-rate
SDRAM with a 10-mW analog DLL circuit, in Symp. VLSI Circuits
Dig. Tech. Papers, June 2000, pp. 7475.

1027

[10] Y. Okuda, M. Horiguchi, and Y. Nakagome, A 66400 MHz adaptive-lock-mode DLL circuit duty-cycle error correction, in Symp. VLSI
Circuits Dig. Tech. Papers, June 2001, pp. 3738.
[11] J. G. Maneatis, Low-jitter process-independent DLL and PLL based
on self-biased techniques, IEEE J. Solid-State Circuits, vol. 31, pp.
17231732, Nov. 1996.
[12] A. Chandrakasan, W. J. Bowhill, and F. Fox, Design of High-Performance Microprocessor Circuit. New York: IEEE Press, 2001, p. 240.
[13] S. Kim et al., A 960-Mb/s/pin interface for skew-tolerant bus using
low jitter PLL, IEEE J. Solid-State Circuits, vol. 32, pp. 691700, May
1997.

Hsiang-Hui Chang (S01) was born in Taipei,


Taiwan, R.O.C., on February 4, 1975. He received
the B.S. and M.S. degrees in electrical engineering
from National Taiwan University, Taipei, in 1999
and 2001, respectively. He is currently working
toward the Ph.D. degree in electrical engineering at
National Taiwan University.
His research interests are PLL, DLL, and
high-speed interfaces for gigabit transceivers.

Jyh-Woei Lin was born in Kaoshiung, Taiwan,


R.O.C., in 1974. He received the B.S. degree in
electrical engineering from National Taipei University of Technology in 1996, and the M.S. degree
in electrical engineering from National Taiwan
University in 2001.
He joined Sunplus Corporation, Hsinchu, Taiwan,
in 2001 as an Analog Circuit Designer. His research
interests include PLL, DLL, and interface circuits for
high-speed data links.

Ching-Yuan Yang (S97M01) was born in Miaoli,


Taiwan, R.O.C., in 1967. He received the B.S. degree in electrical engineering from the Tatung Institute of Technology, Taipei, Taiwan, R.O.C., in 1990,
and the M.S. and Ph.D. degrees in electrical engineering from National Taiwan University, Taipei, in
1996 and 2000, respectively.
He has been on the faculty of Huafan University,
Taiwan, since 2000, where he is currently an Assistant Professor with the Department of Electronics Engineering. His research interests are in the area of
mixed-signal integrated circuits and systems for high-speed interfaces and wireless communication.

Shen-Iuan Liu (S88M93) was born in Keelung,


Taiwan, R.O.C., on April 4, 1965. He received both
the B.S. and Ph.D. degrees in electrical engineering
from National Taiwan University, Taipei, in 1987 and
1991, respectively.
During 19911993, he served as a Second Lieutenant in the Chinese Air Force. During 19911994,
he was an Associate Professor in the Department
of Electronic Engineering of National Taiwan
Institute of Technology. He joined the Department of
Electrical Engineering, National Taiwan University,
Taipei, in 1994, where he has been a Professor since 1998. His research
interests are in analog and digital integrated circuits and systems.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 11, NOVEMBER 2000

1553

Active GHz Clock Network Using Distributed PLLs


Vadim Gutnik, Member, IEEE, and Anantha P. Chandrakasan, Member, IEEE

AbstractA novel clock network composed of multiple synchronized phase-locked loops is analyzed, implemented, and
tested. Undesirable large-signal stable (mode-locked) states
dictate the transfer characteristic of the phase detectors; a matrix
formulation of the linearized system allows direct calculation of
system poles for any desired oscillator configuration. A 16-oscillator 1.3-GHz distributed clock network in 0.35- m CMOS is
presented here.
Index TermsClock network, multiple oscillator system, phaselocked loop.

I. INTRODUCTION

HE CLOCK distribution network of a modern microprocessor uses a significant fraction of the total chip power
and has substantial impact on the overall performance of the
system. For example, the 72-W 600-MHz Alpha processor [1]
dissipates 16 W in the global clock distribution, and another
23 W in the local clocks: more than half the power goes to
driving the clock net. The clock uncertainty budget for a global
clock is 10% of a clock period, which translates to a 10% reduction in maximum operating speed; as argued below, this penalty
is likely to increase for currently popular clock architectures.
Most conventional microprocessors use a balanced tree to distribute the clock [1][3]. Because the delays to all nodes are
nominally equal, trees may be expected to have low skew. However, at gigahertz clock speeds a large fraction of skew and jitter
comes from random variations in gate and interconnect delay.
The majority of jitter in a clock tree is introduced by buffers and
inter-line coupling to the clock wires; a relatively small amount
comes from noise in the source oscillator [4]. Therefore, a primary consideration in clock design is matching delay along the
clock path.
As clock speed increases, signal delay across a chip becomes
comparable to a clock cycle. For example, a 2-cm-long wire in
a 0.25- m process has a delay of 0.86 ns, while the clock might
be as high as 1 GHz; scaling to 4 GHz, the same wire (with
optimal buffering) will have a delay of approximately 0.43 ns,
compared to a clock period of 0.25 ns. In all practical cases a
signal that takes longer than a clock cycle to propagate would
be pipelined, and hence re-clocked. The fundamental weakness
of tree distribution (and networks that depend on tree matching)
Manuscript received March 24, 2000; revised June 24, 2000. This paper was
supported by the MARCO Focused Research Center on Interconnects, which
was funded at the Massachusetts Institute of Technology through a subcontract
from the Georgia Institute of Technology, and supported in part by a Graduate
Fellowship from the Intel Corporation.
V. Gutnik was with M.I.T. Microsystems Technology Lab, Cambridge, MA
02139 USA. He is now with Silicon Laboratories, Austin, TX 78749 USA
(e-mail: gutnik@mit.edu).
A. P. Chandrakasan is with M.I.T. Microsystems Technology Lab, Cambridge, MA 02139 USA (e-mail: anantha@mtl.mit.edu).
Publisher Item Identifier S 0018-9200(00)09441-5.

is that skew is only relevant between communicating latches,


but the clock path is always the length of the chip. Clock speeds
increase with gate delay, and processor architectures can exploit
both locality of blocks and pipelining to avoid penalty due to
long signal paths, but the error in a global clock scales with the
total path delay, and is thus a growing fraction of a clock cycle.
In this paper, we consider the effects of static and dynamic
mismatch on a few representative clock networks in Section II
and propose a distributed generation scheme that needs only
local synchronization to generate a global clock. Large and
small-signal stability of the proposed network is analyzed in
Section III. This clock was implemented on a test chip; circuit
details and results are presented in Sections IV and V.
II. MODELING RANDOM SKEW
A. Assumptions
Given sufficiently accurate models, systematic skew can
be corrected at design time. Therefore, the primary interest
is random zero-mean variations. For the sake of comparing
architectures, we make several simplifying assumptions.
1) Delay mismatch, both static and dynamic, is proportional
to total delay.
2) Wire RC delay is independent of gate delay ( ).
3) The clock period proportional to gate delay.
4) Chip size is independent of gate delay.
5) In 0.25- m technology, signal delay across a die equals
one clock period.
Assumption 1 is inaccurate, but convenient. Mismatch due
to gradients scales as delay squared; purely random short-distance mismatch scales as the square root of delay. For the sake
of analysis, however, we will assume that uncertainty scales linearly. Assumptions 2, 3, and 4 are approximately true, given historical data: as the geometries scale the resistance increase in
clock wires is offset by lower capacitance; processor cycle time
is generally on the order of 816 gate delays; and chip sizes
mm .
hover around
Assumption 5 serves to normalize signal delay, chip size, and
clock speed. It is not coincidental that random variation has become a noticeable issue at about the time when cross-die signal
delay is comparable to one clock cycle: as a heuristic, 10% of a
clock cycle is allocated for unmodeled skew and jitter margin,
and delay uncertainty is about 5%10% of delay. Hence, when
delay across a chip is comparable to clock cycle time, random
delay is a considerable fraction of the total clock error budget.
B. Tree
To keep internal clock skew low, a tree is generally made deep
enough that a tile driven by a single leaf is small compared to the
size of the chip [5], [6]. In turn, this means that the path from the

00189200/00$10.00 2000 IEEE

1554

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 11, NOVEMBER 2000

D. Active Feedback

Fig. 1. Simulated edge in a grid with skew to the drivers.

Fig. 2.

Short circuit power in a grid vs. input tree skew.

clock source to the load is comparable to the size of the entire


die. Because the worst-case skew occurs between two adjacent
leaves for which the clock path was completely different, worst
case mismatch depends on the entire source-to-leaf delay. And
worse, the problem becomes worse with process scaling. Because RC delay does not scale, delay along an optimally buffered
; hence the skew as a fraction of the clock
line scales only as
with falling .
period grows as
C. Grid
Modern grids are H-tree-grid hybrids: a short H-tree distributes clock to a few (4 or 16, for example) buffers around a
chip, and those buffers drive a clock grid in parallel. Shorting
the buffers together helps drive down some of the uncertainty
at the cost of increased short-circuit power during switching
and somewhat slower edge rates. However, rise time scales
linearly with , so by the same reasoning as applied to the tree
scaling arguments, skew as a fraction of rise time will increase
as gate delay falls. When the tree skew exceeds rise
with
time, short circuit power dissipation increases rapidly, and the
clock edges begin to show an unacceptable kink. Fig. 1 shows
simulated edge shapes with increasing input skew for a grid
driven from a 4-level tree with skews from 0 to 200 ps, and
Fig. 2 shows the corresponding short-circuit power dissipation,
-power for the clock grid.
plotted as a fraction of

As is evident from the given examples, most of the skew


comes from the initial long-distance distribution of a clock to
relatively small loads. A delay-locked loop (DLL) could be
adapted to measure and cancel out wire variations, as shown
in Fig. 3. If the round-trip delay is tuned to an even number of
clock cycles, the wire has nominally 0 delay.
Unfortunately, despite the apparent symmetry, the forward
and reverse paths do not match well for two reasons. First,
matched buffers are physically separated. In Fig. 3, should
match , although it would be physically near . is not as
far away from its matched pair as it might be in a tree, but it will
still typically be millimeters away. Second, there is no temporal
at a different time than
correlation. The clock signal passes
it passes , so any time-dependent variations, including those
due to power supply and signal coupling, do not match.
Another approach, proposed by Intel, is shown in Fig. 4 [7].
Here, a DLL matches delays to two half-trees; an obvious generalization, with four DLLs matching quarter-trees is shown in
Fig. 5. Static delay variations of some nearest neighbors are canceled out by the DLL to within the precision of the matching of
the comparators. The drawback is that some neighboring nodes,
as and in Fig. 5, are only related through multiple DLLs.
A much better result can be obtained by using DLLs that take
multiple reference inputs, and adjust output phase to be aligned
exactly between the two inputs. The network can then be redrawn somewhat more symmetrically, as Fig. 6. (For clarity, the
local tree was not drawn, and the connections to the comparators are abstracted.)
Optimization of the number of tiles is straightforward. Internal skew scales with tile area, so as the number of tiles increases, internal skew falls. However, every boundary between
tiles introduces some skew because of mismatch in the phase detector (PD). Hence, as the number of tiles increases, the number
of boundaries increases. Fig. 7 shows the optimization curves
calculated for this clock metric. As in other clock networks,
faster clocks require a more finely grained architecture. Jitter in
a DLL network will rise in exactly the same way as it increases
in clock trees, and for the same reasons. Skew scales linearly
with because it is comprised of comparator mismatches and
delays across each leaf-patch. Note, however, that in a phaselocked loop (PLL) the noise can be expected to scale with ; a
PLL network like the one in Fig. 6 would have total clock uncertainty that is a constant fraction of the clock period.
III. STABILITY
We propose a distributed clock network comprised of an
array of synchronized PLLs. Independent oscillators generate
the clock signal at multiple points (nodes) across a chip;
each oscillator distributes the clock to only to a small section
of the chip (tile) (Fig. 8). PDs at the boundaries between tiles
produce error signals that are summed by an amplifier in each
tile and used to adjust the frequency of the node oscillator. In
general, the network need not be square or regular.
With locally generated clocks, there are no chip-length clock
lines to couple in jitter; skew is introduced only by asymmetries
in PDs instead of mismatches in physically separated buffers,

GUTNIK AND CHANDRAKASAN: ACTIVE GHz CLOCK NETWORK USING DISTRIBUTED PLLs

Fig. 3.

1555

Low-skew wire with DLL.

Fig. 7.

Tile number optimization.

Fig. 4. Matching tree leaves with a DLL.

Fig. 8. Distributed clocking network.

Fig. 5.

DLL architecture.

the use of multiple independent clocks [8], this approach produces a single fully synchronized clock. The rest of this section
examines small and large signal stability of a distributed PLL.
A. Small Signal
In a multiple-oscillator PLL large-signal and small-signal behavior are interrelated. In normal operation, the oscillators are
phase-locked, and jitter depends on the network response to
noise. Because startup is expected to take a negligibly small
fraction of time, the connection of the oscillators is optimized
for small-signal behavior rather than to make initial acquisition
more efficient. The linearized small-signal behavior, valid when
the oscillators are nearly in phase, is analyzed first.
B. General Derivation

Fig. 6. Multi-input delay cell DLL architecture.

and the clock is regenerated at each node, so high-frequency


jitter does not accumulate with distance from the clock source.
Unlike earlier work on multiple clock domains which suggested

The block diagram (Fig. 9) of a multiple-oscillator PLL is


essentially identical to the one for a conventional PLL, except
that the connections between blocks are vectors instead of individual signals, and the gains and transfer functions are matrices
instead of scalars. This means that the PD becomes matrix ,

1556

Fig. 9.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 11, NOVEMBER 2000

Linear system model of a multi-oscillator PLL.

system ), the output of PLL is the input to PLL


is described by
shown in Fig. 10.

, as

(3)
Fig. 10. One-dimensional PLL array; symmetrical with the dotted-line
connections.

of size
, and the loop filter becomes
,a
matrix.
is an incorresponding
matrix. The network of oscillators
tuitively meaningful
is similar to a lumped circuit with a node for each oscillator
and a branch for each connection between pairs of oscillators.
Node voltages in represent oscillator phase, and branch currents represent the error signals on the output of the PD. is the
conductance matrix for with unity conductance branches.
for a four-oscillator network is shown in (1). Each off-diagonal
is 1 if there is a PD between node and node ;
entry
is the number of detectors attached to node .

(1)

DC gain in the loop can be lumped into .


Writing the transfer function in matrix form gives
(2)
where is the phase error input to each phase comparator.
is the reference phase, and
are the noise contributions from interconnect and PD mismatch.
C. Examples
is determined by the geometry of the tiles, and
Matrix
hence will constrained by the placement of clock loads, which
for this problem is fixed. Assuming the simplest possible PLL,
. This leaves , , and as design variables.
There are still far too many choices to find the general optimum, but a few examples may help guide the search.
1) One-Dimensional Array: A one-dimensional array of oscillators with PDs between neighbors is the simplest generalization of a single PLL. In a perfectly asymmetrical array (call this

This system has multiple poles at the same place where a singleoscillator PLL has single poles.
On the other hand, in a perfectly symmetrical array (call it
), the input to each oscillator is the phase of oscillators
and
(Fig. 10, with the dotted-line connections). The
matrix is the same because the physical arrangement of nodes
changes:
is identical, but

(4)

as in , it is necessary
To achieve the same phase margin in
to lower the gain . This can be shown with a geometrical ar,
gument: in , when the phase of oscillator changes by
the change is measured at two PDs, so oscillator feels twice
the feedback that it would have felt in , and at the same time,
and
both adjust in the opposite direction,
oscillators
giving four times the effective gain. Hence, the gain must be decreased by a factor of approximately four. Mathematically, the
is 1, but the largest eigenvalue of
largest eigenvalues of
is 3.5. Poles of the symmetrical system, solved via (2),
and
are plotted in Fig. 12(a). The key difference between
is the systems response to noise. In both cases, noise at frequenare attenuated. For
cies higher than the unity gain frequency
frequencies much lower than , the response can be calculated
via (2). Fig. 11 shows a Bode plot of noise at node in response
is much
to a noise source at node . Noise performance of
worse for intermediate frequencies because there is no feedback
so errors propagate forever. In , the feedback limits the influence of preceding stages, and this in turn attenuates noise. For
this reason, networks with feedback are preferred, despite the
more complicated stability calculation.
2) Two-Dimensional Array: A two-dimensional array is analyzed exactly the same as is a one-dimensional array, except
that the gain has to decrease by another factor of two because the
center oscillators see four neighbors rather than two. A 16-elegrid is implemented in this thesis. Its poles
ment array in a
are shown in Fig. 12(b).

GUTNIK AND CHANDRAKASAN: ACTIVE GHz CLOCK NETWORK USING DISTRIBUTED PLLs

Fig. 13.
Fig. 11. Comparison of noise responses for symmetrical and asymmetrical
networks.

(a)

(b)
Fig. 12.

Root locus for 1-D and 2-D PLL arrays. (a) 1-D array. (b) 2-D array.

D. Large Signal: Mode Locking


The analysis of the previous section indicates that fully connected networks should have a better noise response than asymmetrical networks. However, the feedback allows the possibility
of undesirable large-signal modes. Consider the matrices for a
PLL network:

1557

Mode-locking example.

Because phase is periodic with period , the phase measured


. For small ,
at the PDs
, so the nonlinearity is irrelevant. However, with

(6)
is a stationary point. This is intuitively easy to see, in
so
reference to Fig. 13: each oscillator leads one neighbor, and
lags behind another neighbor by exactly the same amount. The
net phase error is zero, so clearly there is no restoring force to
drive the phases to 0. Because the nonlinearity does not change
are the same
for small deviations from , dynamics about
as those about 0 and hence this state is stable. The locking
of a distributed oscillator to nonzero relative phases has been
called mode-locking [9]. At startup, each oscillator in a distributed PLL starts at a random phase, so there is a nonzero
chance of converging to a mode-locked state. Simulations show
that for a network like the one shown here, the system ends
of random initial states. The probamode-locked from
array
bility goes up rapidly with the size of the system; a
ends up mode-locked well over 99% of the time.
Pratt and Nguyen proved several useful properties about systems in mode-lock [9]. The key result, generalized for nonCartesian networks, is that for a system in mode-lock, there
must be a phase difference between two oscillators such that
where is the number of nodes in the largest minimal
loop in the network and a minimal loop is a loop in the graph
that cannot be decomposed into multiple loops
This result suggests a way to distinguish between
mode-locked states and the desired 0-phase state: in mode-lock,
there must be at least one branch with a large phase error. If the
gain of the PD is designed to be negative for a phase difference
larger than , then all mode-locked states are made unstable
without affecting the in-phase equilibrium. Pratt and Nguyen
suggest that XOR PDs preclude mode-lock in a rectangular
network of oscillators because the response decreases for phase
, [9]. This result follows directly from the
errors larger than
result derived above: in a rectangular array, the largest minimal
. A PD described in the
loop has four nodes, so
, would be useful in nonrectangular
next section, with
networks, and where more gain near 0 phase is desirable.
IV. IMPLEMENTATION

(5)

The distributed clock network generates the clock signal with


PLLs at multiple points (nodes) across a chip, and distributes

1558

Fig. 14.

Fig. 15.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 11, NOVEMBER 2000

Ring oscillator schematic

Fig. 16.

Simulated phase transfer curve

Fig. 17.

Locking behavior of the PLL array

Phase detector (PD).

each only to a small section of the chip (tile) (Fig. 8). PDs
at the boundaries between tiles produce error signals that are
summed by an amplifier in each tile and used to adjust the frequency of the node oscillator
Because the proposed network has many nodes, the power
and size constraints on each node are even more stringent than
the constraints on a single global PLL. The oscillator, PD, and
loop filter of a working demonstration chip, fabricated in a standard 0.35- m single-poly triple metal process, are considered in
turn below.
A. Oscillator
The demonstration chip used an nMOS-loaded differential ring oscillator as a voltage-controlled oscillator (VCO)
comprise the differential
(Fig. 14). Transistors
, the tail current is driven
inverter. The differential pair is
, and
act as the nMOS load. The nMOS loads allow
by
noise.
fast oscillation and shield the output signal from
is a low-pass version of
generated by subthreshold
; supply noise coupling in through
leakage through PFET
of
is bypassed by
. The oscillation frequency
is only dependent on the supply voltage through capacitor
, and feedback
nonlinearity and the output conductance of
and
.
of the PLL compensates drift of
B. Phase Detector (PD)
The PD, shown in Fig. 15, has a sufficient nonlinearity, higher
gain at small input phase difference and less high-frequency
) is an nMOScontent than an XOR PD. The core (

loaded arbiter which acts as a nonlinear PD. For no input phase


difference, the output is balanced. As the phase difference increases from zero, one output will be asserted for the full duration of an input pulse, while the other output will be asserted
for only the remainder of the input pulse duration after the first
input pulse ends, which is equal to the input phase difference.
Thus the detector has very high gain near zero phase error that
drops off to zero as the input phase difference approaches the
input pulse width (Fig. 16).
and
enable this arbiter to give
The pulse generators
frequency-error feedback. If one input is at a higher frequency
than the other, its output will be asserted for more input pulses
than the other. Because the width of the pulses is independent
of input frequency, the average output voltage corresponds to
frequency. Unlike a typical phase-frequency detector, however,
the strength of the error signal falls to zero as frequency difference goes to 0, so there can be no mode-lock problems, yet large
signal frequency (and hence, phase) locking is enhanced. Fig. 17
shows the large signal correction and small signal behavior of
the entire array of PLLs as the already internally locked array

GUTNIK AND CHANDRAKASAN: ACTIVE GHz CLOCK NETWORK USING DISTRIBUTED PLLs

1559

Fig. 18. Loop filter schematic.

Fig. 19.

Frequency-locked divider outputs.


Fig. 20.

approaches and locks to the reference clock. The detector fits in


m
m.
C. Loop Filter
make up ampliThe loop filter is shown in Fig. 18.
make up . The differential output
fier , while
currents from the PDs at the edges of each tile are summed at
and
, and drive both amplifiers.
is a single
nodes
stage differential pair so it has relatively low gain but a band.
has a high-gain cascoded stage
width limited by
.
is a large gate cadriving a common source PFET
such that
pacitor which serves to set the dominant pole of
is biased at very low current to
the PLL network is stable.
boost gain and enable a low time constant (as low as 12 kHz)
m
m gate capacitor. The simple design and
with a
feed-forward compensation allow the loop filter to fit in only
m
m. Each clock node, consisting of an oscillator
m
m.
and a loop filter, takes just
V. RESULTS
A chip was fabricated with a
array of nodes and PD
between nearest neighbors. Counting one node and two PDs the
area overhead is approximately 0.0038 mm per tile. Another

Micrograph of the 16-oscillator 1.3-GHz chip.

PD was placed between one of the nodes and the chip clock
input to lock the network to an external reference. The output
of the 16 oscillators was divided by 64 and driven off chip. At
V, the divided outputs were seen to be frequency
locked at 17 to 21 MHz, corresponding to oscillator phase lock
at 1.1 to 1.3 GHz. An oscilloscope plot of four locked output
signals is shown in Fig. 19. Long-term jitter between neighbors
is less than 30 ps. Cycle-to-cycle jitter is less than 10ps. The
oscillators, amplifiers and all the biasing draws 130 mA at 3 V.
A chip plot is shown in Fig. 20. (The rest of the area on the
mm
mm chip is taken up by test circuits.)
VI. CONCLUSION
Design and measurements on this chip confirm that generating and synchronizing multiple clocks on chip is feasible. Neither the power nor the area overhead of multiple PLLs is substantial compared to the cost of distributing the clock by conventional means. Most importantly, a distributed clock network
can take advantage of improved devices by shrinking the size of
the cells, lowering the overall skew and jitter, so performance
will scale with the speed of devices, rather than with the much
slower improvement of on-chip interconnect speed.

1560

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 11, NOVEMBER 2000

REFERENCES
[1] D. W. Bailey and B. J. Benschneider, Clocking design and analysis for
a 600-MHz Alpha microprocessor, J. Solid State Circuits, vol. 33, no.
11, pp. 16271633, Nov. 1998.
[2] C. F. Webb, A 400-MHz S/390 microprocessor, in ISSCC Dig. Tech.
Papers, Feb. 1997, pp. 168169.
[3] T. Yoshida, A 2-V 250-MHz multimedia processor, in ISSCC Dig.
Tech. Papers, Feb. 1997, pp. 266267.
[4] I. A. Young, M. F. Mar, and B. Bhushan, A 0.35-m CMOS
3-880-MHz PLL N/2 clock multiplier and distribution network with
low jitter for microprocessors, in ISSCC Dig. Tech. Papers, Feb. 1997,
pp. 330331.
[5] H. B. Bakoglu, J. T. Walker, and J. D. Meindl, A symmetric clockdistribution tree and optimized high-speed interconnections for reduced
clock skew in ULSI and WSI circuits, in IEEE Int. Conf. Computer
Design, NY, Oct. 1986, pp. 118122.
[6] P. Zarkesh-Ha, T. Mule, and J. D. Meindl, Characterization and modeling of clock skew with process variations, in Proc. IEEE 1999 Custom
Integrated Circuits Conf., pp. 441444.
[7] G. Geannopoulos and X. Dai, An adaptive digital deskewing circuit for
clock distribution networks, in ISSCC Dig. Tech. Papers, Feb. 1998, pp.
400401.
[8] F. Aneau, A synchronous approach for clocking VLSI systems, J.
Solid State Circuits, vol. SC-17, no. 1, pp. 5156, Feb. 1982.
[9] G. A. Pratt and J. Nguyen, Distributed synchronous clocking, IEEE
Trans. Parallel and Distributed Systems, Mar. 1995.

Vadim Gutnik (M00) received the B.S. degree in electrical engineering and
materials science from the University of California, Berkeley, in 1994, and the
S.M. and Ph.D. degrees in electrical engineering from the Massachusetts Institute of Technology, Cambridge, in 1996 and 2000, respectively.
Previous research interests have included micromechanical resonators, and
variable-voltage power supplies. He is currently working as a Design Engineer
at Silicon Laboratories, Austin, TX.
Dr. Gutnik received an NDSEG fellowship in 1994, and the Intel Foundation
Fellowship in 1997.

Anantha P. Chandrakasan (M95) received the


B.S., M.S., and Ph.D. degrees in electrical engineering and computer sciences from the University
of California, Berkeley, in 1989, 1990, and 1994,
respectively.
Since September, 1994, he has been the Analog
Devices Career Development Assistant Professor of
electrical engineering at the Massachusetts Institute
of Technology, Cambridge. His research interests include the ultra-low-power implementation of custom
and programmable digital signal processors, wireless
sensors and multimedia devices, emerging technologies, and CAD tools for
VLSI. He is a co-author of the book titled Low Power Digital CMOS Design
(Norwood, MA: Kluwer, 1995). He has served on the technical program committee of various conferences including ISSCC, VLSI Circuits Symposium,
DAC, ISLPED, and ICCD. He is the Technical Program Co-Chair for the 1997
International Symposium on Low-Power Electronics and Design and for VLSI
Design98.
He received the National Science Foundation Career Development Award in
1995, the IBM Faculty Development Award in 1995, and the National Semiconductor Faculty Development Award in 1996. He received the IEEE Communications Society 1993 Best Tutorial Paper Award for the IEEE Communications
Magazine paper titled, A Portable Multimedia Terminal.

ISSCC 2000 I SESSION 10 I CLOCK GENERATION AND DlSTRl6UTlOH I PAPER TA 10.5

TA 10.5 Active GHz Clack Network using Distributed PLLs


\ladim Gutnik, Ananfha Chandrakasan
MIT Microsyslcms Technology I.ah, Cambridge, MA

difference iiicrcasos rrorti eeru, one output is asserted for the fiill
durntiun of nn input pulse, while thc nthcr output is asscrtcd for
only tlie remniiidcr of the input pu1.s~rlurnt.ion after tlic first input
pulse ends, which is cqual to the input phase differcnco. Thus the
dotectnr has high gain near zero phnsc error that drops nff to zero
A S the input phnsc rliffcrence apprnnclics the iiiput pulw width
(Figurc 10.5.4).

Mout modern microprocessors usc a balanccd tree t u distributr? t h o


cluck [I].Hnwcvcr, n I gigahertz cloclr spccdfi nn increasiiifi-iirnction
of slrew and jitter conies from random vwiatims in gate and The pulse generators P, and P, cnnble this orbitcr to give fwquency
Irone input is at n higher frequcncy than Lhe oLher,
intercunnccf dclny.The majority nf jit.tcriii a clock tree is inti>odiiccd error ficcdl)ncl~,
its output will bo asscrtad Tor tilore input pulscs Ihnu the othcr.
by bufycri; and inler-linc coupling to Ihc clock wircs. A rclntivcly
Bocnusc the width of tho pulscs is independcnt o f inpul frequency,
siiiall amount comes from noisc in the s o i i i ~ eoscillator 121.
thr! average output voltngc corresponds t.n frequency. Unlikc n
typicnl phasc-frcqucncy dcleclor, howcvcr, the fihangth uf thc error
Thin distributed cloelr notwo~lcgcrierateu tho clnclc sigrrnl with
plinsr! locked loops (ILLs) at multiple points (iiorlcs) ncrosfi n chip, signal falls to zcro as frequency differenct! gncs to 0, Y U thcrc can br!
no inodelock prublcms, yet large eigiial frequency- (and hcncc,
anddistdxdeil each oiily to a ~ i n n lscction
l
ofthe chip(tilc) (Figure
phase-) looking is cnhnnced. Kgurc 10.5.5shows the large-signnl
10.5.11. Phave detectors (PD) til Ihe bouiidarics lictwesn tiles
currectioii and smnll-signal bchavior oEths entire array of PLLs na
product error signals thnl are summed by an amplifier iii ench tile
the already intcrnnlly-locked array nppronches and lnclts tn the
nnd used t u acljnst tho rvequency uf the node oscillnlor.
refereiicc clnck. Thc PD fits in 3Ox30[nn2,
With locnlly-@nerated claclcs, thew are no chip-lctigth clock lincs
One loop filler is sssociakd with cnch VCO. Tu avoid the wries
to couple in jittcr; slrcw is intrwluccd only by asynimclrics in phase
resistor of a charge puiiip with passive RC compensation, a f e d
detectors itisteacl of mismatches in physically scpayntcd hflers;
rorwmd compensation inelhod is uscd. Thc loop filter of Figure
and the clocli i8 regeneratcd at cnch node, xu high frequency jitter
1O.S.G consists of t w o differential ainplifiers. M, - M, mdKC up
docs not nccuniulatc with distance from the clock mime. Uillikc
nuiplifier A,, whilc M9- M,?
nialc np A,. The differential output
cnrlicr work on multiple clock dornains which suggmls usc uf
currents fiom tlic PDs at the edges cifcach tile nre s u n n at nodcs
ii>iiltiplcindependent clocks, this approach prnrluccs n single fullyIn+ and In-, nnrl drive buth amplifiers. A, is A singlc-stagcdiffcrcrisynchronized clock. This nrbitrary iictworlc of tiles, riicli with its
lial pair so it hafi relatively low gain bul a bandwidth limitcd by
o w n PT,L, is more gcnornl than piwiuus activc skcw Inmagemelit
g,:,dC,*. A.. hnr: n high gain cascndcd stnge driving a cninmoli source
approaches (31.
pPE? M,7, M,, is n large gate capacitor which SCIVCS to set Ihe
IIowcvcr, because thcm nrc many nudes, tlic powcr nnrl size cuii- doniiiiaril polo n f ?,I2 such that t h o PLI, network is stablo. Mi, i s
biased a t l o w current t o boost gain nnd ennble tiinc constnnt as low
stthnints o n each clcmcnt d n distributed clock gcnci*ntiontiwhitcct w c nre evenmorc stringcnl lhnn the constraints on n single, global 88 l21rHz with A 1.5x15!iin2 gate capncitor. The simple dcsigti arid
feed-fonuartlcoinpciisationalluwthc loopfiltcrm fit iiionly 1 5 ~ 4 5 p ~ i .
PLT,. Furtliannore, thcrc niiisl be t i way to cnsurc thnl the mnltiplc
Each clock nndc, corisiuting of nil oacillnlor s n d a loop filter., Inkes
nodesgetniid stay sjnchronized.lhsuscillator,pliascdetectur,and
loopfiltcr oTs wurking d c m m l i o n chip, fabricated i r i afilaildnrd j u s t 4 ~ X 4 ~ ~ l i i i 2 ,
0.35pn1, singlo-poly triplc-mctnl process, arc considered in turn
A chip was fnhricnlcd with a 4x4 away o f rrodes and PD hctwccri
bclo\v.
ncnrest neighbors. Counting one norlo and two IDs, the a r m
overhead is approxirnnlely 0.003Briini2 pcr tile. Another PD is
This chip iiaes an nMOS-londcd differential ring oscillalar as a
betwecn nnc ofthr? nodes and thccliipcloclc input tu lock the nctworlc
voltage-contrullcdoscillator (VCO)t o minimize povvor supply noisc
to nn external rofcrcncc. Tlie output of the 1.G oscillators is divided
- M, conipriso thc differential
(Fig-orc 1.0.5.2). Iransistom
by 64 end drivcli off chip. At ,V, 5 W ,thc divided o1itput.s arc!
invartor. The differential pair i s ML,+,
Lhc tail currant is drivcIlhy M,
nndM,,: act as thcnMOS lonrl. ThennlOS loads allow fnst oscillation frequency-lockod n t 17 1~ZlMHz, cnrrcspotidingt u oseillntor phase
and shield the output signnl kom VlIn noisc. V,,im,i s a low-pnss lock at 1.1 to 1.3GIIx. An oscillascopc plot of four locked output
signals is showti i t ] Figure 10,5.7.
vcraioni ofV,, gcncmtcd I~ysuhlhresholdlcak~~c
llirongh pFlN M,;
supply noivc coupling in through C,c,iuf M4,7
is bypassed by M, . Ihc
J,ong-tcrm jit,tcr bctweeii neighbors is less Lhan 30ps r m . Cycle-Looscillntiun frcqucncy ifi dependent on the supplyvnltngconly through
cyclc jittcr is loss than 101)s.Ihcoscillntora, nriiplificrs and nll the
capacitoi. nonlinearity and Llie output conductnncc of M4:7,
and
hiaaing draws 1301n.4 nt 3V. A chip plot is shown in Figure 10,6.8.
fecdback o l the ILL compcnsntcs drin of Vnn aiid Vb-k,4.
Thc res1 ofthe area on tlic 3x3min2chip is tnkcn tip by test c i r c u i t s .
Bocausophnseis periodic, asotnfescillatorviieednotall bo in phase
t u each liavc xcro net phase error. Tliifi phcnoinenon, niodclnclr, is Dcsigri nod measureincnt.on t.his chip confirm tlint generating and
sy~~chroriizing
multiple clocks on chip is feasihlc. Neither the powcr
describcd in Rererenee [41, which n o l c s Lhat inodelock can be
nor
tlie
area
ovcrhond
of rnultiple ILLs i s solistantial compared to
avoidcd by iisiiig IUs whosc response decreascs nionotonicnlly
thc cost of distributing the clock by convcntionnl incans. h h t
bcymid a phnse difference of d2, ns from an XOH phnsc! detector.
Notc that this solution precludcs thc use of B phase-ficqireoq importnntly, a distrilrrutcd clock network can lake advantage o f
improved devices by shrinking thc s i x of llie cells, lowcriiig the
dctcctor (PFD). Lack nf a PFD is Iwoblematic bccausc the cepture
overall slrcw m r l j i l l a r , 80 performancc will scale with device speed,
bandwidth of a nicnio~ylcfifiPld, is liiiiitcd t o a fcw percent of thc
rather than with the much sluwcr improvement of on-chig interconcenter keqwncy, wliilo thc cenler frcqucncics o f widely-spced
nect spccd.
oficillntors un a chip can cnsily vary by 10-20%,
The 1D prnposcd here, shown in V i w w 10.5.3, E ~ R R sufficicnt.
nonliIicm,ily, higher gain at small input phase diffcrcncc and less
high-frequency content thnn nn XOE 1U. Ilie cord (RI, - MJ iu an
nMOS-loaded arbiter which ncts ns U noidincar plinfic delector. For
iio input phnse difference, thc output i s balanced. An tlic pl~nse

174

Auknowlcdgii I C RIS:
The nutliors ~clmowlrxlge~ ~ p p from
~ r tlie
t
MRR.CO Focused Resenrch Contcr mi Interconnects fuiidod nlrM12 through a suhconLrnct from Georgia Tccli. Vodim Gutnilr was partly nuppo~.tedby a
gmdutite fellowship from 1nt.el Corp.

2000 IEEE International Solid-State Circuits Conference

0-7803-5853~8/00/$10.00
02000 IEEE

ISSCC 2000 / February 8,2000 I Salon 9 / 10:45 AM

Keferences :

Chip Boundary

[I] Bnilcy,D. W.nnd B. 6.Bcnschncidcr, Clucltinrdcsign and analysis fora


GOO M l h Alplin rilictopr.ocfsanr.,
,Joiwrialof Solid Stnte Circuits, vol. 33,110.
11, pp. 1G27-1639,Novcnihcr 1998.
121Y U U I I.~ A.,
, M.I.Mar, and 13. Ithushan, A0.3:im CMOS :1-H8OMHz 1LL
NL2 cloclc nlulliplicr il rrtl dinidmiion netwnrk with low jitter formicruproccsAOIS, in ISYCI: 13igwt nfTechnicn1 lflpers, Fcbrtwry 1007, pp. 330-331,

i l e Boundary

[YI Genuunpouios, G. wid X. Diii, An utlnptivc digital rlnskcwirlg.circuit fnr


cloclt didtimihillion
riotworks, in tHHCC Uigent nflcclinicnl Papers, hhruury
19!M,pp, 400-4(1 1.
1.11 Pmtt, G . A. nnd J. Nguycn, Distributed synclirorrou:, clocking, 1l4l1Cl3:
Truns;ictionson Pnrallcl a n d Ilish~ibutetlSystems, Pehriinry 1996.

............................................
M7

h14

Vt>iaa

Vout

X t

Loop Filter
6t

1
~6~

vco

Figtire 10.6.1: Dicltribiited docking network.

4 g..............................................

501

I ~I
~

.-

- --

Figure 10.6.2:Ring oscillator schcmntic.

,..........

.................

fi

5. :;

M4

AI

Y2

M2

...............i

................

-0.2

I .

-0.1

0.2

0.1

TIme dlfference (nanoseconds)

Figure 10.6.4 Simulutcd ID tmnsCcr curve.

Figure 10.6.3 Phase detector.

Figtire 10.6.6: Loup filter schematic.

,.031 .

., I

0.5

1.5
2
2.5
3
Slmulatlon time (mlcmsoconds)

Piguro iU.S.6: Locking behavior of the ILL array.

-dPigurc 10.6.7: SCCpngc! 454.


3.5

Fignrc 10.5.8: See pnge 4Gt.

DIGEST OF TECHNICAL PAPERS

175

ISSCC 2000 PAPER CONTINUATlONS

Figure 10.1.5: Clock and data rccovery (CUR.)with MPM.

- -

Sitnilkition B 3.1 GHz


I3-3simplihed Foimiili~Cl 3.1 GH7
Complelc Formtila 0 3 , i l i t l z
01

Vibpro 10.4.8: Miorngraph.

50
100
Spacing b c l w o n Signal Linu and G r o w l Line (11

a
111)

Figure 10.4.R Wire inductancc formulae.

Figurc 10.4.7: Wire inductance with substrate effect.

454

2000 IEEE International Solid-state Circuits Conference

0-7803-5853-8/00/$10.0002000 IEEE

ISSCC 2000 PAPER CONTINUATIONS

G:S

nimaf moulh

Figure 105.8:nistrilmtcd dock chip.


fixcl vcrtcx dclcctnr
Nuinber nf cbnnncls
I'nwer I cllamcl
Arcalclinnncl
Track pnsilion resolulioii
Tntd m a
tarliation
. . dose (10 yrs)
Tracker
Numlicr of ctiaiinels
Power I cliilnncl
Trnck posilioii rewliitioii
Bndinliun dose (10 yrs)

Caloriiiiclor
--

Niimbcr o f clinnnds
Snmpljnfi rate

Rrtlialion d u e (10 ym)


Munti dctccior
Nuinbcr nf chinncla
T i m i y rcaoliition
Kadialion ilme ( I O yrsi
Dala rnte i i h level I
lrigger

100M

< 1UOJI\V

1 5 0 hu 400 luii'
I 15 [I111
,8 Id

li)-30Mndr~10"1icutnms/rii~

Figure 11.1.7: Ruiigc finder ASIC in 0 , 8 ) ~ m


CMOS and
p:rckagcd transmitter-receiver.
Core aren of chip is 1xl.67min2.

12M
< 3n1w
Sfl-10011nr
I O Mnd $. I O " ILICI~I'

LOOK
I 2 liil al 40 MI lz
500 KmdtlO"n/ctn' (lmrrtil)
Zfl Mmrlt Ill"n/cin? h l c a l x )

BW K
.7 11s
IflKrnd t

IO" n l c d

I Tbit/scc

Figutw 11.3.4: 130 chtinnel protolype micrograph.

Chip is 2x8mm2.

-4

I25 by 50 iiiicrotis
hiialog 2.2 V Di si131 I .Ir V
13" 6.6 and 5.2 pA
11.9

Tlircsliulil of Iiil channcli


incrmwd b y 1000 eIecImis.
arid Ihe m i s e iruill 220 Iu 450
cleclrniis rim.
~

Pigurc 11.3.8. Measured prutotypc charnctcristics.

Pimrc 11.2.7: Dic micrograph.

DIGEST OF TECHNICAL PAPERS

455

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 3, MARCH 2000

377

An All-Analog Multiphase Delay-Locked Loop Using


a Replica Delay Line for Wide-Range Operation and
Low-Jitter Performance
Yongsam Moon, Student Member, IEEE, Jongsang Choi, Kyeongho Lee, Member, IEEE,
Deog-Kyoon Jeong, Member, IEEE, and Min-Kyu Kim

AbstractThis paper describes an all-analog multiphase


delay-locked loop (DLL) architecture that achieves both
wide-range operation and low-jitter performance. A replica
delay line is attached to a conventional DLL to fully utilize the
frequency range of the voltage-controlled delay line. The proposed
DLL keeps the same benefits of conventional DLL's such as good
jitter performance and multiphase clock generation. The DLL
incorporates dynamic phase detectors and triply controlled delay
cells with cell-level duty-cycle correction capability to generate
equally spaced eight-phase clocks. The chip has been fabricated
using a 0.35-m CMOS process. The peak-to-peak jitter is less
than 30 ps over the operating frequency range of 62.5250 MHz.
At 250 MHz, its jitter supply sensitivity is 0.11 ps/mV. It occupies
smaller area (0.2 mm2) and dissipates less power (42 mW) than
other wide-range DLL's [2][7].
Index TermsDelay-locked loop, duty-cycle correction, dynamic phase detector, multiphase clock generation, replica delay
line, triply controlled delay cell.

I. INTRODUCTION

S THE SPEED performance of VLSI systems increases


rapidly, more emphasis is placed on suppressing skew and
jitter in the clocks. Phase-locked loops (PLL's) and delay-locked
loops (DLL's) have been typically employed in microprocessors, memory interfaces, and communication IC's for the generation of on-chip clocks. However, it becomes increasingly difficult to reduce the clock skew and jitter, whether they are inherent
or result from substrate and supply noise, as the clock speed and
circuit integration levels are increased.
While the phase error of PLL's is accumulated and persists
for a long time in a noisy environment, that of DLL's is not accumulated, and thus, the clock generated from DLL's has lower
jitter. Therefore, DLL's offer a good alternative to PLL's in cases
where the reference clock comes from a low-jitter source, although their usage is excluded in applications where frequency
tracking is required, such as frequency synthesis and clock recovery from an input signal. However, the main problem of conventional DLL's [1] is that they are very difficult to design to
Manuscript received July 20, 1999; revised October 6, 1999.
Y. Moon, J. Choi, and D.-K. Jeong are with the School of Electrical
Engineering, Seoul National University, Seoul 151-742 Korea (e-mail:
ysmoon@griffin.snu.ac.kr).
K. Lee was with the School of Electrical Engineering, Seoul National University, Seoul 151-742 Korea. He is now with Global Communication Technology
Inc., Los Altos, CA 94024 USA.
M.-K. Kim is with Silicon Image, Inc., Sunnyvale, CA 94086 USA.
Publisher Item Identifier S 0018-9200(00)00538-2.

work over process, voltage, and temperature (PVT) variations.


Since DLL's adjust only phase, not frequency, the operating frequency range is severely limited. We propose a new DLL architecture that operates in a wide frequency range while keeping
the low-jitter performance.
Various wide-range DLL architectures [2][7], with similar
motivations, have been developed, which can be classified
into three categories: analog type [2], digital type [3], [4], and
dual-loop type [5][7]. While a conventional analog DLL [1]
uses a voltage-controlled delay line (VCDL), the wide-range
analog DLL [2] uses phase mixers for wide-range operation.
However, because of its relatively high analog complexity,
the analog DLL requires a process-specific implementation,
making it relatively difficult to port across multiple processes
[4]. Thus, digital DLL's [3], [4] have been proposed for
better process portability. However, skew error and jitter are
increased due to continuous change of phase selections among
quantized delay times with supply and temperature variations.
To overcome these problems, dual-loop architectures have been
proposed [5][7]. In [5], a PLL is added to make the core DLL
lock to a reference frequency, and a phase mixer interpolates
two intermediate clocks in the core DLL and produces the
final output clock. Or, almost continuous phase is obtained
with addition of a fine delay line [6] or a phase interpolator
[7] to a digital DLL. However, additional chip area and power
consumption of these wide-range DLL's are excessive, and
furthermore, their jitter performance gets worse compared with
conventional DLL's since the number of delay cells or gates in
the clock propagation paths becomes larger.
We propose a new DLL architecture that achieves a large
operating range by attaching a replica delay line in parallel
with a conventional analog DLL. Since the replica delay line
occupies one-fourth the area of the core DLL, it incurs only
a small increase in chip area and power consumption. Since
the replica delay line is out of the clock propagation path, it
does not do any harm on low-jitter performance. While other
wide-range DLL's [2][7] use phase mixers or phase selections to generate a single output, the proposed DLL uses a
similar multistage analog VCDL to what conventional analog
DLL's use. Therefore, the proposed DLL can generate multiphase clocks without using excessive amount of hardware.
Furthermore, by incorporating a dynamic phase detection circuit and cell-level duty cycle correction method, the multiphase clocks are equally spaced even in high-frequency operations. A prototype DLL designed for eight-phase clocks can

00189200/00$10.00 2000 IEEE

378

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 3, MARCH 2000

Fig. 2.

Block diagrams of (a) a digital DLL and (b) a dual-loop DLL.

Fig. 1. Block diagram of (a) a conventional DLL and (b) a DLL locking
operation and operating frequency range limitation.

be used in applications such as gigabit serial interfaces [8],


[9].
This paper is arranged as follows. Section II describes a conventional analog DLL and includes an analysis of its operational
frequency range. This section also overviews other wide-range
DLL architectures. In Section III, the proposed architecture is
presented with design ideas, issues, and various analyses. Section IV describes various circuits used in the design. Section V
discusses the prototype chip implementation and shows experimental results. Section VI concludes this paper with a summary.

Fig. 3. Block diagram of the proposed analog DLL.

or, equivalently, in terms of

II. CONVENTIONAL ARCHITECTURES


A. Range Problem of Conventional DLL's
A simplified block diagram of a conventional DLL [1] is outlined with its operation mechanism in Fig. 1. When the delay
of the VCDL is initially smaller (or larger) than
time
of the reference clock (Ref-CLK), the DLL
the period
until phase difference disappears in a negative
adjusts
feedback loop, as shown in Fig. 1(b). The phase difference is
detected by sampling the reference clock with the rising edge of
the output clock (DLL-CLK). Depending on the sampled value,
a DOWN or UP pulse is generated. These pulses discharge (or
charge) a capacitor in the loop filter, thereby decreasing (or inand reducing the phase differcreasing) the control voltage
ence gradually.
However, if the sampling edge of DLL-CLK deviates from the
lock range indicated in Fig. 1(b), the DLL falls prey to a stuck
or a harmonic lock problem. In order to avoid this problem, the
of
should be located between
minimum
and
, and the maximum
between
0.5
and 1.5
. These stuck-free conditions can be expressed as the following inequality:

(1)

(2)
The range of stuck-free clock period is determined by inequality
(2). If the target clock period satisfies inequality (2), the DLL
works without the stuck problem. However, it should be noted
that inequality (2) has the maximum range
, when
. In
, there is no range of
addition, if
that satisfies inequality (2), and the DLL is prone to the
can be as
stuck problem. Since the PVT variations of
much as 2:1 in a typical CMOS process, the stuck-free condi, and
tion can be satisfied over only a very narrow range of
thus a time-consuming and tedious circuit trimming job is required when process migrations are performed across different
processes.
B. Digital DLL's and Dual-Loop DLL's
Digital DLL's [3], [4] have been developed to overcome the
narrow frequency range problem of conventional analog DLL's.
A simplified block diagram of a typical digital DLL [3] is outlined in Fig. 2(a). Multistage delay cells in the VCDL provide

MOON et al.: ALL-ANALOG MULTIPHASE DLL

fixed and quantized delay times. The finite-state machine selects one clock output with closest phase to the reference clock's
by using digital control bits instead of using an analog control
voltage.
Therefore, major drawbacks in the digital DLL's are large
skew due to quantized delay time and large jitter due to control-bit updates during operation. To increase the resolution and
cover a wide delay range, a large delay cell array must be used,
and that inevitably increases chip area and power consumption.
In order to cope with these problems, Garlepp et al. [4] proposed a phase blending technique in a hierarchical structure for
improved phase resolution. However, the inherent problems of
digital DLL's are not solved entirely.
Dual-loop DLL's [6], [7] have been proposed to minimize the
problems of digital DLL's. A simplified block diagram of architecture proposed in [6] is shown in Fig. 2(b). A fine delay line,
which is analog controlled, is attached to a digital DLL in the
subsequent stage. In [7], a phase interpolator is cascaded to a
digital DLL for unlimited phase capture range. These dual-loop
architectures achieve both a wide frequency range and relatively
low jitter performance. However, due to digital DLL's inherent
nature, jitter histogram of the generated clock shows the superposition of two Gaussian distributions [7] resulting from the
control-bit updates. In addition, the overhead of chip area and
power consumption is significant.

379

Fig. 4.

Configuration and operation of a replica delay line.

Fig. 5. (a) Delay capture range of the replica delay line and (b) gain curve of
CSPD.

B. Delay Capture Range of the Replica Delay Line


III. PROPOSED ARCHITECTURE
A. All-Analog DLL Using a Replica Delay Line
Fig. 3 shows a high-level block diagram of the proposed arof the main analog DLL
chitecture. The delay time
(core DLL) is primarily controlled by a control voltage Vcr,
which is generated from a replica delay line. Another control
. The replica delay line consists
voltage Vcp fine-tunes
of only one replica delay cell, a current steering phase detector
(CSPD), and a low-pass filter (LPF). The replica delay cell is
identical to the delay cells in the core DLL. Due to sharing of
of the replica delay cell is almost
Vcr, the delay time
of each delay cell in the core DLL.
equal to the delay time
They are not exactly the same unless Vcp equals bias. Due to the
is forced to
characteristics of the proposed CSPD [10],
. Therefore,
of the core DLL bebe one-eighth of
when the number of delay cells in the core
comes equal to
DLL is eight. With the replica delay line with a wide frequency
range, the core DLL's operating frequency bounds will be established, and thus the core DLL will not fall into such a harmonic
lock problem as conventional analog DLL's do. With only a negligible increase in chip area and power consumption, the proposed architecture offers many advantages compared with other
wide-range DLL's. Since the DLL is analog controlled and the
clock path is not extended, the DLL can keep the low-jitter performance of the conventional DLL. In addition, because it uses a
multistage analog VCDL, the proposed DLL can generate multiphase clocks.

Fig. 4 shows the circuit diagram and operation waveforms of


the replica delay line. The replica delay line generates a control voltage Vcr to pass to the core DLL. Vcr is used as a reference voltage in the core DLL to lock to the input frequency.
The CSPD takes two inputs ICLK and QCLK. ICLK is directly
connected to Ref-CLK, and QCLK is delayed from Ref-CLK
is equal to the delay difference beby one delay cell.
tween ICLK's and QCLK's rising edges. In the charge pump, the
is tuned to three times the pulldown current
pullup current
. When
is high, the charge on the
, and Vcr will
filter capacitors will be decreased by
is low, the charge will be
go down. On the other hand, when
, and Vcr will go up three times faster.
increased by
When the feedback loop is locked, a stable value of Vcr will be
. Thereobtained with the relation of
has the low-to-high
fore, in the locked state, XNOR output
, and the rising edge
duration ratio of 1:3
.
of ICLK leads that of QCLK by one-eighth of
Fig. 5(a) shows the capture range of the replica delay line
. This can be derived from the gain
when
is smaller than
curve of the CSPD shown in Fig. 5(b). If
, the change of Vcr, denoted as
, will become
1/8
will increase. That action is indicated in the
negative and
gain curve as the corresponding arrow pointing to the right. If
is between 1/8
and 7/8
will be positive and the corresponding arrow points to the left. Eventually,
will be settled at 1/4 , which represents 1/8
.
is larger than 7/8
, the settling point
However, if
will run away and a harmonic lock problem will occur.

380

Fig. 6.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 3, MARCH 2000

Replica delay line for high-frequency operation.

The operating conditions explained above, in which the delay


line locks correctly, can be summarized in the inequalities as
follows:

(3)

Fig. 7.

Block diagram of the core DLL.

or equivalently in terms of
Max
(4)
If the delay range of the controlled delay cell satisfies the re, the DLL will have a frelation
quency range determined by the entire delay range of the delay
cell. However, even if we make the delay range wider and satin an effort to increase the
isfy
frequency range, the lock range is limited to only 7:1.
In some applications where the frequency range must be
larger than 7:1, changing the pump-current ratio of the CSPD
can make the frequency range wider. For example, with
, the frequency range of 9:1 can be obtained.
, the frequency range of 11:1 can be
With
obtained.
, especially
, may be
In high-frequency operations,
too short to drive the XNOR gate. So, a divide-by-two circuit and
a pair of delay cells are used to slow down the frequency of
Ref-CLK [11]. The new configuration shown in Fig. 6 is effectively the same as the one in Fig. 4 but offers a more robust
operation in the high-frequency operations.
C. Core DLL
Fig. 7 shows a simplified block diagram of the core DLL. It
consists of a VCDL, a dynamic phase detector, a charge pump,
and a loop filter. The core DLL generates eight-phase clock outputs through eight delay cells (DC's) in the VCDL. The core
DLL is the same as a conventional analog DLL except that it
has another control voltage Vcr. Vcr from the replica delay line
of the VCDL so
coarsely determines the delay time
is equal to
in the locked state. In the locked
that
state, the eighth clock output, CLK7 in Fig. 7, is aligned with
Ref-CLK.
In high-frequency operations, there may be some static phase
mismatch between CLK7 and Ref-CLK due to the long rise/fall
times of signal transition edges compared with the period of
the clock. So the fine-tuning is required. The dynamic phase
detector (PD) in the core DLL generates control signal Vcp, fine, and removes residual phase mismatch so that the
tunes
rising edge of Ref-CLK is exactly aligned with that of CLK7.

Fig. 8. (a) Core DLL with cell-level duty-cycle correction and (b) rising and
falling edge alignment.

D. Cell-Level Duty-Cycle Correction


Fig. 8 shows the core DLL with a cell-level duty-cycle correction mechanism. In high-frequency operations, clock outputs with a short cycle time can be severely distorted as the
clock passes through many delay cells. Even if the duty cycle
of Ref-CLK is 50% at the entrance, that of CLK7 may deviate
significantly from 50%. It causes multiphase clock outputs to
have phase error, which could be fatal, especially in high-speed
communication applications. A conventional solution is to attach duty-cycle correction circuits to all clock output drivers
with the price of added area, increased jitter, and further phase
mismatch due to elongated path. So a cell-level duty cycle correction is proposed.
The second phase detector shown in Fig. 8 takes inverted
Ref-CLK and inverted CLK7 as its inputs, generating a control
signal Vduty as the output. It fine-tunes the cell current ratio,
and thus aligns the falling edges of Ref-CLK and CLK7. In the
steady state, therefore, both rising and falling edges of CLK7
and Ref-CLK are synchronized in phase, and both clocks have
the same duty cycle. It should be noted that the duty-cycle correction circuit (DCC) used right at the input of Ref-CLK corrects

MOON et al.: ALL-ANALOG MULTIPHASE DLL

381

Fig. 10.

(a) Dynamic phase detector and (b) its operations.

Fig. 11.

Prototype chip microphotograph.

Fig. 9. Triply controlled DC. (a) Circuit diagram of a DCE and (b)
configuration of a triply controlled DC.

the duty cycle of Ref-CLK only. With cell-level duty cycle correction, not only CLK7 but also the other intermediate clock outputs maintain a 50% duty cycle without any additional circuits.
Although two control voltages Vcp and Vduty are simultaneously adjusted in the coupled negative feedback loops, the stability is guaranteed by making one of its loops have a sufficiently
low bandwidth.
IV. CIRCUIT DESIGN
A. Triply Controlled Delay Cell
According to the noise analyses of [12] and [13], a
fast-slewing (short rise/fall time) delay cell with a fully
switching capability offers less phase noise. Although offering
a full swing output, a shunt-capacitor delay cell [14], with its
capacitor, would increase the chip area and power. Therefore,
we decided to use the current-starved inverter [15] as a basic
controlled delay cell. Since the current-starved inverter does
not require a level conversion circuit, which is required for a
differential delay cell, it has less chip area and power, although
substrate and supply noise might cause detrimental influence.
A triply controlled delay cell is used as the basic delay cell
element (DCE). The circuit diagram of the DCE and the configuration of one unit of DC are shown in Fig. 9. Four DCE's and
two inverters compose a DC and make its rising/falling delay
of the triply controlled
times symmetric. The delay time
delay cell is determined by six control signals: Vcr, Vcr_b, Vcp,
Vcp_b, Vduty, and Vduty_b. Of those signals, Vcr and Vcr_b
come from the replica delay line. In the DCE, the sizes of
MP1 and MN1 are made larger than the others' so that Vcr and
and
primarily. The other control
Vcr_b can control
signals, which are generated by the core DLL, make only small
and
for the fine-tuning of
.
adjustments to
Vcp and Vcp_b are used to align the rising edges of Ref-CLK
and CLK7. Vduty and Vduty_b are responsible for maintaining
the correct duty cycle and, thus, aligning the falling edges.

Since the high and low levels alternate in an inverter chain,


duty-cycle control signals must alternate between Vduty and
Vduty_b as well. Therefore, Vduty_b controls DCE0 and DCE3
and Vduty controls DCE1 and DCE2, as shown in Fig. 9(b).
In the delay circuit, either Vduty or Vduty_b changes the duty
cycle of the clock outputs by adjusting the current ratio of
to
. With this mechanism, the multiphase clock
outputs, CLK0 CLK7, will be duty-cycle corrected and
equally spaced. There is no need to attach a DCC circuit in
each clock output.
B. Dynamic Phase Detector
Since the tuning precision of the core DLL depends on the
characteristics of the phase detector, we propose a new high-precision dynamic phase detector. Fig. 10(a) shows the circuit diagram of the proposed dynamic phase detector, which is improved from the published phase-frequency detector [8] by removing a feedback path and replacing the feedback input with
an REF and DCLK signal. The phase detector can operate with
less phase offset at high frequencies due to symmetry of circuit,
shallow logic depth of only two gates, and fast operation with
a dynamic logic circuit. While the widths of UP and DOWN
pulses are proportional to the phase difference of the inputs as
shown in Fig. 10(b), there remains a chain of short pulses in the
locked state. These pulses in the locked state serve to reduce
the dead zone of the phase detector [8]. However, the accuracy
of the phase detector is improved when the pulse duration is
shorter. Furthermore, smaller capacitor in the loop filter can be
used since the amount of pumped charge is smaller compared

382

Fig. 12.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 3, MARCH 2000

Clock waveforms at 62.5 MHz. (a) CLK0, CLK2 and (b) CLK0, CLK4.

with a conventional bang-bang type of phase detector or a proportional phase detector with wider pulse width.
V. EXPERIMENTAL RESULTS
The test chip has been fabricated using a 0.35-m, N-well,
triple-metal CMOS process. The threshold voltages in this
process are 0.42 V (NMOS) and 0.22 V (PMOS). The
gate-oxide thickness is 75 nm. Fig. 11 shows a microphotograph of the fabricated chip. The chip integrates the DLL with
an on-chip decoupling capacitance of 270 pF. The active area
of the DLL occupies 0.08 mm2 and the decoupling capacitor
0.12 mm2. Since the pulse currents of the multiphase clock
outputs are interspersed, the ac component of the supply
current is present at the eighth harmonic frequencies of the
clock. Therefore, the 270-pF on-chip capacitor is adequate to
reduce the on-chip supply noise induced by switching of digital
circuits.
The prototype chip operates from 62.5 to 250 MHz with a
3.3-V power supply. Fig. 12(a) shows the waveforms of CLK0
and CLK2 at 62.5 MHz. These clock outputs are the first and
the third clocks, respectively, and have a 90 phase difference.
Fig. 12(b) shows the waveforms of CLK0 and CLK4, which
are an inversion of each other with a 180 phase difference.
Fig. 13(a) and (b) shows the same waveforms at 250 MHz. In
spite of some ringing due to capacitance and inductance of the
board and measurement instrument, the measurement results

show that the clock outputs are aligned with precise phase relationships of less than 1% error over an operating frequency
range from 62.5 to 250 MHz. The delay range of the VCDL is
estimated to be between 4 and 16 ns. With minor change of device sizes of the VCDL, the operating frequency range could be
extended toward a higher frequency range.
Fig. 14(a) and (b) shows the jitter histograms in the clock
output CLK7. The frequency of Ref-CLK is 250 MHz. Fig. 14(a)
shows 4-ps rms and 29-ps peak-to-peak jitter characteristics
in a quiet power supply, where only the DLL is activated in
the chip. When other digital circuits are turned on, rms and
peak-to-peak jitter are increased to 6.4 and 44 ps, respectively,
and internal supply noise of about 200 mV is measured. If a
500-mV, 1.1-MHz square wave is injected externally on the
power supply, the peak-to-peak jitter increases to 83 ps, as
shown in Fig. 14(b). At 250 MHz, jitter supply sensitivity is
measured to be only 0.11 ps/mV. Furthermore, from 62.5 to 250
MHz, the clock outputs show almost flat jitter performance.
Since the delay range of the VCDL in the core DLL is primarily
set by Vcr and Vcr_b, the gain of the VCDL is nearly flat over
a wide range of operating frequency. The jitter performance of
the proposed DLL is better than or at least comparable to other
wide-range DLL's [2][7].
Table I summarizes the DLL performance characteristics.
The power dissipation is proportional to the operating frequency. Operating at 250 MHz, the DLL draws 12.6-mA dc
from a 3.3-V power supply.

MOON et al.: ALL-ANALOG MULTIPHASE DLL

Fig. 13.

Clock waveforms at 250 MHz. (a) CLK0, CLK2 and (b) CLK0, CLK4.

Fig. 14.

Jitter histograms at 250 MHz in (a) a quiet supply and (b) with added 1.1-MHz, 500-mV square wave noise.

383

384

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 3, MARCH 2000

TABLE I
PERFORMANCE CHARACTERISTICS
PROTOTYPE CHIP

OF

VI. CONCLUSION
By including a replica delay line with a CSPD, the core DLL
operates in a wide frequency range from 62.5 to 250 MHz. Since
the replica delay line occupies a quarter of the area of the core
DLL, the area cost and power consumption of the prototype
chip are much smaller than those of other wide-range DLL's
[2][7]. Both the analog-control scheme and the flat gain of
the VCDL offer a low-jitter performance of 4-ps rms and 29-ps
peak-to-peak, and a low supply sensitivity of 0.11 ps/mV. The
DLL incorporates dynamic phase detectors and triply controlled
delay cells with cell-level duty-cycle correction capability in
order to generate equally spaced eight-phase clocks.
The DLL can be used not only as an internal clock buffer of
microprocessors and memory IC's but also as a multiphase clock
generator for gigabit serial interfaces. With a faster VCDL with
minor change of device sizes, the DLL will operate at a higher
and wider frequency range.
REFERENCES
[1] M. Johnson and E. Hudson, A variable delay line PLL for CPU-coprocessor synchronization, IEEE J. Solid-State Circuits, vol. 23, pp.
12181223, Oct. 1988.
[2] T. H. Lee, K. S. Donnelly, J. T. C. Ho, J. Zerbe, M. G. Johnson,
and T. Ishikawa, A 2.5 V CMOS delay-locked loop for an 18 Mbit,
500 Megabyte/s DRAM,, IEEE J. Solid-State Circuits, vol. 29, pp.
14911496, Dec. 1994.
[3] A. Efendovich, Y. Afek, C. Sella, and Z. Bikowsky, Multifrequency
zero-jitter delay-locked loop, IEEE J. Solid-State Circuits, vol. 29, pp.
6770, Jan. 1994.
[4] B. W. Garlepp, K. S. Donnelly, J. Kim, P. S. Chau, J. L. Zerbe, C. Huang,
C. V. Tran, C. L. Portmann, D. Stark, Y.-F. Chan, T. H. Leen, and M. A.
Horowitz, A Portable Digital DLL for High-Speed CMOS Interface
Circuits, IEEE J. Solid-State Circuits, vol. 34, pp. 632644, May 1999.
[5] S. Tanoi, T. Tanabe, K. Takahashi, S. Miyamoto, and M. Uesugi, A
250622 MHz deskew and jitter-suppressed clock buffer using two-loop
architecture, IEEE J. Solid-State Circuits, vol. 31, pp. 487493, Apr.
1996.
[6] K. Lee, Y. Moon, and D.-K. Jeong, Dual loop delay-locked loop,, U.S.
patent pending.
[7] S. Sidiropoulos and M. A. Horowitz, A semi-digital dual delay-locked
loop, IEEE J. Solid-State Circuits, vol. 32, pp. 16831692, Nov. 1997.
[8] S. Kim, K. Lee, Y. Moon, D.-K. Jeong, Y. Choi, and H. K. Lim, A 960Mb/s/pin interface for skew-tolerant bus using low jitter PLL, IEEE J.
Solid-State Circuits, vol. 32, pp. 691700, May 1997.
[9] D.-L. Chen and M. O. Baker, A 1.25 Gb/s, 460 mW CMOS transceiver
for serial data communication, in IEEE ISSCC Dig. Tech. Papers, Feb.
1997, pp. 242243.

[10] Y. Moon, D.-K. Jeong, and G. Kim, Clock dithering for electromagnetic
compliance using spread spectrum phase modulation, in IEEE ISSCC
Dig. Tech. Papers, Feb. 1999, pp. 186187.
[11] Y. Moon, J. Choi, K. Lee, D.-K. Jeong, and M.-K. Kim, A 62.5250
MHz multi-phase delay-locked loop using a replica delay line with triply
controlled delay cells, in Proc. IEEE Custom Integrated Circuits Conf.,
May 1999, pp. 299302.
[12] B. Kim, High speed clock recovery in VLSI using hybrid analog/digital
techniques, Ph.D. dissertation, Univ. of California, Berkeley, Memo.
UCB/ERL M90/50, June 1990.
[13] A. Hajimiri, S. Limotyrakis, and T. H. Lee, Jitter and phase noise in
ring oscillators, IEEE J. Solid-State Circuits, vol. 34, pp. 790804, June
1999.
[14] M. Bazes, A novel precision MOS synchronous delay line, IEEE J.
Solid-State Circuits, vol. SC-20, pp. 12651271, Dec. 1985.
[15] D.-K. Jeong, G. Borriello, D. A. Hodges, and R. H. Katz, Design of
PLL-based clock generation circuits, IEEE J. Solid-State Circuits, vol.
SC-22, pp. 255261, Apr. 1987.

Yongsam Moon (S'96) was born in Incheon, Korea, on March 1, 1971. He received the B.S. and M.S. degrees in electronics engineering from Seoul National
University, Seoul, Korea, in 1994 and 1996, respectively, where he is currently
pursuing the Ph.D. degree.
He has been working on architectures and CMOS circuits for microprocessors. His current research interests include clock and data recovery for highspeed communication and high-speed I/O interface circuits.

Jongsang Choi was born in Korea on September 11, 1974. He received the B.S.
and M.S. degrees in electronics engineering from Seoul National University,
Seoul, Korea, in 1997 and 1999, respectively, where he is currently pursuing
the Ph.D. degree.
He has been working on architectures and CMOS circuits for high-speed communication. His current research interests include high-speed CMOS circuits
and gigabit network systems.

Kyeongho Lee (S'92M00) was born in Seoul, Korea, on August 5, 1969. He


received the B.S., M.S., and Ph.D. degrees in electronics engineering from Seoul
National University, Seoul, Korea, in 1993, 1995, and 2000, respectively.
Since 2000 he has been with Global Communication Technology, Inc., Los
Altos, CA. He is working on various CMOS high-speed circuits for RF communication. His research interests include high-speed CMOS circuits and PLL
systems.

Deog-Kyoon Jeong (S'87M'89) received the B.S. and M.S. degrees in electronics engineering from Seoul National University, Seoul Korea, in 1981 and
1984, respectively., and the Ph.D. degree in electrical engineering and computer
sciences from the University of California at Berkeley, Berkeley, CA, in 1989.
From 1989 to 1991, he was with Texas Instruments Incorporated, Dallas, TX,,
where he was a Member oif the Technical Staff. He worked on modeling and
design of BiCMOS circuits and single-chip implementation of the SPARC architecture. Since 1991, he has been on the Faculty of the School of Electrical Engineering, Seoul National University, Seoul, Korea, as an Associate Professor.
His research interests include high-speed circuits, microrocessor architectures,
and memory systems.

Min-Kyu Kim was born in Seoul, Korea, in 1965. He received the B.S., M.S.,
and Ph.D. degrees in electronics engineering from Seoul National University,
Seoul, Korea, in 1988, 1990, and 1998, respectively.
From 1995 to 1996, he was with the Electronics and Telecommunications
Research Institute, Taejon, Korea, working on the development of high-speed
communication IC's for ATM switches. Since 1998, he has been working on
high-speed serial link technologies at Silicon Image, Inc., Cupertino, CA. His
current interests include circuit design for high-speed communication systems
and digital-interface display systems.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 3, MARCH 2001

417

CMOS DLL-Based 2-V 3.2-ps Jitter 1-GHz Clock


Synthesizer and Temperature-Compensated
Tunable Oscillator
David J. Foley, Student Member, IEEE, and Michael P. Flynn, Senior Member, IEEE

AbstractThis paper describes a low-voltage low-jitter clock


synthesizer and a temperature-compensated tunable oscillator.
Both of these circuits employ a self-correcting delay-locked loop
(DLL) which solves the problem of false locking associated with
conventional DLLs. This DLL does not require the delay control
voltage to be set on power-up; it can recover from missing reference clock pulses and, because the delay range is not restricted,
it can accommodate a variable reference clock frequency. The
DLL provides multiple clock phases that are combined to produce
the desired output frequency for the synthesizer, and provides
temperature-compensated biasing for the tunable oscillator. With
a 2-V supply the measured rms jitter for the 1-GHz synthesizer
output was 3.2 ps. With a 3.3-V supply, rms jitter of 3.1 ps was
measured for a 1.6-GHz output. The tunable oscillator has a 1.8%
frequency variation over an ambient temperature range from
0 C to 85 C. The circuits were fabricated on a generic 0.5- m
digital CMOS process.
Index TermsCMOS analog integrated circuits, delay-locked
loops, frequency synthesizers, tunable oscillators, voltage controlled oscillators.

I. INTRODUCTION

RADITIONALLY, phase-locked loops (PLLs) have been


used for clock synthesis. The synthesizer and tunable
oscillator outlined in this paper employ a delay-locked loop
(DLL). A DLL is more stable than higher order PLLs and
requires only one capacitor in its first-order loop filter. On
the other hand, a PLL generally requires a more complex
second-order filter. This filter usually employs larger components which may need to be off chip. Additionally, a DLL
offers better jitter performance than a PLL because phase errors
induced by supply or substrate noise do not accumulate over
many clock cycles [1].
The self-correcting DLL overcomes problems of false
locking associated with conventional DLLs. A self-correcting
circuit detects when the DLL is locked, or is attempting to lock,
to an incorrect delay and then brings the DLL into a correct
locked state. This DLL does not require the delay control
voltage to be set on power-up; it can recover from missing
reference clock pulses and, because the delay range is not
restricted, it can accommodate a variable reference clock frequency. This paper describes how a small number of additional
Manuscript received July 19, 2000; revised October 24, 2000. This work was
supported by Parthus Technologies.
D. J. Foley is with the Department of Microelectronics, National University
of Ireland, Cork, Ireland.
M. P. Flynn is with Parthus Technologies, Cork, Ireland.
Publisher Item Identifier S 0018-9200(01)01483-4.

digital logic gates are required to convert a conventional DLL


into a wider range self-correcting DLL. For comparison, in [2]
a second DLL is added to achieve wider range operation.
The synthesizer outlined in this paper operates over a wide
range of input reference clock frequencies and generates a lowjitter output clock running at nine times the reference frequency.
Jitter measurements of 3.2 ps rms and 20 ps peak-to-peak, for
a 2-V supply and 1-GHz output frequency, show that the core
DLL compares well with recently reported DLLs [2], [3]. Multiple clock phases from the DLL are combined using digital
logic to produce the synthesizer output [4]. An alternative approach requiring a pair of on-chip tuned LC-tanks is described
in [5].
The tunable voltage-controlled oscillator (VCO) is intended
for use in a transceiver where the receive and transmit clocks
are plesiochronous. It is possible to tune the VCO around a
center frequency while still maintaining good temperature independence. In some applications it may also act as a replacement
for a fractional-N-type synthesizer. This circuit is similar to the
oscillator described in [6] but it uses a lower jitter DLL in place
of the PLL and can operate over a wider frequency range.
In Section II the DLL architecture is discussed, starting with
a review of a conventional DLL and progressing to the new
self-correcting architecture. Section III outlines the clock synthesizer architecture. This is followed in Section IV by an outline of the temperature-compensated tunable oscillator architecture. Section V discusses the circuit layout and Section VI
introduces measured performance results for the two circuits.
This paper then concludes in Section VII with a summary of the
achievements of this work.
II. DLL ARCHITECTURE
A. Conventional DLL
A simplified block diagram of a conventional DLL is illustrated in Fig. 1. This circuit contains a voltage-controlled delay
line (VCDL), a phase detector, a charge pump, and a first-order
loop filter. The delay line, consisting of cascaded variable delay
stages, is driven by the input reference clock, ckref. The output
of the delay lines final stage and the ckref falling edges are
compared by the phase detector to determine the phase alignment error. The phase detector output is integrated by the charge
pump and loop filter capacitor to generate the control voltage,
vcntl, of the delay stages.
When correctly locked, the total delay of the delay line
should equal one period of the reference clock. A conventional

00189200/01$10.00 2001 IEEE

418

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 3, MARCH 2001

Fig. 1. Conventional DLL architecture.


Fig. 3.

Fig. 2. (a) Three-stage VCDL. (b) Waveforms with correct lock. (c)
Waveforms with false lock.

DLL may lock or attempt to lock to an incorrect delay. In Fig. 2


we show correct and false locking for a three-stage delay line
[Fig. 2(a)]. Fig. 2(b) shows the output phases at each stage
, and
with the delay line in correct lock. The DLL
and ckref. The total delay is one
control loop has aligned
and ckref are
period of the reference clock. In Fig. 2(c)
again aligned but the total delay is two clock periods. The DLL
can also falsely lock to three or more periods of delay or can
attempt to lock to zero delay.
B. Self-Correcting DLL Architecture
Fig. 3 shows a block diagram of the new self-correcting DLL.
The problem of false locking is solved by the addition of a lockdetect circuit and by some slight modifications to the conventional phase detector. The DLL incorporated in the two designs
reported in this paper employs a nine-stage VCDL as shown in
Fig. 3.
In a conventional DLL, only the state of the output of the
last delay element is used. From the example in Fig. 2, we can
see that the state at the outputs of the other delay elements can

Self-correcting DLL architecture.

provide additional information about the nature of the locked


delay. In the prototype the delayed phases, (1:9), are decoded
to indicate the VCDL delay. If the delay is outside an acceptable delay range then the lock-detect circuit takes control of the
loop from the phase detector. The lock-detect circuit signals the
charge pump to charge or discharge the filter capacitor until it
is safe for the phase detector to regain control of the loop.
Three control signals are produced by the lock-detect circuit:
over to indicate that the VCDL delay is greater than 1.5 reference clock periods, under to indicate that the delay is less
than 0.75 clock periods, and release is activated when the delay
reaches 1.25 clock periods. The release signal clears the over
and under control signals and removes the phase detector from
reset. The phase detector then regains control of the loop. If neither under nor over is active then the phase detector has control
of the loop and the DLL is either in correct lock or approaching
correct lock.
If the DLL is in lock and it is brought out of lock because
of missing reference clock pulses or a step in the input reference frequency, then the DLL may inadvertently try to lock to
an incorrect delay. The DLL is allowed to attempt to reach the
undesired lock delay until it triggers either an over or an under
signal at which time the lock-detect circuit takes control of the
DLL loop.
C. Lock-Detect Circuit
The VCDL output phases are first level shifted to CMOS
levels. The level shift circuitry is designed to have high gain
and fast rise and fall times. This helps to minimize any jitter
contribution from this circuitry. The level-shifted output phases,
(1:9), are latched on the rising edge of the reference clock. The
outputs from these latches are processed by the decode circuitry
as shown in the schematic of Fig. 4. The inputs, (1:8), correspond to the (1:8) output phases of the VCDL. Fig. 5 shows
example output waveforms for a nine-stage VCDL. In Fig. 5(a)
when the state of the VCDL output phases is decoded none of
the control signals are activated as the VCDL is correctly locked
to one period of the reference clock. In Fig. 5(b) the VCDL is
incorrectly locked to two periods of the reference clock and the
state of the output phases is decoded to activate the over control
signal.
The phase detector outputs, up and dn, signal the charge pump
to charge or discharge the filter capacitor. An active over output

FOLEY AND FLYNN: CMOS DLL-BASED CLOCK SYNTHESIZER AND TEMPERATURE-COMPENSATED TUNABLE OSCILLATOR

419

Fig. 6. VCDL delay stage schematic.

Fig. 4.

Lock-detect decode circuitry.

Fig. 7.

(a)

(b)

Fig. 5. Nine-stage VCDL waveforms with (a) correct lock and (b) false lock.

from the lock-detect circuit disables the phase detector and activates the up control signal. Similarly, the lock detect under
output activates dn. Following power-on reset the lock-detect
circuit is initialized by setting over active. This ensures a faster
acquisition time for the DLL because the filter capacitor is continuously charged to a voltage level corresponding to 1.25 reference clock periods. At this VCDL delay, the release signal is
activated and the phase detector gains control of the loop and
brings the DLL to lock. The state of the output phases corresponding to a delay of nine reference clock periods is the same
as that corresponding to a single reference clock period delay.
This circuitry is therefore only capable of detecting incorrect
delays up to eight periods of the reference clock. This is not a
limitation of the design as any delays above this would be outside the delay range of the VCDL. In general, the error detection
periods of
logic can detect an incorrect lock delay up to
the reference clock, where is equal to the number of VCDL
output phases.
D. Voltage-Controlled Delay Line (VCDL)
Fig. 6 shows one of the VCDL delay stages. The stage is
designed to operate from a supply as low as 1.8 V and is similar
to that used in [7]. The stage propagation delay is proportional
to the tail current for the output charging and to the voltagecontrolled resistor (VCR) resistance for the output discharging.

Phase detector schematic.

A three-transistor VCR structure is adopted for better control


linearity. The DLL negative feedback control loop compensates
for variations in the stage delay due to process and temperature.
The differential delay stage structure and coupling capacitors
between bias lines and supply help to minimize supply-induced
jitter noise.
E. Charge Pump
The charge pump charges or discharges the filter capacitor.
The voltage on this capacitor, vcntl, sets the VCDL stage
propagation delay. To minimize the temperature variation of
the VCDL delay, the charging and discharging currents are
proportional to absolute temperature. This helps to maintain a
constant loop gain and phase margin over temperature.
F. Phase Detector
The phase detector, shown in Fig. 7, employs the conventional
sequential-phase-frequency detection scheme [7] but extra gates
have been included. This extra logic enables the lock-detect circuit to over-ride the phase detector control of the loop. The lockdetect output signals, over and under, now have direct control
of the charge pump. The lock-detect circuit can therefore charge
or discharge the VCDL control voltage, vcntl, to a voltage from
which it is safe for the phase detector to regain control of the
loop.
III. CLOCK SYNTHESIZER ARCHITECTURE
The clock synthesizer generates a differential output clock
running at nine times the input reference frequency. The clock

420

Fig. 8.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 3, MARCH 2001

Clock synthesis waveforms.

Fig. 10. 1.62-GHz clock generation schematic.

Fig. 9.

Optimized AND-OR block diagram.

synthesizer employs the DLL structure shown in Fig. 3 to generate the multiple clock phases that are then combined to produce the output clock. There are two steps in the generation of
the output clock. The first step combines the nine DLL output
phases, (1:9), to generate three clocks ck1, ck2, and ck3. Fig. 8
shows the clock waveforms. These three clocks are phase separated by one-ninth of a reference clock period and have a frequency three times that of the reference clock. Fig. 9 shows how
the 1, 4, and 7 output phases are combined in an optimized
AND-OR structure with symmetrical delays to generate the ck1
clock. Using identical logic the 2, 5, and 8 phases produce
the ck2 clock and the 3, 6, and 9 phases produce the ck3
clock.
The second step in generating the synthesizer output clock
is to combine these three clocks in another AND-OR structure
and
, running at
to produce a differential output clock,
nine times the reference clock frequency; see Fig. 8. This design
produces a 1.62-GHz output clock frequency for a 180-MHz
reference clock frequency. For a 0.5- m 3.3-V CMOS process
there is a bandwidth limitation of approximately 500 MHz for
reliable on-chip clock transmission [8]. The high bandwidth
available at the chip outputs is utilized (determined by the external pull-up resistor and load capacitance) [8] to produce the
1.62-GHz clock as shown in Fig. 10. The AND function of the
clock generation is performed in the chip core and the analog OR
function is performed in the I/O ring. External pull-up resistors
set the output swing and match the output impedance to that of
the test equipment. Damping resistors are included to avoid any
oscillations resulting from the combination of the lead and pin
inductance and load capacitance. This removes the necessity to
double bond these high-frequency outputs.

Fig. 11.

Tunable VCO architecture.

Fig. 12.

Tunable VCO stage block diagram.

IV. TEMPERATURE-COMPENSATED TUNABLE VCO


ARCHITECTURE
The temperature-compensated oscillator utilizes the control
loop voltage, vcntl, of the DLL (Fig. 3) to compensate for any
temperature and supply voltage induced frequency fluctuation
in a VCO. Fig. 11 shows how the VCO and VCDL stages are
both connected to vcntl. (For ease of illustration a conventional
DLL is shown in Fig. 11 but in practice the new DLL architecture of Fig. 3 is employed). The VCDL in the DLL tracks temperature and process variations in the VCO circuit. The VCO is

FOLEY AND FLYNN: CMOS DLL-BASED CLOCK SYNTHESIZER AND TEMPERATURE-COMPENSATED TUNABLE OSCILLATOR

Fig. 13.

Fig. 14.

421

Fig. 15.

Variation of measured jitter over output frequency.

Fig. 16.

720-MHz synthesizer output for V

Fig. 17.

VCO frequency variation with temperature.

Fig. 18.

VCO frequency variation with tune voltage.

Die photo of the synthesizer and tunable oscillator.

= 1:8 V.

1.62-GHz synthesizer edge jitter histogram.

composed of the same delay stages as the VCDL and its temperature (and process) variations will therefore be the same (apart
from some minor random mismatch effects and thermal gradients across the die). vcntl thus compensates for the VCDL and
VCO temperature fluctuations. The last VCO stage has an additional tuning voltage, tune, which fine tunes the VCO frequency.
By varying the tune voltage it is possible to tune the VCO center
frequency to within 3%. A wider tuning range can be achieved
by varying the frequency of the DLL reference clock, ckref.
The schematic of the last VCO stage is shown in Fig. 12. This
stage is identical to the other VCO and VCDL stages except that
the VCR contains a transistor which is connected to the external
tune voltage. In all other stages this transistor is connected to
ground. The extra charging current required in this VCO stage
.
is provided by the controlled current source bias

ature-compensated tunable oscillator has an active area of


0.7 mm .

V. CIRCUIT LAYOUT
The synthesizer and temperature-compensated tunable
oscillator were fabricated on a standard 0.5- m triple-metal
single-poly digital CMOS process. The die photomicrograph
of the device, containing both the synthesizer and temperature-compensated tunable oscillator, is shown in Fig. 13. The
synthesizer has an active area of 0.6 mm and the temper-

VI. TEST RESULTS


Fig. 14 shows a histogram of the edge jitter on the 1.62-GHz
synthesizer output clock for a supply of 3.3 V. Edge jitter of
3.1 ps rms and 20 ps peak-to-peak were measured. The jitter
measurements of 3.2 ps rms and 20 ps peak-to-peak, for a 2-V
supply and 1-GHz output frequency, show that the DLL core ex-

422

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 3, MARCH 2001

TABLE I
MEASURED SYNTHESIZER CHARACTERISTICS

the tune voltage. As can be seen from the plot, the relationship
is close to linear. It is possible to tune the frequency around a
center frequency in the range from 200 to 500 MHz by selecting
an appropriate input reference frequency. This ensures that this
scheme can be used for a wide variety of applications. The measured jitter on the 400-MHz output was 29 ps rms and 180 ps
peak-to-peak. Table I shows the measured synthesizer characteristics. Table II summarizes the measured characteristics of
the temperature-compensated tunable oscillator.
VII. CONCLUSION
In this paper, a robust self-correcting low-jitter DLL was used
as the basis for a low-voltage high-frequency synthesizer and a
temperature-compensated tunable oscillator. The DLL does not
require the VCDL control voltage to be set on power-up. The
DLL can recover from missing reference clock pulses and it
can track step changes in a variable reference clock frequency.
The synthesizer has significantly lower edge jitter than the traditional PLL-type synthesizer [9] and other reported DLL circuits
[10], [11]. The temperature-compensated tunable oscillator provides a temperature-stable tunable frequency that varies by just
1.8% over the 0 C to 85 C temperature range.

TABLE II
MEASURED TUNABLE OSCILLATOR CHARACTERISTICS

ACKNOWLEDGMENT
The authors wish to acknowledge contributions from the
following Parthus Technologies employees: J. Ryan, J. Horan,
C. Cahill, F. Fuster, J. Collins, B. Kinsella, M. Erett, and
S. Murphy. The authors also wish to thank R. Fitzgerald from
the NMRC for the die photo micrographs. The device was fabricated on the ESM (Newport) Wafer Fab through Europractice.
REFERENCES

hibits better jitter performance than that reported for the higher
voltage DLLs (3.3-V supply, 0.35- m CMOS, 4-ps rms jitter) in
[2] and (5-V supply, 0.7- m CMOS, 10-ps rms jitter) in [3]. The
measured jitter (rms) variation versus synthesizer output frequency for a 3.3-V supply is shown in Fig. 15. With the supply
reduced to 1.8 V, the rms jitter was measured at 4.9 ps for an
output frequency of 720 MHz. Fig. 16 shows this 720-MHz synthesizer output. Mismatched propagation delays and interblock
routing in the frequency multiplication block (Fig. 9) resulted
in 100-ps interperiod jitter.
Fig. 17 shows the temperature-compensated tunable oscillator frequency variation with temperature. Varying the ambient
temperature from 0 C to 85 C resulted in a total frequency
variation of 1.8%. Fig. 18 shows the variation of frequency with

[1] B. Kim, T. C. Weingandt, and P. R. Gray, PLL/DLL system noise analysis for low-jitter clock synthesizer design, in Proc. ISCAS, June 1994,
pp. 151154.
[2] Y. Moon, J. Choi, K. Lee, D. Jeong, and M. Kim, An all-analog multiphase delay-locked loop using a replica delay line for wide-range operation and low jitter, IEEE J. Solid-State Circuits, vol. 35, pp. 377384,
Mar. 2000.
[3] M. Mota and J. Christiansen, A high-resolution time interpolator based
on a delay-locked loop and an RC delay line, IEEE J. Solid-State Circuits, vol. 34, pp. 13601366, Oct. 1999.
[4] D. Foley and M. Flynn, CMOS DLL-based 2-V 3.2-ps jitter 1-GHz
clock synthesizer and temperature compensated tunable oscillator, in
Proc. IEEE Custom Integrated Circuits Conf., May 2000, pp. 371374.
[5] G. Chien and P. R. Gray, A 900-MHz local oscillator using a DLL-based
frequency multiplier technique for PCS applications, in ISSCC Dig.
Tech. Papers, Feb. 2000, pp. 202203.
[6] H. Chen, E. Lee, and R. Geiger, A 2-GHz VCO with process and temperature compensation, in Proc. ISCAS, June 1999, pp. 11 56911 572.
[7] A. Young, J. K. Greason, and K. L. Wong, A PLL clock generator with
5 to 110 MHz of lock range for microprocessors, IEEE J. Solid-State
Circuits, vol. SC-27, pp. 15991607, Nov. 1992.
[8] M. Horowitz, C.-K. K. Yang, and S. Sidiropoulos, High-speed electrical
signaling: Overview and limitations, IEEE Micro., vol. 18, pp. 1224,
Jan./Feb. 1998.
[9] H. C. Yang, L. K. Lee, and R. S. Co, A low-jitter 0.3-165 MHz CMOS
PLL synthesizer for 3-V/5-V operation, IEEE J. Solid-State Circuits,
vol. 32, pp. 582586, Apr. 1997.
[10] J. G. Maneatis, Low-jitter process-independent DLL and PLL based
on self-biased techniques, IEEE J. Solid-State Circuits, vol. 31, pp.
17231732, Nov. 1996.
[11] S. Sidiropoulos and M. A. Horowitz, A semidigital dual delay-locked
loop, IEEE J. Solid-State Circuits, vol. 32, pp. 16831692, Nov. 1997.

FOLEY AND FLYNN: CMOS DLL-BASED CLOCK SYNTHESIZER AND TEMPERATURE-COMPENSATED TUNABLE OSCILLATOR

David J. Foley (S00) received the B.Eng. degree


from the National University of Ireland, Limerick, in
June 1988. In 1994 he received the M.Eng.Sc. degree
from the National University of Ireland, Cork, where
he is currently working toward the Ph.D. degree.
He has worked in IC design with NEC Corporation, Tamagawa, Japan, from 1988 to 1990, AT&T
Bell Labs, Tokyo, Japan, from 1990 to 1992, and
Parthus Technologies, Dublin, Ireland, from 1994 to
1998.

423

Michael P. Flynn (S92M95SM98) was born in


Cork, Ireland. He received the B.E. and M.Eng.Sc.
degrees from the National University of Ireland,
Cork, in 1988 and 1990, respectively. He received
the Ph.D. degree in electrical engineering from
Carnegie Mellon University, Pittsburg, PA, in 1995.
From 1998 to 1991, he was with the National
Microelectronics Research Center, Cork. He was
a Co-op Engineer with National Semiconductor in
Santa Clara, CA, from 1993 to 1995. From 1995
to 1997, he was a Member of Technical Staff with
Texas Instruments DSPS R&D Lab, Dallas, TX. He is now a Technical
Director with Parthus Technologies, Cork. He is also a part-time Lecturer in the
Department of Microelectronics at the National University of Ireland, Cork.
Dr. Flynn received the 19921993 IEEE Solid-State Circuit Predoctoral Fellowship. He is a member of Sigma Xi.

ISSCC 2002 / SESSION 4 / BACKPLANE INTERCONNECTED ICs / 4.1

4.1

A 1.5V 86mW/ch 8-Channel 6223125Mb/s/ch


CMOS SerDes Macrocell with Selectable
Mux/Demux Ratio

Fuji Yang, Jay ONeill, Patrik Larsson, Dave Inglis, Joe Othmer
Agere Systems, Holmdel, NJ
2.5-3.125Gb/s serial links are commonly used for chip-to-chip
interconnects in high-speed network systems. In SONET OC-768
application, at least 16 on-chip SerDes transceivers are required
to guarantee total full duplex I/O throughput of 40Gb/s.
Published 2.5Gb/s SerDes transceivers consume between 150
and 200mW, not suitable for applications requiring hundreds of
on-chip SerDes transceivers [1]. Developing a low-power SerDes
transceiver is important for high throughput network ICs [2].
Another challenge is reduction of inter-channel noise coupling
when integrating many transceivers on the same chip. This lowpower 8-channel SerDes macrocell employs a shared-PLL architecture. As shown in Figure 4.1.1, on the transmitter side, the
on-chip TxPLL provides a half-rate clock to all transmitters. On
the receiver side, the RxPLL distributes I- and Q-phase clocks to
8 receivers. Each receiver has a phase interpolator to generate
an output phase-aligned with the in-coming data for clock and
data recovery. Sharing a single PLL between a group of transmitters or receivers reduces the power and avoids the potential
multi-VCO coupling problem found in a conventional one-PLLper-channel configuration. The macrocell realized in a 0.16m
CMOS process consumes an average power of 86mW per channel
at 1.5V power supply.
The transmitter 16:1 or 20:1 serialization starts with 4 shift-register based selectable 4:1 or 5:1 multiplexers. Their 4 outputs are
sent to a tree-based 4:1 multiplexer (Figure 4.1.1). A pMOS CML
output driver with on-chip 50 terminations is employed. The
output signal referenced to the ground makes the interface independent of the power supply. The output amplitude is set to
1Vpp, diff.
The receiver employs an interleaved integrate-and-dump frontend (Figure 4.1.1) [3, 4]. The integrate-and-dump operation
improves the SNR and eliminates the quadrature clock required
in a conventional half-rate front-end [5]. The integrator outputs
are de-multiplexed by the decision-latches controlled respectively by ck2i and ck2q, which are divide-by-2 clocks of the recovered
clock. The decision-latch outputs d1-d4 are fed into 4 shift-registers to realize the 4:16 or 4:20 de-serialization. The integrator is
implemented in a way similar way to that proposed in Reference
[4], but with a pMOS input stage. It has a gain of 2 allowing
relaxed offset and noise requirements of the latches. The receiver achieves 30mVpp,diff sensitivity with BER <10-12 at 2.5Gb/s.
The clock recovery is by a DLL based on an analog phase interpolator [6]. In contrast to the implementation in Reference [6], a
four-quadrant phase mixer is used here. Referring to Figure
4.1.2, the DLL consists of a bang-bang phase detector (PD), a PD
polarity control circuit, an amplitude control circuit, I- and
Q-charge-pumps and the four-quadrant mixer-based phase
interpolator. The analog phase interpolation is by mixing the Iand Q-phase clocks from the RxPLL with respective weights
(=Va-Vref) and (=Vb-Vref): CLK=*(I-IB)+*(Q-QB). Va and Vb are
independently generated by I- and Q-charge-pumps. The weights
and , ranging from negative to positive, directly control the
quadrant changes. This eliminates the potential phase discontinuity at quadrant crossings found in the circuit of Reference [6].
Figure 4.1.3 shows the schematic of one 4-quadrant mixer, where

2002 IEEE International Solid-State Circuits Conference

the nMOS differential pair converts Va to differential currents Ip


and In, which are mirrored into the pMOS current sources to be
steered by the high-speed differential clock (I-IB). A self-biased
nMOS load is used with MP1 and MP2 to control the output common-mode voltage.
The phase interpolator exhibits an infinite phase shift range
allowing the DLL to easily track the frequency offset between the
local clock and the incoming data and enables shared-PLL architecture for multi-channel serial links with plesiochronous clocking.
Figure 4.1.4 illustrates the non-monotonic relation between the
phase shift introduced by the interpolator and the two weights
and . To have a 2 interpolation range, the bang-bang phase
detector polarity must be updated to provide the correct up/down
signals for different quadrants. This is by a PD-polarity-control
circuit in association with a Q-detect circuit. The Q-detect circuit
detects the output vector quadrant by determining the sign of
and . The Q-detect circuit uses the replica of the V-I converter
in the phase mixer.
Although the phase mixer has control weights and , the phase
interpolation is only a function of /, and is independent of the
amplitude of and . The loop, sensitive only to the phase variation, thus controls /. As a consequence, and can grow or
shrink arbitrarily. To prevent and from being too small, an
offset current is intentionally introduced in the charge-pumps. It
is controlled as follows: If >0, Iup = I0 + Ioffset and if <0, Idown = I0
+ Ioffset (the same algorithm is applied for Q-charge-pump). As the
result, and are always pulled away from zero to eliminate
any shrinking possibility. To prevent overflow on and , the
amplitude control circuit clips and by blocking UP or DOWN
signal. As shown in Figure 4.1.4, Va or Vb will be kept within
[Vmin, Vmax].
The test chip in a 0.16m 5-level metal CMOS technology uses a
217-pin PBGA package. The chip micrograph is shown in Figure
4.1.5. Active area is about 2mm2. Figure 4.1.6a shows the measured jitter tolerance of the receiver. The CDR works with VDD
as low as 1V for 1Gb/s maximal input data rate. With 1.5V power
supply, the receiver covers an input data rate range of 622 to
3125Mb/s. Measured recovered clock jitter is 87.1ps pp at
2.5Gb/s. Figure 4.1.6b shows the Tx output eye diagram measured at 3.2Gb/s with a 231-1 PRWS. The measured jitter is
57.8ps pp and static VDD sensitivity is 0.06ps/mV. Measured
results are summarized in Figure 4.1.7.
References:
[1] R. Gu et al A 0.5-3.5Gb/s Low-Power Low-Jitter Serial Data CMOS
Transceiver, ISSCC Digest of Technical Papers, pp. 352-353, Feb. 1999.
[2] M-J. Lee et al., An 84mW 4Gb/s clock and data recovery circuit for serial link applications VLSI Symposium 2001, pp. 149-152, 2001.
[3] S. Sidiropoulos et al A 700Mb/s/pin CMOS signaling interface using a
current integrating receivers IEEE JSSC, vol. 32, no. 5, pp. 681-690, May
1997.
[4] J. Savoj et al., A CMOS Interface Circuit for Detection of 1.2Gb/s RZ
Data ISSCC Digest of Technical Papers, pp. 278-279, Feb. 1999.
[5] P. Larsson, An Offset-Cancelled CMOS Clock Recovery/Demux with
Half-Rate Linear Phase Detector for 2.5Gb/s Optical Communication
ISSCC Digest of Technical Papers, pp. 74-75, Feb. 2001.
[6] T. Lee et al., A 2.5V CMOS delay-locked loop for an 18Mb, 500Mb/s
DRAM IEEE JSSC, vol. 9, no. 2, Dec. 1994

0-7803-7335-9

2002 IEEE

ISSCC 2002 / February 4, 2002 / Salon 8 / 1:30 PM


9E

9D
5[3//

7[ 3//
5;

FK 

LQWHUSFNJHQ

3'
FN

FNL

LQW


GULYHU

FNL
'
'

LQW







'

G
G
G
G

FNT

FNT

,QSXW
GDWD
&/.



'

FORFNJHQ

FK 
FK 
FK 

GG

XS
GQ

$PSOLWXGHFRQWURO

FK 
FK 
FK 

XSL
3'SRODULW\FRQWURO

FK 

9UHI

4GHWHFW

3KDVHGHWHFWRU

7;

&3L

GQL

,%

9D

9UHI

XST

9E

&3T

GQT
9PD[

'

$GHWHFW

4% 4

9PLQ
9D

Figure 4.1.1: Overall architecture.

9E

Figure 4.1.2: DLL block diagram.

,S

,Q

9D

2XWSXWYHFWRU

,%

9UHI
9RS
03

03

9D9E
,

9UHI

9RQ

,,

,,,

,9

9PD[








(GHJU.)

9PLQ
FOLSLQJ

&RPPRQORDGFLUFXLW
Figure 4.1.3: Four-quadrant mixer schematic.

Figure 4.1.4: Relation between the phase shift and the weights Va and Vb.

5[3//

5;

7;

7[ 3//

Figure 4.1.5: Die micrograph.

2002 IEEE International Solid-State Circuits Conference

Technology:

0.16 CMOS with 5 metal levels

Supply Voltage

1.5V

Power dissipation

75mW per transceiver with Tx output set to 1Vpp,diff


85mW for Tx and Rx PLLs + clock buffers

Active area

2mm2 (PLL : 0.1mm2, single transceiver : 0.25mm2)

BER

< 10 12 (all measurements were done with BER < 10 12 )

Receiver Sensitivity

30mVpp,diff

Recovered clock jitter

87.1psPkPk

Max. offset frequency

400ppm at 2.5Gb/s

Input data rate range

622-3125Mb/s

Transmitters output jitter

57.8ps Pk-Pk at 3.2Gb/s

TxoutputVDD sensitivity

0.06ps/mV

Output Amplitude

1Vpp,diff

Figure 4.1.7: Summary of measured results.

0-7803-7335-9

2002 IEEE

SVGLY

Figure 4.1.6: Measured Rx jitter tolerance and Tx eye diagram at 3.2Gb/s.

2002 IEEE International Solid-State Circuits Conference

0-7803-7335-9

2002 IEEE

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998

1987

A Fully Integrated VCO at 2 GHz


Markus Zannoth, Bernd Kolb, Joseph Fenk, and Robert Weigel, Senior Member, IEEE

AbstractA fully integrated voltage-controlled oscillator at a


frequency of 2 GHz with low phase noise has been implemented
in a standard bipolar process with a f t of 25 GHz. The design
is based on an LC-resonator with vertical-coupled inductors.
Only two metal layers have been used. The supply voltage of
the oscillator is 2.7 V. The phase noise is only 0136 dB/Hz at 4.7MHz frequency offset. A tuning range of 150 MHz is achieved
with integrated tuning diodes.
Index Terms Bipolar, fully integrated VCO, noise requirement for cordless phones.

I. INTRODUCTION

ITH the fast growth of the wireless application market, there is a growing need for smaller designs and
higher levels of integration for the reduction of costs and
size. Because of the very poor performance of integrated
resonators on silicon ICs, local RF oscillators are difficult
to integrate with regard to the phase noise requirements.
At the moment, external resonators are used with external
hyperabrupt tuning diodes. The integrated tuning diodes have
low performance because it is not possible to produce a
hyperabrupt pn-junction in our standard bipolar process. The
limiting factor is the inductor of the resonator, which can only
archive quality factors of about four. While the performance
of mobile telecommunication standards like global system
for mobile communication (GSM) or digital communication
system (DSC-1800) requires such high noise performance,
which cannot be reached in our technology with the use
of full integration of the local oscillator, the requirements
for cordless phones like for the Digital European Cordless
Telecommunications (DECT) standard seem to be achievable.
In a DECT system, the critical point concerning oscillator
phase noise is the emissions due to modulation. There the
emitted power at the output in the third adjacent channel is
specified to be smaller than 20 nW, which equals 47 dBm
[1]. With a maximum output power of 25 dBm, there is
a difference of 72 dB. The specified measurement filter to
get this power has a bandwidth of 1 MHz. So the noise
requirement at this point becomes 132 dB/Hz, which results
dBm
from:
dBm
dB Hz
dB Hz . This is the difference
between the noise level and the output-power normalized to
1 Hz. The frequency offset for this specification is the start
frequency of the measurement filter. As this is in the third
Manuscript received April 10, 1998; revised July, 17, 1998.
M. Zannoth, B. Kolb, and J. Fenk are with Siemens AG, Muenchen D81541, Germany.
R. Weigel is with the University of Linz, Institute for Communication and
Information Engineering, Linz A-4040 Austria.
Publisher Item Identifier S 0018-9200(98)08854-4.

Fig. 1. DECT specification.

adjacent channel, three times the channel spacing of 1.728


MHz has to be taken (see Fig. 1). The filter is centered in
the channel, half of the filter-bandwidth of 1 MHz has to be
subtracted to get the offset frequency
MHz
MHz

MHz
MHz

Normalized to a 100-kHz frequency offset, the requirement is


98.6 dB/Hz, if the phase noise has a constant slope of 20
dB/decade, as assumed by [3]. These are the requirements for
the main transmitter-voltage-controlled oscillator (TX-VCO),
when the following blocks are not dominating in noise. This
is indeed fulfilled.
With fully integrated oscillators these requirements seem to
be possible to realize. This paper presents a fully integrated
oscillator, which achieves the required specification. It uses
an LC oscillator consisting of integrated tuning diodes and
integrated vertically coupled inductors. In this design only two
metal layers in a standard bipolar process are used.
II. OSCILLATOR

WITH

COUPLED INDUCTORS

The phase noise in oscillators depends on the quality factor


of the resonator, the noise figure of the amplifier creating a
negative resistance, and the energy in the resonator [3], [4].
For low phase noise, passive resonators are chosen. With active
inductors, noise is added by the active devices that cannot
be compensated by increasing the quality factor. For best
integration and reproducibility spiral inductors are used, as
described in [5]. The quality factor of the resonator is mainly
limited by the inductors. Here it is limited to four, as only two
metal layers are available. One metal layer has a thickness of
0.8 m. With limited chip area for the inductor and the use of

00189200/98$10.00 1998 IEEE

1988

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998

Fig. 2. Possibilities of feedback.

vertically coupled inductors, a quality factor of only four was


achievable without changing technology parameters.
The simplest way to reduce phase noise is increasing the
resonator energy by applying higher voltages to the resonator.
In this design an emitter-coupled pair with cross feedback
is used as a negative resistance, which is responsible for
undamping the resonator. The limit of the maximum oscillation
amplitude depends on the feedback. There are three ways of
feeding the output-signal back to the input (see Fig. 2). The
easiest way is direct coupling, where no biasing network is
needed and very low power consumption can be achieved.
Using direct coupling the voltage across the resonator is
limited by the base-collector diode of the transistors. When
forward biased, this diode inserts additional damping and
current noise to the resonator causing increased phase noise.
With capacitive coupling [6] this can be avoided. Here no
resistive element is inserted into the feedback. With capacitive
feedback a phase noise of 100 dB/Hz at 100 kHz could
be achieved by [6], having quality factors of eight. The
disadvantage is the need of a high-impedance biasing network
at the transistors base. This biasing network can be realized
by noisy resistors or by large inductors that cost a lot of chipspace. If resistors are used, uncorrelated noise is introduced
to both halfwaves of the signal, when the oscillator acts in
its linear region. This noise is nearly negligible, when a low
resonator is used. In our case the impedance at the input
of the feedback amplifier is about 500 . This impedance
consists of the feedback capacitor and the tank impedance at
resonance. The Bias resistor in parallel is about 4 k , and so
the effect of adding noise is not very dominant. With inductive
coupling the bias current can be fed through the inductor. This
allows connecting a low-impedance biasing network which can
be made of a voltage source. The advantage of connecting a
voltage source directly to the circuit is the absence of resistive
elements that cause white noise, which would be converted to
phase noise by the nonlinear elements. Every DC path can be
blocked carefully against emissions from the supplies without
any resistive element. The maximum voltage at the resonator
can be adjusted by the biasing voltage so that the base-collector
diode is not the limiting element. Now the amplitude of the
swing is limited by the base emitter diode of the transistors
and the limitation of the current source. The energy in the
resonator can be increased and so the phase noise is reduced.
Now the maximum voltage is not one diode-voltage, as in the

Fig. 3. Equivalent circuit of the tuning diode.

case of direct coupling. In the presented design the resonator


.
voltage reaches a value of 3
The limiting elements for the maximum voltage of the
resonator are now the two serial-connected tuning diodes. To
decrease their voltage without reducing the resonator-energy,
a capacitor is added in series at the cost of tuning range.
This capacitor is also responsible for getting a linear tuning
characteristic. To provide the DC-path for the tuning diodes,
resistors are connected in parallel to the coupling capacitances.
These resistors are negligible in sight of reducing the quality
factor because they have a large value of 1 k , which is much
larger than the capacitances impedance of 40 (see Fig. 6).
The quality factor of the capacitance is about 24, which is
at the same range as the varactor. These quality factors are
negligible high relative to that of the inductor.
The inductors are produced as symmetrical quadratic spirals.
At our standard bipolar process only two metal layers could
be used to create vertically coupled inductors. The crosses
are made in the gap between two metal lines (see Fig. 7). The
cost of this technique is the wide gap between the lines, which
causes an increment of the size and parasitic effects like series
resistance and substrate capacitance. The quality factor is as
low as four. This is caused by the technology, where the metal
layers have a poor conductivity and high capacitances to the
medium-doped substrate. In this design an inductor of 2.7 nH
was used. Its series resistance is 4.2 . The coupling factor
was estimated to 0.85. The values of the equivalent circuit (see
Fig. 8) where first calculated by algorithms from [7] and then
fitted to measurement. The coupling capacitor was estimated
from the plate capacitance of the two metal layers.
For tuning, the base-emitter diode of a transistor is used (see
Fig. 3). This has the disadvantage of a high series resistance
and a relatively low capacitance
(base resistance) of 2.6
variation by a factor of 1.75 applying a voltage difference of
2.7 V (see Fig. 4). However, this represents the only way to
create a tuning diode without changing the standard bipolar
process, where no hyperabrupt pn-junctions are available. The
of this varactor was simulated to be about 25 (see Fig. 5)
when it is calculated from 1/(jwRC). The base-collector diode
could not be used for tuning, because it does not have such a
large capacitance variation.

ZANNOTH et al.: FULLY INTEGRATED VCO

1989

Fig. 7. Simplified layout of the inductor.

Fig. 4. Capacitance of the tuning diode.

Fig. 8. Equivalent circuit of the inductor.

Fig. 5. Quality factor of the tuning diode.

Fig. 9. Output buffer.

resonator, a high impedance is required for minimizing the


effects of the load. The signal is fed through small coupling
capacitances (400 fF) to emitter-followers that provide this
high input impedance. This first stage drives a differential amplifier with open collector outputs. A balun can be connected
that transforms the differential signal to a single ended one
that can be connected to 50 . The current-consumption of
the amplifier is about 6 mA. Its output power is about 8
dBm at 50 , which is enough for noise measurements.
Fig. 6. VCO with coupled inductors.

III. RESULTS
To get the signal into a 50- measurement system a buffer
is added (see Fig. 9). As the signal is taken directly from the

The measured phase noise of the oscillator can be calculated


from expressions by Leeson [3] and has a slope of 20 dB/Dec

1990

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998

SUMMARY
Supply voltage

Measured tuning characteristic.

12 mA

Current of amplifier

6 mA

Output power

08 dBm

Center frequency

1.96 GHz

Tuning constant (KVCO)

055 MHz/V

Tuning range (0 . . . 2.7 V)

150 MHz

Phase noise @ 4.7 MHz

0102 dBc/Hz
0136 dBc/Hz

Suppression of harmonics

23 dB

Size of one inductor


Chip size without Pads

Fig. 11.

2.7 V

Current of oscillator core

Phase noise @ 100 kHz

Fig. 10.

TABLE I
VCO CHARACTERISTICS

OF THE

2 215 m
600 m 2 600 m
215 m

Measured and simulated phase noise.

measured at offset-frequencies between 100 kHz and 50 MHz,


which represent the measurement limits. If the phase noise is
calculated from

[5]
we get a value of 135 dB/Hz at an offset frequency of
4.7 MHz. The effective series resistance is about 16 .
It is taken from the series resistances of the inductor, the
tuning diodes, and the coupling capacitances. The amplitude
(3
)
of the oscillation was simulated to be 1.5
at a resonance frequency of 2 GHz. The noise factor of the
amplifier was supposed to be about two. The expressions from
[3] give a approximation for the expected noise. For better
calculation, nonlinear effects have to be considered. In [12],
such calculations as the correlation between waveform and
phase noise are shown.
The simulation with spectre RF shows nearly the same
values as the measured ones. In Fig. 11 the simulated and
measured noise are shown up to an offset frequency of 6 MHz.
In the simulation the equivalent circuits are taken from above.
The simulator uses nonlinear methods [13], [14] to calculate
noise. It gives nearly the same results as the measurement and
the small signal approximation from above.
The measurement shows that the free running oscillator has
a resonant frequency of 1.96 GHz, a phase noise of 136

Fig. 12. Chip photograph.

dB/Hz at 4.7 MHz offset frequency, and a VCO constant


of 55 MHz/V (see Figs. 10 and 11). The phase noise is
measured in the middle of the tuning range. With a thicker
metal layer (1.2- m aluminum instead of 0.8 m) a phase
noise of even 137 dB/Hz can be achieved. This improvement
is achieved by reducing the series resistance of the inductor
and increasing its quality factor. The tuning characteristic is
nearly linear (see Fig. 10) and the noise performance varies
only less than 1 dB at the whole tuning range from 0
V2.7 V. The linearity of this characteristic is improved by
the series capacitance (Fig. 6) added to the varactor diode.
The current through the oscillator core is about 12 mA;
the supply voltage is 2.7 V. At tuning voltages above the
supply voltage the varactor diodes get forward biased and
so the frequency stays nearly constant and the noise rises.

ZANNOTH et al.: FULLY INTEGRATED VCO

This occurs because of the reduction of the quality factor


and the introduction of additional current noise due to the
forward-biased diodes.

1991

Markus Zannoth was born in Munich, Germany, in


1971. He received the Dipl. Ing. degree in electrical
engineering in 1996 from the Technical University
of Munich, Munich, Germany. Since 1996, he has
been working towards the Dr.Ing. degree at Siemens
AG and the Technical University of Munich.
His doctoral research is on integrated oscillators.

IV. CONCLUSION
A fully integrated bipolar VCO is realized (see Fig. 12) that
achieves a measured phase noise of 136 dB/Hz at 4.7 MHz.
The oscillator has a linear tuning characteristic with a tuning
range of 150 MHz at a center frequency of 1.96 GHz. Further
characteristics are given in Table I.
In this design two metal layers are used to build vertically
coupled integrated inductors. These have quality factors of
about four. Integrated varactor diodes are implemented by
using base-emitter diodes of transistors. With this design the
noise requirements of the DECT-specification of 132 dB/Hz
at 4.7 MHz frequency offset are achieved with a margin of
4 dB. The output power is 8 dBm at 50 , with a center
frequency of 1.95 GHz. For the use of this oscillator in a DECT
product, the varactor-capacitance will be increased until the
required center frequency of 1.88 GHz is reached. The design
has been realized in standard high-volume bipolar process with
of 25 GHz.
an
REFERENCES
[1] ETSI, Digital European Cordless Telecommunications (DECT) Common
Interface, Part 2: Physical Layer, Oct. 1992.
[2] L. L. Larson, RF and Microwave Circuit Design for Wireless Communications. Boston: Artech House, 1996.
[3] B. D. Leeson, A simple model of feedback oscillator noise spectrum,
Proc. Lett. IEEE, pp. 329330, Feb. 1966.
[4] G. Sauvage, Phase noise in oscillators: A mathematical analysis of
Leesons model, IEEE Trans. Instrum. Meas., vol. IM-26, pp. 408410,
Dec. 1977.
[5] J. Craninckx and M. S. J. Steyaert, A 1.8-GHz low-phase-noise CMOS
VCO using optimized hollow spiral inductors, IEEE J. Solid-State
Circuits, vol. 32, pp. 736744, May 1997.
[6] G. Palmisano, M. Paparo, F. Torrisi, and P. Vita, Noise in fully
integrated PLLs, in Proc. 6th Workshop Advances in Analog Circuit
Design AACD97, Como, Italy, pp. 119.
[7] J. Crols, P. Kinget, J. Craninckx, and M. Steyaert, An analytical model
of planar inductors on lowly doped silicon substrates for high frequency
analog design up to 3 GHz, in IEEE Symp. VLSI Circuit Dig. Tech.
Papers, 1996, pp. 2829.
[8] J. N. Burghartz, M. Soyuer, and K. A. Jenkins, Microwave inductors
and capacitors in standard multilevel interconnect silicon technology,
IEEE Trans. Microwave Theory Tech., vol. 44, pp. 100104, Jan. 1996.
[9] L. Dauphinee, M. Copeland, and P. Schvan, A balanced 1.5 GHz
voltage controlled oscillator with an integrated LC resonator, in Proc.
ISSCC97, Session 23, Analog Techniques, pp. 390391.
[10] I. B. Jansen, K. Negus, and D. Lee, Silicon bipolar VCO family for
1.1 to 2.2 GHz with fully-integrated tank and tuning circuits, in Proc.
ISSCC97, Session 23, Analog Techniques, p. 392.
[11] B. Razavi, A 1.8 GHz CMOS voltageControlled oscillator, in Proc.
ISSCC97, Session 23, Analog Techniques, pp. 388389.
[12] K. A. Hajimiriand and T. H. Lee, A general theory of phase noise in
electrical oscillators, IEEE J. Solid-State Circuits, vol. 33, pp. 179194,
Feb. 1998.
[13] CADENCE, Oscillator Noise Analysis in SpectreRF, application note to
SpectreRF, 1998.
[14] F. X Kartner, Untersuchung des Rauschverhaltens von Oszillatoren,
Ph. D. dissertation, Tech. Univ. Munich, Munich, Germany, 1988.

Bernd Kolb was born in 1972. He studied electrical engineering with


an emphasis on telecommunication techniques at the Georg-Simon-OhmPolytechnic Nuremberg. There, he received the Dipl.Ing. (FH) degree in 1995.
He joined the Siemens High Frequency IC Department in 1995. Since
then, he has worked in the field of oscillators, frequency dividers, and vector
modulators. He has focused on designing highly integrated transmitter ICs
for mobile communication. He is now with Lucent Network Systems GmbH
Nuremberg, Germany, where he designs high-frequency parts of base station
for mobile communication.

Joseph Fenk received the diploma in electronics


from the Technical University of Munich, Munich,
Germany, in 1968.
He is responsible for product definition and
project management of communications RFintegrated circuits at Siemens Components, Inc.,
Integrated Circuit Division. After joining Siemens
in 1968, he worked as a Development Engineer
on high-frequency components in the Discrete
Components Group, developing transmitters, aerial
and tuner transistors, FETs, and Varactor and PIN
diodes. In 1976, he joined the Integrated Circuits Group as a Design Engineer
for consumer products. He has been engaged in the development of integrated
circuits for infrared preamplifiers, prescalers, IF-amplifiers/demodulators for
FM-radio and satellite-TV, mixer/oscillators FM radio, TV-and SAT-TV, and
TV UHF/VHF modulator ICs, as well as circuits for narrowband FM mobile
radio. He holds more than 50 patents relating to IC and system design and
has presented technical papers at numerous industry conferences and forums.

Robert Weigel (S88M89SM95) was born in Ebermannstadt, Germany,


in 1956. In 1989, he received the Dr.Ing. degree, and in 1992 the Dr.Ing.habil
degree, both in electrical engineering from the Technical University of
Munich, Munich, Germany.
From 1982 to 1988, he was a Research Assistant, from 1988 to 1994,
he was a Senior Research Engineer, and from 1988 to 1996, he was a
Professor at the Technical University of Munich. In the winter of 19941995,
he was a Guest Professor at the Technical University of Vienna, Vienna,
Austria. Since 1996, he has been Head of the Institute for Communication
and Information Engineering at the University of Linz, Austria. He has been
engaged in research and development on microwave theory and techniques,
integrated optics, high-temperature superconductivity, surface acoustic wave
(SAW) technology, and digital and microwave communication systems. In
these fields, he has published more than 120 papers and has given more than
90 international presentations. His work includes European research projects
and international journals.
Dr. Weigel is a senior member of the IEEE Microwave Theory and Techniques and the Ultrasonics, Ferroelectrics, and Frequency Control Societies.
He is also a member of the Institute for Systems and Components of the
Electromagnetics Academy, the Informationstechnishe Gesellschaft (ITG) in
the Verband Deutscher Elekrotechniker (VDE), and the Society of PhotoOpticals Instrumentation Engineers (SPIE). In 1993 he was a co-recipient of
the MIOP-award.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

295

A Simple Precharged CMOS Phase Frequency Detector


Henrik O. Johansson
Abstract We propose a simple precharged CMOS phase
frequency detector (PFD). The circuit uses 18 transistors and has
a simple topology. Therefore, the detector, in a 0.8-m CMOS
process, works up to clock frequencies of 800 MHz according to
SPICE simulations on extracted layout. Further, the detector has
no dead-zone in the phase characteristic which is important in
low jitter applications. The phase and frequency characteristics
are presented and comparisons are made to other PFDs. The
phase offset of the detector is sensitive to differences of the dutycycle between the inputs. Mixed-mode simulations are presented
of the lock-in procedure for a phase-locked loop (PLL) where
the detector is used. Measurements on the detector are presented
for a test-chip with a delay-locked loop (DLL) where the phase
detection ability of the detector has been verified.

Fig. 1. Conventional phase frequency detector (conPFD) from [2].

Index Terms CMOS integrated circuits, delay lock loops,


phase detectors, phase lock loops.

I. INTRODUCTION

part of a phase-locked loop (PLL) is the phase detector


(PD) [1]. The PD detects the phase difference between
the reference frequency and the controlled slave frequency.
Some PDs also detect frequency errors, they are then called
phase frequency detectors (PFDs). A PFD is usually built
with a state machine with memory elements such as flip-flops
[2], [3], Figs. 1 and 2, respectively. We propose a new simple
PFD, ncPFD, which uses two nc-stages [4] and six inverters,
Fig. 3(a).
A drawback with some phase detectors is a dead zone in
the phase characteristic at the equilibrium point. The dead zone
generates phase jitter since the control system does not change
the control voltage when the phase error is within the dead
zone.
In Section II the ncPFD circuit is described. The phase
and frequency characteristics are discussed in Sections III and
IV, respectively, and comparisons are made to other PFDs.
Behavioral mixed-mode simulations are made to check the
lock-in properties of the ncPFD detector and these simulations
are shown in Section V. Experiments on the phase detection
abilities of the ncPFD are presented in Section VI.

Fig. 2. Precharge type phase frequency detector (ptPFD) from [3].

II. CIRCUIT
The transistor schematic of the ncPFD is shown in Fig. 3(a).
The detector has a 0-rad phase offset. The main part of the
circuit is the nc stage [4]. Delays (two inverters) are inserted
at the reference and slave inputs in order to remove the dead
zone in the phase characteristics around rad phase error. In
Fig. 4, waveforms for the circuit in Fig. 3(a) are shown when
the slave input lags the reference input.
Manuscript received March 11, 1997; revised August 21, 1997.
The author is with Electronic Devices, Department of Physics and Measurement Technology, Linkoping University, S-58183 Linkoping, Sweden.
Publisher Item Identifier S 0018-9200(98)00732-X.

(a)

(b)

Fig. 3. (a) The ncPFD in zero degree phase offset version. (b) Modified
version with  rad phase offset.

The detector can easily be modified to one with -rad phase


offset, as shown in Fig. 3(b), where one, or in general an odd
number, of inverter(s) are used for the delays.
If the phase detector is used only as a phase detector, i.e., not
as a frequency detector, the circuit in Fig. 3(a) can be used as

00189200/98$10.00 1998 IEEE

296

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

Fig. 4. Waveforms for the case when slave lags after the reference signal.
The pulse width of the up signal is larger than for the down signal.

Fig. 6. Magnified phase characteristics at zero phase error of the ncPFD


(solid line), conPFD (dashed line), and the ptPFD (dash-dot line) from SPICE
level-2 simulations of extracted layout, VDD = 3:0 V and f = 50 MHz.

Fig. 5. Phase characteristics of the ncPFD (solid line), conPFD (dashed line),
and the ptPFD (dash-dot line) from SPICE level-2 simulations of extracted
layout, VDD = 3:0 V and f = 50 MHz.

a -rad phase detector by switching the up and down signals.


The equilibrium point will then be on the negative slope of the
phase characteristics at rad instead of at the positive slope
at zero, Fig. 5. Similarly, the -rad phase detector, Fig. 3(b),
can be modified to a 0-rad phase detector.
III. PHASE CHARACTERISTIC
The phase characteristic of the proposed ncPFD is shown
in Fig. 5 together with the characteristics of the conventional
PFD (conPFD) of Fig. 1 [2] and the precharge type PFD
(ptPFD) shown in Fig. 2 [3]. Unlike the conPFD and ptPFD,
there is no dead-zone in the characteristics of the ncPFD. A
magnification of the characteristics at zero phase is shown
in Fig. 6. The dead zone of the conPFD can be reduced by
inserting delay at the output of the four-input-NAND-gate. But
if delays are inserted in the feedback signals from the up
and down outputs of the ptPFD, the dead zone unfortunately
increases.
In an ncPFD, when the PLL is locked, both up and down
signals are active. Therefore the phase offset of the PLL
depends on the matching between the up and down currents
of the charge pump.
All data in this section are based on simulations of extracted
layout with SPICE (level-2) when
V and
MHz unless otherwise stated. The layout was made in a
0.8- m standard CMOS process and the N and P-transistors
are 2.0 and 4.0 m wide, respectively. The outputs were
connected to 4.0 fF capacitors, and the inputs were driven
with inverters with a tapering factor of one.
A. Duty-Cycle and Transition-Time Dependence
The output of the ncPFD depends on the pulse-width of
the input signals. Hence, the duty cycle will affect the phase

Fig. 7. Phase characteristics for three cases with different duty cycles. The
reference input duty cycle is 50% for all cases and the slave input has the
duty-cycles 45%, 50%, and 55% for the dashed, solid, and dasheddotted
lines, respectively.

characteristics. The phase characteristics are checked for three


different duty cycles, 45, 50, and 55%.
When both the reference and slave have the same duty cycle,
the phase offset is not affected. There is a dead zone at -rad
when the duty cycle is less than 50%. A duty cycle of 45%
gives a dead zone width of 0.50 rad, 1.6 ns, at rad. This
dead zone may result in a metastable state of the control loop.
When the duty cycle is different for the two inputs, the
phase offset will be nonzero, Fig. 7. A duty cycle difference
of 5% at 50 MHz, i.e., 1 ns, gives a phase offset of
rad, i.e., 630 ps.
The phase characteristic of the ncPFD is not affected by
variations of the rise and fall times when they are in the range
of 300 ps up to 600 ps.
B. Maximum Operation Frequency
A maximum operation frequency definition can be found in
[3]. The definition is that the maximum operation frequency
is one over the shortest period with correct up and down
signals when the inputs have the same frequency and 90
phase difference. This definition is easily applicable on flipflop-based PFDs where this frequency is easily identified.
Unfortunately, the degradation of the performance of the

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

297

Fig. 10. Waveforms for the case when the slave has a higher frequency than
the reference signal. The down signal has higher duty cycle than the up signal.

Fig. 8. The width of the dead zones of the ncPFD (solid), ptPFD (dashed),
and conventional PFD (dash-dot) as function of frequency. The frequency
resolution is 100 MHz and the supply voltage is 5.0 V. The plot is based on
SPICE simulations of extracted layout.

Fig. 11. Frequency sensitivity for the ncPFD (solid), ptPFD (dash-dot),
and conPFD (dashed). The plot is based on behavioral simulations with
20 different initial phases for each frequency and the mean-value for each
frequency is plotted. The reference frequency is 50 MHz.

Fig. 9. Maximum frequency as function of supply voltage for the ncPFD


(solid line), the ptPFD (dash-dot line), and the conPFD (dashed line). The
frequency resolution is 25 MHz. The plot is based on simulations of extracted
layout. The layouts are made in a standard 0.8-m CMOS process.

ncPFD is gradual for increasing frequency and this makes it


hard to find a specific frequency where the circuit starts to
malfunction.
Therefore, we define the maximum operation frequency
to be the frequency where the size of the dead zone starts
to deviate significantly from the low-frequency value. This
definition gives similar results for the flip-flop-based phase
detectors as for the definition in [3], and it is applicable on the
ncPFD. An example of how the dead-zone-width varies with
the frequency is shown in Fig. 8.
The maximum speeds for different supply voltages are
plotted in Fig. 9 for the three PFDs of Figs. 1, 2, and 3(a).
As seen, the maximum speed of the ncPFD and the ptPFD are
similar and the conPFD is approximately three times slower.
IV. FREQUENCY CHARACTERISTICS
A frequency dependent phase detector always has some
kind of memory. For the ncPFD, the memory consists of the
two dynamic nodes at the output of the nc-stages. In Fig. 10,
the frequency of the slave input is approximately three times
higher than the reference input frequency, as a result, the down
signal has a higher duty cycle than the up signal. Thus the
slave frequency should decrease.

The average frequency sensitivities of the ncPFD, ptPFD,


and conPFD are shown in Fig. 11. The frequency sensitivity
is represented by the rate of change in the control voltage
of the loop filter of a PLL when the slave input is driven
by a pulse generator with a fixed frequency instead of the
voltage-controlled oscillator (VCO) output. Each frequency
is simulated 20 times with different initial phases, i.e., skew
between the inputs.
The ptPFD has the largest sensitivity, followed by the
conPFD, and the ncPFD has the lowest. The sensitivity goes to
zero as the slave frequency approaches the reference frequency
for both the ncPFD and ptPFD. But for the conPFD, the
sensitivity is relatively high even for frequencies close to the
reference.
In Fig. 12 the sensitivity for the ncPFD is shown with the
mean, minimum, and maximum values from the 20 simulations
for each frequency. Note that the behavior of the minimum and
maximum values are almost random.
For the ncPFD, the minimum absolute value of the sensitivity is close to zero for certain frequencies, Fig. 12. Actually,
the sensitivity is zero for some frequency ratios and phase
combinations. This is the case also for the ptPFD but not for
the conPFD. The condition for this seems to be that when the
frequency ratio of the reference and slave inputs is a rational
number and the ratio is in the interval 1/2 to 2, including the
limits, the sensitivity is zero for certain initial phases. We have
no general proof of the previous statement but, for example,
the sensitivity of the ncPFD for
as
function of initial phase is shown in Fig. 13. The sensitivity is
zero for the phases 0.0, 2.5, and 5.0 ns. This lack of sensitivity
may lead to false locking for a PLL in operation. However,

298

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

Fig. 12. Frequency sensitivity for the ncPFD for a number of frequencies.
The plot is based on behavioral simulations with 20 different initial phases
for each frequency. The solid line is the mean value and the symbols are
the minimum and maximum values. The reference frequency is 50 MHz.

Fig. 14. Lock-in process of a third-order PLL with the ncPFD as phase
frequency detector. The loop filter and PLL data are shown in the upper right
corner.

language M [6]. The loop filter used ideal R and C models


in circuit mode with analog voltages. The loop filter and PLL
data are shown as an inset in Fig. 14. A lock-in simulation is
shown in Fig. 14. The simulation is done with the presence of
300 ps peak-to-peak phase noise.
Because of the sawtooth-shaped frequency sensitivity of the
ncPFD (for a fixed frequency offset and varied initial phase),
Fig. 13, and the presence of noise, the lock-in time is not
deterministic but random. The lock-in times for 60 simulations
have been analyzed. Most simulations show a lock-in time of
7 s and the largest time is 16 s. There is no upper limit on
the lock-in time. One simulation took approximately 3 cpu-min
on a SPARC 10 workstation.

Fig. 13. Frequency sensitivity for the ncPFD when the slave frequency is
4/5 of the reference frequency. For the initial phases of 0.0, 2.5, and 5.0 ns
the sensitivity is zero.

this false locking will not be stable, since a small phase change
results in a nonzero sensitivity and drives the loop back to lock.
One way to add small phase changes to the simulation is to
include phase noise which is always present in an oscillator.
When we add phase noise of approximately 300 ps peak-topeak to the simulations, the normalized minimum sensitivity
which was zero will increase to approximately 0.01. The
improvement is not significant but the sensitivity will be
nonzero and positive for all phases. Hence, false locking is
avoided. To further enhance the phase noise during the lock in
process, one could use dithering techniques, i.e., add the signal
from a noise/signal source to the control voltage of the VCO.
V. BEHAVIORAL MIXED-MODE SIMULATIONS
In order to understand the sensitivity to frequency errors
and lock-in properties of the proposed detector, a complete
third-order charge pump PLL system was simulated using a
multilevel mixed-mode simulator, Lsim [5]. The PFD was
represented by a schematic simulated in switch mode. The
VCO, phase-noise generator, and charge pump are represented
by behavioral models written in the hardware description

VI. EXPERIMENTS
The phase detection properties of the ncPFD have been
verified experimentally with a test chip. The test chip is a line
receiver for serial data that utilizes several parallel samplers
to receive bit rates of 2.0 Gb/s [7]. The phase detector was
used in a delay-locked loop (DLL) which generates control
signals for the sampling switches used in the line receiver.
The ncPFD, Fig. 3(a), was used as a -rad phase detector and
the delay line was half a wavelength long.
The skew between the reference and slave signals is not
possible to measure directly. This quantity has been measured
indirectly through measurement error compensation circuits to
be about 125 ps at
MHz. Unfortunately, there is no
control of how large the measurement error is.
The circuit blocks used to measure the offset are shown in
Fig. 15. The two clocks that we want to compare come from
the beginning and the end of the delay line. They are fed into
two matched inverter chains where the propagation delay for
rising and falling edges are matched against process variations
[8]. The delay from the multiplexer inputs to the oscilloscope
screen for the two signal paths are not matched. Two measurements are done to compensate this. One where the delay
line input signal goes uninverted through Output buffer 1 and
one where the same signal goes inverted through the Output
buffer 2. The measured skew including the measurement error

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 2, FEBRUARY 1998

299

Fig. 15. DLL, phase offset measurement circuitry, and NMOS transistor to
access the control voltage.
Fig. 16. Oscilloscope screen dump of the drain voltage of an NMOS
transistor with external pull-up resistor where the gate is connected to the
control voltage. Four different lock-in procedures are shown. The initial
control voltages are 0.0, 1.0, 2.0, and 3.0 V for the curves from top to
bottom, respectively.

for the measurements will be as follows:


skew
skew

inv
inv
inv
inv

mux
mux
mux
mux

Buf
Buf
Buf
Buf

(1)
VII. CONCLUSIONS
(2)

where
is the real skew and inv
and inv
are the
delays through the four inverters long chains for falling and
rising edges through the left and right chain, respectively.
Similarly, inv
and inv
are for the five inverters long
chains. And mux
and mux
are the delays through the
multiplexers. The Buf
and Buf
are the delays through
the output-buffers and through the oscilloscope input-channels.
The sum of the skews (1) and (2) is
skew

skew

inv
inv

inv
inv

(3)

Note that the expression is independent of the mux and Buf


delays. Hence, theoretically, if the rise and fall delays of the
inverter chains are matched properly, there will not be any
measurement error.
In Fig. 16 an oscilloscope screen dump with four lock-in
procedures is shown. The signal is the drain voltage of an
NMOS transistor with an external pull-up resistor and with the
gate connected to the control voltage as shown in Fig. 15. The
lock-in time is less than 200 s. Ideally, the control voltage
should go monotonically to the equilibrium voltage. Therefore,
the beating in the lock-in procedure when the initial control
voltage is 3.0 V is unexpected. The reason for this is unknown.

A new PFD without a dead zone has been proposed.


The circuit topology is simple and has no feedback loops.
Simulation results indicate that the circuit can operate up
to 800 MHz in 0.8- m CMOS with a 5-V supply. The
detectors phase offset depends on the duty cycle of the inputs.
Measurements have been performed on the detector when it
was used in a DLL as a phase detector and the functionality
was verified.
REFERENCES
[1] R. E. Best, Phase-Locked Loops, 2nd ed. New York, NY: McGrawHill, 1993.
[2] N. H. E. Weste and K. Eshragrian, Principles of CMOS VLSI Design,
2nd ed. Reading, MA: Addison Wesley, 1993.
[3] H. Kondoh, H. Notani, T. Yoshimura, H. Shibata, and Y. Matsuda,
A 1.5-V 250-MHz to 3.0-V 622-MHz operation CMOS phase-locked
loop with precharge type phase-detector, IEICE Trans. Electron., vol.
E78-C, no. 4, pp. 381388, Apr. 1995.
[4] P. Larsson and C. Svensson, Skew safety and logic flexibility in a true
single phase clocked system, in Proc. IEEE Int. Symp. Circuits Syst.,
1995, pp. II:941944.
[5] Mentor Graphics, Explorer Lsim Users Manual. Mentor Graphics
Corp., 1992.
[6] Mentor Graphics, M Language Users Guide. Mentor Graphics Corp.,
1991.
[7] H. O. Johansson, J. Yuan, and C. Svensson, A 4 Gsamples/s LineReceiver in 0.8 m CMOS, in Proc. Symp. VLSI Circuits, 1996, pp.
116117.
[8] M. Shoji, Elimination of process-dependent clock skew in CMOS
VLSI, IEEE J. Solid-State Circuits, vol. SC-21, pp. 875880, Oct. 1986.

1654

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001

Rotary Traveling-Wave Oscillator Arrays:


A New Clock Technology
John Wood, Terence C. Edwards, Member, IEEE, and Steve Lipa, Student Member, IEEE

AbstractRotary traveling-wave oscillators (RTWOs) represent a new transmission-line approach to gigahertz-rate clock
generation. Using the inherently stable LC characteristics of
on-chip VLSI interconnect, the clock distribution network becomes a low-impedance distributed oscillator. The RTWO operates
by creating a rotating traveling wave within a closed-loop differential transmission line. Distributed CMOS inverters serve as both
transmission-line amplifiers and latches to power the oscillation
and ensure rotational lock. Load capacitance is absorbed into the
transmission-line constants whereby energy is recirculated giving
an adiabatic quality. Unusually for an LC oscillator, multiphase
(360 ) square waves are produced directly. RTWO structures
are compact and can be wired together to form rotary oscillator
arrays (ROAs) to distribute a phase-locked clock over a large chip.
The principle is scalable to very high clock frequencies. Issues
related to interconnect and field coupling dominate the design
process for RTWOs. Taking precautions to avoid unwanted signal
couplings, the rise and fall times of 20 ps, suggested by simulation,
may be realized at low power consumption. Experimental results
of the 0.25- m CMOS test chip with 950-MHz and 3.4-GHz rings
are presented, indicating 5.5-ps jitter and 34-dB power supply
rejection ratio (PSRR). Design errors in the test chip precluded
meaningful rise and fall time measurements.
Index TermsClocks, MOSFET oscillators, phase-locked oscillators, phased arrays, synchronization, timing circuits, transmission line resonators, traveling-wave amplifiers.

Researchers have therefore looked to alternative oscillator


mechanisms for better phase stability and lower power consumption. Previous transmission-line systems such as salphasic
distribution [6], distributed amplifiers [7], and adiabatic LC resonant clocks [8] provide only a sinusoidal or semisinusoidal
clock, making fast edge rates difficult to achieve.
This paper introduces the rotary traveling-wave oscillator
(RTWO); a differential LC transmission-line oscillator which
produces gigahertz-rate multiphase (360 ) square waves with
low jitter. Extension of the RTWO to rotary oscillator arrays
(ROAs) offers a scalable architecture with the potential for
low-power low-skew clock generation over an arbitrary chip
area without resorting to clock domains. Simulations predict
rise and fall times of 20 ps on a 0.25- m process and a
of the integrated
maximum frequency limited only by the
circuit technology used.
Experiments show that although the RTWO operates differentially, careful attention is required to guard against magnetic
field couplings between the clock conductors and other structures if the potential performance of these oscillators is to be
realized.
II. CONCEPT OF THE ROTARY CLOCK OSCILLATOR

I. INTRODUCTION

A. Fundamentals and Structures

LOCKING at gigahertz rates requires generators with low


skew and low jitter to avoid synchronous timing failures.
The notion of a clocking surface becomes untenable at gigahertz rates [1], frequently mandating that large VLSI chips are
subdivided into multiple clock domains and/or utilize skew-tolerant multiphase circuit design techniques [2].
Techniques such as distributed phase-locked loops (PLLs)
[3] and delay-locked loops (DLLs) [4] can control systematic
skew to within 20 ps, but are complex, introduce random skew
(i.e., jitter), and have area penalties. H-tree distribution systems,
while simple, are difficult to balance and can use upwards of
30% of a chips total power budget [5]. All these systems are
inherently single-phase, induce large amounts of simultaneous
switching noise, and can be highly susceptible to this noise.
Manuscript received March 20, 2001; revised June 28, 2001. This work was
supported by Multigig Ltd., and also supported in part by the National Science
Foundation under Award EIA-31332.
J. Wood is with MultiGig, Ltd., Northampton NN8 1RF, U.K. (e-mail:
john.wood@multigig.com).
T. C. Edwards is with Engalco, Huntington, YO32 9NY, U.K. (e-mail: enquiries@engalco.com).
S. Lipa is with the Microelectronics Systems Laboratory, North Carolina State
University, Raleigh, NC 27695 USA.
Publisher Item Identifier S 0018-9200(01)08220-8.

The basic ROA architecture is shown in Fig. 1. A representative multigigahertz rotary clock layout has 25 interconnected
RTWO rings placed onto a 7 7 array grid. Each ring consists
of a differential line driven by shunt-connected antiparallel inverters distributed around the ring. This arrangement produces
a single clock edge in each ring which sweeps around the ring
at a frequency dependent on the electrical length of the ring.
Pulses are synchronized between rings by hard wiring which
forces phase lock.
Fig. 2 illustrates the theory behind the individual RTWO.
Fig. 2(a) depicts an open loop of differential transmission line
(exhibiting LC characteristics) connected to a battery through
an ideal switch. When the switch is closed, a voltage wave begins to travel counterclockwise around the loop. Fig. 2(b) shows
a similar loop, with the voltage source replaced by a cross-connection of the inner and outer conductors to cause a signal inversion. If there were no losses, a wave could travel on this ring
indefinitely, providing a full clock cycle every other rotation of
the ring (the Mbius effect).
In real applications, multiple antiparallel inverter pairs are
added to the line to overcome losses and give rotation lock.
Rings are simple closed loops and oscillation occurs spontaneously upon any noise event. Unbiased, startup can occur in

00189200/01$10.00 2001 IEEE

WOOD et al.: ROTARY TRAVELING-WAVE OSCILLATOR ARRAYS

1655

Fig. 3. Waveforms of line voltage and line current for the 3.4-GHz clock
simulation example.

B. Waveforms
Fig. 3 shows simulated waveforms of a 3.4-GHz RTWO taken
at an arbitrary position on the ring. The design has the following
characteristics for reference:
Fig. 1.
phase.

Basic rotary clock architecture. The

= signs denote points with same

Fig. 2. Idealized theory underlying the RTWO. (a) Open loop of differential
conductors to a battery via a switch. (b) Similar loop but with the voltage source
replaced by the inner and outer conductors cross-connected.

either rotational senseusually in the direction of lowest loss.


Deterministic rotation biasing mechanisms are possible, e.g., directional coupler technology or gate displacement [9]. Once a
wave becomes established, it takes little power to sustain it, because unlike a ring oscillator, the energy that goes into charging
and discharging MOS gate capacitance becomes transmission
line energy, which is recirculated in the closed electromagnetic
path. This offers potential power savings as losses are not related
but rather to
dissipation in the conductors where
to
can be reduced, e.g., by adoption of copper metallization.

m
Conductors: Width
m
Pitch
m
Ring Length
Metallization: 1.75 m copper
nH
Loop inductance total
Process: 0.25- m CMOS
Nch total width: 2000 m
Pch total width: 5000 m
Number of inverters: 24 pairs.
Very large distributed transistor widths give substantial capacitive loading to the lines, thus lowering velocity to give a
reasonably low clock rate from a compact oscillator structure.
In application, up to 75% of this capacitance can come from
load capacitance, reducing the size of the drive transistors accordingly.
The upper traces of Fig. 3 show the simulated voltage waveforms on the differential line at points labeled A0, B0. The lower
traces show the current in the conductors to be 200 mA, while
the supply current is simulated at 84 mA with 4.5 mA of
ripple. This clearly illustrates that energy is recycled by the basic
operation of the RTWO. Just driving the 34 pF of capacitance
).
present would require 275 mA at this frequency (from
C. Phase Locking
Interconnected rings, as in Fig. 1(a), will run in lockstep, ensuring that the relative phase at all points of an ROA are known.
It is possible to use a large array of interconnected rings to distribute a clock signal over a large die area with low clock skew.
For example, referring to Fig. 1(a), all the points marked with
have the same relative phase as that arthe equals sign
bitrarily marked as 0 . At any point along the loop, the two
signal conductors have waveforms 180 out of phase (two-phase

1656

Fig. 4. Voltage, current, and phase relationships versus rotation direction


(Poyntings vector).

nonoverlapping clock). A full 360 is measured along the complete closed path of the loop. In principle, an arbitrary number of
clock phases can be extracted. Phase advances or retards depend
on the direction of rotation, and Fig. 4 shows the currentvoltage
relationships for clockwise and counterclockwise rotation.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001

Fig. 5. Three-dimensional view of the structure. The two differential lines


are shown, with current flow arrows (main and charge/boost) and encircling
H-fields. CMOS transistors are also shown complete with supply voltages (V
and V ) and both p- and n-channels.

D. Network Rules
Although the square-ring shape is convenient to show diagrammatically, it is only one example of a more general network solution which requires ROAs to conform closely to the
following rules.
1) Signal inversion must occur on all (or most) closed paths.
2) Impedance should match at all junctions.
3) Signals should arrive simultaneously at junctions.
From 1) above, any odd number of crossovers are allowed on
the differential path and regular crossovers forming a braided
or twisted pair effect can dramatically reduce the unwanted
coupling to wires running alongside the differential line.
The differential lines would typically be fabricated on the top
metal layer of a CMOS chip where the reverse-scaling trend of
VLSI interconnect offers increasingly high performance [10].
E. Fields and Currents
Fig. 5 illustrates a three-dimensional section of the ring structure connected to a pair of CMOS inverters expanded to show
the four individual transistors. The main current flow in the differential conductors is shown by solid arrows, the magnetic field
surrounding these conductors by dashed loops, and the capacitance charge/signal-boost current flowing through the transistors by dashed lines.
An important feature of differential lines is the existence of a
well-defined go and return path which gives predictable inductance characteristics in contrast to the uncertain return-current path for single-ended clock distribution [11].
Capacitance arises mainly from the transistor gate and depletion capacitance and interconnect capacitance does not dominate.
indicates intrinsic gate resistance, i.e., the ohmic path
implies a
through which the gate charge flows. The term
parasitic gate term, but in reality, most of this resistance is in
the series circuit of the channel under the gate electrode. This is
shared by the D-S channel, as illustrated by the triangular region
(shown with transistors operating in the pinchoff region).

Fig. 6. Expanded view of short sections of the transmission line, including


three sets of back-to-back inverters as a wavefront passes.

F. Coherent Amplification, Rotation Locking


Fig. 6 is an expanded view of a short section of transmission
line with three sets of back-to-back inverters shown. It is assumed that startup is complete and the rotating wave is sweeping
left to right. For this analysis, we view the inverter pairs as discrete latch elements.
Each latch switches in turn as the incident signal, traveling on
the low impedance transmission line, overrides the ON resistance
of the latch and its previous state. This clash of states occurs
only at the rotating wavefront and therefore only one region is in
this cross-conduction condition at any one time. The transmission-line impedance is of the order of 10 and the differential
on-resistance of the inverters is in the 100- 1-k range, depending on how finely they are distributed throughout the structure.
Once switched, each latch contributes for the remainder of the
half cycle, adding to the forward-going signal. Coherent buildup
of switching events occurs in this forward direction only. An
equal amount of energy is launched in the reverse direction, but
the latches in that direction cannot be switched further into the
state to which they have already switched. The reverse-traveling
components simply reduce the amount of drive required from
those latches.
Importantly, it is the nonlinear latching action which is responsible for the self-locking of direction (a highly linear amplifier has no such directionality).
To clarify the above statements, Fig. 7 demonstrates how a
large CMOS latch responds to an imposed differential signal.
The curve trace shows a central differential-amplification region bounded by two absorptive ohmic regions (shaded) corre-

WOOD et al.: ROTARY TRAVELING-WAVE OSCILLATOR ARRAYS

1657

Fig. 7. DC transfer characteristic of two back-to-back inverters to an imposed


differential signal.
(a)

sponding to the two latched states. Except at the wavefront location where amplification takes place, the ring structures will
be terminated ohmically to the supplies.
The four-transistor full-bridge circuit minimizes supply
current ripple to the cross-conduction period.
G. Frequency and Impedance Relations
In simulation models (and indeed as fabricated), the RTWO
transmission line is built up from multiple RLC segments, and
therefore, these primary line constants must be identified.
Fig. 8(a) is the basic RF macromodel of a short length
(SegLen) of RTWO line with all significant RF components
and parasitics annotated (as per Fig. 5). Suffixes identify
per-unit-length perlen, lumped lump and total (or loop) values.
segments connected together, plus a crossover,
There are
to produce a closed ring of length RingLen.
Fig. 8(b) is a capacitive equivalent circuit for the transistor
and load capacitances. AC0 indicates an ac ground point (
and
).
of one such segThe differential lumped capacitance
ment is given approximately by

(b)
Fig. 8. Development of the rotary clock model. (a) Complete RF circuit.
(b) Capacitance circuit.

where
conductor separation;
conductor width;
conductor thickness.
The phase velocity is given by
where

(3)
For heavily loaded RTWO structures, can be as low as 0.03
m/s).
of (where is the free space velocity, i.e.,
The clock frequency is given approximately by

RingLen
(1)
where
interconnect capacitance for the line AB;
gate overlap and Miller-effect feedback capacitance;
total channel capacitance;
drain depletion capacitance to bulk (substrate);
load capacitance added to a line.
is used to convert the in-parallel to ground
(Note that the
values into in-series differential values of capacitance.)
is usually a small part of total capacitance and accurate formulas are available [12] if needed.
To calculate the per-unit-length differential inductance, i.e.,
accounting for mutual coupling, we use [13], expressed below.
(2)

SegLen

(4)

(The 2 factor arises from the pulse requiring two complete


laps for a single cycle.)
Differential characteristic impedance is given by
(5)
Transmission line characteristics dominate over RC characteristics when [14]
(6)
H. Bandwidth and Power Consumption
Seen from an RF perspective, Fig. 8(a) shows the RTWO to
be two pushpull distributed amplifiers folded on top of each
other. Distributed amplifiers exhibit very wide bandwidth because parasitic capacitances are neutralized by becoming part

1658

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001

TABLE I
CHANGES OF CHARACTERISTICS WITH

(a)

(b)

(c)

Fig. 9. A four-port junction of two RTWO rings carrying anticlockwise signals, with a noncoincident signal arrival time.

of the transmission-line impedance [15]. Performance is limited


by the carrier transit time of the MOSFETs [16], not by the tra, which is not apditional digital inverter propagation time
plicable where gates and drains are driven cooperatively by an
imposed low-impedance signal, and where the load capacitance
is hidden in the transmission line.
Operation of the RTWO is largely adiabatic when the voltage
drop required to charge the capacitances is developed mainly
across the inductance:

Most of the remaining losses in Table I are attributed to crossconduction and parasitic
losses.
is a real loss mechanism
for gigahertz signals, and RTWO rise/fall times can be doubled
improves
by this phenomenon. In newer CMOS processes,
with shorter channel length.

III. MORE DETAILED CONSIDERATIONS


A. Skew Control

(7)
and when the intrinsic gate resistance is low relative to the reactance of the gate capacitance.
(8)
RTWO rise and fall times are controllable by setting the cutoff
frequency of the transmission lines.
(9)
Edges become faster and cross-conduction losses are reduced
when the structure is more distributed.
, where
Table I lists characteristic changes with
with
, and
held
constant.
The most significant power loss mechanism for the RTWO is
power dissipated in the interconnect, given by
(10)

Interconnected RTWO loops offer the potential to control


skew in spite of relatively large open-loop time-of-flight
mismatches. Functionally, phase averaging occurs by pulse
combination at the junction of multiple transmission lines.
For a four-port junction, the normal operating mode will see
two pulses arriving at the junction simultaneously. These
two sources will feed two output ports and signal flow will
be unimpeded by reflections if impedance is matched. This
amounts to a situation similar to that described in [17], [18],
although for ROAs, the mechanism is LC transmission-line
energy combination, not ohmic combination of CMOS inverter
outputs.
Where there exists a time-of-flight mismatch, one pulse arrives at the junction before the other. Fig. 9(a) depicts the operation of a four-port junction between of two interwired but velocity-mismatched RTWO loops. Each of these rings has been
(each as Fig. 8). Four
divided into segments numbered
rings are wired together (similar to Fig. 16, shown later). Only
and
are considered here;
the junction of the rings
the latter having a higher open-loop operating frequency.

WOOD et al.: ROTARY TRAVELING-WAVE OSCILLATOR ARRAYS

1659

Fig. 11. Segment of chip layout showing 90 routing beneath clock lines and
a tap to clock (CLK: CLK) loads.

Fig. 10.

Waveforms corresponding with Fig. 9.

From simulation, two pulse-combination effects appear to be


present, the simplest of which is the impedance match effect
where the first signal to arrive at a junction must try to drive
three transmission lines. If all ports have equal impedance, the
junction can only reach a quarter of the full signal value and a
reflection occurs driving an inverted signal back down the incident port [Fig. 9(b)]. Initially, detrimental effects on signal fidelity arising from this reflection are overcome when the other
pulse arrives, whereupon the pulses combine and branch into
the output ports, as shown in Fig. 9(c).
The second pulse combination effect is believed to be due to
nonlinear MOSFET drain capacitance, which can modulate the
velocity of the line. Reflections can drive the MOSFETS from
the ohmic state into the low-capacitance pinchoff region, locally
increasing velocity.
Quantitative Results From Simulation: Fig. 10 presents the
results of a SPICE simulation of the above situation with an
extreme condition of velocity mismatch. A 50% variation of
oxide thickness is modeled across a small 2.4 2.4 mm chip
having four interconnected rings. Thick oxide (lower ) devices
are on the right side of the chip, giving a 22.5% phase velocity
increase relative to the left side.
Looking at these results with reference to Fig. 9 reveals
and passes point
that the first pulse arrives from ring
at time
ps and begins its rise time. Within
A
this rise time, the leading edge reaches the nearby junction,
where negative reflections bounce back to momentarily prevent
passing through the 1.5-V level.
A
The second pulse arrives from the slower left-hand ring
, reaching point B
at approximately
ps. It
then combines with the first pulse at the junction to branch into
the two output ports without further reflections.
ps, the signals have reached points A
and
By
and are essentially coincidentforward progress of the
B
and
are now synchronized.
waves in rings

The phase-locking phenomenon occurs at every junction of


the array (not just the junction considered here) and twice per
oscillation cycle which accounts for the smaller than expected
initial skew seen between the rings.
Simulations of typical arrays show that lockup is achieved
within a few nanoseconds from powerup after signals settle into
the lowest-energy state of coherent mesh.
B. Coupling Issues Related to Layout
The induced magnetic fields from the rotary clock structures
is relatively high (square
can be strong. This is because
waves). The magnetic coupling coefficient, however, depends
on the angle between source and victim and falls to zero when
the angle becomes 90 .
Fig. 11 illustrates a 90 layout technique to minimize inductive coupling problems. The top metal M5 (running left to right)
is used to create the differential RTWO, while orthogonal M4
is used as a routing resource for busses into and out of areas
bounded by the clock transmission line.
For capacitive coupling, fast rise and fall times imply high
displacement currents and a potentially aggressive noise source.
Differential transmission lines tend to mitigate such effects [19],
and in Fig. 11, the total capacitive coupling area between each of
the transmission-line conductors and any M4 conductor is balanced. If the clock source were ideally differential, no net charge
would be coupled to the M4 wires. For the RTWO, distributed
inverters force the waveforms to be substantially differential and
nonoverlapping, keeping glitches below the sensitivity of a typical gate.
For the five-metal test chip (Section V), a 45% utilization
of M4 was used for the 90 routing pattern immediately underneath the RTWO rings. This coverage allows the M4 to act
as both a routing resource and as an electrostatic shield similar
to [20], preventing electrostatic coupling to signal lines further
below. Magnetic fields are not attenuated much by this configuration, because the spaces between the thin perpendicular M4
lines break up the circulating currents which could repel a magnetic field. Substrate magnetic fields [21] are, therefore, to be
expected.

1660

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001

Coupling to co-parallel (0 ) victim conductors is potentially


much more problematic (discussed later in Section IV-C).
C. Tapoff Issues and Stub Loadings
It is possible to tap into the ROA structure (Fig. 11) anywhere along its length and extract a locally two-phase signal
with known phase relationship to the rest of the network. This
signal can then be routed via a fast differential transmission line
to other circuits and will generally represent a capacitive stub
on the RTWO ring.
For minimum signal distortion, the round-trip time-of-flight
(forward and backward along the stub) must be much less
than the rise time and fall time of the clock waveform:

Fig. 12.

Signal at either end of a 2-pF total tap loading line.

(11)
D. Frequency/Impedance Adjustment
When the above condition is met, the capacitance can be taken
as being effectively lumped on the main RTWO ring at the tap
point for the purposes of predicting oscillator frequency and ring
impedance.
Although not immediately apparent, this condition is achievable in practice due to three factors. The first factor is that the
tap line velocity is relatively fast for SiO dielectric. It is ap, while the main RTWO oscillator ring might
proximately
. The second factor is that the
be operating at perhaps
tap length only has to be long enough to reach within a single
RTWO ring. The third factor is that it requires two signal rotations on the RTWO to complete a clock cycle. These three factors work together to make the RTWO rings physically small
compared to the expected speed-of-light dimensions. The distances to be spanned by the fast tap wires are therefore short
enough that transmission-line effects on these lines are unimportantcertainly at the clock fundamental frequency and even
at higher harmonics.
This can be illustrated by reference to a specific 3.4-GHz
RTWO, 3200 m long with 20-ps rise/fall times. Within one of
these rise or fall periods, a stub transmission line with velocity
is able to communicate a signal over a distance of 3 mm.
For a stub length of 400 m (to reach the center of the ring), this
equates to 3.75 round-trip times along the stub.
Fig. 12 shows simulated waveforms with 2 pF of total
to-ground capacitance at the end of one such stub. Reflected
energy gives rise to the ringing which is evident with this level
of capacitance. The line resistance of the stubs must be low to
maintain reflective energy conservation.
The ratiometric factors outlined above between ring length,
frequency, rise/fall time, and stub lengths are expected to hold as
ROAs are scaled to higher frequencies and smaller ring lengths
without requiring special stub tuning measures.
Capacitive Loading Limits: Substantial total-chip capacitive
loading can be tolerated by the RTWO relative to conventionally
resonant systems [8], [22], [23]. However, the loading effects of
interconnect, active, and stub capacitances cannot be increased
without limit. The consequential lowering of line impedance inlosses become a concern.
creases circulating currents until
Eventually, the impedance becomes so low relative to the loop
resistance that the relation (6) cannot be maintained, whereupon
oscillation ceases altogether.

Rewriting (4) in the form below shows that frequency is set


only by the total inductance and capacitance of the RTWO loop.
(12)
is proportional to RingLen and
Total loop inductance
varies strongly as a function of the width and pitch of the top
metal differential conductors. This allows a coarse frequency
selection through the top-metal mask definition. Unit-to-unit inductance variation is expected to be small because of the good
lithographic reproduction of the relatively large clock conductors and the weak sensitivity of inductance to metal thickness
variations.
for the RTWO is the sum of all
Total capacitance
tends to
lumped capacitances connected to the loop (1).
from the drive
be dominated by gate-oxide capacitance
is inversely proportional
FETs and the clock load FETs.
, which on a modern CMOS SiO
to gate-oxide thickness
is controlled to approximately 5% variation over extended
wafer lots [24]. Drain depletion capacitances exist on bulk
CMOS where the active transistors connect to the ring.
During the VLSI layout phase, a CAD tool (expected release: Q1 2002) can target a fixed operating frequency. The
tool will be able to correct impedance discontinuities caused by
lumped load capacitance by the addition of dummy padding
capacitance elsewhere around the loop, and postcompensate an
overly capacitive-loaded clock network by reducing the differential inductances through pitch reductionhence restoring velocity and thus frequency. Alternatively, at the expense of using
more metallization, a new layout with more numerous, shorter
length rings could be used. The tool will need to simultaneously
solve impedance matching issues [refer to Section II-A, (5)]. By
manipulation of both and simultaneously, it is possible to
control and independently, as shown diagrammatically in
Fig. 13. For example, velocity can be reduced by increasing
by the same factor to cancel the effect on .
both and
These adjustments can support arbitrary branch-and-combine
networks (at least in theory).
Post fabrication, adding together the sources of variation and
and
, a 5% inigiven that frequency is related to
tial tolerance of operating frequency between parts is expected.

WOOD et al.: ROTARY TRAVELING-WAVE OSCILLATOR ARRAYS

1661

IV. SIMULATED PERFORMANCE


A. Approach

Fig. 13. Differential line with varing trace separations and capacitive inverter
loadings indicating the effects of altering several parameters.

Matching within a die should be better, but temperature gradients and transistor size variations as they affect capacitance will
lead to phase velocity changes requiring correction by the Skew
Control mechanism (described in Section III-A).
Temperature can alter frequency through variation of
and
. Inductance variation is assumed to be negligible
compared to capacitance variation and is not considered. Gate, but for
oxide thickness variation could potentially affect
SiO dielectric, with properties similar to quartz, this can be ignored. More significant are temperature variations of drain de.
pletion capacitance and of transistor
To tune an ROA clock to an exact reference frequency, allowing limited speed-binning and reduced internal phase mismatches, closed-loop control of distributed switched capacitors
[9] or varactors [25] is envisaged.
E. Active Compensation for Interconnect Losses
Resistive interconnect losses make it difficult to communicate high-frequency clock signals over a large chip without
waveshape distortion and attenuation, which impacts on the
practicality of reflective energy conservation schemes [6], [22],
[23]. The skin effect loss mechanism has been evident in clock
tree conductors for some time [26] and is frequency dependent.
High-speed H-trees tend to use hierarchical buffers within the
trees to maintain amplitude and edge rates.
Active compensation of VLSI differential transmission
lines to overcome clock attenuation was shown by Bumann
and Langmann [27] to be applicable to sine-wave signals.
Shunt-connected negative impedance convertors (NICs) were
used with linear compensation to prevent oscillations.
The distributed inverters used within RTWOs afford active
compensation for transmission-line losses, raising the apparent
of the resonant rings and helping to maintain a uniformly high
clock amplitude around the structure.
F. Logic Styles
Two-phase latched logic [28] is the style most compatible
with RTWO. It is highly skew tolerant and through dataflowaware placement [27] offers the potential to exploit the full 360
of clock phase to reduce clock-related surging [29], which in
future systems could exceed 500 A [30]. Conventional singlephase D-latch designs can be driven where timing improvements through skew scheduling [31] might be possible. A locally four-phase system to support domino logic [2] could be
implemented by wrapping two loops of RTWO line around the
region to clock. Unfortunately, all of these techniques are beyond the capability of current logic synthesis tools.

To enable rapid what-if evaluation of potential RTWO


structures, a simulation/visualization program known as Rotary
Explorer [32] has been developed. Rotary Explorer is GUI
driven and parametrically creates a SPICE deck of macromodels linking to FASTHENRY subcircuits [33] for multipole
magnetic analysis of skin, proximity, and LR coupling effects
in the time domain. MOSFETs are modeled using BSIM3v3
nonquasi-static model with an external resistor added to model
(Fig. 8). The BSIM4 model [34], which properly accounts
as a D-S channel component, was not available.
for
With the Rotary Explorer program, it is possible to simulate
arrays. The
RTWO rings independently or as interlocked
effects of tap loads, oxide thickness variations, and magnetically
induced victim noise can be evaluated.
As a visualization aid, Rotary Explorer gives a live display
of color-coded SPICE voltages projected onto a scaled image
of the ROA structure being simulated. This aids in the intuitive
understanding of reflections and how the structure achieves a
steady-state phase-locked operation.
B. Results
Two very important performance metrics for any oscillator
are its sensitivity to changes in temperature and supply voltage.
Simulations of these effects on a nominally 3.34-GHz rotary
clock resulted in the data given in Tables II and III.
Supply Induced Jitter: Following on from the above and in
light of the RTWOs time-of-flight oscillation mechanism, it is
inferred that such voltage sensitivity will also apply to phase
modulation versus voltage, i.e., jitterat least at low supplynoise frequencies. For a single RTWO ring, the power-supply
and the power-supply
induced jitter will be related to
rejection ratio (PSRR) by
(13)
, because of the distributed nature of the oscillator,
where
is the mean supply voltage deviation as experienced along the
path of an edge as it travels two complete rotations. To improve
PSRR, plans are in place to add voltage-dependent capacitance
to the structure to give first-order compensation.
From simulations, we see that jitter reduces for multiple ring
structures due to averaging effects.
C. Coupling IISimulated Coupling
The Rotary Explorer program makes it easy to simulate coupled noise between an RTWO ring and user defined victim trace
(drawn with the aid of a mouse). Simulated results are shown in
Table IV for a 3.4-GHz RTWO configured to have 20 ps rise
and fall times, and with geometry as shown in Fig. 14.
Peak coupling magnitude occurs at 60- m victim length. A
trace longer than this will see a coupling cancellation effect that
approaches zero for each pitch of the braiding it traverses.
Fig. 15 illustrates a notably strong coupled signal waveform
m, with no loading on the victim
at victim distance

1662

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001

TABLE II
VARIATIONS WITH TEMPERATURE

TABLE III
VARIATIONS WITH DC SUPPLY VOLTAGE V

TABLE IV
INDUCED NOISE AS A FUNCTION OF VICTIM DISTANCE AND LENGTH

Fig. 14.

Crossover traces, a visualization output from the Rotary Explorer tool.

trace and one end connected to ground. Note the more sensitive
noise scale.
The absolute maximum coupling occurs if victim distance is
allowed to go to zero. In this case, mutual coupling between aggressor and victim is 100% with no cancellation effects from the
other differential trace. As a numerical example, it follows that
a 2.5-V signal with a rise time of 20 ps on a transmission line
has the 2.5-V gradient over 430 m of
with a velocity of
length (Fig. 4 illustrates the concept). Over the 60- m length
discussed above, this equates to 348 mV. Slower edge rates,
faster transmission lines, and lower supply voltages reduce this
figure proportionally.
Long-range inductive noise coupling from the differential
transmission line is expected to be small, since (from a distance)
the go and return currents are equal and opposite.
Potential problems exist in short-range magnetic coupling to
wiring in the vicinity of the clock lines. Inductance is lowered

Fig. 15.

Example of notably strong coupled signal waveform.

by coupling to any highly conductive structure in which eddy


currents can flow to decrease and distort the inducing field. Couplings to less conductive circuits such as the substrate give a loss
mechanism which can be modeled as a shunt term in the transmission-line equations. LC resonance in the small-scale coupled
structures is unlikely because of the high resonant frequencies.
All of the coupling mechanisms mentioned are edge-rate dependent, and this can limit the achievable rise and fall times of the
RTWO by attenuating the high-frequency signal components.
Full RLC layout extraction is essential in the neighborhood of
the clock lines if routing is allowed in these areas. An alternative
proposal under investigation is to predefine a VLSI structure
combining clock and power distribution into the same grid to
give consistent characteristics and shielding.

WOOD et al.: ROTARY TRAVELING-WAVE OSCILLATOR ARRAYS

1663

Fig. 18. Clock frequency versus V


for the entire chip with all five rings.

Fig. 16.

versus V

Die photograph of a prototype chip.

Fig. 19.

Fig. 17.
ring.

for the large ring and I

Measurement versus simulation waveforms for the large 965-MHz

V. SOME EXPERIMENTAL RESULTS


Fig. 16 shows a die photograph of a prototype built using a
0.25- m 2.5-V CMOS process with 1- m Al/Cu top metal M5.
The conductors are relatively wide in order to minimize resistive
losses of the rather thin M5. The available top-metal area consumed by the transmission lines was 15%. A general feature of
the RTWO and ROA is that power can be reduced by increasing
the metal area devoted to clock generation. The simple substitution of copper metallization could halve the width of the lines
for the same power consumption.
The prototype features a large ring independent of four interconnected smaller rings. The 12 000- m outer ring uses 60- m
conductors on a 120- m pitch, with 128 62.5- m/25- m inverter pairs distributed along its length.
For the large ring, simulations predicted a clock frequency
of approximately 925 MHz. Measurements of the actual perforV are shown in Fig. 17.
mance versus simulated with
The oscillation frequency was 965 MHz. Jitter was measured
at 5.5 ps rms using a Tektronix 11 801A oscilloscope with an
SD-26 sampling head.
The slower than simulated rise-time discrepancy is believed
to be due to the large extrinsic gate electrode resistance on the
Pch FETs. At design time, the importance of this parameter was

Measured output on one of the 3.42-GHz rings.

overlooked. Transistors are now laid out according to RF design


rules with the gate driven from both sides of the device.
versus
Fig. 18 shows that the oscillation frequency
is quite flat over a large
. We calculate from the measured
slope that PSRR is approximately 34 dB for oscillators fabricated on this process. The oscillator was seen to be functional
down to 0.8-V supply voltage, although 1.1 V was required to
initiate startup.
The test chip incorporates 15 pF of on-chip decoupling
capacitance per ring. No off-chip decoupling was required.
Effectively, the equivalent of ten single-ended lines each having
10
impedance were active, but simultaneous switching
surges are low because of the distributed switching times of the
inverters.
The quad of inner rings each have the following characteristics:
m
Conductors: Width
m
Pitch
m.
Ring Length
Total channel widths are 2000 m for the Nch FET and 5000 m
for the Pch FET spread over 40 pairs of inverters.
Fig. 19 shows the measured waveform from one of the
3.4-GHz rings. The oscillation frequency is 3.38 GHz versus a
simulated frequency of 3.42 GHz. However, the waveshape is
disappointingly distorted, the amplitude is low, and even-mode
artifacts are visible.
Investigation of the fault identified a co-parallel (0 )
inductive coupling problem between the clock signal lines and
and
supply traces running directly beneath on M3 for
the complete loop length. Only when a complete FASTHENRY

1664

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001

analysis was performed including these power traces was it


apparent that induced current loops (circulating through the
decoupling capacitors) were strongly attenuating the rotary
signal. In this condition, the latching action (Fig. 7) does not
fully develop and the rings support linear amplification of noise
signalshence the problematic multimode action. (This effect
was much less severe on the large 965-MHz ring because the
lines were much closer to the magnetically neutral
center line of the transmission line). The problem can be
mitigated by use of braided transmission lines. (as detailed in
Section IV-C).
Analysis of the test chip showed that 90 coupling between
M5 and the orthogonal thin M4 lines is not a significant
problem, making it possible to route power and signals between
regions bounded by the rotary clock structures.
VI. CONCLUSION AND FURTHER WORK PLANNED
This paper has described the rotary traveling-wave oscillator
(RTWO) and its potential application to gigahertz-rate VLSI
clocking. The oscillator is unique for a resonant-style LC-based
oscillator in that it produces square waves directly and can
be hardwired to form rotary oscillator arrays (ROAs). Being
LC-based, the oscillator is stable and jitter is low.
The formulas presented here give practical adiabatic oscillator designs suitable for VLSI fabrication. The structure and
operation of the RTWO is fundamentally simple and amenable
to analysis. We find that agreement between simulation and
measurement is good.
We need to demonstrate skew control (believed to be inherent)
to fully establish that the simulated performance of multiring
ROAs is realizable, and to measure susceptibility to induced
high-frequency noise. Further work is planned to establish firm
mathematical/analytical foundations for the prediction of both
jitter and skew and to determine exact stability criteria for arrayed oscillators. Currently, a test chip using braided transmission line design to minimize coupling and incorporating varactors to control frequency is awaiting packaging and test.
Looking to the future, our simulations predict that the oscillator scales well. On a more modern 0.18- m copper process,
10.5-GHz square-wave oscillator/distributors should be realizable consuming less than 32 mA per ring using slimmer 10- m
conductors. From simulation, the RTWO also appears to be viable on SOI processes.
ACKNOWLEDGMENT
The authors would like to thank P. Franzon and M. Steer, both
of North Carolina State University, for their assistance, and the
Raunds and British public library service.
REFERENCES
[1] E. G. Friedman, High Performance Clock Distribution Networks. Boston, MA: Kluwer, 1997.
[2] D. Harris, Skew Tolerant Circuit Design. San Mateo, CA: Morgan
Kaufmann, 2000.
[3] G. A. Pratt and J. Nguyen, Distributed synchronous clocking, IEEE
Trans. Parallel Distributed Syst., vol. 6, pp. 314328, Mar. 1995.

[4] S. Tam, S. Rusu, U. N. Desai, R. Kim, J. Zhang, and I. Young, Clock


generation and distribution for the first IA-64 microprocessor, IEEE J.
Solid-State Circuits, vol. 35, pp. 15451552, Nov. 2000.
[5] C. J. Anderson et al., Physical design of a forth-generation power
GHz microprocessor, in ISSCC 2001 Dig. Tech. Papers, Feb. 2001,
pp. 232233.
[6] V. L. Chi, Salphasic distribution of clock signals for synchronous systems, IEEE Trans. Comput., vol. 43, pp. 597602, May 1994.
[7] B. Kleveland et al., Monolithic CMOS distributed amplifier and oscillator, in ISSCC Dig. Tech. Papers, Feb. 1999, pp. 7071.
[8] W. Athas, N. Tzartzanis, L. J. Svensson, L. Peterson, H. Li, X. Jiang,
P. Wang, and W.-C. Liu, AC-1: A clock-powered microprocessor,
in Proc. Int. Symp. Low-Power Electronics and Design, Aug. 1997,
[Online] Available: http://www.isi.edu/acmos/people/nestoras/papers/97-08.MontereyAC1.ps.
[9] J. Wood. PCT/GB00/00175. MultiGig Ltd.. [Online]. Available:
http://www.delphion.com/cgi-bin/viewpat.cmd/WO00044093A1
[10] B. Kleveland, T. H. Lee, and S. S. Wong, 50-GHz interconnect design
in standard silicon technology, presented at the IEEE MTT-S Int.
Microwave Symp., Baltimore, MD, June 1998, [Online] Available:
http://smirc.stanford.edu/papers/mtts98p-bendik.pdf.
[11] B. Kleveland, X. Qi, L. Madden 1, R. W. Dutton, and S. S. Wong, Line
inductance extraction and modeling in a real chip with power grid, presented at the IEEE IEDM Conf., Washington, D. C., Dec. 1999, [Online]
Available: http://gloworm.stanford.edu/tcad/pubs/device/iedm.pdf.
[12] N. Delorme et al., Inductance and capacitance analytic formulas for
VLSI interconnect, Electron. Lett., vol. 32, no. 11, May 23, 1996.
[13] C. S. Walker, Capacitance, Inductance and Crosstalk Analysis. Norwood, MA: Artech, 1990, p. 95.
[14] A. Deutsch et al., Modeling and characterization of long on-chip interconnections for high-performance microprocessors, IBM J. Res. Develop., vol. 39, no. 5, pp. 547567, Sept. 1995. p. 549.
[15] J. B. Beyer et al., MESFET distributed amplifier design guidelines,
IEEE Trans. Microwave Theory Tech., vol. MTT-32, pp. 268275, Mar.
1984.
[16] Y. Tsividis, Operation and Modeling of the MOS Transistor, 2nd
ed. New York: McGraw-Hill, 1999, pp. 339340.
[17] H. Larsson, Distributed synchronous clocking using connected ring
oscillators, Masters thesis, Computer Systems Engineering Centre for
Computer System Architecture, Halmstad Univ., Halmstad, Sweden,
Jan. 1997. [Online] Available: http://www.hh.se/ide/ccaweb/publications/97/distclock/9705.ps.
[18] L. Hall, M. Clements, W. Liu, and G. Bilbro, Clock distribution
using cooperative ring oscillators, in Proc. IEEE 17th Conf. Advanced Research in VLSI (ARVLSI97), 1997, [Online] Available:
http://www.computer.org/proceedings/arvlsi/7913/79130062abs.htm.
[19] T. C. Edwards and M. B. Steer, Foundations of Interconnect and Microstrip Design, Chichester, U.K.: Wiley, 2000, ch. 6. sec. 6.11.
[20] C. P. Yue and S. S. Wong, On-chip spiral inductors with patterned
ground shields for Si-based RF ICs, IEEE J. Solid-State Circuits, vol.
33, pp. 743752, May 1998.
[21] C. P. Yue and S. S. Wong, A study on substrate effects of silicon-based
RF passive components, in MTT-S Int. Microwave Symp. Dig., June
1999, pp. 16251628.
[22] M. E. Becker and T. F. Knight Jr. Transmission line clock driver. presented at 1999 IEEE Int. Conf. Computer Design. [Online]. Available:
http://www.computer.org/proceedings/iccd/0406/04060489abs.htm
[23] P. Zarkesh-Ha and J. D. Meindl, Asymptotically zero power dissipation
Gigahertz clock distribution networks, IEEE Electrical Performance
and Electronic Packaging, pp. 5760, Oct. 1999.
[24] K. Bernstein, K. Carrig, C. M. Durham, and P. A. Hansen, High Speed
CMOS Design Styles. Norwood, MA: Kluwer, 1998, p. 22.
[25] T. Soorapanth, C. P. Yue, D. Shaeffer, T. H. Lee, and S. S. Wong, Analysis and optimization of accumulation-mode varactor for RF ICs, presented at the Symp. VLSI Circuits, Honolulu, HI, June 1113, 1998,
[Online] Available: http://smirc.stanford.edu/papers/VLSI98p-chet.pdf.
[26] H. B. Bakoglu, J. T. Walker, and J. D. Meindl, A symmetric clockdistribution tree and optimized high speed interconnections for reduced
clock skew in ULSI and WSI circuits, in IEEE Int. Conf. Computer
Design, Oct. 1986, pp. 118122.
[27] M. Bumann and U. Langmann, Active compensation of interconnect
losses for multi-GHz clock distribution networks, IEEE Trans. Circuits
and Syst. II, vol. 39, pp. 790798, Nov. 1992.
[28] M. C. Papaefthymiou and K. H. Randall, Edge-triggering vs.
two-phase level-clocking, presented at the 1993 Symp. Research on Integrated Systems, Mar. 1993, [Online] Available:
http://www.eecs.umich.edu/~marios/papers/sis93.ps.

WOOD et al.: ROTARY TRAVELING-WAVE OSCILLATOR ARRAYS

[29] L. Benni et al., Clock skew optimization for peak current reduction,
J. VLSI Signal Processing, vol. 16, pp. 117130, 1997.
[30] International Semiconductor Roadmap for Semiconductors (1999).
[Online]. Available: http://public.itrs.net/files/1999_SIA_Roadmap/Design.pdf
[31] I. S. Kourtev and E. G. Friedman, Timing Optimization Through Clock
Skew Scheduling. Boston, MA: Kluwer, 2000.
[32] MultiGig, Ltd. Rotary Explorer. [Online]. Available: http://www.
multigig.com/software.htm
[33] M. Kamon, M. J. Tsuk, and J. K. White, FASTHENRY: A multipole-accelerated 3-D inductance extraction program, IEEE Trans. Microwave
Theory Tech., vol. 429, pp. 17501758, Sept. 1994.
[34] BSIM Research Group. (20002001) The BSIM4 Short-Channel
Transistor Model. Univ. of California at Berkeley. [Online]. Available:
http://www-device.eecs.berkeley.edu/~bsim3/bsim4.html

John Wood is the Engineering Director of MultiGig,


Ltd., a U.K. technology startup specializing in multigigahertz circuit design I.P.
Previously, he has worked as a consultant design
engineer on multidomain design projects in mechanical, power electronics, infrared optics, and software
development roles. He holds a number of patents
which have been licensed for manufacture in the
fields of infrared plastic welding and high-speed
digital signaling. His technical interests include
all areas of engineering design, but particularly
electromagnetics, VLSI circuit design, and high-speed analog techniques.

1665

Terence C. Edwards (M89) received the M.Phil.


degree in microwaves.
He is the Executive Director of Engalco, a consultancy firm based in the U.K., mainly specializing
in signal transmission technologies and the global
RF and microwave industry. He researches and takes
responsibility for regular releases of Microwaves
North America, published 1995, 1998, and 2001.
He has authored several publications (including
papers published in the IEEE TRANSACTIONS ON
MICROWAVE THEORY AND TECHNIQUES), has led
management seminars on fiber optics, presented a paper on mobile technologies
at the IMAPS Microelectronics Symposium, Philadelphia, PA, October 1997,
and has written several articles and books. These include (jointly with Prof.
Michael Steer) one recently on MICs (New York: Wiley) and on gigahertz and
terahertz technologies (Norwood, MA: Artech, 2000). He is on the editorial
advisory board for the International Journal of Communication Systems. He
regularly consults for both national and overseas companies and is on the
prestigious IEE (London) Presidents List of Consultants.
Mr. Edwards is a Fellow of the Institution of Electrical Engineers (IEE), U.K.

Steve Lipa (S00) received the B.S. degree in electrical engineering from the University of Virginia,
Charlottesville, in 1980, and the M.S. degree in
electrical engineering from North Carolina State
University, Raleigh, in 1993. He is currently working
toward the Ph.D. degree in electrical engineering at
North Carolina State University.
He is currently a Research Assistant and Laboratory Manager with the Microelectronics Systems
Laboratory at North Carolina State University. He
has ten years of experience as an Integrated Circuit
Design Engineer, primarily in the design of high-speed digital logic circuits.
His current research is in the area of high-speed clock distribution.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 1, JANUARY 1999

97

A 1.6-GHz Dual Modulus Prescaler Using the Extended


True-Single-Phase-Clock CMOS Circuit Technique (E-TSPC)
J. Navarro Soares, Jr., and W. A. M. Van Noije

AbstractThe implementation of a dual-modulus prescaler (divide by 128/129) using an extension of the true-single-phase-clock
(TSPC) technique, the extended TSPC (E-TSPC), is presented.
The E-TSPC [1], [2] consists of a set of composition rules for
single-phase-clock circuits employing static, dynamic, latch, dataprecharged, and NMOS-like CMOS blocks. The composition
rules, as well as the CMOS blocks, are described and discussed.
The experimental results of the complete dual-modulus prescaler,
implemented in a 0.8 m CMOS process, show a maximum 1.59
GHz operation rate at 5 V with 12.8 mW power consumption.
They are compared with the results from other recent implementations showing that the proposed E-TSPC circuit can reach high
speed with both smaller area and lower power consumption.
Index Terms CMOS digital, high-speed circuits, prescalers,
single-phase-clock design.

I. INTRODUCTION

OR MORE than 15 years, CMOS has been the main


technology for very-large-scale integration (VLSI) system
design. From the beginning to nowadays, several CMOS clock
policies have been proposed. The pseudotwo-phase logic was
one of the earliest techniques [3]. Later on, two-phase logic
structures were proposed. The domino technique [4] associated
successfully both two-phase and dynamic CMOS circuits.
With the NORA technique [5], [6], an extensive no-race
approach for two-phase and dynamic circuits was developed.
A single-phase-clock policy was introduced in [7] [the true
single-phase-clock (TSPC)]. This technique was subsequently
advanced by [8][10].
Single-phase-clock policies are superior to the others due
to the simplification of the clock distribution. They reduce the
wiring costs and the number of clock-signal requirements (no
problems with phase overlapping, for instance). Consequently,
higher frequencies and simpler designs can be achieved.
Introduced by [1] and [2], the extended true-single-phaseclock CMOS circuit technique (E-TSPC), an extension of the
TSPC, consists of composition rules for single-phase circuits
using static, dynamic, latch, data-precharged, and NMOS-like
blocks. The composition rules enlarge the block-connection
possibilities and avoid races; additionally, NMOS-like blocks
enhance the technique for high-speed operations.
The design of a dual-modulus prescaler (divide by 128/129)
with the E-TSPC in a standard 0.8 m CMOS process (0.7 m

Manuscript received February 16, 1998; revised May 25, 1998. This work
was supported in part by FAPESP and CNPq, Brazil.
The authors are with the LSI/PEE, Escola Politecnica, University of
Sao Paulo, Sao Paulo, S.P. 05508-900 Brazil (e-mail: navarro@lsi.usp.br;
noije@lsi.usp.br).
Publisher Item Identifier S 0018-9200(99)00410-2.

effective channel length) is presented. The prescaler implementation purpose is the evaluation of the E-TSPC technique
potentialities.
This paper is organized as follow. In Section II, the principal
features of the E-TSPC technique, blocks and design rules,
are presented. In Section III, some different dual-modulus implementations are analyzed. Experimental results and comparisons are reported in Section IV, and the principal conclusions
are drawn in Section V.
II. E-TSPC CIRCUIT BLOCKS AND COMPOSITION RULES
A. Basic CMOS Blocks
An E-TSPC circuit should use any of the blocks: CMOS
static block, n-dynamic block [Fig. 1(a)], p-dynamic block
[Fig. 1(c)], n-latch block [Fig. 1(e)], p-latch block [Fig. 1(g)],
and high (PH) and low (PL) data-precharged blocks (Fig. 2).
In Fig. 1, the clocked transistors of the n- and p-latches are
placed close to the power rail, following the suggestion of [11].
This configuration can attain a higher speed but suffers chargesharing problems. Clocked transistors close to either the power
rail or the block output are admissible latch configurations.
In data-precharged blocks [10], some input signals, called
precharging inputs or pc-inputs, control the output precharge
(see Fig. 2). If all PH block pc-inputs are high, or if all PL
block pc-inputs are low, then the PH or PL block is precharged.
In this case, the PH block output goes to low, and the PL
block output to high. In Fig. 2, the CMOS static block that
is drawn, along
executes the logic function
with all equivalent PH and PL blocks [Fig. 2(b) and (c)]. The
pc-inputs of each block are also indicated. The PH and PL
blocks that have the output precharged when the clock is low
will be called n-Dp blocks; similarly, the PH and PL blocks
that have the output precharged when the clock is high will
be called p-Dp blocks.
B. Composition Rules
First, the definition of data chains, fundamental to the design
rules, is given.
Definition: An n-data chain is any noncyclic signal propagation path:
1) containing at least one n-latch, one n-dynamic, or one
n-Dp block;
2) starting in a circuit external input, or in the output of a
p-latch, p-dynamic, or p-Dp block; when this output is
followed by static blocks in the normal data flow, the
data chain starts in the output of the last static block;

00189200/99$10.00 1999 IEEE

98

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 1, JANUARY 1999

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Fig. 1. Construction blocks of the E-TSPC circuit technique: (a) n-dynamic and (b) NMOS-like n-dynamic blocks; (c) p-dynamic and (d) NMOS p-dynamic
blocks; (e) n-latch and (f) NMOS-like n-latch blocks; and (g) p-latch and (h) NMOS-like p-latch blocks.

(b)

(a)

(c)
Fig. 2. Transformation from (a) a static block into data-precharged blocks: (b) PH blocks and (c) PL blocks.

3) going through static, n-dynamic, n-Dp, or n-latch blocks;


4) regardless of the number and ordering of the blocks
defined above;
5) finishing in a circuit external output, or in the input of
the first p-latch, p-dynamic, or p-Dp block.

For the p-data chains, an equivalent definition applies,


replacing n with p and vice versa.
When clock is high, n-data chains are in evaluation phase;
otherwise, they are in holding phase. P-data chains evaluate
when clock is low.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 1, JANUARY 1999

99

Fig. 3. Example of n-data chains. The blocks mentioned in the text are named and hatched in the figure.

In Fig. 3, part of a circuit schematic is depicted with seven


complete n-data chains. Some examples are the data chain
and going through blocks
,
,
,
starting at input
; the data chain starting at
and going through
,
and
,
,
, and
; and the data chain starting at
and
,
,
, and
.
going through
Five of the six E-TSPC composition rules are now listed.
Their purpose is to ensure the observance of some constraints
during the evaluation and holding phases. To simplify the rule
will be used to denote n or p in
statements, the symbol
nouns like -data chain, -dynamic block, etc.
Composition Rule ( ): The -data chain input should be
an input of a dynamic block, an input of a latch, or a nonpcinput of a Dp block.
Composition Rule ( ): A -latch must not drive, directly
or through static blocks, a -dynamic or a -Dp block.
Composition Rule ( ): The number of inversions between:
) any two adjacent dynamic blocks must be odd1;
) any two adjacent Dp-blocks of the same type (PH and
PH or PL and PL) must be odd;
) any two adjacent Dp-blocks of complementary types
must be even;
) a PH (PL) block and an adjacent n- (p)-dynamic (or
vice versa) in an n- (p)-data chain must be even;
) a PL (PH) block and an adjacent n- (p)-dynamic (or
vice versa) in an n- (p)-data chain must be odd.
(Two blocks are called adjacent if there are only static blocks
between them.)
Composition Rule ( ): Consider the last dynamic block in
the -data chain (when it exists). The number of inversions
(due to any block) from this dynamic block up to at least one
-latch must be even.
1 Through

all the rules, zero inversion will be considered even.

Fig. 4. Two TSPC D-flip-flops connected in series.

Composition Rule ( ):
The -data chain must have one of the following two
configurations:
) at least one dynamic block and one latch;
) at least two latches and an even number of inversions
(latches or static blocks) between them.
It is worth noting that these five composition rules are very
similar to the five rules proposed in the NORA technique [6].
In a circuit where all data chains obey the five rules, it can be
proved that (six theorems presented in [1] and [2]):
a) all data-precharged blocks are precharged during the
holding phase of the data chains to which they belong;
b) the dynamic and the data-precharged blocks are not
incorrectly discharged during the evaluation phase;
c) the output of the data-chain last latch is steady during
the holding phase of the data chain.
C. Exception Rule
Although the above-described rules are necessary to avoid
race problems, typical TSPC systems do not follow some of
them. The most common exception is found in connecting
two D-flip-flops (D-FFs), as shown in Fig. 4. In such a

100

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 1, JANUARY 1999

CONDITIONS

FOR

TABLE I
CORRECT OPERATION OF

configuration, the p-data chains are constituted of only one


or
( violation). In conp-latch block, namely,
sequence, the p-latch output may change during its holding
time. A faulty sequence example is depicted below: consider
an initial state on which the signals clock, input, and output a
and
are evaluating. At
are low, and both blocks
and
are
the end of the evaluation period, the outputs
high. Subsequently, when the clock goes to high, the other
works properly, holding
blocks will evaluate. Suppose that
goes to low,
its former value (high). In this case, the node
goes to low. As a result, the
output a goes to high, and
is cut, and the final value of node will depend
transistor
on the circuit delays.
and
is long
Commonly, the delay between nodes
enough to ensure that is fully discharged through transistors
and
; in this case, the second D-FF works properly.
A simple exception rule is added to cover the utilization of
the well-established TSPC D-FFs (Fig. 4).
Exception Rule ( ): Configurations similar to that of
and
are not obeyed, are accepted if
Fig. 4, where rules
enough delay exists.
is applied, to the detriment of
The data chains where
and , do not have a latch with steady output at the holding
phase. Since the correct operation of the circuit will depend
on the block delays, the exception rule should be used with
caution.
Considering the connection rules presented in former works
[7][10], our six proposed rules differ in the following aspects.
a) The nonlatched domino logic, a timing strategy considered in [10], is not accepted in our proposal.
b) The proposed rules permit a more flexible usage of both
data-precharge blocks, due to the distinction between pc
and nonpc-inputs, and static logic blocks (static logic is
allowed between dynamic and latch blocks). In Fig. 2,
where no rule violations occur, several connections not
allowed by former work rules are provided, for instance,
and
, between
the connection between blocks
and
, between
and
, etc.
D. NMOS-Like Logic Extension
When high speed is also a requirement, restrictions on the
use of p-dynamic and p-latch blocks should be imposed. These
blocks have at least two p-transistors in series, which may

THE

NMOS-LIKE BLOCKS

Fig. 5. Schematic of the dual-modulus prescaler (divide by 128/129).

reduce considerably the maximum speed. In such applications,


the p-data chains are limited to one block, and most logic
operations are handled with n-data chains with limited logic
dept. Thus, deep pipelines will be necessary to implement
complex and fast logic designs.
NMOS-like dynamic and latch blocks can be used to minimize this difficulty and also to increase the n-data chain speed.
They are ratioed logic blocks, where the n-transistor section
and the p-transistor section may conduct simultaneously. A
similar technique was used in [12], but restricted to D-FFs.
In Fig. 1, the NMOS-like versions of the dynamic and latch
blocks are drawn. To assure a correct operation, these blocks
should satisfy the constraints summarized in Table I. The
transistor section that must impose the output value, when both
sections are conducting, is drawn with bold lines in Fig. 1.
The NMOS-like blocks are faster due to the reduced number
of transistors in series, but, unfortunately, they consume more
power. In consequence, they should be used only in critical
data chains, where the desirable speed has not been reached.
Since the connection characteristics do not depend on whether
it is a conventional or an NMOS-like block, the composition
rules ( and ) are valid and necessary for both; as a
result, NMOS-like blocks and conventional blocks can replace
one another, and the judicious selection of NMOS-like blocks
is made easy.
Summarizing, the static blocks, the n/p-dynamic, the n/platch, the PH/PL data-precharged, the NMOS-like blocks, and
compose the E-TSPC
the composition rules and
technique.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 1, JANUARY 1999

101

Fig. 6. Transistor schematic of the divide-by-4/5 counter DG4 . The transistor width or, when the length is different from 0.8 m, the transistor width/length,
in m, is also indicated in the figure.

TABLE II
MAXIMUM SPEED AND POWER-CONSUMPTION RESULTS FOR
THE FOUR DESIGNED DIVIDE-BY-4/5 COUNTERS (SPICE
SIMULATIONS, SLOW PARAMETERS, AND VDD = 5 V)
Design

DG1
DG2
DG3
DG4

Speed
(GHz)

Power
(w/MHz)

0.98

3.27

1.28

4.45

1.39

4.85

1.67

5.62

III. DUAL-MODULUS DESIGN


Dual-modulus prescalers, a circuit with applications in frequency synthesis systems, have been frequently used to compare different high-speed implementations [12] and [13], our
current goal. A high-speed dual-modulus prescaler (divide
by 128/129) was designed using a standard 0.8 m CMOS
bulk process.
The schematic of the dual-modulus prescaler is depicted
in Fig. 5. The circuit inside the cross-hatched box, composed
of three D-FFs and two logic gates, forms a divide-by-4/5
counter. The div32 signal selects if it counts up to four (div32
high) or up to five (div32
low). The five D-FFs at
the bottom of the figure form a divide-by-32 counter. The
fractional division ratio of the prescaler, 128 or 129, is selected
signal.
according to the
Four different approaches were applied to draw a layout of
the divide-by-4/5 counter, which is the critical high-speed part
of the prescaler. The approaches are:
) design with conventional rise edge-triggered TSPC
D-FF (Fig. 4);
) design with rise edge-triggered D-FF, and further
optimization applying the E-TSPC technique;
) design with a modified fall edge-triggered D-FF [12];
) design with fall edge-triggered D-FF, and further
optimization applying the E-TSPC technique.
approach, with
In Fig. 6, the transistor schematic of the
transistor dimensions, is depicted. The three cross-hatched
boxes mark the D-FFs; the first D-FF (left) has a buffered
output.

Fig. 7. Photograph of the prescaler test chip.

The maximum speed and the power consumption for each


design are shown in Table II. These results were obtained with
SPICE simulations from the extracted netlists of the layouts
for slow parameters, room temperature, and power supply at
5 V. The comparison of the results exhibits some advantages
to
approach,
of the E-TSPC technique. From the
the speed improvement is higher than 70%, and from
to
is 20%. On the other hand, the power consumption
to
. As
uses only NMOSincreases 72% from
like blocks, the latter result is not surprising, and confirms that
these blocks should be restricted to critical circuit parts. Since
the composition rules favor the replacement of conventional
blocks with NMOS-like ones and vice versa, E-TSPC circuits
can reach high speed and keep the power consumption low.
To better evaluate the above results, the following notes
should be taken into account:
all approaches use small transistor sizes, usually minimum
sizes (as indicated in Fig. 6);
the Fig. 5 divide-by-4/5 counter schema was slightly
) to conform
modified for each design (
with its structure characteristics;
the NOR configuration of Fig. 6 is similar to an NMOS
logic, but the load is now a PMOS transistor. It is faster
,
,
than the CMOS static NOR and is used in the
approaches;
and

102

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 1, JANUARY 1999

TABLE III
AREA, SPEED, AND POWER-CONSUMPTION
RESULTS FOR FOUR DIFFERENT PRESCALERS

Fig. 8. Measured results for the prescaler maximum frequency (fmax ), left
axis (3 ), and current consumption at fmax , right axis (o), as a function of
the power supply.

and
blocks, Fig. 6, drive the clock signal to
the divide-by-32 counter. All four designs have similar
configuration.
IV. EXPERIMENTAL RESULTS

The full prescaler circuit, occupying a 0.0126 mm area,


. The D-FFs of the 32
was formed with the counter
asynchronous counter were built with conventional rise edgetriggered TSPC D-FF (Fig. 4). The clock signal from the
divide-by-4/5 counter, Fig. 6, is inverted before being sent to
the 32 counter. This expedient allows a longer time interval
for preparation of the signal div32.
The prescaler test chip, whose photograph is shown in
Fig. 7, was mounted on an alumina substrate with the chip-onboard technique. A coplanar radio-frequency probe was used
to feed the unique prescaler high-speed signal, the clock input.
In Fig. 8, the measured maximum frequency and current
consumption as a function of the power supply are shown.
Since the used pulse generator has a maximum excursion of
3 V, the circuit real maximum frequencies are expected to
be slightly higher than the measured results for power supply
above 3 V.
Performance results of this work, of two recently published
prescalers using TSPC D-FFs, and of a new prescaler architecture are summarized in Table III. In [13], the prescaler
is implemented with rise edge-triggered TSPC D-FFs, which
were size optimized to reach maximum speed; in consequence,
not only the circuit speed but also the area and power consumption are high. Fall edge-triggered TSPC D-FFs with
small-sized transistors and with some NMOS-like blocks are
used in [12]. The resulting circuit has a small area and a low
power consumption but a reduced maximum operation rate.
Our implementation, with the E-TSPC technique and smallsized transistors, provides the smallest area and the lowest
power consumption; the speed, in addition, is comparable to
[13] and [14].
V. CONCLUSIONS
A complete high-speed dual-modulus prescaler (divide by
128/129) was developed in a 0.8 m CMOS process. The

measured circuit attained 1.59 GHz and 8.0 mW/MHz power


consumption with 5 V power supply. It can be advantageously
compared with other implementations in terms of area and
power consumption; in terms of speed, it matches the fastest
TSPC prescaler. The studies done during the design reveal that,
to take full advantage of the TSPC technique, every possible
configuration should be considered. The E-TSPC, being an
extension of TSPC, permits exploring a larger number of
solutions and, in consequence, finding the best configuration.
The dual-modulus prescaler results exhibit some significant
improvements produced by the E-TSPC.
REFERENCES
[1] J. Navarro and W. Van Noije, E-TSPC: Extended true single-phase
clock CMOS circuit technique, in VLSI: Integrated Systems on Silicon,
IFIP International Conference on VLSI, R. Reis and L. Claesen, Eds.
London, U.K.: Chapman & Hall, 1997, pp. 165176.
[2]
, E-TSPC: Extended true single-phase-clock CMOS circuit technique for high speed applications, SBMICRO, J. Solid-State Devices
Circuits, vol. 5, pp. 2126, July 1997.
[3] N. H. E. Weste and K. Eshraghian, Principles of CMOS VLSI Design,
1st ed. Reading, MA: Addison-Wesley, 1985.
[4] R. H. Krambeck, C. M. Lee, and H.-F.S. Law, High-speed compact
circuits with CMOS, IEEE J. Solid-State Circuits, vol. SC-17, pp.
614619, June 1982.
[5] N. F. Goncalves and H. J. De Man, NORA: A racefree dynamic CMOS
technique for pipelined logic structures, IEEE J. Solid-State Circuits,
vol. SC-18, pp. 261266, June 1983.
[6] N. F. Goncalves, NORA: A racefree CMOS technique for register
transfer systems, Ph.D. dissertation, Katholieke Universiteit Leuven,
Leuven, Belgium, 1984.
[7] Y. Ji-ren, I. Karlsson, and C. Svensson, A true single-phase-clock
dynamic CMOS circuit technique, IEEE J. Solid-State Circuits, vol.
SC-22, pp. 899901, Oct. 1987.
[8] J. Yuan and C. Svensson, High speed CMOS circuit technique, IEEE
J. Solid-State Circuits, vol. 24, pp. 6270, Feb. 1989.
[9] M. Afghahi and C. Svensson, A unified single-phase clocking schema
for VLSI systems, IEEE J. Solid-State Circuits, vol. 25, pp. 225235,
Feb. 1990.
[10] P. Larsson, Skew safety and logic flexibility in a true single phase
clocked system, in Proc. IEEE ISCAS, Seattle, WA, May 1995, pp.
941944.
[11] Q. Huang, Speed optimization of edge-triggered nine-transistor D-flipflop for gigahertz single-phase clocks, in Proc. IEEE ISCAS, Chicago,
IL, May 1993, pp. 21182121.
[12] B. Chang, J. Park, and W. Kin, A 1.2 GHz CMOS dual-modulus
prescaler using new dynamic D-type flip-flops, IEEE J. Solid-State
Circuits, vol. 31, pp. 749752, May 1996.
[13] Q. Huang and R. Rogenmoser, Speed optimization of edge-triggered
CMOS circuits for gigahertz single-phase clocks, IEEE J. Solid-State
Circuits, vol. 31, pp. 456465, Mar. 1996.
[14] J. Craninckx and M. S. J. Steyaert, A 1.75-GHz/3-V dual-modulus
divide-by-128/129 prescaler in 0.7-m CMOS, IEEE J. Solid-State
Circuits, vol. 31, pp. 890897, July 1996.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

835

16

A CMOS Monolithic
-Controlled Fractional-N
Frequency Synthesizer for DCS-1800
Bram De Muer, Student Member, IEEE, and Michel S. J. Steyaert, Senior Member, IEEE

16

AbstractA monolithic 1.8-GHz


-controlled fractionalphase-locked loop (PLL) frequency synthesizer is implemented in a
standard 0.25- m CMOS technology. The monolithic fourth-order
type-II PLL integrates the digital synthesizer part together with
a fully integrated LC VCO, a high-speed prescaler, and a 35-kHz
2 mm2 . To investigate
dual-path loop filter on a die of only 2
the influence of the
modulator on the synthesizers spectral
purity, a fast nonlinear analysis method is developed and experimentally verified. Nonlinear mixing in the phase-frequency detector (PFD) is identified as the main source of spectral pollution in
fractional- synthesizers. The design of the zero-dead zone
PFD and the dual charge pump is optimized toward linearity and
spurious suppression. The frequency synthesizer consumes 35 mA
from a single 2-V power supply. The measured phase noise is as
low as 120 dBc/Hz at 600 kHz and 139 dBc/Hz at 3 MHz.
The measured fractional spur level is less than 100 dBc, even
for fractional frequencies close to integer multiples of the reference frequency, thereby satisfying the DCS-1800 spectral purity
constraints.

16

16

16

Index TermsCMOS RF integrated circuits,


modulator,
fractional- frequency synthesis, phase-locked loop, phase noise.

I. INTRODUCTION

HE END of the 20th century was characterized by the unrivaled growth of the telecommunication industry. The main
cause was the introduction of digital signal processing in wireless communications, driven by the development of high-performance low-cost CMOS technologies for VLSI. However, the
implementation of the RF analog front end remains the bottleneck. This is reflected in the large effort put into monolithic
CMOS integration of RF circuits both by academics and industry [1][3].
The goal of this work is the monolithic integration in standard CMOS technology of a frequency synthesizer to enable the
full integration of a transceiver front end in CMOS, including
a low-IF receiver and a direct upconversion transmitter [1]. To
achieve a high degree of integratability and fast settling under
fractional- synthesizer topology
low-noise constraints, a
fractional- synthesis circumhas been chosen [4] (Fig. 1).
vents the severe speedspectral purityresolution tradeoff of the
classic phase-locked loop (PLL) synthesizer, by providing synthesis of fractional multiples of the reference frequency. Spurious tones that emerge from the fractional division are whitened
action and ultimately filtered by
and noise shaped by the
the loop filter. To prevent degradation of the spectral purity by
Manuscript received November 5, 2001; revised January 31, 2002.
The authors are with the Katholieke Universiteit Leuven, Department
Elektrotechniek, ESAT-MICAS, B-3001 Heverlee, Belgium (e-mail: bram.demuer@esat.kuleuven.ac.be).
Publisher Item Identifier S 0018-9200(02)05856-0.

Fig. 1. Principle of

16 fractional-N synthesis.

digital noise coupling, the


modulator is scheduled for integration on the digital baseband signal processing IC of the full
transceiver system.
The paper describes the design of a monolithic 1.8-GHz
-controlled fractional- PLL frequency synthesizer. In
noise on PLL bandwidth
Section II, the influence of
requirements is theoretically analyzed for multistage noise
modulators.
shaping (MASH) and multibit single-loop
Next, a fast nonlinear analysis method is presented, which
predicts possible degradation of the PLL spectral purity by
in-band noise leakage and re-emerging of spurious tones.
The nonlinearities in the phase-frequency detector (PFD)
charge pumps are identified as the main trouble spots. The
fourth-order type-II PLL building-block design is discussed in
Section IV, focusing on integrated filter and voltage-controlled
oscillator (VCO) design and on the realization of a linear phase
error-to-charge-pump current conversion. In Section V, the
experimental results of the fractional- synthesizer prototype
are presented and compared to the simulations, showing good
correspondence.
II. THE FRACTIONAL-

SYNTHESIZER

A. Introduction
fractional- synthesizer is shown
A block diagram of a
modulator output controls the instantaneous
in Fig. 1. The
division modulus of the prescaler, such that the mean division
, with the number of bits of the
modulus is
modulator and the input word. The corresponding phase
changes at the prescaler output are quantized, leading to possible
spurious tones and quantization noise. By selecting higher order
modulators, the spurious energy is whitened and shaped to
high-frequency noise, which can be removed by the low-pass
loop filter. As a result, for a given frequency resolution, an arcan be chosen, by assigning the proper number
bitrary high

0018-9200/02$17.00 2002 IEEE

836

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

Fig. 2. Third-order multibit single-loop


reasons.

16 modulator. The internal modulator accuracy is 16 bit. From the five output bits, only four are used for stability

of bits to the modulator. The loop bandwidth is not restricted


by the reference spur suppression, resulting in faster settling and
higher integratability. Additionally, the division modulus is de(with
the minimum number of
creased by a factor
bits for the frequency resolution, i.e., 7.02 in this case), so that
noise of the PLL blocks, except for the VCO, is less amplified.
B. The

Modulators

The influence of both third-order MASH and multibit


modulators on the spectral purity of the
single-loop
fractional- synthesizer is investigated. Since the order of
the integrated PLL loop filter is three, the order of the
modulators must also be three or higher to ensure that
noise has at least a 20-dB/dec rolloff at intermediate offset
frequencies, causing no degradation of the output phase noise.
Both modulators have an internal accuracy of 16 bit and 1 LSB
dithering is applied to further randomize any spurious energy.
The dithering sequence is third-order noise shaped to avoid an
increased noise floor.
modulator is chosen beThe MASH or cascade 1-1-1
cause it is easy to integrate in CMOS and is unconditionally
stable. The noise transfer function (NTF) of the MASH moduand contains three poles at the
lator is
origin of the plane. The result is harsh LF noise shaping and
and substantial HF noise. In the time domain, this is reflected
in the intensive prescaler modulus switching. To synthesize a
, all moduli between 64 and 71 are
frequency of
employed.
modulator is shown in Fig. 2.
The multibit single-loop
For ease of integration, the feedforward and feedback coefficients are a power of 2. Only four output bits are needed to control the prescaler moduli, but five output bits are used, to avoid
overlap of the intended input operating range and the unstable
input regions. The NTF of the presented modulator is given in
(1) and contains only one pole at the origin of the plane and
, with a passband
two low- Butterworth poles at
gain of 3.2.
(1)
modulator is more complex than
Although the single-loop
the MASH modulator, it offers a higher flexibility in terms of
noise shaping. The HF quantization noise of the modulator
can be spread out by proper pole positioning. As a result, the
prescaler modulus switching is less intense. Only the moduli
.
between 66 and 69 are needed to synthesize
The reduced HF switching has advantageous effects on noise

Fig. 3. Maximum PLL bandwidth f


versus the reference frequency and
different
modulator orders, for the type-II fourth-order PLL. The dashed
curve is for the third-order single-loop modulator. The targeted phase-noise
specification is 136 dBc/Hz at 3 MHz for DCS-1800.

16

coupling and sensitivity to PLL nonlinearities, as will be


discussed in Section III.
C. Theoretical Analysis
control on the specTo theoretically model the impact of
tral purity of the synthesizer, a linear-time-invariant (LTI) PLL
quantization noise as an admodel is employed, with the
at the prescaler output. The prescaler
ditive noise source
control can be looked upon as a digital-to-phase (D/P)
with
converter. Every reference cycle, the prescaler subtracts
rad from its input signal, with
determined
modulator output. The resulting quantization noise
by the
on the division modulus, and thus output phase, is approximated
by uniformly distributed white noise [5]. The quantization noise
with
for both modupower is
the modulus range and the number of signiflators with
output bits. The phase noise contribution of the
icant
modulator at the output of the synthesizer is found in (2) [6],
the closed-loop transfer function of the fourth-order
with
type-II PLL.
(2)
fractional- synthesizers
Since the main advantage of
and the PLL
is the decoupling of the reference frequency
noise on the bandwidth
bandwidth , the influence of the

DE MUER AND STEYAERT: CMOS MONOLITHIC FREQUENCY SYNTHESIZER FOR DCS-1800

837

III. FAST NONLINEAR ANALYSIS METHOD

Fig. 4. Maximum PLL bandwidth f


different
modulator orders for
third-order single-loop modulator.

16

18

versus the reference frequency and


:
. The dashed curve is for the

<

15

requirement is examined. To comply with the most stringent


133 dBc/Hz
DSC-1800 phase noise specification, i.e.,
phase noise is
at 3 MHz offset [7], the target
(3 MHz)
dBc/Hz. In Fig. 3, the maximum PLL
is plotted versus the reference frequency
bandwidth
for different MASH modulator orders. The dashed line is the
modumaximum bandwidth for the single-loop multibit
lator of Section II-B. For a reference frequency of 26 MHz, not
much is gained from increasing the modulator order. For a high
bandwidth and thus a fast PLL, the reference frequency and/or
the modulator order should be increased leading to an increased
power consumption and circuit complexity. The maximum
bandwidth is 87 kHz for the third-order MASH modulator and
62 kHz for the single-loop multibit modulator.
Apart from the out-of-band phase-noise constraint, the integrated in-band phase noise, determining the rms phase error
of the PLL is of importance. To be sure that the
does not corrupt the rms phase error, the dynamic range of the
modulator must be higher than the dynamic range of the PLL
is given by
[8]. The integrated in-band frequency noise
with
the noise bandwidth of the PLL
the in-band phase noise in dBc/Hz. The noise
and 10
. The maxbandwidth of the presented PLL is
imum bandwidth of the PLL is calculated in (3) [8].
(3)
is plotted versus the refThe maximum PLL bandwidth
erence frequency of the PLL for different MASH modulator
modulators
orders in Fig. 4. For the single-loop multibit
(dashed curve), the actual maximum bandwidth can be calculated to be 25% smaller than in (3), due to the Butterworth poles.
In the case of a third-order modulator, a 1.5 rms phase error (to
of
ensure at least an overall rms phase error of 2 ) and a
26 MHz, the maximum bandwidth is 810 and 614 kHz, respecmodulator
tively. Obviously, the constraint posed on the
noise due to in-band noise contributions is much less severe than
the constraint due to the out-of-band phase noise at 3 MHz.

The theoretical analysis suggested that applying


control
to the prescaler would not cause any problems for the spectral
purity of the PLL. Practice, however, proves this wrong. A fast
nonlinear analysis method is developed which can take into account the nonlinearity of the PLL building blocks. The analysis
method is at the same time sufficiently fast to sweep simulations
over different degrees of nonlinearities and operating points, and
is capable of performing sufficiently long transient simulations
to get accurate fast Fourier transforms (FFTs) of the phase variable. The fractional operation of the PLL is simulated in discrete
time and in open loop under locked conditions to avoid drift of
the phase error. To further speed up the simulation, the building
blocks are represented by high-level models with parameters to
model any nonlinear behavior or mismatch in critical transistors. The simulations are performed in Matlab [9].
modulation of
To find the phase error, generated by the
the division modulus, the variation of the number of RF pulses,
, at the output of the divider is monitored. Every reference
cycle, the number of RF pulses at the divider output is detercontrol,
mined by the number of pulses swallowed by the
:
(4)
The resulting quantized phase changes are compared with the
phase that would be expected when the loop would be in lock,
i.e., the phase corresponding to the fractional part of the divi. The result is the instantaneous accumusion modulus
:
lated phase error
(5)
, in the
The phase error is converted to current pulses,
(phase-error charge-pump curcharge pump. The
rent) conversion is modeled to contain any PFD nonlinearity.
Mismatch in the up and down current sources, resulting in gain
mismatch for positive and negative phase errors is modeled by
. The occurrence of a dead zone is modeled by
(6)
By taking an FFT of the current pulses, the current noise
spectrum is obtained. The current noise spectrum is modeled
as a phase-noise source which is subjected to its corresponding
closed-loop transfer function, obtained from the LTI PLL
model. This means that the filter is modeled by its linear
transfer function, which includes parasitic gain and pole
position changes. The nonlinear conversion from voltage to
frequency/phase in the VCO is modeled by the variation of the
VCO gain, when changing the operating point of the PLL.
The analysis tool enables the evaluation and comparison of
noise on the PLL.
the effect of MASH and single-loop
This analysis is performed with the following nonlinearities: a
0.1% dead zone and a gain mismatch of 2%. The internal accuracy of both modulators is 16 bit. The reference frequency

838

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

Fig. 5. Simulation results. The phase error  for (a) the MASH modulator and (b) the single-loop multibit modulator. The FFT of the current pulses CP
(c) the MASH modulator and (d) the single-loop multibit modulator.

is 26 MHz and the fractional division number is 67.92. The


output frequency is 1.76592 GHz, i.e., 2.08 MHz offset from
. In Fig. 5(a) and (b), the time-domain
an integer multiple of
is plotted for both modulators. Note that the
phase error
fractional- PLL frequency synthesizer can hardly be called a
phase-locked loop, since the loop is never in lock! Due to the
modulator, the inshaping of the HF noise in the single-loop
stantaneous phase error is smaller than for a MASH modulator.
This has two important consequences. First, the on-time of the
charge pumps is smaller for the single-loop modulator, making it
less sensitive to noise coupling from the substrate and the power
consupply. Second, the sensitivity to the nonlinear
version in terms of noise leakage is reduced.
To be able to examine the effect of nonlinearities in the frequency domain, the FFTs of the charge-pump current pulses
are plotted in Fig. 5(c) and (d). A noise floor appears in
the output spectrum as well as spurious tones, although the
output is perfectly randomized and dithered. Due to the nonfolds
linear mixing in the PFD charge pump, noise at
back to lower offset frequencies, similar to the effect of a nonADC. Since the noise at
is
linear DAC in a multibit
modulator, its noise leakage
much lower for the single-loop
due to the nonlinear mixing in the PFD is also lower. In the time

[i] for

domain, this effect corresponds to the smaller phase excursions.


The difference in phase error between MASH and single-loop
modulators is reflected in a lower noise floor, i.e., a 10-dB difference. In addition, previously unnoticed spurious tones appear
in the output spectrum at
with
.
noise of both modulators as it appears at
Fig. 6 shows the
the PLL output for an ideal (dotted) and a nonlinear
conversion (solid). The results of the ideal case closely match
the theoretical results of Section II-C (solid light gray). Due
to nonlinearity, the simulated output spectrum of the integerPLL (the dash-dotted line) is seriously deteriorated by
noise
. Especially,
in the PLL noise bandwidth, increasing the
the MASH converter is critical in terms of in-band noise due
to the higher phase error [see Fig. 5(a)], despite the inherently
noise of the MASH modulator. Note that the simlower LF
ulations are performed without taking into account noise coupling through the substrate or power-supply lines. As a consefractionalquence, the actual spurious performance of the
PLL could be worse than simulated. The presented simulation
results are for a division modulus 67.92, close to an integer mul. When analyzing division moduli in between integer
tiple of
, noise leakage is still observed, but the spurious
multiples of
tones are well below the phase noise.

DE MUER AND STEYAERT: CMOS MONOLITHIC FREQUENCY SYNTHESIZER FOR DCS-1800

839

Fig. 7. Discrete time autocorrelation estimate of the modulator outputs for (a)
the MASH modulator and (b) the single-loop multibit modulator.

16

Fig. 6. Simulation results. The


noise at the output of the PLL for (a) the
MASH modulator and (b) the single-loop multibit modulator. The results are
plotted for an ideal PFD (dotted), which closely corresponds to the theoretical
results (solid light gray) and for a nonlinear PFD (solid). They are compared to
the simulated integer PLL phase noise (the dash-dotted line).

PFD. This effect can be worsened by substrate and power.


supply coupling with signals at
IV. PLL BUILDING-BLOCK CIRCUIT DESIGN
A. The Fourth-Order Type-II PLL

The explanation for the re-emerging of spurious tones is that


the modulator is unable to sufficiently decorrelate the successive
modulator
output samples. To quantify the correlation in the
output, the discrete time autocorrelation estimate is calculated
and plotted for both modulators for inputs close to an integer
value (see Fig. 7). The autocorrelation calculations show correlation, although 1LSB noise-shaped dithering is applied. The
modulator shows large
autocorrelation of the single-loop
correlation peaks, explaining the higher spurious tones in the
output phase-noise spectrum of the PLL. With the autocorrelamodtion estimate, the necessary internal accuracy of the
ulators is found to be at least 13 bits for MASH and 16 bit
for single-loop modulators to sufficiently decorrelate the
modulator output for inputs close to integers. A second possible
source of tones is the downconversion of tones which are inher[5], by the nonlinear mixing in the
ently present around

A fourth-order type-II PLL is integrated, including a 4-bit


prescaler, a zero-dead-zone PFD, a dual charge pump, and a
3-step equalizer, together with an on-chip LC-tank VCO and a
third-order dual-path 35-kHz low-pass loop filter (see Fig. 8).
The equalizer performs a 3-step piecewise equalization of the
loop gain, by keeping the product of the VCO gain and the
charge-pump current constant. To prevent switching between
different equalization states, the state transitions exhibit hysteresis.
B. The 4-Bit Prescaler
The first high-speed division of the prescaler is done
with two differential single-transistor-clocked (DSTC) logic
n-latches [10], forming a differential dynamic D-flip-flop. The
flip-flop operates with rail-to-rail internal signals to minimize
the residual prescaler phase noise [11] to levels insignificant to

840

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

Fig. 8. Fully integrated fourth-order type-II phase-locked loop.

the overall phase-noise performance. The 16-modulus division


(64 79) is implemented with the phase-switching topology
[12]. The division moduli are generated by switching between
the 90 -spaced output phases of the second D-flip-flop. When
the 90 spacing is not ideal, spurs appear at 1/4, 2/4, and 3/4 of
the PLL reference frequency. It takes careful layout and circuit
design to equalize the delays of the different quadrature paths,
such that these spurious tones are suppressed to negligible
levels.

SUMMARY

TABLE I
LOOP PROPERTIES AND PERFORMANCE
FOURTH-ORDER TYPE-II PLL

OF THE

OF THE

C. The Voltage-Controlled Oscillator


The LC VCO with on-chip inductor combines a 30% tuning
range at only 2 V and an excellent phase-noise performance
over a large frequency range. To minimize the VCO phase
noise, a simulator-optimizer program has been developed
which searches the optimal inductor geometry for a given
technology. The resulting hollow octagonal balanced inductor
as high as 9 with an inductance of 2.86 nH, for a
has a
standard 0.25- m CMOS technology with only two metal
layers (0.6 and 1.0 m) [13].
The VCO is implemented as a single differential pMOS-only
topology, leading to an enhanced tuning range, without in[13].
creasing the power consumption and the VCO gain,
is between 100 and
For the frequency range of interest,
200 MHz/V, explaining the need for equalization of the loop
gain. The VCO output is buffered from the prescaler input to
prevent kickback noise from entering the tank. The measured
phase noise is as low as 127.5 dBc/Hz at 600 kHz and
142.5 dBc/Hz at 3 MHz for a carrier frequency of 1.82 GHz.
D. The 35-kHz Dual-Path Loop Filter
To achieve full integration, a dual-path filter topology has
been implemented (Fig. 8). Two filter paths, one active integraare added
tion ( ) and one passive low-pass filter

with a multiplication factor in the dual charge pumps. The


addition realizes the low-frequency zero needed for loop stability in a type-II PLL, without adding the actual capacitor [12].
The total number of capacitors is the same as in a classical
fourth-order type-II PLL, but for the same phase noise the integrated capacitance is more than 5 times smaller. Due to the
rather high VCO gain, the integrated capacitance is still 1.4 nF
to be able to comply with the DCS-1800 phase-noise requireis added at 210 kHz to ensure
ments. An extra pole
enough suppression at higher offset frequencies. A filter optimization model is developed, determining all pole and zero
positions and the capacitanceresistance tradeoff to obtain low
noise and high integratability [14]. The results of the optimization at 1765.92 MHz are listed in Table I. The total phase noise
is without the
noise. The MASH and single-loop (SL)
noise contributions result from the nonlinear analysis. As

DE MUER AND STEYAERT: CMOS MONOLITHIC FREQUENCY SYNTHESIZER FOR DCS-1800

841

Fig. 9. (a) Timing control circuit and signals to control the dummy and the output current branch of the charge pump. (b) Charge-pump circuit with (at the left)
the dummy current branch, denoted by the suffix d, and the output branch.

seen in Section II-C, the loop bandwidth needs to be smaller than


noise suppression. However, to ensure sufficient
62 kHz for
suppression of the low-frequency fractional spurious tones for
inputs close to the integers, the bandwidth is designed to 35 kHz.
Despite the rather low loop bandwidth for a fractional- synthesizer, a settling time of less than 293 s for a 104-MHz step
is simulated.
E. The

Conversion

The nonlinear analysis of Section III identified nonlinearity


conversion as the main cause of noise leakage
of the
and spurious tones. Therefore, the PFD and charge-pump circuits are carefully optimized toward spurious suppression as
such and toward a highly linear phase-error detection for
spurious suppression.
First, the reference spur generation by the PFD charge-pump
circuit is carefully minimized. The integration in the first path of
the loop filter is done actively to keep the charge-pump output

at a fixed level (see Fig. 8). Additionally, the charge-pump current is designed to be at least a magnitude larger than the fixed
parasitic charge injection of the switch transistors. The current
switches are implemented with pMOS and nMOS transistors to
compensate charge injection. Finally, a timing control scheme
[Fig. 9(a)] is developed to control the charge-pump switches.
The up and down control pulses of the PFD are converted to synchronized control signals to drive both the output current branch
and the dummy current branch of the charge pump [Fig. 9(b)].
Fig. 9(a) shows the dummy and output control signals. The
is delayed versus the output control
by
dummy control
modifying the thresholds of the second inverter-string (indicated
always flows, preby high and low) such that the current
venting hard on/off switching of the current sources. To equalize
rise and fall times and force a perfect rad relation between
nMOS and pMOS control signals, latches at the outputs of both
inverter strings are implemented. Capacitors at the control outputs lower the rise and fall times to prevent large charge injections by fast switching.

842

Fig. 10.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

IC microphotograph and the measurement setup in which it is embedded.

To linearize the
conversion, the phase detection
is performed by a zero-dead-zone PFD [15], to prevent a hard
nonlinearity around 0 phase error. Due to the delay added in
the PFD, both the up and down current sources are on, for small
or zero phase errors, enabling the PFD to react to very small
phase errors. The on-time fraction of the charge pump due to
the delay is less than 10%. This value is a tradeoff between
dead-zone prevention and sensitivity to noise coupling, when
the charge pumps are on. To further minimize digital noise coupling, the sampling in the PFD and the computational events
modulator and prescaler are offset in phase. Consein the
quently, the phase-error decision making is done in a relatively
quiet environment. To make sure that the gains for positive and
negative phase-error detection are equal, the current source transistors are oversized to ensure sufficient matching. As a side efnoise, which can seriously affect the
fect, the current source
in-band noise, is decreased. Additionally, the timing control of
Fig. 9(a) provides synchronization between the two filter paths
and the switches of the charge pumps themselves, thereby ensuring equal positive and negative phase-error detection gain.
HSPICE simulations of the PFD charge-pump circuit are performed and show no dead zone and no gain mismatch with ideal
transistor matching.

V. EXPERIMENTAL RESULTS
Fig. 10 shows the IC microphotograph and the measurement
fractional measuresetup in which it is embedded. The
ments are performed by controlling the PLL divider moduli with
an HP80000 data generator, which generates the 4-bit control
output bit stream is generated using Matlab.
word. The 4-bit
modThis provides a flexible way to test different kinds of
ulators, without the need for redesigns. All presented measurements are performed with a 26-MHz reference frequency and at

16

Fig. 11. Measured output spectrum of the


fractionalGHz. All spurious tones are well below 75 dBc/Hz.

N PLL at 1.76592

1.76592 GHz, i.e., for a fractional division by 67.92 for comparmodulators


ison with the simulated results. The input to the
), resulting in a frequency resolution of
is a 16-bit word (
around 400 Hz. The power-supply voltage is only 2 V. Fig. 11
shows the output spectrum of the fractional- PLL over a span
of 55 MHz. The reference spurs are well below 75 dBc, due
to the careful charge-pump timing control.
To measure the fractional performance of the frequency synthesizer, the Matlab data is stored in the data generator memory.
Unfortunately, the maximum memory capacity is only 128 kbit,
leading to large spurious tones at the output at low offset frequencies. These large tones corrupt the gain calibration, which is
performed by the phase-noise measurement system every offset
frequency decade, such that accurate measurements of the phase
noise at offsets smaller than 10 kHz are not feasible. The measured phase noise of the PLL with the MASH modulator and the

DE MUER AND STEYAERT: CMOS MONOLITHIC FREQUENCY SYNTHESIZER FOR DCS-1800

SUMMARY

843

OF

TABLE II
MEASURED SPECIFICATIONS COMPARED
DCS-1800 SPECIFICATIONS

TO THE

16

Fig. 12. Phase-noise measurement with the


single-loop multibit converter
at 1.76592 GHz compared to the phase noise at integer division (light).

Fig. 13. Phase noise measurement with the MASH converter at 1.76592 GHz
compared to the simulated
noise at the output of the PLL (dashed), and
control (dash-dotted).
with the simulated PLL output without

16

16

single-loop multibit modulator is presented in Figs. 12 and 13.


Small spurs are present at 2.08 MHz as predicted by the simulations in Fig. 6. The spur level is well below 100 dBc, due to
careful PFD charge-pump design. The phase noise at 600 kHz
is lower than 120 dBc/Hz.
In Fig. 12, the measured phase noise of the PLL with a
multibit single-loop modulator (dark) is compared to the phase
noise at integer division (light). Noise at lower offsets origimodulator due to noise folding in the PFD,
nates from the
as predicted by the simulations. As a result, the rms phase error
is increased from 1.7 to 3 . Note that the phase noise
of the PLL at integer divisions is as low as 124 dBc/Hz
at 600 kHz, which is only 0.3 dB higher than predicted by
the PLL simulations (see Table I). The measured results for
fractional division are much noisier than predicted by simulation. The phase noise at offset frequencies close to 10 kHz
is increased due to the limited memory of the data generator.
The noise at higher offset frequencies is corrupted by noise
coupling from the data generator. As can be seen in Fig. 10,
-control bonding wires, which conduct rail-to-rail, very
the

noisy control pulses are close to the LC tank and the bonding
wires of the VCO power supply. Without proper shielding, the
VCO phase noise is seriously degraded by this noise coupling.
noise and the
noise as simIn Fig. 13, the measured
ulated in Section III (dashed) is compared. The dash-dotted line
control. The
is the simulated phase noise of the PLL without
noise leakage closely matches the measured resimulated
sults, except at very low offsets due to the limited memory. The
phase noise at high offsets is increased versus the simulated PLL
results due to noise coupling. Second-order tones are larger in
measurements, since the models in the simulator do not include
second-order effects and noise coupling. Tones at 520 kHz are
believed to come from subharmonic tones present in the
modulator output [5], which are amplified by mixing through
noise coupling. When comparing the results for the MASH and
the single-loop modulator, the measured results are less pronounced than the simulated results (see Fig. 6). The measured
phase noise for the single-loop modulator is however a few decibels lower than for the MASH modulator. Note that all measurements are performed for frequencies close to integer multiples
.
of
The measured settling time of the PLL is 226 s for a
104-MHz frequency step. The power consumption of the PLL
is 70 mW from a 2-V power supply. The fully integrated
low-phase-noise VCO is responsible for almost 66% of the
total power consumption. The IC area is 2 2 mm , including
bonding pads and bypass capacitors. Table II shows the measured specifications compared to the DCS-1800 specifications
[1]. The specifications of the IC prototype comply with the
is degraded due to the limited
DCS-1800, only the
resolution of the measurement setup.
VI. CONCLUSION
A monolithic 1.8-GHz
-controlled fractional- PLL
frequency synthesizer is implemented in a standard 0.25- m
CMOS technology. The monolithic fourth-order type-II PLL

844

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 7, JULY 2002

integrates the digital synthesizer part together with a fully


integrated LC VCO, a high-speed prescaler, and a 35-kHz
dual-path loop filter on a die of only 2 2 mm . To investigate
modulator on the synthesizers spectral
the influence of the
purity, a fast nonlinear analysis method is developed, showing
good correspondence with measurements, in contrast to the
results of the theoretical analysis. Nonlinear mixing in the
phase-frequency detector and the VCO is identified as the main
fractional- synthesizers.
source of spectral pollution in
modulators are compared
MASH and single-loop multibit
for use in fractional- synthesis. Although the MASH is stable
and easy to integrate, the single-loop modulator presents a
better solution, showing less sensitivity to noise leakage and
noise coupling and providing more flexibility. The measured
phase noise is lower than 120 dBc/Hz at 600 kHz and
139 dBc/Hz at 3 MHz. The measured fractional spur level is
lower than 100 dBc, satisfying the DCS-1800 spectral purity
requirements. All measurements are performed for frequencies
close to integer multiples of the reference frequency, where the
synthesizer is most sensitive to spurious tones.
REFERENCES
[1] M. S. J. Steyaert, J. Janssens, B. De Muer, M. Borremans, and N. Itoh, A
2-V CMOS cellular transceiver front-end, IEEE J. Solid-State Circuits,
vol. 35, pp. 18951907, Dec. 2000.
[2] T. Cho, E. Dukatz, M. Mack, D. Macnally, M. Marringa, S. Mehta, C.
Nilson, L. Plouvier, and S. Rabii, A single-chip CMOS direct-conversion transceiver for 900-MHz spread-spectrum digital cordless phones,
in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, San
Francisco, CA, Feb. 1999, pp. 228229.
[3] A. Rofougaran, G. Chang, J. J. Rael, J. Y.-C. Chang, M. Rofougaran, P.
J. Chang, M. Djafari, J. Min, E. W. Roth, A. A. Abidi, and H. Samueli,
A single-chip 900-MHz spread-spectrum wireless transceiver in 1-m
CMOSPart II: Receiver design, IEEE J. Solid-State Circuits, vol. 33,
pp. 547555, Apr. 1998.
[4] M. Copeland, T. Riley, and T. Kwasniewski, Deltasigma modulation
in fractional-N frequency synthesis, IEEE J. Solid-State Circuits, vol.
28, pp. 553559, May 1993.
[5] S. R. Norsworthy, R. Schreier, and G. C. Themes, DeltaSigma Data
Converters: Theory, Design and Simulation. New York: IEEE Press,
1997.
[6] B. Miller and R. Conley, A multiple modulator fractional divider,
IEEE Trans. Instrum. Meas., vol. 40, pp. 578583, June 1991.
[7] Digital cellular communication system (Phase 2+); Radio transmission
and reception, Eur. Telecommun. Standards Inst., ETSI 300 190 (GSM
05.05 version 5.4.1), 1997.
[8] W. Rhee, B.-S. Song, and A. Ali, A 1.1-GHz CMOS fractional-N
modulator, IEEE J.
frequency synthesizer with a 3-b third-order
Solid-State Circuits, vol. 35, pp. 14531460, Oct. 2000.
[9] The Mathworks Inc., Matlab Users Guide, Version 5. Englewood
Cliffs, NJ: Prentice Hall, 1997.
[10] J. Yuan and C. Svensson, New single-clock CMOS latches and flipflops with improved speed and power savings, IEEE J. Solid-State Circuits, vol. 32, pp. 6269, Jan. 1997.

16

[11] B. De Muer and M. S. J. Steyaert, A single-ended 1.5-GHz 8/9 dualmodulus prescaler in 0.7-m CMOS with low phase-noise and high
input sensitivity, in Proc. Eur. Solid-State Circuits Conf. (ESSCIRC),
The Hague, Sept. 1998, pp. 256259.
[12] J. Craninckx and M. S. J. Steyaert, Low-phase-noise fully integrated
CMOS frequency synthesizers, Ph.D. dissertation, Katholieke Univ.
Leuven, Belgium, 1997.
[13] B. De Muer, M. Borremans, N. Itoh, and M. S. J. Steyaert, A 1.8-GHz
highly tunable low-phase-noise CMOS VCO, in Proc. IEEE Custom
Integrated Circuits Conf. (CICC), Orlando, FL, May 2000, pp. 585588.
[14] B. De Muer and M. S. J. Steyaert, Fully integrated CMOS frequency
synthesizers for wireless communications, in Analog Circuit Design,
W. Sansen, J. H. Huijsing, and R. J. van de Plassche, Eds. Norwell,
MA: Kluwer, 2000, pp. 287323.
[15] F. M. Gardner, Phaselock Techniques. New York: Wiley, 1979.

Bram De Muer (S00) was born in Sint-Amandsberg, Belgium, in 1973. He received the M.Sc.
degree in electrical engineering in 1996 from the
Katholieke Universiteit Leuven, Belgium, where
he is currently working toward the Ph.D. degree
on high frequency low-noise integrated frequency
synthesizers at the ESAT-MICAS laboratories.
He has been a Research Assistant with
ESAT-MICAS laboratories since 1996. His research
is focused on integrated low-phase-noise VCOs with
on-chip planar inductors and high-speed prescaler
design, leading to fully integrated
fractional-N synthesizers in CMOS
technology.

16

Michel S. J. Steyaert (S85A89SM92) was born


in Aalst, Belgium, in 1959. He received the M.S.
degree in electrical-mechanical engineering and
the Ph.D. degree in electronics from the Katholieke
Universiteit Leuven (K.U. Leuven), Heverlee,
Belgium, in 1983 and 1987, respectively.
From 1983 to 1986, he obtained an IWONL fellowship (Belgian National Foundation for Industrial
Research) which allowed him to work as a Research
Assistant at the Laboratory ESAT at K.U. Leuven.
In 1987, he was responsible for several industrial
projects in the field of analog micropower circuits at the Laboratory ESAT as
an IWONL Project Researcher. In 1988, he was a Visiting Assistant Professor
at the University of California, Los Angeles. In 1989, he was appointed by
the National Fund of Scientific Research (Belgium) as a Research Associate,
in 1992 as a Senior Research Associate, and in 1996 as a Research Director
at the Laboratory ESAT, K.U. Leuven. Between 1989 and 1996, he was also
a part-time Associate Professor and since 1997 an Associate Professor at
the K.U. Leuven. His current research interests are in high-performance and
high-frequency analog integrated circuits for telecommunication systems and
analog signal processing.
Dr. Steyaert received the 1990 European Solid-State Circuits Conference
Best Paper Award, the 1995 and 1997 ISSCC Evening Session Award, the
1999 IEEE Circuit and Systems Society GuilleminCauer Award, and the
1991 NFWO Alcatel-Bell-Telephone award for innovative work in integrated
circuits for telecommunications.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 7, JULY 2000

1039

A Family of Low-Power Truly Modular Programmable Dividers


in Standard 0.35-m CMOS Technology
Cicero S. Vaucher, Igor Ferencic, Matthias Locher, Sebastian Sedvallson, Urs Voegeli, and Zhenhua Wang

AbstractA truly modular and power-scalable architecture for


low-power programmable frequency dividers is presented. The architecture was used in the realization of a family of low-power fully
programmable divider circuits, which consists of a 17-bit UHF divider, an 18-bit -band divider, and a 12-bit reference divider. Key
circuits of the architecture are 2/3 divider cells, which share the
same logic and the same circuit implementation. The current consumption of each cell can be determined with a simple power optimization procedure. The implementation of the 2/3 divider cells is
presented, the power optimization procedure is described, and the
input amplifiers are briefly discussed. The circuits were processed
in a standard 0.35 m bulk CMOS technology, and work with a
nominal supply voltage of 2.2 V. The power efficiency of the UHF
divider is 0.77 GHz/mW, and of the -band divider, 0.57 GHz/mW.
The measured input sensitivity is 10 mVrms for the UHF divider,
and 20 mVrms for the -band divider.
Index TermsCMOS integrated circuits, current-mode logic,
frequency synthesizers, phase-locked loop, programmable frequency counter, programmable frequency divider.

I. INTRODUCTION

HE feasibility of RF functions implemented in CMOS


technology has been demonstrated by a.o. the work
presented in [1][3]. They show that the scaling of CMOS
technologies to deep submicron has made CMOS a technological option for the low-gigahertz frequency range. However, for
CMOS to become a commercial option for RF building blocks
requires compliance to all trends of the consumer market:
miniaturization, low cost, high reliability and long battery
lifetime. Bulk CMOS technologies presently available satisfy
the low cost and reliability trends by standard design practice.
Complying to miniaturization and long battery lifetime, on the
other hand, demands CMOS building blocks with low-power
dissipation and good electromagnetic compatibility (EMC)
characteristics. A critical RF function in this context is the
frequency synthesizer, more particularly the programmable
frequency divider. The divider consists of logic gates which
operate at (or close to) the highest RF frequency. Due to the
dividers complexity, high operation frequency normally leads
to high power dissipation.
Other crucial aspects of the present-day consumer electronics
industry are the short time available for the introduction of new
products in the market, and the short product lifetime. On top
of that, the lifespan of a given CMOS technology is also short,
due to the aggressive scaling of minimum feature sizes. Short
Manuscript received November 16, 1999; revised January 24, 2000.
C. S. Vaucher is with the Philips Research Laboratories, 5656AA Eindhoven,
The Netherlands.
I. Ferencic, M. Locher, S. Sedvallson, U. Voegeli, and Z. Wang are with
Philips Semiconductors Zurich, 8045 Zurich, Switzerland.
Publisher Item Identifier S 0018-9200(00)03878-6.

Fig. 1.

Fully programmable divider based on a dual-modulus prescaler.

time-to-market demands architectures providing easy optimization of power dissipation, fast design time and simple layout
work. High reusability, in turn, requires an architecture which
provides easy adaptation of the input frequency range and of
the maximum and minimum division ratios of existing designs.
The choice of the divider architecture is therefore essential
for achieving low-power dissipation, high design flexibility and
high reusability of existing building blocks. A modular architecture complies with these requirements, as shall be demonstrated
in this paper. The focus of the paper is first on the truly modular
architecture and on the implementation of the circuits. Then the
power optimization procedure and the design of the input amplifier are briefly discussed. Finally, a collection of measured
data and the conclusions are presented.
II. PROGRAMMABLE DIVIDER ARCHITECTURES
A. Architecture Based on a Dual-Modulus Prescaler
Fig. 1 depicts the divider architecture based on a dual-modulus prescaler [4], [5]. The design of the dual-modulus prescaler
itself has been extensively treated in the literature [4], [6][10].
On the other hand, the architecture of Fig. 1 has some undesirable characteristics. One readily notices the lack of modularity
of the concept: besides the dual-modulus prescaler, the architecture requires two additional counters for the generation of a
given division ratio. The programmable counterswhich are, in
fact, fully programmable dividers, albeit not operating at the full
RF frequencyrepresent a substantial load at the output of the
dual-modulus prescaler, so that power dissipation is increased.
Besides, the additional design and layout effort required for
the programmable counters increase the time-to-market of new
products. These properties led us to conclude that the dual-modulus-based architecture is not an interesting option for the realization of building blocks with high reusability, high flexibility,
and short design time.

00189200/00$10.00 2000 IEEE

1040

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 7, JULY 2000

(a)

(b)
Fig. 2.

Programmable prescaler. (a) Basic architecture. (b) With extended division range.

B. Programmable Prescaler Architectures


The basic programmable prescaler architecture is depicted
in Fig. 2(a). The modular structure consists of a chain of 2/3 divider cells connected like a ripple counter [11]. The structure
of Fig. 2(a) is characterized by the absence of long delay loops,
as feedback lines are only present between adjacent cells. This
local feedback enables simple optimization of power dissipation. Another advantage is that the topology of the different cells
in the prescaler is the same, therefore facilitating layout work.
The architecture of Fig. 2(a) resembles the one presented in [12],
which is also based on 2/3 divider cells. Yet there are two fundamental differences. First, in [12] all cells operate at the same
(high) current level. Second, the architecture of [12] relies on
a common strobe signal shared by all cells. This leads to high
power dissipation, because of high requirements on the slope of
the strobe signal, in combination with the high load presented
by all cells in parallel.
The programmable prescaler operates as follows. Once in a
division period, the last cell on the chain generates the signal
This signal then propagates up the chain, being reclocked by each cell along the way. An active mod signal enables a cell to divide by 3 (once in a division cycle), provided
that its programming input is set to 1. Division by 3 adds one
extra period of each cells input signal to the period of the output
signal. Hence, a chain of 2/3 cells provides an output signal
with a period of

1) can be realized. The division range is thus rather limited,


amounting to roughly a factor two between maximum and
minimum division ratios.1 The division range can be extended
by combining the prescaler with a set-reset counter [13]. In that
case, however, the resulting architecture is no longer modular.
The divider implementation presented in Fig. 2(b) extends
the division range of the basic prescaler, whilst maintaining the
modularity of the basic architecture [14]. The operation of the
new architecture is based on the direct relation between the performed division ratio and the bus programmed division word
Let us introduce the concept of effective
length of the chain. It is the number of divider cells that are
effectively influencing the division cycle. Deliberately setting
the mod input of a certain 2/3 cell to the active level overrules
the influence of all cells to the right of that cell. The divider
chain behaves as if it has been shortened. The required effective
length corresponds to the index of the most significative (and
active) bit of the programmed division word. Only a few extra
to the programmed division
OR gates are required to adapt
word, as depicted on the right side of Fig. 2.
With the additional logic the division range becomes:
;
minimum division ratio:
.
maximum division ratio:
We see that the minimum and maximum division ratios can be
and respectively. Subseset independently, by choice of
quent changes in an optimized design can be realized with low
risk. A somewhat similar technique, applied to an asynchronous
programmable counter, is described in [9].
III. TRULY MODULAR PROGRAMMABLE DIVIDERS FAMILY

(1)
is the period of the input signal
, and
are the binary programming values of the cells 1
to respectively. The equation shows that all integer division
(if all
= 0) to
(if all
=
ratios ranging from
In (1),

A. Realized Circuits
The modular architecture of Fig. 2(b) was applied in the realization of a family of fully programmable frequency dividers.
1In principle, it is also possible to divide by 3 , but the gap between this
value and the continuous division range makes it useless in standard synthesizer
applications.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 7, JULY 2000

1041

Fig. 3. Family of truly modular programmable dividers, and corresponding division range of the different implementations.

Fig. 4.

Functional blocks and logical implementation of a 2/3 divider cell.

Three circuits were implemented: an 18-bit -band divider, a


17-bit UHF divider, and a 12-bit reference divider. The architecture and the division range of the dividers is presented in Fig. 3.
The -band divider was used as the basis for the UHF and for
the reference divider. The UHF divider consists of the same circuitry as the -band divider, except for the first 2/3 cell, which
was removed. The reference divider is simply the -band divider stripped off its six high frequency cells.
B. Logic Implementation of the 2/3 Divider Cells
A 2/3 divider cell comprises two functional blocks, as depicted in Fig. 4. The prescaler logic block divides, upon control
input signal
by the end-of-cycle logic, the frequency of the
either by 2 or by 3, and outputs the divided clock signal to the
next cell in the chain. The end-of-cycle logic determines the momentaneous division ratio of the cell, based on the state of the
and signals. The
signal becomes active once
in a division cycle. At that moment, the state of the input is
checked, and if = 1, the end-of-cycle logic forces the prescaler

Fig. 5. SCL implementation of an AND gate combined with a latch function.

to swallow one extra period of the input signal. In other words,


the cell divides by 3. If = 0, the cell stays in division by 2
mode. Regardless of the state of the input, the end-of-cycle
signal, and outputs it to the preceding
logic reclocks the
signal).
cell in the chain
C. Circuit Implementation of the 2/3 Divider Cells
The use of standard rail-to-rail CMOS logic techniques
makes the integration of digital functions with sensitive RF
signal processing blocks difficult, due to the generation of large
supply and substrate disturbances during logic transitions.
Source coupled logic (SCL), often referred to as MOS current
mode logic (MCML), has better EMC properties, because of
the constant supply current and differential voltage switching
operation [8]. Besides, SCL has lower power dissipation than
rail-to-rail logic, for (very) high input frequencies [15].
The logic functions of the 2/3 cells are implemented with the
SCL structure presented in Fig. 5. The logic tree combines an
AND gate with a latch function. Three AND_latch circuits are

1042

Fig. 6.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 7, JULY 2000

Transient simulation of optimized L-band divider.

used to implement Dlatch1, Dlatch3, Dlatch4 and the AND gates


of the 2/3 cells (see Fig. 4). Therefore, six logic functions are
achieved, at the expense of three tail currents only. Dlatch2 is
implemented as a normal D latch (without the differential pair
connected to the bbn inputs).
The nominal voltage swing is set to 500 mV in the high frequency (and high current) cells, and to 300 mV in the low curA). The voltage is generated by the tail
rent cells
and by the load resistances
current, set by the current source

TABLE I
SCALING OF CURRENTS IN THE 2/3 DIVIDER CELLS

D. Power Dissipation Optimization


The absence of long delay loops in the architecture of Fig. 2
enables fast and reliable optimization of power dissipation,
since simulation runs may be done for clusters of two cells
each time.
The critical point in the operation of the programmable
prescaler are the divide by 3 actions [11]. There is a maximum
delay between the mod and the clock signals in a given cell that
still allows properly timed division by 3. The maximum delay
where
is the period of the cells input
is
signal. The input frequency for each cell is scaled down by the
previous one. As a consequence, the maximum allowed delay
increases as one moves down the chain. As the delay in a cell
is inverse proportional to the cells current consumption (which
is a property of current mode logic circuits), the currents in the
cells may be scaled down as well.
The results of a transient simulation with the optimized high
frequency cells of the -band divider are presented in Fig. 6.
The influence of current consumption on the slope of the digital signals (and hence on the time delay) is clearly observed.
Table I presents the tail current and the resistance values of the
optimized divider cells. Layout optimization took about three
iteration cycles. Transient simulations, including extracted parasitics, showed that layout parasitics caused a decrease of about
30% in the highest operation frequency, when compared to the
original simulations.

circuitry. High input sensitivity enables the divider to be directly


coupled to a wide range of VCOs, without the need for external
(discrete) buffers. In addition, the input amplifier performs other
important functions, which are listed below.
It provides reverse isolation, to prevent the divider activity
from kicking-back and disturbing the VCO.
It provides single-ended to differential conversion of the
(very often) single-ended VCO signal.
It enables the VCO to be ac coupled to the divider function, and provides a signal to the first divider cell with the
proper dc level.
The required amplification of the UHF amplifier, set by sensitivity requirements ( 20 dBm), has been split into two differential stages. Each differential pair operates with 50- A nominal
current, and has load resistances of 14 k The -band input
amplifier is a scaled version of the UHF input amplifier. The tail
currents were doubled, and the drain resistances were halved.
The nominal low frequency small signal gain of the UHF amplifier is 26 dB; the gain of the -band amplifier is 23 dB. Negative
feedback from the output node to the input was implemented
with 50 k resistances. The feedback provides dc biasing to the
first stage, and allows AC coupling of the VCO signal to the first
differential pair.

E. Input Amplifiers
The input amplifier provides the required amplification of the
voltage-controlled oscillator (VCO) signal to digital levels,
determined by the sensitivity specifications and by the divider

IV. MEASUREMENTS
The control currents for the UHF and -band dividers can be
set externally, through input pins. The input amplifier current is

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 7, JULY 2000

Fig. 7.

1043

Sensitivity of the UHF divider, for different divider current settings. Division ratio = 511, nominal current is I

= 10 A.

Fig. 8. Sensitivity curves of the L-band divider, for a few divider and amplifier current settings. Division ratio = 1023.

Fig. 9. Maximum operation frequency of the UHF and L-band dividers, as


function of divider current consumption (excluding input amplifiers).

controlled by the input current


and the 2/3 divider cell curThe curves presented in this section
rents by input current
were obtained with the nominal supply voltage of 2.2 V, except
where otherwise noted.
A. Input Sensitivity and Maximum Operation Frequency
Fig. 7 presents sensitivity curves of UHF divider, for different
The nominal value of
current settings of the control current
is 10 A. Fig. 8 shows sensitivity curves of the -band divider, for different current settings in the divider and input amplifier. Such as the UHF divider, the -band divider is highly
sensitive over a large frequency range. The circuits were driven

Fig. 10. Minimum input level for correct division of the frequency dividers,
as function of the input amplifiers consumption.

by a differential input signal, carried over printed circuit board


The
(PCB) strip-lines with a characteristic impedance of 50
strip-lines were terminated with discrete resistances of 50 to
ground, which were set close to the input leads of the input amplifiers. The maximum operation frequencies of the UHF and
-band dividers, as function of the current consumption, are
plotted in Fig. 9. The effect of the input amplifiers current consumption on the input sensitivities is displayed Fig. 10. Setting
the UHF amplifier current to 230 A yields a sensitivity value
in excess of 10 mVrms.
The influence of the supply voltage on the maximum operdecreased
ating frequency was found to be small ( 5% for
from 2.2 V down to 1.8 V). It is interesting to mention that

1044

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 7, JULY 2000

Fig. 11. Phase noise of the reference and UHF divider, measured at 10 MHz. - - - Reference divider (I
Reference divider (I = 10 , F = 20 MHz, 20 dBm).

Fig. 12.

= 10 , F

= 20 MHz,

010 dBm).

Comparison of power efficiency (GHz/mW).

MCML circuits have been demonstrated to operate with supply


voltages as low as 1.2 V [15], without significant loss of speed.
B. Phase Noise
The phase noise of the UHF and reference dividers was
measured with a dedicated phase noise measurement system.
We used coherent demodulation techniques (phase-locked
loop configuration), and employed a low-noise 10 MHz signal
source during the evaluation of the circuits. To facilitate the
output
measurements, we implemented signal taps on the
of certain cells on the divider chain. The UHF divider was
output of its sixth cell; the
provided with a tap on the
reference divider had the output of the first cell tapped.
Fig. 11 presents the phase noise of the UHF divider, with
nominal settings for the supply current. The straight lines represent measured phase noise of the reference divider. We see a
dependency of the noise floor of the reference divider on the

level of the 20-MHz input signal. For the UHF divider, however, no dependency of the noise floor on the level of the input
signal at 640 MHz was observed. An increase of 25% in current
led to a change in noise floor from 122 dBc/Hz (nominal bias,
= 10 A) to 124 dBc/Hz (with
= 12.5 A). The noise
floor of the reference divider went from 127.5 dBc/Hz down
to 130 dBc/Hz, with increased bias. Fig. 11 shows that the
high frequency cells of the UHF divider (see Fig. 3) contribute
region.
significantly to the phase noise, specially in the
noise of about 15 dB is observed, compared
An increase of
to the noise of the single reference dividers cell.
C. Power Efficiency
Fig. 12 presents the power efficiency of the UHF and -band
dividers, in comparison to recently published data on low-power
dividers and tuning systems. Power efficiency is defined here as
the ratio of the dividers maximum operation frequency to its

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 7, JULY 2000

power dissipation, with dimensions of GHz/mW. The authors


have found that (most of) the dividers presented in the literature
do not include an input amplifier. Therefore, only the current
consumption of the core divider circuits is taken in the calculations. This leads to a fair comparison of the available data.
Refs. [2] and [8] describe prescalers implemented in bulk
CMOS technology. Reference [16] proposes a new synthesizer architecture, where the divider is powered-down after
lock has been achieved. Ref. [14] describes a fully programmable divider implemented in an ultrathin-film 0.25- m
CMOS/SIMOX process. The CMOS/SIMOX divider power
efficiency is about 30% higher than the -band dividers. Our
divider, however, is implemented in a standard 0.35- m bulk
technology. The power efficiency of a bipolar dual-modulus
prescaler [6] included as well, for technology benchmarking.
Its power efficiency is similar to the power efficiency of
the CMOS/SIMOX divider. The fully programmable dividers
described here demonstrate that architectural choices and optimization procedures can take standard 0.35 m CMOS to
performance levels comparable to more expensive technologies, such as bipolar and CMOS/SIMOX processes.
V. CONCLUSION
This paper presented a truly modular and power-scalable
architecture for low-power fully programmable frequency
dividers. The flexibility and reusability properties of the
architecture were demonstrated with the realization of a
family of programmable divider circuits, consisting of the
UHF divider (17 bits), the -band divider (18 bits), and the
reference divider (12 bits). The UHF and reference divider
were implemented by simple removal of divider cells from
the -band circuitry. The implementation of the 2/3 divider
cells was presented, and the power dissipation optimization
procedure was described. To cope with EMC considerations,
the dividers were implemented in CMOS SCL (current mode
logic). The circuits were processed in a standard 0.35- m
bulk CMOS technology, and operate with a nominal supply
voltage of 2.2 V. The power efficiency of the UHF divider
is 0.77 GHz/mW, and of the -band divider, 0.57 GHz/mW.
The measured input sensitivity, including the input amplifiers,
is 10 mVrms for the UHF divider, and 20 mVrms for the
-band divider.

1045

ACKNOWLEDGMENT
The authors wish to thank G. van Veenendaal, of PS-Systems Laboratory, Eindhoven, The Netherlands, for evaluation
work done on the programmable dividers. Many thanks go to
D. Kasperkovitz and J. de Haas for the support provided during
the project.
REFERENCES
[1] S. Wu and B. Razavi, A 900 MHz/1.8 GHz CMOS receiver for
dual-band applications, IEEE J. Solid-State Circuits, vol. 33, pp.
21782185, Dec. 1998.
[2] J. Craninckx and M. Steyaert, A fully integrated CMOS DCS-1800
frequency synthesizer, IEEE J. Solid-State Circuits, vol. 33, pp.
20542065, Dec. 1998.
[3] Q. Huang et al., GSM transceiver front-end circuits in 0.25-m
CMOS, IEEE J. Solid-State Circuits, vol. 34, pp. 292303, Mar. 1999.
[4] Y. Kado et al., An ultralow power CMOS/SIMOX programmable
counter LSI, IEEE J. Solid-State Circuits, vol. 32, pp. 15821587,
Oct. 1997.
[5] U. L. Rohde, RF and Microwave Digital Frequency Synthesizers. New
York, NY: Wiley, 1997.
[6] T. Seneff et al., A sub-1 mA 1.6 GHz silicon bipolar dual modulus
prescaler, IEEE J. Solid-State Circuits, vol. 29, pp. 12061211, Oct.
1994.
[7] J. Craninckx and M. Steyaert, A 1.75 GHz/3 V dual-modulus divide-by-128/129 prescalar in 0.7 m CMOS, IEEE J. Solid-State
Circuits, vol. 31, pp. 890897, July 1996.
[8] F. Piazza and Q. Huang, A low power CMOS dual modulus prescaler
for frequency synthesizers, IEICE Trans. Electron., vol. E80-C, pp.
314319, Feb. 1997.
[9] P. Larsson, High-speed architecture for a programmable frequency divider and a dual-modulus prescaler, IEEE J. Solid-State Circuits, vol.
31, pp. 744748, May 1996.
[10] J. Navarro Soares, Jr. and W. A. M. Van Noije, A 1.6 GHz dual-modulus prescaler using the extended true-single-phase-clock CMOS circuit
technique (E-TSPC), IEEE J. Solid-State Circuits, vol. 34, pp. 97102,
Jan. 1999.
[11] C. S. Vaucher and D. Kasperkovitz, A wide-band tuning system for
fully integrated satellite receivers, IEEE J. Solid-State Circuits, vol. 33,
pp. 987998, July 1998.
[12] N. H. Sheng et al., A high-speed multimodulus HBT prescaler for frequency synthesizer applications, IEEE J. Solid-State Circuits, vol. 26,
pp. 13621367, Oct. 1991.
[13] C. S. Vaucher, An adaptive PLL tuning system architecture combining
high spectral purity and fast settling time, IEEE J. Solid-State Circuits,
vol. 35, pp. 490502, Apr. 2000.
[14] C. S. Vaucher and Z. Wang, A low-power truly modular 1.8 GHz programmable divider in standard CMOS technology, in Proc. 25th Eur.
Solid-State Circuits Conf., Sept. 1999, pp. 406409.
[15] M. Mizuno et al., A GHz MOS adaptive pipeline technique using MOS
current-mode logic, IEEE J. Solid-State Circuits, pp. 784791, June
1996.
[16] A. R. Shahani et al., Low-power dividerless frequency synthesis using
aperture phase detection, IEEE J. Solid-State Circuits, vol. 33, pp.
22322239, Dec. 1998.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

761

A 10-Gb/s CMOS Clock and Data Recovery Circuit


with a Half-Rate Linear Phase Detector
Jafar Savoj, Student Member, IEEE, and Behzad Razavi, Member, IEEE

AbstractA 10-Gb/s phase-locked clock and data recovery


circuit incorporates an interpolating voltage-controlled oscillator
and a half-rate phase detector. The phase detector provides a
linear characteristic while retiming and demultiplexing the data
with no systematic phase offset. Fabricated in a 0.18- m CMOS
technology in an area of 1 1
0 9 mm2 , the circuit exhibits an
RMS jitter of 1 ps, a peak-to-peak jitter of 14.5 ps in the recovered
clock, and a bit-error rate of 1 28
10 6 , with random data
input of length 223
1. The power dissipation is 72 mW from a
2.5-V supply.
Index TermsClock recovery, half-rate CDR, optical communication, oscillators, phase detectors, PLLs.

I. INTRODUCTION

ITH THE exponential growth of the number of Internet


nodes, the volume of the data transported by its backbone continues to rise rapidly. The load of the global Internet
backbone is expected to be as high as 11 Tb/s by the year 2005,
indicating that the required bandwidth must increase by a factor
of 50 to 100 every seven years.
Among the available transmission media, optical fibers have
the highest bandwidth with the lowest loss, serving as an attractive solution for the Internet backbone. However, the electronic
interface proves to be the bottleneck in designing high-speed
optical systems. In order to push the speed of operation beyond the capabilities of the fabrication processes, a number of
transceivers can be fabricated on the same chip. The input and
output signals can be carried either over a bundle of fibers,
or on a single fiber that uses wave-division multiplexing. In
this scenario, both the power dissipation and the complexity
of each transceiver become critical. While stand-alone building
blocks of optical transceivers have been built in GaAs and silicon bipolar technologies [1], [2], full integration of many transceivers makes it desirable to use CMOS technology.
This paper describes the design of the first 10-Gb/s CMOS
clock and data recovery (CDR) circuit. A linear phase detector
(PD) is introduced that compares the phase of the incoming data
with that of a half-rate clock. The CDR circuit also incorporates a three-stage interpolating ring oscillator to achieve a wide
tuning range. Fabricated in a 0.18- m CMOS technology, the
circuit achieves an RMS jitter of 1 ps with a pseudorandom sewhile dissipating 72 mW from a 2.5-V supply.
quence of
Manuscript received August 21, 2000; revised December 1, 2000. This work
was supported in part by the Semiconductor Research Corporation and in part
by Cypress Semiconductor.
The authors are with the Electrical Engineering Department, University of
California, Los Angeles, CA 90095-1594 USA (e-mail: razavi@icsl.ucla.edu).
Publisher Item Identifier S 0018-9200(01)03020-7.

The next section of the paper presents the CDR architecture and design issues. Section III deals with the design of the
building blocks. Section IV describes the experimental results.

II. ARCHITECTURE
The choice of the CDR architecture is primarily determined
by the speed and supply voltage limitations of the technology
as well as the power dissipation and jitter requirements of the
system.
In a generic CDR circuit, shown in Fig. 1, the phase detector compares the phase of the incoming data to the phase of
the clock generated by the voltage-controlled oscillator (VCO),
producing an error that is proportional to the phase difference
between its two inputs. The error is then applied to a charge
pump and a low-pass filter so as to generate the oscillator control
voltage. The clock signal also drives a decision circuit, thereby
retiming the data and reducing its jitter.
If attempted in a 0.18- m CMOS technology, the architecture of Fig. 1 poses severe difficulties for 10-Gb/s operation.
Although exploiting aggressive device scaling, the CMOS
process used in this work provides marginal performance
for such speeds. For example, even simple digital latches or
three-stage ring oscillators fail to operate reliably at these
rates. These issues make it desirable to employ a half-rate
CDR architecture, where the VCO runs at a frequency equal to
half of the input data rate. The concept of the half-rate clock
has been used in [2][5]. However, [2] and [3] incorporate
a bangbang phase detector, possibly creating a large ripple
on the control line of the oscillator and hence high jitter. The
circuit reported in [4] inherently has a smaller output jitter as a
result of using a linear phase detector, but it fails to operate at
speeds above 6 Gb/s in 0.18- m CMOS technology. The circuit
of [5] benefits from a new linear phase detection scheme, but it
may not operate properly with certain data patterns.
Another critical issue in the architecture of Fig. 1 relates to the
inherently unequal propagation delays for the two inputs of the
phase detector: most phase detectors that operate properly with
random data (e.g., a D flip-flop) are asymmetric with respect to
the data and clock inputs, thereby introducing a systematic skew
between the two in phase-lock condition. Since it is difficult
to replicate this skew in the decision circuit, the generic CDR
architecture suffers from a limited phase margin, unless the raw
speed of the technology is much higher than the data rate.
The problem of the skew demands that phase detection and
data regeneration occur in the same circuit such that the clock
still samples the data at the midpoint of each bit even in the

00189200/01$10.00 2001 IEEE

762

Fig. 1.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

Generic CDR architecture.

Fig. 3. Effect of clock duty cycle distortion.

Another important aspect of CDR design is the leakage of


data transitions to the oscillator. In Fig. 2, such leakage arises
to
in the PD; 2)
from: 1) capacitive feedthrough from
and
to
through
capacitive feedthrough from
to the oscillator
the multiplexer; and 3) coupling of
through the substrate. To minimize these effects, the VCO is
followed by an isolation buffer and all of the building blocks
incorporate fully differential topologies.

Fig. 2.

III. BUILDING BLOCKS

Half-rate CDR architecture.

A. VCO
presence of a finite skew. For example, the Hogge PD [6] automatically sets the clock phase to the optimum point in the data
eye (but it fails to operate properly with a half-rate clock).
The above considerations lead to the CDR architecture shown
in Fig. 2. Here, a half-rate PD produces an error proportional to
the phase difference between the 10-Gb/s data stream and the
5-GHz output of the VCO. Furthermore, the PD automatically
retimes and demultiplexes the data, generating two 5-Gb/s
and
. Although the focus of this work
sequences
is point-to-point communications, a full-rate retimed output,
, is also generated to produce flexibility in testing and
exercise the ultimate speed of the technology. The VCO has
both fine and coarse control lines, the latter allowing inclusion
of a frequency-locked loop in future implementations.
In this work, a new approach to performing linear phase detection using a half-rate clock is described. Owing to its simplicity, this technique achieves both high speed and low power
dissipation while minimizing the ripple on the oscillator control
voltage.
It is interesting to note that half-rate architectures do suffer
from one drawback: the deviation of the clock duty cycle from
50% translates to bimodal jitter. As depicted in Fig. 3, since both
clock edges sample the data waveform, the clock duty cycle distortion pushes both edges away from the midpoint of the bits.
Typical duty cycle correction techniques used at lower speeds
are difficult to apply here as they suffer from significant dynamic mismatches themselves. Thus, special attention is paid
to symmetry in the layout to minimize bimodal jitter.

The design of the VCO directly impacts the jitter performance


and the reproducibility of the CDR circuit. While LC topologies achieve a potentially lower jitter, their limited tuning range
makes it difficult to obtain a target frequency without design
and fabrication iterations. Since the circuit reported here was
our first design in 0.18- m technology, a ring oscillator was
chosen so as to provide a tuning range wide enough to encompass process and temperature variations.
A three-stage differential ring oscillator [Fig. 4(a)] driving a
buffer operates no faster than 7 GHz in 0.18- m CMOS technology. The half-rate CDR architecture overcomes this limitation, requiring a frequency of only 5 GHz.
As shown in Fig. 4(b), each stage consists of a fast and a slow
path whose outputs are summed together. By steering the current between the fast and the slow paths, the amount of delay
achieved through each stage and hence the VCO frequency can
be adjusted. All three stages in the ring are loaded by identical
buffers to achieve equal rise and fall times and thus improve
the jitter performance. Fig. 4(c) shows the transistor implementation of each delay stage. The fast and slow paths are formed
as differential circuits sharing their output nodes. The tuning
is achieved by reducing the tail current of one and increasing
that of the other differentially. Since the low supply voltage

and
makes it difficult to stack differential pairs under
, the current variation is performed through mirror arrangements driven by pMOS differential pairs. Fig. 5 depicts
the small-signal gain and phase response of each delay stage.
While providing a phase shift of 60 , each stage achieves a gain

SAVOJ AND RAZAVI: CLOCK AND DATA RECOVERY CIRCUIT

763

Fig. 5. Small-signal gain and phase response of each delay stage.

Fig. 4. (a) Three-stage ring oscillator. (b) Implementation of each stage. (c)
Transistor-level schematic.

of 5.5 dB at 5 GHz, yielding robust oscillation at the target frequency.


A critical drawback of supply scaling in deep-submicron
technologies is the inevitable increase in the VCO gain for
a given tuning range. To alleviate this difficulty, the control
of the VCO is split between a coarse input and a fine input.
The partitioning of the control allows more than one order of
magnitude reduction in the VCO sensitivity. The idea is that the
fine control is established by the phase detector and the coarse
control is a provision for adding a frequency detection loop.
The coarse control is provided externally in this prototype.
The fine control exhibits a gain of 150 MHz/V and the coarse
control, 2.5 GHz/V (Fig. 6). The tuning range is 2.7 GHz
( 54%).
B. Phase Detector
Phase detectors generally appear in two different forms. Nonlinear PDs coarsely quantize the phase error, producing only a
positive or negative value at their output. Linear PDs, on the
other hand, generate a linearly proportional output that drops to
zero when the loop is locked.
Compared to nonlinear PDs, linear PDs result in less charge
pump activity, smaller ripple on the oscillator control line, and
hence lower jitter. In a linear PD, such as that described in [6],

Fig. 6. VCO gain partitioning. (a) Fine control. (b) Coarse control.

the phase error is obtained by taking the difference between the


width of two pulses, both of which are generated whenever a
data transition occurs. The width of one of the pulses is linearly
proportional to the phase difference between the clock and the
data, whereas the width of the other is constant. By using a differential error signal, pattern dependency of phase error is can-

764

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

Fig. 8.

Fig. 7.

(a) Phase detector. (b) Operation of the circuit.

celled because both pulses are present only when a data transition occurs.
For linear phase comparison between data and a half-rate
clock, each transition of the data must produce an error pulse
whose width is equal to the phase difference. Furthermore, to
avoid a dead zone in the characteristics, a reference pulse
must be generated whose area is subtracted from that of the error
pulse, thus creating a net value that falls to zero in lock.
The above observations lead to the PD topology shown in
Fig. 7(a). The circuit consists of four latches and two XOR gates.
The data is applied to the inputs of two sets of cascaded latches,
each cascade constituting a flipflop that retimes the data. Since
the flipflops are driven by a half-rate clock, the two output seand
are the demultiplexed waveforms of
quences
the original input sequence if the clock samples the data in the
middle of the bit period.
The operation of the PD can be described using the waveforms depicted in Fig. 7(b). The basic unit employed in the circuit is a latch whose output carries information about the zero
crossings of both the data and the clock. The output of each
latch tracks its input for half a clock period and holds the value
for the other half, yielding the waveforms shown in Fig. 7(b) for
and
. The two waveforms differ because their corpoints
responding latches operate on opposite clock edges. Produced
, the Error signal is equal to ZERO for the portion of
as
and
overlap, and equal to the
time that identical bits of
XOR of two consecutive bits for the rest. In other words, Error
is equal to ONE only if a data transition has occurred.
It may seem that the Error signal uniquely represents the
phase difference, but that would be true only if the data were pe-

Symmetric XOR gate.

riodic. The random nature of the data and the periodic behavior
of the clock in fact make the average value of Error pattern dependent. For this reason, a reference signal must also be generated whose average conveys this dependence. The two waveand
contain the samples of the data at the rising
forms
contains pulses as
and falling edges of the clock. Thus,
wide as half the clock period for every data transition, serving
as the reference signal.
While the two XOR operations provide both the Error and the
Reference pulses for every data transition, the pulses in Error
are only half as wide as those in Reference. This means that
the amplitude of Error must be scaled up by a factor of two
with respect to Reference so that the difference between their
averages drops to zero when clock transitions are in the middle
of the data eye. The phase error with respect to this point is then
linearly proportional to the difference between the two averages.
In order to generate a full-rate output, the demultiplexed sequences are combined by a multiplexer that operates on the
half-rate clock as well. This output can also be used for testing
purposes in order to obtain the overall bit-error rate (BER) of
the receiver.
It is important to note that the XOR gates in Fig. 7 must be
symmetric with respect to their two differential inputs. Otherwise, differences in propagation delays result in systematic
phase offsets. Each of the XOR gates is implemented as shown
in Fig. 8 [7]. The circuit avoids stacking stages while providing
perfect symmetry between the two inputs. The output is singleended but the single-ended Error and Reference signals produced by the two XOR gates in the phase detector are sensed with
respect to each other, thus acting as a differential drive for the
charge pump. The operation of the XOR circuit is as follows. If
the two logical inputs are not equal, then one of the input transistors on the left and one of the input transistors on the right
off. If the two inputs are identical,
turn on, thus turning
. Since the average
one of the tail currents flows through
current produced by the Error XOR gate is half of that generated
is scaled differently,
by the Reference XOR gate, transistor
making the average output voltages equal for zero phase differreduces the
ence. Channel length modulation of transistor
precision of current scaling between the two XOR gates. This effect can be avoided by increasing the length of the device.
The gain of the PD is determined by the value of the resistor
and the tail current sources ( ). The voltage
is generated on chip in order to track the variations over temperature and

SAVOJ AND RAZAVI: CLOCK AND DATA RECOVERY CIRCUIT

765

Fig. 10.

Fig. 9.

Determination of PD gain.

process. This voltage equals the output common-mode level of


the latches preceding the XOR gate. It is generated using a differential pair that is a replica of the preamplifier section of the
raises the common-mode level of the
latch. Current source
differential signal formed by the Error and Reference signals,
compatible with the input of the charge pump.
making
It is instructive to plot the input/output characteristic of the
PD to ensure linearity and absence of dead zone. This is accomplished by obtaining the average values of Error and Reference
while the circuit operates at maximum speed. Fig. 9 shows the
simulated behavior as the phase difference varies from zero to
one bit period. The Reference average exhibits a notch where
the clock samples the metastable points of the data waveform.
The Error and Reference signals cross at a phase difference approximately 55 ps from the metastable point, indicating that the
systematic offset between the data and the clock is very small.
The linear characteristic of the phase detector results in minimal
charge pump activity and small ripple on the control line in the
locked condition.
The choice of the logic family used for the XOR gates and
the latches is determined by the speed and switching noise considerations. While rail-to-rail CMOS logic achieves relatively
high speeds, it requires amplifying the data swings generated by
the stage preceding the CDR circuit (typically a limiting amplifier). Furthermore, CMOS logic produces enormous switching
noise in the substrate and on the supplies, disturbing the oscillator considerably. For these reasons, the building blocks employ current-steering logic. The phase detector incorporates an
input buffer with on-chip resistive matching.
C. Charge Pump and Loop Filter
Fig. 10 shows the implementation of the differential charge
pump. The common-mode feedback (CMFB) circuit senses the
and
, providing correction through
output CM level by
and
. Both the matching and channel-length modulation

in Fig. 10 impact the residual phase error in locked


of

Charge pump and loop filter.

condition. Thus, their lengths and widths are relatively large to


minimize these effects.
The design of the loop filter is based on a linear time-invariant
model of the loop and is performed in continuous time domain.
The loop is in general a nonlinear time-variant system and can
only be assumed linear if the phase error is small. The timeinvariant analysis is valid if the averaging behavior of the loop
rather than its single-cycle performance is of interest, i.e., the
loop can be analyzed by continuous-time approximation if the
loop bandwidth is small. Under this condition, the state of the
CDR changes by only a small amount on each cycle of the input
signal.
A low-pass jitter transfer function with a given bandwidth
and a maximum gain in the passband is specified for a SONET
system. The closed-loop transfer function of the CDR has a zero
at a frequency lower than the first closed-loop pole. This results
in jitter peaking that can never be eliminated. But the peaking
can be reduced to negligible levels by overdamping the loop.
As derived in [8], the closed-loop unity-gain bandwidth is
approximated as
(1)
and
are the gains of the VCO and PD, rewhere
denotes the conversion gain of the charge
spectively, and
.
pump. Equation (1) can be used to determine the value of
The amount of the jitter peaking in the closed-loop transfer function can be approximated as
(2)
Equation (2) yields the required value of . In order to obtain
greater suppression of high-frequency jitter, a second capacitor
and .
is added in parallel with the series combination of
These components are added externally to achieve flexibility in
defining the closed-loop characteristics of the circuit.
Another advantage of linear PDs over their bangbang counterparts is that their jitter transfer characteristics is independent
of the jitter amplitude. It should also be mentioned that if the
CDR is followed by a demultiplexer, the tight specifications for
jitter peaking need not to be satisfied because such specifications are defined for cascaded regenerators handling full-rate
data.
Fig. 11 depicts the simulated behavior of the CDR circuit at
the transistor level. The voltage across the filter is initialized to

766

Fig. 11.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 5, MAY 2001

Lock acquisition.

Fig. 13. (a) Spectrum of the recovered clock. (b) Recovered clock in the time
domain.
Fig. 12.

Chip photograph.

a value relatively close to its value in phase lock. The loop goes
through a transition of 350 ns before it locks. The ripple on the
control line in phase lock is approximately 1 mV.
IV. EXPERIMENTAL RESULTS
The CDR circuit has been fabricated in a 0.18- m CMOS
process. Fig. 12 shows a photograph of the chip, which occumm . Electrostatic discharge (ESD)
pies an area of
protection diodes are included for all pads except the high-speed
lines. Nonetheless, since all of these lines have a 50- termina, they exhibit some tolerance to ESD. The circuit is
tion to
tested in a chip-on-board assembly. In this prototype, the width
of the poly resistors was not sufficient to guarantee the nominal
sheet resistance. As a result, the fabricated resistor values deviated from their nominal value by 30%, and the VCO center
frequency was proportionally lower than the simulated value at
the nominal supply voltage (1.8 V). The supply was increased to
2.5 V, to achieve reliable operation at 10 Gb/s. While such a high
supply voltage creates hot-carrier effects in rail-to-rail CMOS
circuits, it is less detrimental in this design because no transistor
in the circuit experiences a gatesource or drainsource voltage
of more than 1 V. This issue is nonetheless resolved in a second
design [9] by proper choice of resistor dimensions. The circuit
is brought close to lock with the aid of the VCO coarse control
before phase locking takes over.
Fig. 13(a) shows the spectrum of the clock in response to a
. The effect of the noise
10-Gb/s data sequence of length
shaping of the loop can be observed in this spectrum. The phase

Fig. 14. Measured jitter transfer characteristic.

noise at 1-MHz offset is approximately equal to 106 dBc/Hz.


Fig. 13(b) depicts the recovered clock in the time domain. The
time-domain measurements using an oscilloscope overestimate
the jitter, requiring specialized equipment, e.g., the Anritsu
MP1777 jitter analyzer. The jitter performance of the CDR
circuit is characterized by this analyzer. A random sequence
produces 14.5 ps of peak-to-peak and 1 ps
of length
of RMS jitter on the clock signal. These values are reduced to
4.4 and 0.6 ps, respectively, for a random sequence of length
.
The measured jitter transfer characteristics of the CDR is
shown in Fig. 14. The jitter peaking is 1.48 dB and the 3-dB
bandwidth is 15 MHz. The loop bandwidth can be reduced to

SAVOJ AND RAZAVI: CLOCK AND DATA RECOVERY CIRCUIT

767

supply. The VCO, the PD, and the clock and data buffers consume 20.7, 33.2, and 18.1 mW, respectively.
V. CONCLUSION
CMOS technology holds great promise for optical communication circuits. The raw speed resulting from aggressive scaling
along with high levels of integration provide a high performance
at low cost. A 10-Gb/s clock and data recovery circuit designed
in 0.18- m CMOS technology performs phase locking, data regeneration, and demultiplexing with 1 ps of RMS jitter.
REFERENCES

Fig. 15.

(a) Recovered demultiplexed data. (b) Recovered full-rate data.

the SONET specifications, but the jitter analyzer must then generate large jitter and drives the loop out of lock. The loop bandwidth can be reduced to the SONET specifications if a means of
frequency detection is added to the loop [9]. The circuit is then
much less susceptible to loss of lock due to the jitter generated
by the analyzer.
Fig. 15 depicts the retimed data. The demultiplexed data
outputs are shown in Fig. 15(a). The difference between the
waveforms results from systematic differences between the
bond wires and traces on the test board. Fig. 15(b) depicts the
full-rate output. Using this output, the BER of the system can
, the BER is
be measured. With a random sequence of
. However, a random sequence of
smaller that
results in a BER of
. This BER can be reduced if
the bandwidth of the output buffer driving the 10-Gb/s data is
increased. Furthermore, if the value of the linear resistors is
adjusted to their nominal value, the increased operating speed
of the back-end multiplexer results in an improved BER [9].
The CDR circuit exhibits a capture range of 6 MHz and a
tracking range of 177 MHz. The total power consumed by the
circuit excluding the output buffers is 72 mW from a 2.5-V

[1] Y. M. Greshishchev and P. Schvan, SiGe clock and data recovery IC


with linear type PLL for 10-Gb/s SONET application, in Proc. 1999
Bipolar/BiCMOS Circuits and Technology Meeting, Sept. 1999, pp.
169172.
[2] M. Wurzer et al., 40-Gb/s integrated clock and data recovery circuit in
a silicon bipolar technology, in Proc. 1998 Bipolar/BiCMOS Circuits
and Technology Meeting, Sept. 1998, pp. 136139.
[3] M. Rau et al., Clock/data recovery PLL using half-frequency clock,
IEEE J. Solid-State Circuits, vol. 32, pp. 11561159, July 1997.
[4] K. Nakamura et al., A 6 Gb/s CMOS phase detecting DEMUX module
using half-frequency clock, in Dig. Symp. VLSI Circuits, June 1998, pp.
196197.
[5] E. Mullner, A 20-Gb/s parallel phase detector and demultiplexer circuit
in a production silicon bipolar technology with f = 25 GHz, in Proc.
1996 Bipolar/BiCMOS Circuits and Technology Meeting, Sept. 1996,
pp. 4345.
[6] C. Hogge, A self-correcting clock recovery circuit, J. Lightwave
Technol., vol. LT-3, pp. 13121314, Dec. 1985.
[7] B. Razavi, Y. Ota, and R. G. Swarz, Design techniques for low-voltage
high-speed digital bipolar circuits, IEEE J. Solid-State Circuits, vol. 29,
pp. 332339, Mar. 1994.
[8] L. M. De Vito, A versatile clock recovery architecture and monolithic
implementation, in Monolithic Phase-Locked Loops and Clock
Recovery Circuits, Theory and Design, B. Razavi, Ed. New York:
IEEE Press, 1996.
[9] J. Savoj and B. Razavi, A 10-Gb/s CMOS clock and data recovery circuit with frequency detection, in Int. Solid-State Circuits Conf. Dig.
Tech. Papers, Feb. 2001, pp. 7879.

Jafar Savoj (S98) was born in Tehran, Iran, in 1974.


He received the B.S.E.E. degree from Sharif University of Technology, Tehran, in 1996 and the M.S.E.E.
degree from the University of California, Los Angeles (UCLA), in 1998. He is currently working toward the Ph.D. degree at UCLA.
He spent the summer of 1998 with Integrated
Sensor Solutions, San Jose, CA, working on the
design of high-precision interfaces for sensor applications. During the summer of 1999, he was with
NewPort Communications, Irvine, CA, developing
CMOS clock and data recovery circuits for the SONET OC-192 standard.
Mr. Savoj received the IEEE Solid-State Circuits Society Predoctoral Fellowship for 20002001, and the Beatrice Winner Award for Editorial Excellence at
the 2001 ISSCC. He is also a recipient of the Design Contest Award of the 2001
Design Automation Conference.

ISSCC 2001 / SESSION 5 / GIGABIT OPTICAL COMMUNICATIONS I / 5.3

5.3 A 10Gb/s CMOS Clock and Data Recovery Circuit


with Frequency Detection
Jafar Savoj, Behzad Razavi
Electrical Engineering Department, University of California, Los Angeles, CA
Clock and data recovery (CDR) circuits operating in the 10Gb/s
range have become attractive for the optical fiber backbone of
the Internet. While CDR circuits operating at 10Gb/s have been
designed in bipolar technologies, cost and integration issues
make it desirable to implement these circuits in standard CMOS
processes. This 10Gb/s CDR circuit is realized in 0.18m CMOS
technology. Architecture and circuit techniques circumvent the
speed limitations of the devices. In contrast to previous work [1],
this design incorporates an LC oscillator to reduce the jitter as
well as a phase/frequency detector to achieve a wide capture
range.
Shown in Figure 5.3.1, the CDR consists of a phase/frequency
detector (PFD), a voltage-controlled oscillator (VCO), a charge
pump, and a low-pass filter (LPF). The PFD compares the phase
and frequency of the input data to that of a half-rate clock, providing two binary error signals for phase and frequency. The
PFD is designed so that, in addition to providing information
about the phase error, it retimes the data as well. Consequently,
the CDR exhibits no systematic offset, i.e., inherent skews
between clock and data edges due to their unidentical paths
through the loop do not degrade the quality of detection. The
VCO provides four differential half-quadrature phases over the
full tuning range. All building blocks are fully differential.
Since the half-rate frequency detector requires clock phases that
are integer multiples of 45, the 5GHz VCO is designed as a ring
structure consisting of four LC-tuned stages [Figure 5.3.2a]. If the
dc feedback around the ring is positive, all stages operate in-phase
at the resonance frequency defined by the LC tanks. On the other
hand, if the dc feedback is negative, the frequency shifts by a small
amount so as to allow each stage to contribute 45 of phase.
The oscillator topology has two advantages over resistive-load ring
oscillators. First, owing to the phase slope (Q) provided by the resonant loads, it exhibits less phase noise. Second, its frequency of
oscillation is only a weak function of the number of stages, generating multiple phases with no speed penalty. By comparison, a
four-stage resistive-load ring operates at a lower frequency.
Figure 5.3.2b shows the implementation of each stage. The loads
are formed using on-chip spiral inductors and MOS varactors.
Resistor R1 provides a shift in the output common-mode level,
allowing both positive and negative voltages across the varactors
and thus maximizing the tuning range. Modeling each tank by a
parallel network, the required 45 phase shift slightly detunes
the circuit. The oscillation frequency is given by 0=(LC)-0.5(11/Q0)0.5, where Q0 denotes the Q of each stage at resonance.
The phase detector (PD) is derived from the data transition
tracking loop described in Reference 2. In this PD, in-phase and
quadrature phases of a half-rate clock signal sample the data in
two double-edge-triggered flipflops (DETFFs). Figure 5.3.3
shows the implementation of the PD. Two latches operating on
opposite clock phases and a multiplexer form a DETFF that samples the data using both the positive and negative transitions of
a half-rate clock. The two signals V1 and V2 are therefore the inphase and quadrature samples of data, respectively, and one is
used to route the other or its complement.

The phase detector operates at high speeds because it uses a


half-rate clock. Since in the locked condition, the rising and
falling edges of the quadrature clock coincide with data transitions, the in-phase clock transitions sample the data at its optimum point with no systematic offset, generating a full-rate output stream. Also, since the phase-error signal is reevaluated only
at data transitions, it incurs little ripple. Note that the output is
independent of the data transition density, resulting in reduction
of pattern-dependent jitter.
With the small CDR loop bandwidths specified by optical standards, circuits employing only phase detection suffer from an
extremely narrow capture range, e.g., about 1% of the center frequency. For this reason, a means of frequency detection is necessary to guarantee lock to random data. As with other phase
detectors, the half-rate PD of Figure 5.3.3 generates a beat frequency equal to the difference between the data rate and twice
the VCO frequency. However, it does not provide knowledge of
the polarity of this difference. Figure 5.3.4 depicts the half-rate
phase and frequency detector introduced in this work. A second
PD is added and driven by phases that are 45 away from those
in the first PD. The circuit operates as follows. (1) If the clock is
slow, VPD1 leads VPD2; therefore, if VPD2 is sampled by the rising
and falling edges of VPD1, the results are negative and positive,
respectively. (2) If the clock is fast, VPD1 lags VPD2. Therefore, if
VPD2 is sampled by the rising and falling edges of VPD1, the
results are the reverse of the previous case.
The output buffer delivering the 10Gb/s retimed data with high
current levels requires a bandwidth of more than 7 GHz. As
shown in Figure 5.3.5, the buffer stage employs inductive peaking [3]. The value of the spiral inductors is chosen so as to avoid
ripple in the passband. Since the quality factor of the inductors
is not critical here, the spiral structures have a linewidth of only
4m to achieve a high self-resonance frequency.
The CDR circuit is fabricated in a 0.18m CMOS technology. The
circuit is tested in a chip-on-board assembly while operating
with a 1.8V supply. The phase noise of the clock in response to a
9.95328Gb/s data sequence of length 223-1 at 1MHz offset is
approximately equal to -107dBc/Hz. Figure 5.3.6a depicts the
recovered clock and data. A pseudo-random sequence of length
223-1 produces 9.9ps of peak-to-peak and 0.8ps rms jitter on the
clock signal. The jitter characteristics are measured by the
Anritsu MP1777 jitter analyzer. The measured jitter transfer
characteristic of the CDR is shown in Figure 5.3.6b. The jitter
peaking is 0.04dB and the 3dB bandwidth is 5.2MHz. Despite
the small loop bandwidth, the frequency detector provides a capture range of 1.43GHz, obviating the need for external references. The total power consumed by the circuit excluding the
output buffers is 91mW from a 1.8V supply. Figure 5.3.7 shows a
micrograph of the chip, which occupies 1.75x1.55mm2.
Acknowledgments:
The authors thank NewPort Communications for fabrication and test support. This work was supported by SRC and Cypress Semiconductor.
References:
[1] J. Savoj and B. Razavi, A 10-Gb/s CMOS Clock and Data Recovery
Circuit, Dig. of Symposium on VLSI Circuits, pp. 136-139, June 2000.
[2] A. W. Buchwald, Design of Integrated Fiber-Optic Receivers Using
Heterojunction Bipolar Transistors, Ph.D. Thesis, University of California,
Los Angeles, Jan. 1993.
[3] J. Savoj and B. Razavi, A CMOS Interface Circuit for Detection of 1.2Gb/s RZ Data, ISSCC Digest of Technical Papers, pp. 278-279, Feb. 1999.

2001 IEEE International Solid-State Circuits Conference

0-7803-6608-5

2001 IEEE

ISSCC 2001 / February 5, 2001 / Salon 9 / 2:30 PM

Figure 5.3.1: CDR architecture.

Figure 5.3.2: (a) Four-stage LC-tuned ring oscillator, (b) implementation of


one stage.

Figure 5.3.3: Phase detector.

Figure 5.3.4: Phase and frequency detector.

Figure 5.3.5: Output buffer.

Figure 5.3.7: Die micrograph.

2001 IEEE International Solid-State Circuits Conference

0-7803-6608-5

2001 IEEE

Figure 5.3.6: (a) Recovered data and clock, (b) measured jitter transfer
characteristics.

2001 IEEE International Solid-State Circuits Conference

0-7803-6608-5

2001 IEEE

1320

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 9, SEPTEMBER 1999

A 40-Gb/s Integrated Clock and Data Recovery


Silicon Bipolar Technology
Circuit in a 50-GHz
Martin Wurzer, Josef Bock, Herbert Knapp, Wolfgang Zirwas, Fritz Schumann, and Alfred Felder

AbstractClock and data recovery (CDR) circuits are key electronic components in future optical broadband communication
systems. In this paper, we present a 40-Gb/s integrated CDR
circuit applying a phase-locked loop technique. The IC has been
fabricated in a 50-GHz f T self-aligned double-polysilicon bipolar
technology using only production-like process steps. The achieved
data rate is a record value for silicon and comparable with the
best results for this type of circuit realized in SiGe and IIIV
technologies.
Index Terms Bipolar digital integrated circuits, clocks, data
communication, high-speed integrated circuits, phase-locked
loops, synchronization.
Fig. 1. Block diagram of the fiber-optic link.

I. INTRODUCTION

HE demands for new services and increased flexibility


have accelerated the development of telecommunication
transport networks, which has resulted in the synchronous optical networks (SONET)/synchronous digital hierarchy (SDH)
standards. Key elements of such high-capacity networks are
fiber-optic communication links. Time-division multiplexing
(TDM) systems operating at 10 Gb/s are now under development using advanced silicon bipolar production technologies to fabricate all high-speed ICs. Next generations with
SONET/SDH are expected to operate at data rates of 40 Gb/s
[1]. To enable such large-capacity optical transmission systems
to be put into practical use, very high-speed monolithic ICs
are required as key components.
It has been shown that basic digital functions like MUX
and DMUX for 40-Gb/s optical-fiber TDM systems can be
realized in silicon bipolar technology [2]. But clock and data
recovery circuits in a silicon technology have so far only been
demonstrated for 20 Gb/s [3]. With more sophisticated SiGe
or IIIV technologies, 40-Gb/s operation has been achieved
[4][7]. Some of these solutions are hybrid.
All these realizations are based on either high-Q filters
or phase-locked loops (PLLs). The advantage of the first
concept is the easy implementation. The disadvantages are
that temperature and frequency variation of filter group delay
makes sampling time difficult to control, the high-Q filter is

difficult to integrate, and narrow pulses require a high . The


major advantages of the second approach are that the phase
between the extracted clock and the received data is locked and
that it can be implemented as a monolithic integrated circuit.
The goal of our work was to implement a cost-effective
and reliable clock and data recovery circuit for 40 Gb/s in
a production-near silicon bipolar technology. Therefore, an
approach based on a PLL technique has been selected and
will be described in this paper in more detail.

Manuscript received January 11, 1999; revised March 23, 1999.


M. Wurzer is with Corporate Technology, Microelectronics, Siemens
AG, Munich 81730 Germany and the Institut fur Nachrichtentechnik
und Hochfrequenztechnik, Technische Universitat Wien, Austria (e-mail:
Martin.Wurzer@mchp.siemens.de).
J. Bock, H. Knapp, F. Schumann, and A. Felder are with Corporate
Technology, Microelectronics, Siemens AG, Munich 81730 Germany
W. Zirwas is with Information and Communication Networks, Siemens
AG, Munich 81379 Germany.
Publisher Item Identifier S 0018-9200(99)06493-8.

III. ARCHITECTURE OF THE CLOCK


AND DATA RECOVERY CIRCUIT

II. FIBER-OPTIC LINK


The described circuit has been developed for use in 40Gb/s TDM fiber-optic links. A block diagram of such a link is
shown in Fig. 1. The time-division multiplexer collects several
data channels into a single high-speed data stream. An external
modulator converts the data from electrical to optical signals
by modulating the light of a semiconductor laser diode (E/O
block). The O/E conversion on the receiving side is performed
by a photodiode followed by a transimpedance amplifier. This
bitstream is fed into the clock and data recovery unit. Its task is
to synchronize the local oscillator to the phase of the incoming
data and to retime the data. In contrast to 10-Gb/s systems, the
decision function is now performed by a demultiplexer. This
requires a DMUX with excellent retiming capability combined
with a high input sensitivity [8].

Fig. 2(a) shows the used concept of the CDR for the fiberoptic link (Fig. 1) in more detail. The main processing blocks
are the demultiplexer consisting of two masterslave D-flipflops (DFF1, DFF2) in parallel and an additional masterslave
D-flip-flop (DFF3), which forms the phase detector together

00189200/99$10.00 1999 IEEE

WURZER et al.: 40-Gb/s INTEGRATED CLOCK AND DATA RECOVERY CIRCUIT

(a)

1321

(b)

Fig. 2. CDR circuit: (a) block diagram and (b) timing diagram.

Fig. 3. Circuit diagram of the masterslave D-flip-flop.

with DFF2 and the XOR gate. All these functions are integrated
in a single chip. The fixed 90 phase shifter, voltage-controlled
oscillator (VCO), and loop filter have been realized externally
with commercially available components.
Fig. 2(b) shows the timing diagram. The incoming 40-Gb/s
data signal is applied to flip-flops DFF1, DFF2, and DFF3.
DFF1 is toggled by CLK, DFF2 by CLK, and DFF3 by the
90 delayed clock signal. This results in the sampling of the
input in the vicinity of midbit and each following potential
transition. If a transition is present, the phase relationship of
the data and the clock can be deduced to be early or late. If
the midbit clock CLK is too early, DFF3 samples the same
bit; if it is too late, DFF3 samples the following bit. Under
locked conditions, DFF3 samples at the edge of the data eye.
The XOR compares the output samples of DFF2 and DFF3.
The result is fed to the loop filter. The output signal of the
loop filter serves as the control signal of the VCO.
The advantages of this concept are that all components
operate at half the data rate and that the input is demultiplexed
at the same time. The disadvantage is that the input signal has
to drive three DFFs in parallel.

IV. CIRCUIT

AND

DESIGN PRINCIPLES

The circuit is designed for the single supply voltage of 5


V. The circuit principles used are seen in the circuit blocks
of a masterslave D-flip-flop (MS-DFF), shown in Fig. 3.
For details, see [9]. The well-proven E CL (emitteremitter
coupled logic) is used with emitter followers at the inputs
and current switches at the outputs. The series gating between
clock and data signals enables differential operation with low
mV - ) resulting in an increase in
voltage swings (
speed and a reduction of power consumption. Furthermore,
differential operation reduces time jitter and crosstalk and
offers good common-mode suppression compared to singlemode operation [10]. Cascaded emitter followers are used
for level shifting and impedance transformation between the
various current switches. Multiple emitter followers improve
the decoupling capability and increase the collector-base voltage of the current-switch transistors allowing for smaller
transistors, resulting in lower collector-base capacitances [10].
On-chip matching resistors (50 ) at all data inputs are used
in order to reduce jitter introduced by reflections [2], [11].

1322

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 9, SEPTEMBER 1999

Fig. 5. Schematic cross section of a transistor.

Fig. 4. Chip micrograph (chip size: 0.9

2 0.9 mm2 ).

TABLE I
DEVICE PARAMETERS (JUNCTION CAPACITANCES ARE ZERO-BIAS VALUES)

Meeting the required speed rather than low power consumption was the main aim of this design. All transistor
sizes are individually optimized with respect to the function
of the transistor in the circuit. Extra attention was given to
the on-chip wiring. The lines on the chips were classified as
critical or uncritical. For example, the lines driven by
emitter followers are critical because they support ringing,
while the lines driven by current switches are uncritical [10].
The critical lines are then shortened at the cost of the uncritical
ones. The longer signal lines are realized as microstrip lines
(with the lowest metallization layer as a ground plane), mainly
to improve simulation accuracy. This leads to the layout shown
in Fig. 4.
V. CHIP TECHNOLOGY
The circuit has been fabricated in a self-aligned doublepolysilicon bipolar technology [12]. The fabrication starts
with buried layer formation. A 1- m epitaxial layer is grown
and low
to compromise between high-transit-frequency
. The isolation consists
external collector-base capacitance
of a channel stopper implantation combined with LOCOS field
oxide. The active base is formed by 5-keV BF implantation.
This low implantation energy in combination with optimized
annealing conditions allows for very steep base profiles. This
results in a narrow base width of about 50 nm and thus enables
a high transit frequency. A selectively implanted collector
improves the current-carrying capability of the transistors to
of about 2
the high optimum collector current density
mA/ m . To minimize narrow emitter effects, an in situ doped
emitter-polysilicon layer is used [13]. This prevents a reduction
of cutoff frequency even for 0.5- m design rules. A three-level
metallization completes the process.
Fig. 5 shows a schematic cross section of a transistor.
Except for epitaxy, only process steps of a 0.5- m CMOS
production environment are necessary. The maximum transit
GHz at
V
frequency of the transistors is
mA m . Table I summarizes typical parameters
and
for transistors with effective emitter size of
m . The minimum gate delay for an ECL differentially
operating ring oscillator with output voltage swing of 800
mV - is measured to be 15.4 ps. This value is achieved for
a current per gate of 1.6 mA (see Fig. 6).

Fig. 6. Measured ECL gate delay versus current per gate with an 800-mV
differential voltage swing.

VI. MOUNTING

AND

MEASURMENT SETUP

For measurements, the clock and data recovery IC has been


using
mounted on a 15-mil ceramic substrate
conventional bonding techniques. Special care has been taken
to minimize the length of the bond wires by positioning the
surface of the chip on the same level as the signal, ground,
and supply lines of the mounting substrate. Due to differential
operation, a pair of lines for each clock and data signal is
needed to connect the chip with the environment. Therefore,
a corresponding number of connectors are necessary. The
minimum distance between them determines the minimum
size of the test fixture. To avoid additional delay lines, the
,
and
,
,
,
length of the lines for the signals

WURZER et al.: 40-Gb/s INTEGRATED CLOCK AND DATA RECOVERY CIRCUIT

1323

Fig. 9. Eye diagram of the 20-Gb/s data signal at the output


demultiplexer.

Fig. 7. Photograph of the package (package size: 70

2 70 mm2 ).

D2 of the 1 : 2

(a)

(b)

Fig. 10. (a) Transmitter clock and (b) recovered clock.

Fig. 8. Eye diagram of the 40-Gb/s input data signal

Din .

respectively, have to be the same. To achieve a compact layout


of these lines, coupled microstrip lines are used. At the input
, grounded coplanar lines are applied, which show lower
dispersion than microstrip lines. The realized test fixture is
70 mm .
shown in Fig. 7. It measures 70
Random pulse pattern generators for driving the circuit at
the required data rate of 40 Gb/s are not commercially available. A pulse generator has been built from basic high-speed
ICs [2], [14]. Four 10-Gb/s pseudorandom bit sequences
1) have been multiplexed to a 40-Gb/s
(sequence length 2
nonreturn-to-zero signal.

Fig. 11. Jitter histogram of the recovered clock.

VII. EXPERIMENTAL RESULTS


The clock and data recovery IC operates at the single supply
voltage of 5 V and consumes 1.6 W. It should be mentioned
that no additional cooling was applied. Fig. 8 shows the 40Gb/s input signal to the CDR circuit. To demonstrate the
input sensitivity of the circuit, the eye opening is artificially
reduced. In Fig. 9, an eye diagram of the well regenerated and
demultiplexed data signal is shown. Fig. 10 shows in (a) the
20-GHz transmitter clock and in (b) the recovered clock. The
jitter histogram of the extracted clock in the time domain is
displayed in Fig. 11. The measured rms time jitter as observed
on the sampling oscilloscope is about 0.8 ps. The measured
signal spectra of the VCO are plotted in Fig. 12. The dashed
line represents the free-running VCO and the solid line the
VCO phase-locked to the 40-Gb/s data signal shown in Fig. 8.
The peak is about 35 dB above the floor caused by the statistics
of the data.

Fig. 12. VCO spectra.

VIII. CONCLUSION
An integrated clock and data recovery circuit operating up
silicon
to 40 Gb/s has been realized in a 0.5- m/50-GHz
bipolar technology using only production-like process steps.
This data rate is the highest reported value for this type of
circuit in a silicon technology. This demonstrates that all

1324

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 9, SEPTEMBER 1999

digital functions necessary for a 40-Gb/s transmission system


are feasible with silicon bipolar production technologies.
REFERENCES
[1] K. Hagimoto, Y. Miyamoto, T. Kataoka, H. Ichino, and O. Nakajima,
Twenty-Gbit/s signal transmission using simple high-sensitivity optical
receiver, in OFC92 Tech. Dig., Feb. 1992, p. 48.
[2] A. Felder, M. Moller, J. Popp, J. Bock, and H.-M. Rein, 46 Gb/s
DEMUX, 50 Gb/s MUX, and 30 GHz static frequency divider in silicon
bipolar technology, IEEE J. Solid-State Circuits, vol. 31, pp. 481486,
Apr. 1996.
[3] W. Bogner, U. Fischer, E. Gottwald, and E. Mullner, 20 Gbit/s TDM
nonrepeatered transmission over 198 km DSF using Si-bipolar IC for
demultiplexing and clock recovery, in Proc. ECOC, Sept. 1996, paper
TuD.3.4.
[4] W. Bogner, E. Gottwald, A. Schopflin, and C.-J. Weiske, 40 Gbit/s unrepeatered optical transmission over 148 km by electrical time division
multiplexing and demultiplexing, Electron. Lett., vol. 33, no. 25, pp.
21362137, Dec. 1997.
[5] R. Yu, R. Pierson, P. Zampardi, K. Runge, A. Campana, D. Meeker,
K. C. Wang, A. Petersen, and J. Bowers, Packaged clock recovery
integrated circuits for 40 GBit/s optical communication links, in GaAs
IC Symp. Tech. Dig., Nov. 1996, pp. 129132.
[6] M. Mokhtari, T. Swahn, R. H. Walden, W. E. Stanchina, M. Kardos, T.
Juhola, G. Schuppener, H. Tenhunen, and T. Lewin, InP-HBT chip-set
for 40-Gb/s fiber optical communication systems operational at 3 V,
IEEE J. Solid-State Circuits, vol. 32, pp. 13711383, Sept. 1997.
[7] M. Lang, Z.-G. Wang, Z. Lao, M. Schlechtweg, A. Thiede, M. RiegerMotzer, M. Sedler, W. Bronner, G. Kaufel, K. Kohler, A. Hulsmann,
and B. Raynor, 2040 Gb/s 0.2-m GaAs HEMT chip set for optical
data receiver, IEEE J. Solid-State Circuits, vol. 32, pp. 13841393,
Sept. 1997.
[8] A. Felder, M. Moller, M. Wurzer, M. Rest, T. F. Meister, and H.-M.
Rein, 60 Gbit/s regenerating demultiplexer in SiGe bipolar technology,
Electron. Lett., vol. 33, no. 23, pp. 19841986, Nov. 1997.
[9] J. Hauenschild, A. Felder, M. Kerber, H.-M. Rein, and L. Schmidt,
A 22 Gb/s decision circuit and a 32 Gb/s regenerating demultiplexer
IC fabricated in silicon bipolar technology, in Proc. IEEE BCTM92,
Sept. 1992, pp. 151154.
[10] H.-M. Rein and M. Moller, Design considerations for very-high-speed
Si-bipolar ICs operating up to 50 Gb/s, IEEE J. Solid-State Circuits,
vol. 31, pp. 10761090, Aug. 1996.
[11] J. Hauenschild and H.-M. Rein, Influence of transmission-line interconnections between Gbit/s ICs on time jitter and instabilities, IEEE
J. Solid-State Circuits, vol. 25, pp. 763766, June 1990.
[12] J. Bock, A. Felder, T. F. Meister, M. Franosch, K. Aufinger, M. Wurzer,
R. Schreiter, S. Boguth, and L. Treitinger A 50 GHz implanted base
silicon bipolar technology with 35 GHz static frequency divider, in
Symp. VLSI Technology Tech. Dig., June 1996, pp. 108109.
[13] J. Bock, M. Franosch, H. Schafer, H. v. Philipsborn, and J. Popp, Insitu doped emitter-polysilicon for 0.5 m silicon bipolar technology, in
Proc. ESSDERC95, The Hague, the Netherlands, Sept. 1995, pp. 421
424.
[14] M. Moller, H.-M. Rein, A. Felder, and T. F. Meister, 60 Gbit/s timedivision multiplexer in SiGe-bipolar technology with special regard to
mounting and measuring technique, Electron. Lett., vol. 33, no. 8, pp.
679680, Apr. 1997.

Martin Wurzer was born in Innsbruck, Austria, in


1966. He received the Diplomingenieur degree in
electrical engineering from the Technical University
Vienna, Austria, in 1994, where he is currently
pursuing the Ph.D. degree.
He joined Corporate Research and Development,
Siemens AG, Munich, Germany, in 1994, where
he has been engaged in the development of digital
high-speed silicon bipolar ICs for future optical
communication systems in the gigabit-per-second
range.

Josef Bock was born in Straubing, Germany, in


1968. He received the diploma degree in physics and
the Ph.D. degree from University of Regensburg,
Germany, in 1994 and 1997, respectively.
He joined Corporate Research and Development,
Siemens AG, Munich, Germany, in 1993, where
he first investigated narrow emitter effects in deep
submicrometer silicon bipolar devices. His work on
technology development and process integration for
high-speed silicon bipolar transistors resulted in the
SIEGET 45 microwave-transistor family. Currently,
he is working on process development for Si and SiGe bipolar technologies.

Herbert Knapp was born in Salzburg, Austria,


in 1964. He received the Diplomingenieur degree
in electrical engineering from Technical University
Vienna, Austria, in 1997.
He joined Corporate Research and Development,
Siemens AG, Munich, Germany, in 1993, where
he has been involved in the design of integrated
circuits for wireless communications. His current
research interests include the design of high-speed
and low-power microwave circuits.

Wolfgang Zirwas received the Diplomingenieur


degree in electrical engineering from Technical University Munich, Germany.
He joined Siemens AG, Munich, in 1987. First,
he worked in the field of high-bit-rate fiber-optic
communication systems. Later, he focused his work
on broad-band access technologies (xDSL, HFC)
for both residential and business users. He is now
working in the field of broad-band wireless systems.

Fritz Schumann received the Diplomingenieur degree in electrical engineering from Technical University Berlin, Germany, in 1981.
Subsequently, he worked in the field of RF and
microwave hybrid circuit and system design for
telecommunication and radar applications. In 1992,
he joined the silicon bipolar IC design group, Corporate Research and Development, Siemens AG,
Munich, Germany. Since then, he has realized ICs
for wireless and fiber-optic communication systems
up to 60 Gb/s.

Alfred Felder was born in Bruneck, South Tyrol, Italy, in 1963. He received
the Diplomingenieur and Ph.D. degrees in electrical engineering from the
Technical University Vienna, Austria, in 1989 and 1993, respectively.
He joined Corporate Research and Development, Siemens AG, Munich,
Germany, in 1989, where he has been engaged in the development of analog
and digital high-speed silicon bipolar ICs for future optical communication
systems in the gigabit-per-second range. From 1996 to 1998, he was Manager
of the Technology Department of Siemens K.K. The department is the liaison
office of the Corporate Technology of Siemens AG in Japan, responsible
for the cooperation with Japanese companies in research. Since 1998, he
has been heading the business operation Signal Processing & Control within
the Siemens Semiconductor Group in Japan and has been responsible for
marketing of microcontrollers and digital signal processors.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 12, DECEMBER 2001

1937

A Fully Integrated 40-Gb/s Clock and Data Recovery


IC With 1:4 DEMUX in SiGe Technology
Mario Reinhold, Claus Dorschky, Eduard Rose, Rajasekhar Pullela, Peter Mayer, Frank Kunz,
Yves Baeyens, Member, IEEE, Thomas Link, and John-Paul Mattia

AbstractIn this paper, a fully integrated 40-Gb/s clock


and data recovery (CDR) IC with additional 1:4 demultiplexer (DEMUX) functionality is presented. The IC is implemented in a state-of-the-art production SiGe process. Its
phase-locked-loop-based architecture with bang-bang-type phase
detector (PD) provides maximum robustness. To the authors best
knowledge, it is the first 40-Gb/s CDR IC fabricated in a SiGe
heterojunction bipolar technology (HBT). The measurement results demonstrate an input sensitivity of 42-mV single-ended data
input swing at a bit-error rate (BER) of 10 10 . As demonstrated
in optical transmission experiments with the IC embedded in a
40-Gb/s link, the CDR/DEMUX shows complete functionality as
a single-chip-receiver IC. A BER of 10 10 requires an optical
signal-to-noise ratio of 23.3 dB.
Index TermsBang-bang, BER, CDR, clock and data recovery,
demultiplexer, DEMUX, dynamic frequency divider, jitter generation, jitter tolerance, limiting amplifier, OSNR, phase detector,
phase-locked loop, PLL, SiGe, VCO.

I. INTRODUCTION

ODAYS commercially available highest capacity optical


transmission systems are based on multiple 10-Gb/s timedivision multiplexing (TDM) channels. These systems are expected to be insufficient to meet the rapidly increasing demands
for higher bandwidth in the foreseeable future.
The economically achievable transmission capacity of these
wavelength-division multiplexing (WDM) systems is currently
limited to 1.6 Tb/s, assuming 160 parallel 10-Gb/s TDM channels in the C- and L-band at a channel spacing of 50 GHz.
This corresponds to a spectral efficiency of 0.2 (b/s)/Hz. By
increasing the channel bit rate to 40-Gb/s per TDM channel,
the fiber capacity can be better utilized. With the spectral efficiency increased to 0.4 (b/s)/Hz, the total transmission capability is 3.2 Tb/s, assuming 80 parallel 40-Gb/s channels and
100-GHz channel spacing.

Manuscript received March 26, 2001; revised July 15, 2001.


M. Reinhold, C. Dorschky, E. Rose, and F. Kunz were with Lucent Technologies, Optical Networking Group, D-90411 Nrnberg, Germany. They are now
with CoreOptics GmbH, D-90411 Nrnberg, Germany (e-mail: mario@coreoptics.com or mario_reinhold@gmx.de).
R. Pullela was with Lucent Technologies, Bell Labs, Murray Hill, NJ. He is
now with Gtran Inc., Westlake Village, CA 91362 USA.
P. Mayer and T. Link are with Lucent Technologies, Optical Networking
Group, D-90411 Nrnberg, Germany.
Y. Baeyens is with Lucent Technologies, Bell Labs, Murray Hill, NJ 07974
USA.
J.-P. Mattia was with Lucent Technologies, Bell Labs, Murray Hill, NJ 07974
USA. He is now with Big Bear Networks, Sunnyvale, CA 94086 USA.
Publisher Item Identifier S 0018-9200(01)09325-8.

Additionally, 40-Gb/s TDM will become more cost effective,


as the number of optical ports is reduced by a factor of 4 compared to 10-Gb/s TDM, resulting in fewer price-determining optical components, smaller system footprint, and reduced maintenance costs.
Regarding next-generation 40-Gb/s TDM links, the clock
and data recovery (CDR) IC is a key electronic component,
which strongly determines the overall transmission performance. 40-Gb/s TDM designs must be architecturally robust
and manufacturable to compete with 10-Gb/s TDM systems.
Accordingly, a fully integrated phase-locked loop (PLL)-based
approach with self-aligning bang-bang phase detector (PD) is
employed in this work. The IC is fabricated in a production
state-of-the-art SiGe heterojunction bipolar technology (HBT)
which provides advantages with respect to the achievable level
of integration, yield, cost-effectiveness, and process stability
compared to III-V process technologies.
II. CLOCK

AND

DATA RECOVERY EMBEDDED


OPTICAL LINK

IN

THE

The 40-Gb/s TDM optical link employs a 4:1 multiplexing


scheme, as shown in the block diagram in Fig. 1. At the receiver, the incoming optical signal is first amplified by an optical preamplifier (OA), converted into electrical pulses by the
photo diode, and then directly feeds the CDR/DEMUX. Data
recovery is accomplished by the first 1:2 DEMUX. In the PLLbased clock recovery approach presented here, the PD output
forces the receive-side voltage-controlled oscillator (VCO) to
track the phase of the incoming data signal. The combination of
the PD and 1:2 DEMUX function allows the use of a 20-GHz
half-bit-rate clock. This half-bit-rate architecture is explained in
more detail in Section IV.
This paper focuses on the CDR/DEMUX IC. However, the
remaining basic functions such as the 4:1 multiplexer (MUX)
and the driver IC have also been realized in this work program in
SiGe HBT and GaAs high-electron-mobility transistor (HEMT)
technology, respectively.
III. PROCESS TECHNOLOGY
The CDR/DEMUX IC presented in this work was designed
and 74-GHz
in a state-of-the-art SiGe HBT with 72-GHz
[1]. SiGe HBT provides superiority for a high level of integration compared to III-V technologies. The process features
four metal layers in total including a thick metal layer on top.

00189200/01$10.00 2001 IEEE

1938

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 12, DECEMBER 2001

Fig. 1. Clock and data recovery IC embedded in the optical link.

Fig. 2.

CDR/DEMUX architecture.

Small-scale integrated analog and digital building blocks implemented in this process have been demonstrated for 40-Gb/s
operation [2], [3].
IV. CDR/DEMUX ARCHITECTURE
The half-bit-rate architecture of the CDR/DEMUX IC (Fig. 2)
is based on the concept reported in [4] and has already demonstrated its functionality for 10-Gb/s applications.
The nonlinear bang-bang PD described in [5] was modified
for interlaced operation. The combination of the PD and the

1:2 DEMUX functions allows the use of a half-bit-rate clock


running at 20 GHz.
The three upper eye diagrams in Fig. 3 illustrate the basic
principle of a common bang-bang PD. For the basic realization,
of the incoming data signal are necessary.
three samples
sample two consecutive bits
In locked condition, and
while samples the data transition, as indicated in the first eye
diagram.
In this implementation, two modifications to the common
bang-bang PD are made. First, the 1:2 DEMUX and phase-

REINHOLD et al.: FULLY INTEGRATED 40-Gb/s CLOCK AND DATA RECOVERY IC

1939

each other, resulting in a minimum lead time of the high-speed


phase-control loop.
Two variants of the CDR/DEMUX exist. The higher integration variant of the CDR/DEMUX (CDR/DEMUX with VCO)
also contains an on-chip 40-GHz VCO, the proportional filter
of the PLL, and a frequency detector (FD) working as a lowspeed frequency acquisition aid, whereas in the lower integration variant (CDR/DEMUX without VCO), these parts are external to the IC.
V. BUILDING BLOCKS

Fig. 3. PD principle.

The most challenging building blocks of the CDR/DEMUX


are the 40-GHz 2:1 frequency divider, the 40-GHz VCO, the
limiting data input amplifier and the transition sampling latches,
including the four-phase 20-GHz clock tree.
A. 2:1 Dynamic Frequency Divider

detection function are combined so that a half-bit-rate clock


is used. However, for sampling in the bit transition, a fourphase clock is essential. Second, unlike in previous approaches
and
are generated in
[6], six samples
order to process every single data transition as indicated in the
fourth eye diagram in Fig. 3. This increases the maximum PD
gain, reducing the jitter generation compared to a single-edge
PD.
and
are derived
The PD output signals
from the following logical operations on the six samples after
their synchronization with the clock phase C0.

Since the PD requires a 20-GHz four-phase clock, those


clock signals can be most accurately generated by dividing a
40-GHz signal with a 2:1 frequency divider resulting in differential 0 and 90 phase-shifted clock signals. Studies published
earlier [7] using a similar technology show a maximum operating frequency of 42 GHz with a standard static frequency
divider. To achieve a higher performance margin, a dynamic
frequency divider similar to [8] was employed. As it is indicated in the circuit diagram (Fig. 4), the divided clock signal
is stored by parasitic capacitances, which results in a higher
operating frequency compared to a static frequency divider. As
a drawback of the dynamic approach, the operating frequency
exhibits a lower limit.
B. On-Chip 40-GHz VCO

In this design, an accurate four-phase clock at 20 GHz is


generated from the 40-GHz VCO output using a symmetrically
loaded 2:1 frequency divider.
Additionally, a limiting amplifier is implemented to improve
the input sensitivity. It feeds the 40-Gb/s input data to the four
parallel latch chains generating the six samples.
To process the two 1:2 demultiplexed 20-Gb/s signals
and
with commercially available 10-Gb/s DEMUX ICs, an
additional 2:4 demultiplexer is included, which produces the
10-Gb/s output signals D00, D01, D10, D11, and the 5-GHz
output clock C5G.
The PLL filter has a parallel proportional ( ) integrating
filter). The high-speed
filter aligns the
( ) structure (
phase relation between clock and data. Thus, the digital PD
and
each generate a dynamic
output pulses
in the VCO frequency. Integration
frequency step
by the filter with low bandwidth
of
controls the static VCO frequency. Since the filter and the
filter work at different speeds, both paths are decoupled from

The internal 40-GHz VCO is based on a differential Colpitts topology using microstrip transmission lines instead of a
spiral inductor (Fig. 5). At an oscillation frequency of 40 GHz, a
grounded microstrip line can be modeled quite accurately compared to other forms of inductors.
The microstrip transmission lines exhibit an inductive input
impedance, since for the odd mode they see a virtual ground
termination. It is physically implemented with the signal line on
the upper thick metallization layer and a shielding ground plane
on the lowest metal layer. This yields maximum inductive input
impedance per line length combined with minimum resistive
losses, which optimizes the quality factor .
As the PLL employs a high-speed and a low-speed filter
in parallel, the VCO has two separate frequency tuning inputs.
and
feed two ac-couThese tuning inputs
pled reverse-biased varactor diodes controlling the VCO frequency. The varactors have minimum size, resulting in a maximum VCO frequency modulation bandwidth as required by the
high-bandwidth bang-bang PD architecture.
It should be noted that the optimization of the free-running
VCO phase noise is not the major design goal. Since the
bang-bang PLL architecture provides very high bandwidth with

1940

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 12, DECEMBER 2001

Fig. 4. Circuit diagram of the 2:1 dynamic frequency divider.

D. Clock Distribution and Latch

Fig. 5.

Circuit diagram of the 40-GHz on-chip VCO.

respect to loop gain, the VCO phase noise is suppressed by its


open loop gain when the PLL is in lock.
C. Limiting Data Amplifier
The limiting data input amplifier (Fig. 6) employs cascaded
chains of emitter followers (EF), transimpedance stages (TIS),
and transadmittance stages (TAS) in accordance with the concept of impedance mismatch [9].
Layout aspects strongly influence the circuit performance; especially, it should not be degraded by signal interconnects. Since
signal lines can be distinguished into critical and uncritical lines
[9], long transmission lines are arranged between current interfaces consisting of a TAS and its load, which is either represented by an active TIS or by passive resistors. For this reason,
the limiting data amplifier is implementedboth in schematic
and layoutin the form of three separate amplifier blocks with
a TISTAS interface.
As the data signal has to be split into four latch chains (Fig. 2)
and longer lines cannot be avoided, four TIS2 stages are driven
in parallel by one TAS2.

Fig. 7 shows the circuit diagram of the latch. The latch structure can be subdivided into the latch core and the local clock
input stage.
The steepness of the PD curve is strongly determined by the
metastability and the clock phase margin (CPM) of the latch as
sample the bit transition when the PLL is in lock. Two high
current-biased emitter followers in the data path provide a high
CPM and a small metastability region.
For optimal clock distribution, a local clock input stage conand two emitter followers
sisting of termination resistors
is included in each latch cell, since a total clock line length
of several millimeters cannot be avoided. For this reason, current interfaces between each clock buffer and the latches are
employed. The concept of the clock distribution is illustrated
in Fig. 8. The two clock buffers are based on open-collector
transadmittance stages. As stated before, a local clock input
is included in
stage consisting of termination resistors
every latch cell. Each clock buffer is loaded with ten latches.
Impedance matching can be easily achieved by designing
according to the number of loading
the line impedance
can be locally adapted with respect
latches. The value of
to signal splits, as it is shown in Fig. 8. If the line impedances
(with being the number of loading
are chosen
latches), the clock amplitude can be increased. This is due
to the inductive peaking effect of the transmission line, as
indicated in Fig. 9.
The layout of the latch, which is shown as part of Fig. 10,
employs orthogonal data and clock inputs. This implementation
minimizes line length on the high-speed data path by running a
data channel directly through the cascaded latch cells (Fig. 10).
In addition, clock channels run beside the cells to simplify the
clock-tree routing.

REINHOLD et al.: FULLY INTEGRATED 40-Gb/s CLOCK AND DATA RECOVERY IC

1941

Fig. 6. Circuit diagram of the 40-Gb/s limiting data amplifier.

Since the self-oscillating frequency of the divider is roughly 41


GHz divided by 2, the circuit is very sensitive in the frequency
range centered around the nominal operating frequency of 40
GHz. The dynamic principle of the frequency divider results in
a minimum operating frequency of roughly 33 GHz at an input
power of 1 dBm.
B. On-Chip 40-GHz VCO
A similarly centered behavior can be measured for the
on-chip VCO represented by the tuning characteristic, as
shown in Fig. 14. The VCO tuning range expands from 37.7 to
41.2 GHz.
Fig. 7.

Circuit diagram of the latch.

VI. PHYSICAL REALIZATION AND MEASUREMENT RESULTS


The CDR/DEMUX with VCO dissipates 5.4 W in total. All
high-speed blocks operate at 5.5-V supply voltage and the 2:4
DEMUX at 4.2-V supply voltage. The whole die (Fig. 11)
occupies an area of 3005 m .
Closed-loop PLL measurements are performed with mounted
ICs using single-ended 4-b interleaved OC192 SONET signals
pseudorandom bit sequence (PRBS) payload for
with
PRBS payload for the
electrical back to back and
optical transmission experiments. For these measurements, the
CDR/DEMUX and the external components are mounted on
) using standard wire bonding,
a duroid substrate (
as shown in Fig. 12. The IC is placed into a cavity, reducing
the bondwire length. The substrate is attached onto a grounded
brass box. The 40-Gb/s input data is fed single-ended into the
high-speed box via a V-connector.
A. 2:1 Dynamic Frequency Divider
Fig. 13 illustrates the input sensitivity of the dynamic frequency divider in single-ended operation. A maximum operating frequency of more than 44.5 GHz is observed giving sufficient margin with respect to temperature and process variations.

C. Bang-Bang PD
is
The measured PD transfer function
illustrated in Fig. 15. The curve shows a fairly steep slope in the
lock-in point, which implies good sampling capability at data
transition and a small metastability region, corresponding to the
observed high CPM of 240 .
D. Jitter Generation and Jitter Tolerance
is measured using
The rms jitter of the recovered clock
a sampling oscilloscope. By excluding the trigger jitter of the
ps) from the original measurement
test equipment (
result of less than 1.2 ps, the overall CDR rms jitter generation
can be calculated to be approximately 0.7 ps.
For SONET and SDH systems, jitter tolerance masks are defined. As standardization is not finalized for 40-Gb/s systems
yet, the tolerance mask is extrapolated from the 10-Gb/s specification. The measured jitter tolerance curve, given in Fig. 16, exhibits sufficient margin relative to the extrapolated BELLCORE
mask [10]. For jitter frequencies higher than the corner freMHz of the PLL, the jitter tolerance is dequency
termined by the CPM of the first latches and the PLL has no inMHz, the
fluence. For jitter frequencies lower than
jitter tolerance decreases with 20 dB per decade. Large amounts
of jitter can be tolerated at low frequencies, since the filter provides high gain in this frequency range.

1942

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 12, DECEMBER 2001

Fig. 8. Block diagram of the 20-GHz quadrature clock distribution.

Fig. 9.

Rough odd-mode simplification of the clock distribution.

E. Electrical Sensitivity (BER)


The bit-error rate (BER) curve as measure of the overall
electrical CDR performance is given in Fig. 17. For the
CDR/DEMUX with external VCO, a very high sensitivity
is
of 28-mV single-ended voltage swing at
measured. Due to minor modifications of the limiting amplifier
resulting in lower bandwidth, the CDR/DEMUX with internal
VCO provides slightly less sensitivity. For the same BER, a
42-mV single-ended voltage swing is necessary. A contribution
of the internal VCO to the performance degradation can be
ruled out, since a variant with external VCO and modified
limiting amplifier showed similar performance degradation.
Such high input sensitivity allows a direct connection of the

Fig. 10.

Layout of the entire clock tree and enlargement of the latch.

photodiode to the CDR/DEMUX as the optical measurement


results demonstrate.
F. Performance in System Application
The performance of the CDR/DEMUX embedded in a optical fiber link (refer to Fig. 1) can be characterized by the optical signal-to-noise ratio (OSNR) measurement, as shown in
Fig. 18. This is due to the fact that in the given configuration the
sensitivity is limited by the noise of the OA.
A 50- terminated photodiode is directly connected to the
CDR/DEMUX without any electrical amplifier in between, so

REINHOLD et al.: FULLY INTEGRATED 40-Gb/s CLOCK AND DATA RECOVERY IC

Fig. 11.

Fig. 12.

Fig. 13.

1943

Fig. 14.

Tuning curve of the 40-GHz on-chip VCO.

Fig. 15.

Measured PD transfer function.

Fig. 16.

Jitter tolerance measurement result.

Fig. 17.

BER measurement.

CDR/DEMUX micrograph (CDR/DEMUX with VCO).

CDR/DEMUX test fixture with mounted CDR/DEMUX.

Input sensitivity of the 2:1 dynamic frequency divider.

that the CDR/DEMUX works as a single-chip-receiver IC. An


externally modulated signal is transmitted over an 80-km optical-fiber link. To achieve a BER of 10 , a minimum OSNR
of 23.3 dB is required.

1944

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 12, DECEMBER 2001

Claus Dorschky received the Dipl.-Ing. degree in


electrical engineering from Friedrich Alexander
University, Erlangen, Germany, in 1986.
He has been working in the development department for high-speed optical transmission systems
at Philips Kommunikation Industries (later Lucent
Technologies), Nrnberg, Germany, for 14 years. His
research interests include design and integration of
analog and mixed-signal full custom ICs for 10- and
40-Gb/s as well as integration of optical receivers
and transmitters into single wavelength and DWDM
transmission systems at those bitrates. In early 2001, he cofounded CoreOptics
Inc., Nrnberg, Germany.
Fig. 18.

OSNR measurement.

VII. CONCLUSION
In this paper, the implementation of a fully integrated
CDR/DEMUX for 40-Gb/s TDM application in a state-of-theart SiGe HBT process has been demonstrated. Key success factors in the design are, first, the robust half-bit-rate architecture
with bang-bang PD, and second, its implementation by partitioning the whole chip into key building blocks interconnected
via robust current interfaces. Therefore, schematic design and
layout have to be done concurrently.
This concept has been applied throughout the IC, but mainly
in the 40-Gb/s data and the 20-GHz clock distribution. Optical
system measurements show the feasibility of the CDR/DEMUX
as a single-chip-receiver IC.
REFERENCES
[1] T. F. Meister et al., SiGe base bipolar technology with 74-GHz f
and 11-ps gate delay, Proc. IEEE Int. Electron Devices Meeting
(IEDM), pp. 739742, Dec. 1995.
[2] J. Mllrich, T. F. Meister, M. Rest, W. Bogner, A. Schpflin, and H.-M.
Rein, 40-Gbit/s transimpedance amplifier in SiGe bipolar technology
for the receiver in optical-fiber links, Electron. Lett., vol. 34, pp.
452453, 1998.
[3] A. Felder, M. Mller, M. Wurzer, M. Rest, T. F. Meister, and H.-M.
Rein, 60-Gbit/s regenerating demultiplexer in SiGe bipolar technology, Electron. Lett., vol. 33, pp. 19841985, 1997.
[4] J. Hauenschild et al., A plastic packaged 10-Gb/s BiCMOS clock and
data recovery 1:4-demultiplexer with external VCO, IEEE J. SolidState Circuits, vol. 31, pp. 20562059, Dec. 1996.
[5] J. J. D. H. Alexander, Clock recovery from random binary signals,
Electron. Lett., vol. 11, pp. 541542, Oct. 1975.
[6] M. Wurzer et al., A 40-Gb/s integrated clock and data recovery circuit
in a 50-GHz f silicon bipolar technology, IEEE J. Solid-State Circuits, vol. 34, pp. 13201324, Sept. 1999.
[7] M. Wurzer et al., 42-GHz static frequency divider in a Si/SiGe bipolar
technology, in IEEE ISSCC Dig. Tech. Papers, Feb. 1997, pp. 123123.
[8] Z. Lao et al., 55-GHz dynamic frequency divider IC, Electron. Lett.,
vol. 34, no. 20, pp. 19731974, 1998.
[9] H.-M. Rein et al., Design considerations for very-high-speed
Si-Bipolar ICs operating up to 50-Gb/s, IEEE J. Solid-State Circuits,
vol. 31, pp. 10761090, Aug. 1996.
[10] SONET OC-192 Transport System Generic Criteria, Bellcore,
GR-1377-CORE, Dec. 1998.
Mario Reinhold was born in Mlheim/Ruhr, Germany, in 1972. He received the Diplom-Ingenieur degree in electrical engineering from the Ruhr-University Bochum, Germany, in 1998.
He joined Lucent Technologies, Optical Networking Group, Nrnberg, Germany, in 1998,
where his activities focused on the development
of various analog and digital high-speed bipolar
ICs for 40-Gb/s and advanced 10-Gb/s fiber-optic
communication systems. Since 2001, he has been
with CoreOptics Inc., Nrnberg, Germany, working
on a next-generation 40-Gb/s chipset.

Eduard Rose was born in Kischinjow, Moldova,


in 1973. He received the Dipl.-Ing. degree in electrical engineering from Ruhr-University Bochum,
Germany, in 1998.
He joined Lucent Technologies, Optical Networking Group, Nrnberg, Germany, in 1999, where
he started developing different analog and digital
high-speed bipolar ICs for SDH/Sonet systems. He
is currently with CoreOptics Inc., Nrnberg, working
on a second-generation chipset for a 40-Gb/s optical
link system.

Rajasekhar Pullela received the B.Tech. degree


in electrical and communications engineering from
the Indian Institute of Technology, Madras, India, in
1993. From 1993 to 1998, he worked as a graduate
student researcher at the University of California,
Santa Barbara. During this period, he received M.S.
and Ph.D. degrees in electrical engineering, studying
device physics and high-speed circuit design.
During 19982000, he worked as a Member of
Technical Staff at Bell Laboratories, Lucent Technologies, Murray Hill, NJ, designing high-speed ICs
for fiber-optic communication systems. Since 2000, he has been with Gtran,
Inc., Newbury Park, CA.

Peter Mayer was born in Germany on July 11,


1964. He received the Dipl.-Ing. degree in electrical
engineering from Friedrich Alexander University,
Erlangen, Germany, in 1989.
In 1989, he joined Philips Kommunikation Industries, Nrnberg, Germany, and has been working
on 622-Mb/s optical interface circuits. In 1998,
he started developing clock and data recovery ICs
for 10- and 40-Gb/s applications. He is currently a
Technical Manager at Lucent Technologies GmbH,
Nrnberg, where he is responsible for high-speed
optical/electrical module design and integration for optical transmission
systems.

Frank Kunz was born in Bad Sobernheim, Germany,


in 1970. He received the Dipl.Ing. degree in electrical
engineering from the Ruhr-University Bochum, Germany, in 1998.
Until February 2001, he was with Lucent Technologies GmbH, Nrnberg, Germany, developing
high-speed bipolar ICs for advanced 10-Gb/s
optical communication links. He is currently with
CoreOptics Inc., Nrnberg, and is working on a
second-generation chipset for a 40-Gb/s optical link
system.

REINHOLD et al.: FULLY INTEGRATED 40-Gb/s CLOCK AND DATA RECOVERY IC

Yves Baeyens (S89M96) received the M.S. and


Ph.D. degrees in electrical engineering from the
Catholic University, Leuven, Belgium, in 1991
and 1997, respectively. His Ph.D. research was
performed in cooperation with IMEC, Leuven, and
treated the design and optimization of coplanar
InP-based dual-gate HEMT amplifiers, operating up
to W-band.
After a year and a half stay as a Visiting Scientist at the Fraunhofer Institute for Applied Physics,
Freiburg, Germany, he is currently a Technical Manager in the High-Speed Electronics Research Department of Lucent Technologies, Bell Laboratories, Murray Hill, NJ. His research interests include the design of mixed analogdigital circuits for ultrahigh-speed lightwave and millimeter-wave applications.

Thomas Link was born in 1968 in Nrnberg,


Germany. He received the Dipl.-Ing. (FH) degree
in electrical engineering from Georg Simon Ohm
Fachhochschule, Nrnberg, Germany, in 1991.
In 1991, he joined Philips Kommunikation Industrie AG (now Lucent Technologies), Nrnberg. He is
Member of Technical Staff in the high-speed ASIC
group and designed various high-speed ASICs, circuit packs, and firmware for SDH/Sonet systems.

1945

John-Paul Mattia received the B.S., M.S., E.E.,


and Ph.D. degrees in electrical engineering and
computer science from the Massachusetts Institute
of Technology, Cambridge.
He began working in high-speed electronics
at MIT Lincoln Laboratory in 1989. In 1996, he
joined Texas Instruments Inc. in the DSP R&D
organization. From 1997 to 2000, he worked in
the High-Speed Electronics Group of Lucent Bell
Labs, designing and testing circuits for lightwave
communication systems. Since July 2000, he has
been at Big Bear Networks, Sunnyvale, CA, where he is Chief Technical
Officer of Electronics.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 12, DECEMBER 2000

1949

A Fully Integrated SiGe Receiver IC


for 10-Gb/s Data Rate
Yuriy M. Greshishchev, Member, IEEE, Peter Schvan, Member, IEEE, Jonathan L. Showell, Member, IEEE,
Mu-Liang Xu, Member, IEEE, Jugnu J. Ojha, Member, IEEE, and Jonathan E. Rogers, Student Member, IEEE

AbstractA silicon germanium (SiGe) receiver IC is presented


here which integrates most of the 10-Gb/s SONET receiver functions. The receiver combines an automatic gain control and clock
and data recovery circuit (CDR) with a binary-type phase-locked
loop, 1 : 8 demultiplexer, and a 27 1 pseudorandom bit sequence
generator for self-testing. This work demonstrates a higher level
of integration compared to other silicon designs as well as a CDR
with SONET-compliant jitter characteristics. The receiver has a
die size of 4 5 4 5 mm2 and consumes 4.5 W from 5 V.
Index TermsClock and data recovery (CDR), jitter generation,
jitter tolerance, jitter transfer, phase detector, phase-locked loop
(PLL), SONET, VCO.

Fig. 1. 10-Gb/s receiver architecture. Dotted box shows components


integrated in the SiGe receiver IC presented in the paper.

I. INTRODUCTION

TYPICAL fiber-optic SONET receiver contains pin-diode


with transimpedance (TZ) amplifier, wide dynamic range
automatic gain control amplifier (AGC), and a clock and data
recovery circuit (CDR) with a demultiplexer. Introduction of
dense wave-division-multiplexed (DWDM) systems has put a
high demand on the receiver production. A high level of 10-Gb/s
component integration, as opposed to using a filter-based CDR
architecture [1], is required along with self-testing capabilities
to reduce receiver cost, module size, and power dissipation. One
of the major difficulties in the integration of 10-Gb/s receiver
is to achieve jitter characteristics compliant to the SONET requirements, such as Bellcore recommendations for the OC192
system [2]. To the authors knowledge, none of the previously
reported [3][5] 10-Gb/s receiver ICs with the integrated clock
and data recovery circuit (CDR) demonstrated all of the SONET
compliant jitter characteristics. While sub-picosecond jitter generation was previously confirmed in the SONET CDR [5], another important question is if all of the receiver components can
be integrated on a die without sensitivity and jitter performance
degradation.
A fully integrated SiGe receiver IC, presented in the
paper, combines CDR, AGC, 1 : 8 demultiplexer and

pseudorandom bit sequence (PRBS) generator for self-testing,


as shown in the dotted-line box in Fig. 1 [6]. Receiver performance mounted into test fixture was verified in a data-recovery
mode up to 12.5 Gb/s and in a CDR mode at 9.1 Gb/s (only
limited by the VCO maximum oscillation frequency after
packaging). The OC192 10-Gb/s SONET-compliant jitter
characteristics of the CDR were verified on-wafer with a
membrane probe card and with a jitter analyzer from Anritsu.
Phase-noise characteristics have also been measured to confirm
the CDRs sub-picosecond rms jitter performance. Measured
10-Gb/s maximum receiver sensitivity de-embedded after
losses in the test fixture is 4.5 mV at a bit-error rate (BER) of
at the demultiplexer (DEMUX) output.
In Section II, the binary CDR architecture used in the receiver
is briefly analyzed as compared to a linear-type CDR and design
method to meet SONET jitter requirements is presented. Then in
Section III, the full receiver architecture and the building blocks
implementation details are discussed. In Section IV, the IC die
fabrication features are described. Finally, in Section V, measured results are presented.
II. BINARY CDR IN SONET RECEIVER
A. Binary CDR Versus Linear CDR

Manuscript received April 17, 2000; revised June 29, 2000.


Y. M. Greshishchev and P. Schvan are with Nortel Networks, Ottawa, ON
K1Y 4H7, Canada (e-mail: greshy@nortelnetworks.com).
J. L. Showell was with Nortel Network and is currently with Quake Technologies Inc., Ottawa, ON, Canada.
M.-L. Xu was with Nortel Networks, Ottawa, ON K1Y 4H7, Canada. He is
now with Conextant Systems, San Diego, CA.
J. J. Ojha was with Nortel Networks, Ottawa, ON K1Y 4H7, Canada. He
is now with Caspian Networks, Palo Alto, CA (e-mail: jojha@caspiannetworks.com).
J. E. Rogers is with The University of Toronto, Toronto, ON M5S 3G4,
Canada.
Publisher Item Identifier S 0018-9200(00)09475-0.

The CDR published in [5] uses a linear-type PLL approach


[Fig. 2(a)], while the CDR presented here is based on a binary
PLL [Fig. 2(b)]. In the binary PLL, a binary Alexander-type [7]
phase detector (PD) is used as compared to the Hogge-type PD
[8] in a linear-type PLL. Examples of using binary architecture
in optical receiver ICs can be found in [9], [10]. Binary PD produces two digital outputs, UP and DOWN, to signal if the data is
early or late with respect to the VCO clock. To control the VCO,
the binary information is split into two loops as suggested in
[11]. The phase-control loop is formed with the UP and DOWN

0018-9200/00$10.00 2000 IEEE

1950

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 12, DECEMBER 2000

(a)
Fig. 3.

(b)
Fig. 2. Two basic self-aligned CDR architectures. (a) With a linear PLL. (b)
With a binary PLL.

outputs directly modulating the VCO frequency with frequency


( denotes bang-bangother name of the binary
step
PLL architecture) via bang-bang frequency tune input. The frequency loop of the binary PLL uses binary outputs integrated
to control the second tune
with charge pump and capacitor
input of the VCO. To reduce jitter generation in absence of data
transitions (during long zeros or ones), a tri-state charge pump
was employed. By the same reason, a tri-state is introduced to
the VCO bang-bang control input which is no longer binary, but
ternary, where in a tri-state no frequency step is applied.
Table I shows the comparison for two PLL receiver architectures. Binary CDR is less demanding on the analog features of
an IC technology, and, in principal, has only one critical componentthe binary phase detector (BPD) where sub-picosecond
time resolution is required. The ring-oscillator-type VCO is recommended to reduce delay in the PLL loop and, therefore, jitter
generation in the CDR, as explained later in the paper. The VCO
phase noise is less critical in a binary CDR because of the relatively wide PLL bandwidth. In a linear-type PLL, jitter transfer
characteristics can be analyzed using linear PLL theory (see,
for example, [5]). Binary-type PLL has nonlinear jitter transfer
characteristics and its analyses have not been presented in the
technical literature. The following subsection describes the binary PLL design method used in the receiver design.
B. Binary CDR Jitter Characteristics
In SONET applications three main jitter characteristics are
important:
1) Jitter Tolerance;
2) Jitter Transfer;
3) Jitter Generation.
The Jitter Generation and loop stability was first analyzed by
R. Walker et al. [11]. This work also suggested loop decomposi-

CDR analytical jitter tolerance as compared to SONET mask.

Fig. 4. Trade-off for the frequency step in binary CDR.

tion into a frequency-control (low-frequency) part and a phasecontrol (high-frequency) part. In general, binary PLL is similar in system behavior to a double integration delta modulator
with prediction [12] acting in the data-phase domain. Based on
analogy with the signal frequency response in delta modulator,
the phase-jitter transfer function has a nonlinear (slew-rate limited) mechanism for the phase-jitter frequency response. It is a
single-pole-like response with the bandwidth inversely proportional to the input jitter amplitude:
(1)
is 3-dB bandwidth of the binary PLL jitter transfer
where
is the jitter amplitude,
is the
function;
is the average data transition
bangbang frequency step,
for
pattern).
density factor (maximum
The shape of the Jitter Tolerance function can be characterized by means of jitter-tolerance scale function,
[5]. Modeled binary PLL jitter tolerance response,
,
. For
is shown in Fig. 3 for two frequency steps
comparison, Bellcore SONET mask is also shown on the same
plot with the corresponding unit interval (UI) values on the right
side of the graph1 . To satisfy the mask, jitter transfer bandwidth
defined by (1) should be set above 4 MHz at a jitter amplitude
ps and minimum average data transition
.
density,
The Jitter Generation in a binary CDR is proportional to the
, and delay in the PLL loop, measured as a
frequency step,
number of 100-ps clock periods required to propagate signal
from the phase detector output to its input:
(2)
Equations (1) and (2) were used to find a frequency step
and a delay
acceptable for the SONET applications as
1In

a 10-Gb/s system, UI

= 100 ps.

GRESHISHCHEV et al.: Fully Integrated SiGe Receiver IC

1951

TABLE I
COMPARISON OF LINEAR AND BINARY TYPE CDR ARCHITECTURES

shown in Fig. 4. The minimum value for the frequency step


is determined by Jitter Tolerance minimum bandwidth
is limited
requirements (4 MHz); the maximum value
by jitter generation (10 ps is recommended [2]). In the deps was assumed. To reduce
sign presented here,
jitter generation delay, should be minimized. This makes the
ring-type VCO preferable in the binary CDR as compared to
LC-tank based VCO where tuning delay is larger due to the usually higher -factor of the LC-tank.
III. RECEIVER ARCHITECTURE
A. Architecture
The receiver architecture is shown in Fig. 5. It combines an
AGC and a binary CDR with a 1 : 8 demultiplexer and a
PRBS generator for self-testing. The receiver recovers 10-Gb/s
data and a 10-GHz clock, and produces eight demultiplexed
1.25-Gb/s CML data outputs with a 1.25-GHz clock. The PRBS
generator allows functional testing of the CDR and subsequent
circuits. A PRBS clock (CLK) is required for testing. In the test
mode, the PRBS output is enabled to drive the CDR data bus.
In the receiver mode, the AGC output is enabled. The recovered
10-Gb/s data and 10-GHz clock appear at the recovered data
(DATA REC BUS) and clock (CLK REC BUS) buses, driving a
1 : 8 DEMUX circuit. A clock signal can also be supplied externally (CLKx) for data recovery operation only.
B. AGC
The block diagram of the AGC is shown in Fig. 6. The
AGC has total linear gain range from 3 to 20 dB with a
maximum input of 1.7 V . The AGC has two variable gain

stages
implemented with output current steering in
has a fixed gain and also
a differential pair [13]. Stage
provides open collector transmission line interface to drive the
CDR data bus. To alleviate conflicting requirements for large
data swing and low noise figure at low input amplitudes, the
AGC has two gain ranges: 77 dB (low gain range) and 720
dB (high gain range). Two differential pairs with a large and
a small emitter degeneration resistors are used to switched
the gain ranges. AGC-measured S11 is better than 15 dB in
a frequency range up to 10 GHz, noise figure 13.5 dB. The ac
bandwidth is adjustable in a range of 810 GHz.
C. CDR
As compared to original version of binary CDR [7], [11], in
the CDR presented here, the data decision and clock recovery
processes are split. This allows for independent optimization
of data decision threshold (slicing) without affecting clock recovery process. There are four decision channels in the CDR, all
driven by the CDR data bus. Channels 1 and 2 are identical decision circuits, as shown in the block diagram of Fig. 7(a). The
additional decision channel allows operation with two different
input slicing levels. The data decision threshold is set by a differential slicer circuit based on an emitter follower [Fig. 7(b)]. In a
long-haul receiver application, a high-performance limiting amplifier [2 in Fig. 7(a)] is required. Note that the AGC stabilizes
only a long-time averaged amplitude measured at AGC output
with a peak detector (not shown in Figs. 5 and 6). The limiting
amplifier stage was designed for 40-dB gain with bandwidth of
more than 16 GHz and input AM to output PM conversion less
than 1 ps in 20-dB input dynamic range. Two 20-dB gain-limiting amplifier stages similar to [14] were employed. The digital sampler is based on a masterslavemaster (MSM) flip-flop

1952

Fig. 5.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 12, DECEMBER 2000

SiGe receiver architecture.

Fig. 6. AGC block diagram.

configuration [Fig. 7(c)]. It helps to reduce the latching metastability region and to increase the clock phase margin in the decision circuit, as opposed to masterslave D-type flip-flop (DFF).
Schematic and layout of the latch was optimized for a minimum
latching time constant.
The BPD is formed with decision channels 3 and 4, and two
DFF circuits. The BPD takes three data samples according to the
timing diagram in Fig. 5. It generates a binary output with respect to the lead/lag phase of the VCO clock. The BPD uses only
one edge of the data transition, and is set to a tri-state condition
at the other edge or in the absence of data transitions according
to a truth table (Table II). As a result, recovered clock jitter is
not effected by asymmetry between the rising and the falling
edges or by the incoming data pattern. The BPD phase resolution is critical to the CDR jitter performance. Latch metastability at time sample T0 (see Fig. 5) limits the resolution. This
problem was circumvented by using a MSM-based decision circuit preceded by a limiting amplifier, as described above. Phase
resolution better than 1 ps was measured in the BPD circuit, as
shown in Fig. 8. A phase-delay sweep was arranged with two
clock generators: 5.0125 GHz for the data and 10.0000 GHz

Fig. 7. Decision channel. (a) Block diagram: 1slicer; 2limiting amplifier;


3digital sampler. (b) Slicer schematic. (c) Digital sampler schematic.

for the clock. Frequency shift of 12.5 MHz for the data provided binary pulses at the output of the phase detector with
40-ns period corresponding to a 100-ps delay sweep. To analyze the output pulses, a digital scope was synchronized from
the phase-detector output using an HP 54118A trigger amplifier. This method allowed measurement of the output transition
region with the accuracy of 0.5 ps per one data cycle (200 ps).
Measured transition region at the phase detector output contains
two data cycles, confirming phase detector time resolution to be
less than 1 ps.
The CDR frequency-control loop and the phase-control loop
are separated as described in Section II. The bang-bang part of

GRESHISHCHEV et al.: Fully Integrated SiGe Receiver IC

1953

TABLE II
TRUTH TABLE OF BINARY PHASE DETECTOR

Fig. 9.

Ring-type VCO block diagram.

Fig. 8. BPD measured time resolution.

the PLL controls the recovered clock phase via the


input of
the VCO. The frequency loop includes the charge pump and an
external integration capacitor (pins C1 and C2).
The VCO is a ring oscillator type with an architecture shown
in Fig. 9. A mixer-type delay cell is used to control the oscillation frequency. The mixer cell is split into a fine-tune (for
the internal frequency loop) and a coarse-tune (to compensate
for process variation). Care was taken to provide symmetrical
bang-bang frequency steps with respect to the tri-state. All of the
VCO control inputs were implemented with the high-impedance
pMOS buffers. A pMOS-based charge pump was employed [5].

Fig. 10.

1 : 8 demultiplexer block diagram.

Fig. 11.

D. 1 : 8 Demultiplexer
The 1 : 8 DEMUX (Fig. 10) is similar in architecture to
the design presented in [15]. Seven 1 : 2 demultiplexer circuits are cascaded, with each stage optimized for the clock
frequency required. Each 1 : 2 demultiplexer consists of a
masterslavemaster flip-flop to capture the lead bit on the positive edge of the clock and a masterslave flip-flop to capture
the second bit using the negative edge of the clock. Utilizing
an extra latch in the MSM flip-flop ensures that the 1 : 2 data
outputs are aligned for further processing. The frequency of the
incoming CDR clock is divided by two at each demultiplexer
stage with a delay equal to the data delay in the 1 : 2 block.
E. Built-in

PRBS Generator

PRBS generator (Fig. 11) was implemented using


The
in a parallel form.
the standard polynomial equation

0 1 PRBS generator block diagram.

The parallel form avoids the necessity of distributing a 10-GHz


clock, as would be required if using a shift-register-type PRBS.

1954

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 12, DECEMBER 2000

Fig. 13. SiGe receiver mounted into microwave-style test fixture.

Fig. 12.

SiGe receiver die micrograph.

In an 8-bit parallel form, the clock frequency is reduced to 1.25


GHz. Similar to the AGC, the output of the multiplexer provides
open collector transmission line interface to drive the CDR data
bus. The disadvantage of a parallel architecture is penalty in
the die area and the power consumption. In normal operation
(the AGC transmits data to the CDR) the PRBS power supply
is turned off, reducing the overall power consumption of the
receiver.
F. CDR Simulation
Because of the nonlinear jitter response of a binary CDR,
hierarchical numerical analysis was an important part of the
receiver IC design. Four levels of PLL analysis were carried
out: analytical, behavioral, schematic level, and post-layout extracted circuit with distributed parasitics. The last three levels
are HSPICE-based. A behavioral library of linear and digital
components was developed. Analytical models of jitter transfer
and jitter tolerance are based on simplified binary-PLL theory,
as described above.
IV. FABRICATION
The receiver IC was implemented in IBMs SiGe technology
GHz,
GHz). The microphotograph of the
(
mm . The major
die is shown in Fig. 12. The die size is
circuit building blocks were not only integrated into the receiver,
but were implemented as individual IC components and tested.
In the receiver IC, the building blocks were physically partitioned with a transmission line circuit and layout isolation interface similar to that presented in [5], [14]. Separate power supply
systems with digital and analog grounds were routed.
V. EXPERIMENTAL RESULTS
The IC worked at first implementation with the VCO oscillation frequency 10% lower than simulated. The receiver IC was
mounted into a microwave test fixture (Fig. 13) and was tested
at 9.1 Gb/s (VCO oscillation frequency limit2 ) in a CDR mode
and up to 12.5 Gb/s in a data-recovery mode or in internal PRBS
test mode with an external clock. The carrier substrate size of
2Maximum oscillation frequency can be easily corrected by removing one
delay stage in the VCO design of Fig. 9.

Fig. 14. SiGe receiver eye diagrams measured in CDR mode at 9.1 GB/s. Input
data: 80 mV p PRBS 2
1.

cm in the test fixture was defined by the perimeter required for mounting I/O connectors in the housing metal box. A
large number of required I/O were used for testing purposes. The
receiver IC does not require external components, except decoupling and integration capacitors mounted beside the die. The
recovered clock and data eye diagrams at 9.1 Gb/s are shown
in Fig. 14. The IC input sensitivity is less than 4.5 mV at
measured at the 1 : 8 demultiplexer output with
AGC gain set to 20 dB (data eye closure in the test fixture was
de-embedded). DATA : 8 transition distortion apparent in Fig. 14
is due to long ribbon cable attached to the test fixture demultiplexed outputs. The receiver consumes in a mission mode 4.5 W
from 5 V.
The CDR performance IC was fully characterized at 10-Gb/s
on-wafer with a probe card. The die micrograph of the CDR
is shown in Fig. 15. It consists of an exact copy of the receiver
CDR layout plus the output buffers located in the DEMUX partition. In all of the measurements, input data were supplied singleended while unused differential input was terminated with 50 .
The CDR typical eye diagrams measured with 20 mV
PRBS data are shown in Fig. 16. The input sensitivity was meaas compared to 13.4 mV
sured to be 14 mV at
simulated considering thermal and shot noise in the decision
channel.
Phase noise of the recovered clock was measured with an
HP 4352B as a power spectrum density (Fig. 17). 10-Gb/s input
data were supplied with amplitude of 100 mV and

GRESHISHCHEV et al.: Fully Integrated SiGe Receiver IC

Fig. 15.

SiGe CDR IC die micrograph.

Fig. 16.

CDR IC eye diagrams measured in CDR mode at 10 Gb/s.

1955

Fig. 18.

CDR jitter transfer.

Fig. 19. CDR jitter tolerance. Performance measured with the clock reference
level modulaton test marked with symbol .

Fig. 17. Phase-noise comparison of the CDR recovered clock, free running
VCO and data pattern generator (BERT) clock.

PRBS pattern. For comparison, phase noise of the free-running


VCO and the data-pattern generator was also measured and
shown on the same plot. As expected in a high performance
CDR, the recovered clock-phase noise follows, with no error,
the data-reference clock noise down to the CDR jitter noise
floor at 110 dBc/Hz. Similar recovered phase noise was
achieved in the CDR design with a linear PLL and LC-type
VCO [5]. Numerically integrated phase noise of the recovered

clock gives jitter RMS value of 0.78 ps. Phase noise was found
.
to be PRBS pattern independent up to a pattern of
The OC192 jitter compliant performance (at 9.953 28 Gb/s)
was verified with a jitter analyzer MX177 701 from Anritsu.
Jitter generation (in 80-MHz bandwidth) was measured to be
5.4 ps and 0.8 ps RMS as compared to 10 ps or 1 ps RMS
recommended by Bellcore [2]. The RMS jitter is very close to
the 0.78-ps value obtained in the phase-noise measurement.
Jitter transfer measurement (Fig. 18) showed, as predicted
by modeling, single-pole-like characteristics with no jitter
peaking. Jitter tolerance (Fig. 19) has a very wide safety margin
for SONET mask with a minimum of 40 ps (15 ps is
recommended). The shape of measured CDR jitter tolerance
response differs from the modeled in Fig. 3 because of test
setup limitations. This is seen from the measured jitter tolerance
of the test setup (BERT) with no CDR in the data path (shown
in the same plot of Fig. 19). Only in the frequency range of
40 kHz2 MHz measured jitter tolerance is determined by CDR
performance. In this frequency range, measured and modeled
jitter tolerance coincide. The upper frequency range of the
jitter tolerance response was also remeasured with a different
(see Fig. 5)
method, based on the reference voltage

1956

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 12, DECEMBER 2000

modulation with a sin-wave signal. Minimum jitter tolerance of


40 ps was measured. Both jitter transfer and jitter tolerance
response were found to be PRBS pattern independent. The
IC demonstrated a 60-MHz frequency range of robust PLL
locking and operation even at the input signals well below the
sensitivity level.
VI. CONCLUSION
A fully integrated SiGe receiver IC is presented, which combines self-aligned CDR with integrated binary PLL, AGC, 1 : 8
PRBS generator for self-testing. The
demultiplexer, and
receiver, mounted into a test fixture, operates up to 9.1 Gb/s
(VCO limit) in a CDR mode and up to 12.5 GB/s in a data-recovery mode. Maximum die sensitivity is 4.5 mV at
measured at 1 : 8 DEMUX output. Receiver die size is
mm , and it consumes in a mission mode 4.5 W from
5-V power supply. CDR SONET-compliant jitter characteristics were verified on-wafer. Jitter tolerance well exceeds OC192
Bellcore mask with a minimum of 40 ps . Jitter transfer has a
single-pole-like response with no peaking detected. Jitter generation is less than 1 ps RMS and less than 5.5 ps .

[14] Y. Greshishchev and P. Schvan, 60-dB gain 55-dB dynamic range


10-Gb/s SiGe HBT limiting amplifier, IEEE J. Solid-State Circuits,
vol. 34, pp. 19141920, Dec. 1999.
[15] L. I. Anderson et al., Silicon bipolar chipset for SONET/SDH 10-Gb/s
fiber-optic communication links, IEEE J. Solid-State Circuits, vol. 30,
pp. 210218, Mar. 1995.

Yuriy M. Greshishchev (M95) received the


M.S.E.E. degree from Odessa Electrotechnical
Institute of Communications, Odessa, Ukraine,
in 1974 and the Ph.D. degree in electrical and
computer engineering from V.M. Glushkov Institute
of Cybernetics, Microelectronics Division, Kyiv,
Ukraine, in 1984.
From 1976 to 1994, he worked with research
and development organizations and academia on
high-speed silicon bipolar and GaAs MESFET ADC
and DAC integrated circuits. His Ph.D research
was dedicated to the development of folding-type ADCs embedded into TV
systems. In 1993, he was a Visiting Scientist at Micronet, Institution Center
of University of Toronto, Ontario, Canada. In 1994, he joined the Department
of Electrical and Computer Engineering, University of Toronto, where he
conducted research on GaAs MESFET linear transmitter design for digital
wireless communication. Since 1996, he has been with Nortel Networks,
Ottawa, Ontario, where he is responsible for development of highly integrated
circuit solutions in emerging technologies for optical communications. He has
coauthored two books and numerous technical papers on the area of high-speed
communication circuit design, data converters, and statistical modeling.

ACKNOWLEDGMENT
The authors thank their colleagues S. Szilagyi for the
microwave test fixture design, and D. Marchesan and
Dr. S. Voinigescu for useful discussions and distributed
components modeling. Special thanks to R. Hadaway for his
directions and to IBM Corporation for fabrication.
REFERENCES
[1] B. Beggs, GaAs HBT 10-Gb/s Product, in 1999 IEEE MTT-S Int. Microwave Symp. Workshop, Anaheim, CA, June 1319, 1999.
[2] SONET OC-192, Transport system generic criteria, Bellcore,
GR-1377-CORE, no. 4, Mar. 1998.
[3] R. C. Walker et al., A 10-Gb/s Si-bipolar Tx/Rx chipset for computer
data transmission, in ISSCC Dig. Tech. Papers, Feb. 1998, pp. 302303.
[4] T. Morikawa et al., A SiGe single-chip 3.3-V receiver IC for 10-Gb/s
optical communication systems, in ISSCC Dig. Tech. Papers, Feb.
1999, pp. 380381.
[5] Y. Greshishchev and P. Schvan, SiGe clock and data recovery IC
with linear-type PLL for 10-Gb/s SONET application, in Proc. 1999
Bipolar/BiCMOS Circuits and Technology Meeting, Sept. 1999, pp.
169172.
[6] Y. M. Greshishchev, P. Schvan, J. L. Showell, M.-L. Xu, J. J. Ojha, and
J. E. Roger, A fully integrated SiGe receiver IC for 10-Gb/s data rate,
in ISSCC Dig. Tech. Papers, Feb. 2000, pp. 5253.
[7] J. D. H. Alexander, Clock recovery from random binary signals, Electron. Lett., vol. 11, pp. 541542, Oct. 1975.
[8] C. R. Hogge, A self-correcting clock recovery circuit, J. Lightwave
Technology, vol. 3, pp. 13121314, Dec. 1985.
[9] J. Hauenschild et al., A two-chip receiver for short-haul links up to
3.5-Gb/s with PIN-preamp module and CDR-MUX, in ISSCC Dig.
Tech. Papers, Feb. 1998, pp. 308309.
[10] J. Hauenschild et al., A plastic packaged 10-Gb/s biCMOS clock and
data recovering 1 : 4-demultiplexer with external VCO, IEEE J. SolidState Circuits, vol. 31, pp. 20562059, Dec. 1996.
[11] R. C. Walker et al., A two-chip 1.5-GBd serial link interface, IEEE J.
Solid-State Circuits, vol. 27, pp. 18051811, Dec. 1992.
[12] R. Steele, Delta Modulation Systems. New York/Toronto: Wiley, 1975.
[13] M. Soda, T. Suzaki, and T. Morikawa et al., A Si bipolar chip set for
10-Gb/s optical receiver, in ISSCC Dig. Tech. Papers, Feb. 1992, pp.
100101.

Peter Schvan (M89) was born in Budapest, Hungary, in 1952. He received the M.S. degree in physics
from Eotovos Lorand University, Budapest, in 1975
and the Ph.D. degree in electrical engineering from
Carleton University, Ottawa, Ontario, Canada, in
1985.
In 1985, he joined Nortel Neworks, Ottawa, where
he started working in the area of BiCMOS and
bipolar technology development, yield prediction,
device characterization, and modeling. Recently, his
work has been extended to the design of multi-gigabit
circuits and systems. He is currently Senior Manager of a group responsible
for evaluating various high-performance technologies and demonstrating
advanced circuit concepts required for fiber optic communication systems. He
has authored and co-authored numerous publications.

Jonathan L. Showell (S90M95) received the


B.Eng and M.Eng degrees in engineering physics
from McMaster University, Hamilton, ON, Canada,
in 1990 and 1994, respectively.
He joined Nortel Networks, Ottawa, Canada, in
1994, working on hot carrier injection reliability of
CMOS devices. Later he became a member of the
Technology Access and Applications Group where
his responsibilities included accurate high-frequency analog (up to 110 GHz) and digital (40
Gb/s) measurements and the design of high-speed
10- to 40-Gb/s, multiplexer/demultiplexer circuits in SiGe HBT and InP
HBT technologies, respectively. Recently, he joined Quake Technologies,
Ottawa, Canada, working on the design of chip sets for high-speed datacom
applications. His interests include high-speed technologies, circuit design for
high-speed communications, and accurate high-frequency measurements.

Mu-Liang Xu (M00), biography not available at time of publication.

GRESHISHCHEV et al.: Fully Integrated SiGe Receiver IC

Jugnu J. Ojha (M00) received the B.Eng. degree from Salhousie University
and the Technical University of Nova Scotia in 1987. He received the M.Sc. and
Ph.D. degrees from McMaster University, Hamilton, Ontario, Canada, in 1990
and 1994, respectively. His graduate work involved research in electronic and
optoelectronic devices, as well as optoelectronic properties of semiconductors.
He was with Nortel Networks, Ottawa, Ontario, Canada, from 1994 to 2000,
where he worked on a wide range of technologies, including design of circuits
for 10 and 40 Gb/s optical transmission systems using SiGe and InP HBTs. He
also led a program in MEMS technology, with a focus on optical applications,
including optical crossconnects. His other activities included next-generation
optical network development, as well as research on optical properties of SiGe
materials and devices. He recently joined Caspian Networks in Palo Alto, CA,
as Senior Advisor in Optical Networking.

1957

Jonathan E. Rogers (S00) received the B.A.Sc degree in engineering sciences (electrical option) from
the University of Toronto, Ontario, Canada, in 1999.
He is currently working toward the M.A.Sc in electronics at the University of Toronto. His area of research is clock and data recovery systems in deep
sub-micron CMOS.
In May, 1997, he joined Nortel, Ottawa, Ontario,
for a 16-month internship, where he performed clock
and data recovery system characterization, VCO design, and high-speed measurements on SiGe MMICs
under the guidance of Dr. Y. Greshishchev.

1120

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 9, SEPTEMBER 2002

Clock and Data Recovery IC for 40-Gb/s


Fiber-Optic Receiver
George Georgiou, Member, IEEE, Yves Baeyens, Member, IEEE, Young-Kai Chen, Fellow, IEEE,
Alan H. Gnauck, Senior Member, IEEE, Carsten Grpper, Peter Paschke, Rajasekhar Pullela, Mario Reinhold,
Claus Dorschky, John-Paul Mattia, Timo Winkler von Mohrenfels, and Christoph Schulien

AbstractThe integrated clock and data recovery (CDR)


circuit is a key element for broad-band optical communication
systems at 40 Gb/s. We report a 40-Gb/s CDR fabricated in
indiumphosphide heterojunction bipolar transistor (InP HBT)
technology using a robust architecture of a phase-locked loop
(PLL) with a digital earlylate phase detector. The faster InP HBT
technology allows the digital phase detector to operate at the full
data rate of 40 Gb/s. This, in turn, reduces the circuit complexity
(transistor count) and the voltage-controlled oscillator (VCO)
requirements. The IC includes an on-chip LC VCO, on-chip clock
dividers to drive an external demultiplexer, and low-frequency
PLL control loop and on-chip limiting amplifier buffers for the
data and clock I/O. To our knowledge, this is the first demonstration of a mixed-signal IC operating at the clock rate of 40 GHz.
We also describe the chip architecture and measurement results.
Index TermsClock and data recovery, CDR, fiber-optic communication receiver, InP HBT, limiting amplifier, phase detector,
VCO.

I. INTRODUCTION

LOCK and data recovery (CDR) is an important function


of the transceiver of a high bit-rate lightwave communication system. Since 40-Gb/s systems are nearing commercial
deployment, the chosen CDR architecture must have few external components and be insensitive to temperature and component variations. CDRs reported in the literature use either
external filter-based architecture or a
a high quality-factor
phase-locked loop (PLL) architecture.
While the high- filter architecture is easier to implement, it
is susceptible to temperature and group delay variations in the
filter [1]. Specifically, the temperature drift of the filter bandpass and the temperature drift of the timing within the IC are
not correlated. Also, once the clock signal is recovered, additional precise clock phase adjustment is needed to set the decision sampling time to obtain proper phase margin. This phase
Manuscript received January 30, 2002; revised April 30, 2002.
G. Georgiou, Y. Baeyens, and Y.-K. Chen are with Lucent Technologies, Bell
Laboratories, Murray Hill, NJ 07974 USA (e-mail: gundsn@lucent.com).
A. H. Gnauck is with Lucent Technologies, Bell Laboratories, Holmdel, NJ
07733 USA.
C. Grpper and P. Paschke are with Lucent Technologies, Optical Networking
Group, 90411 Nrnberg, Germany.
R. Pullela was with Lucent Technologies, Bell Laboratories, Murray Hill, NJ
07974 USA. He is now with Gtran Inc., Thousand Oaks, CA 91362 USA.
M. Reinhold, C. Dorschky, T. W. von Mohrenfels, and C. Schulein were with
Lucent Technologies, Optical Networking Group, 90411 Nrnberg, Germany.
They are now with Core Optics, 90411 Nrnberg, Germany.
J. P. Mattia was with Lucent Technologies, Bell Laboratories, Murray Hill,
NJ 07974 USA. He is now with BigBear Networks, Milpitas, CA 95035 USA.
Publisher Item Identifier 10.1109/JSSC.2002.801186.

Fig. 1. Schematic diagram of a lightwave transceiver.

adjustment is further complicated by packaging issues, specifically that of aligning the recovered clock after the off-chip filter
and the decision IC.
In PLL-based CDRs, the clock phase in the decision circuit
is automatically synchronized to sample the center of the time
slot of each bit. Also, the PLL can be integrated onto a single
IC, greatly reducing temperature drift and phase relationship
problems.
Previous implementations of digital PLL-based CDRs at
40 Gb/s have employed half bit-rate clocking of CDR demultiplexer (DEMUX) combinations, to reduce the bandwidth
required in the buffers and digital gates [2][4]. By clocking
at 20 GHz to tolerate lower transistor bandwidth, the 2
parallel phase detector requires higher circuit complexity
with about twice the transistor count and a precise four-phase
voltage-controlled oscillator (VCO).
In this paper, we leverage the InP heterojunction bipolar transistor (HBT) technology operating at the full 40-Gb/s data rate
to simplify the CDR architecture.
II. CDR ARCHITECTURE
The typical transceiver architecture for a 40-Gb/s lightwave
system is shown in Fig. 1. The CDR IC of this work is highlighted in the receiver path.
The core of the PLL-based CDR is the phase detector. The
phase detector used here simultaneously recovers both clock and
data. The digital earlylate phase detector [5], consisting of data
flip-flops and combinatory logic gates, is shown in Fig. 2. In the
locked state, the chain samples the center of the incoming

0018-9200/02$17.00 2002 IEEE

GEORGIOU et al.: CDR IC FOR 40-Gb/s FIBER-OPTIC RECEIVER

Fig. 2. Digital phase detector architecture.

1121

Fig. 3.

CDR IC block diagram.

data time slot, while the chain samples the data zero crossings. The combinatory logic block, with inputs from the , ,
and latch chains, determines if the clock is early or late with
respect to the incoming data transition. This logic generates the
UPDOWN control signal for the VCO.
and are generated by decision circuits (respectively, after
two and four latches). The phase difference between and is
180 (one bit). is generated after three latches with an inverted
clock. If the clock phase is correct, is always in the middle
of and . EXOR ( ) and NAND ( ) gate combinatory logic
converts , , and to the UPDOWN pulses for controlling
the VCO. The logic equations are as follows.
UP
DOWN
The clock is early (slows down the VCO) if
and
,
. The clock is late (speeds up the VCO) if
UP DOWN
and
, UP DOWN
. The clock is correct if
. Obviously, data transitions are required for clock
recovery.
The chip also incorporates 15-dB gain-limiting amplifier
buffers at the data I/O, at the divide-by-2 clock output (for the
external DEMUX) and at the divide-by-32 clock output (for the
external coarse adjustment low-frequency lock loop). The static
frequency dividers use similar latches as those in the digital
phase detector. Figs. 3 and 4, respectively, show the final chip
block diagram and photograph.
The block diagram of Fig. 3 is laid over Fig. 4. To maintain
symmetry in the 40-Gb/s data and 40-GHz clock signals, the
phase detector layout contains four symmetric rows (as in the
block diagram of Fig. 2).
III. CDR DESIGN AND FABRICATION
Differential emitter-coupled logic (ECL) logic with 400-mV
differential voltage swings is used. To simplify the layout
process at a high bit rate, standardized digital and analog blocks
are designed and used.
All high-frequency inputs are terminated with on-chip 50resistors. Propagation delay of the 40-GHz clock to the ,
, and
latches and to the divider chain is a critical issue.
Matching propagation delay is achieved by symmetric layout

Fig. 4. CDR IC photograph.

and symmetric loading (for example, the dummy latch in the


chain of Fig. 2).
Coplanar transmission lines with controlled impedance are
) to reduce reflections and to imused for longer lines (
prove timing accuracy. Lines driven by emitter followers are
kept short to avoid reflections due to impedance mismatch.
A series gate approach is used for the clock and data signals.
To improve high-frequency performance, a transadmittance
(TAS) and transimpedance (TIS) combination [6], connected
by coupled coplanar transmission lines is used for clock and
data amplification and distribution. TASTIS buffer amplifiers
have a higher gain-bandwidth product (because of the active
TIS load) than does a simple ECL buffer. The buffer amplifier
signal splitting and transistor level design are shown in Fig. 5.
The VCO transistor level schematic is shown in Fig. 6. The
VCO is based on a differential Colpitts topology using coplanar
transmission lines which can be modeled very accurately at 40
feeds two
GHz, instead of inductors. The tuning input
reverse-biased varactors realized from the basecollector junctions of open-emitter HBTs. Note that minimum phase noise is
not a design parameter since the earlylate (or bangbang) PLL
architecture has very high bandwidth with respect to loop gain.

1122

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 9, SEPTEMBER 2002

Fig. 5. Buffer amplifier cascaded TASTIS architecture and transistor level schematic.

Fig. 6. VCO transistor level schematic.

Thus, the VCO phase noise is reduced by the open-loop gain of


the locked PLL.
The chip is fabricated in an InP HBT technology [7]. Peak
are 160 and 135 GHz, respectively. The
transistor and
transistors are conservatively biased at half the collector current for peak . The nominal transistor has a 1 m 3 m
emitter biased with a collector current of 4 mA. The interconnects use one metal layer for longer wires and one local metal
layer for shorter wires. Passive elements are fabricated using
tantalumnitride thin-film resistors and siliconnitride dielectric capacitors.
)
A power-supply series decoupling resistor (
is used to prevent any spurious low-frequency oscillations that
could arise in the packaged IC. The internal circuit is designed
to operate between 4 and 5 V. The circuit draws nominally
1 A. The nominal total power dissipation is 5.6 W, 1.6 W across

Fig. 7. Retimed 40-Gb/s data (40-GHz clock, 30 mV/div, 10 ps/div) and


phase-detector transfer functions (UP
DOWN with 40-GHz clock offset
10 MHz).

<

the decoupling series resistor and 4 W internally. The chip area


is 1.75 mm 1.75 mm.
IV. MEASUREMENT RESULTS
Two versions of the CDR chip (with and without VCO)
were fabricated. The results of on-wafer measurements on
the CDR with external VCO are discussed here. The input
signal generated by a commeris a single-ended 0.5
cial 40-Gb/s 4 : 1 multiplexer, driven with four independent
pseudorandom bit streams (PRBS) of a
10-Gb/s 2
pattern generator. A synthesizer generates the 40-GHz clock
signal for this measurement. Fig. 7 shows the retimed 40-Gb/s
eye from the CDR IC. (It should be noted that the on-wafer

GEORGIOU et al.: CDR IC FOR 40-Gb/s FIBER-OPTIC RECEIVER

1123

Fig. 8. Divide-by-2 frequency spectrum of on-chip VCO. VCO designed for


40 GHz but actual at 42 GHz divided to 21 GHz with an output power of 5
dBm.

measurement is limited by the equipment. The retimed data


eye is sharper than that of the commercial multiplexer. Also,
the jitter is characteristic of the digital oscilloscope used for
the measurement.) Decision circuit phase margin greater than
180 is measured by changing the clock phase with respect to
incoming data and observing the output data eye. The phase
transfer function (UP and DOWN) at the bottom of Fig. 7 is
measured by introducing a 10-MHz offset between the clock
frequency and data bit rate.
Measurements of the CDR with on-chip LC VCO indicate
that the VCO is capable of driving 40-Gb/s digital gates. However, the VCO center frequency is higher than simulated, probably because of capacitance or inductance (transmission-line)
drift during this process run. The VCO operates in the band between 40.5 and 42.5 GHz. The divider chains operate up to a
frequency of 44 GHz. The output of the divide-by-2 is shown in
Fig. 8 for the VCO tuned to 42 GHz.
To evaluate the packaged performance of the CDR chip in an
optical transmission system, we used the CDR chip as a single
channel 1 : 4 DEMUX by applying 40-Gb/s data and 10-GHz
clock. (The packaging used here for 40 Gb/s is relatively simple.
The chip is mounted into a cutout in a composite Rogers 4003
on FR-4 board. This chip recess corresponding approximately
the chip thickness 8 mil, reduces the length of the bond wires
to the 50- coplanar GSSG transmission lines designed on the
R4003 substrate. (See [8, Fig. 6].)
The experimental optical time-division multiplexing
(OTDM) link is shown in Fig. 9. As before, 4 : 1 MUX is used
to multiplex independent 10-Gb/s nonreturn-to-zero (NRZ)
PRBS data streams, from a commercial pulse pattern
2
PRBS NRZ
generator. The resulting electrical 40-Gb/s 2
PRBS RZ
signal is converted to an optical 40-Gb/s 2
signal by a pulse-carving technique with cascaded modulators.
Optical power is converted back to the electrical signal with
a p-i-n photodetector. The 40-Gb/s RZ eye after the p-i-n
is demultiplexed to 10 Gb/s using the CDR chip as a single
channel 1 : 4 DEMUX.
Fig. 10 shows the CDR IC performance as a one-channel
DEMUX. The demultiplexed 10-Gb/s eye is very open and has
very low jitter ( 4 ps limited by the oscilloscope bandwidth).
The error probability is also measured as a function of received

Fig. 9. Experimental optical link to measure the bit-error rate of 40-Gb/s RZ


transmission.

Fig. 10. CDR IC used as single channel 1 : 4 DEMUX (40-Gb/s data, 10-GHz
clock). Electrical output at 10 Gb/s and BER versus optical power measurement
of the optical link of Fig. 8.

optical power. The received optical power is 29.5 dBm for the
typical system required bit-error rate (BER) of 10 .
V. CONCLUSION
A complex ( 1350 transistors and 1.75-mm square)
mixed-signal CDR chip with on-chip VCO, amplifiers, decision circuit, and clock dividers was successfully fabricated
with a state-of-the-art InP HBT technology. Fully functional
chips at speed were obtained from the first iteration. Data is
retimed at 40 Gb/s and a good control signal is made available
to the on-chip VCO. The CDR IC is used as a DEMUX to
PRBS RZ signal to an
convert an optical 40-Gb/s 2

1124

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 9, SEPTEMBER 2002

electrical 10-Gb/s 2
PRBS NRZ signal in an optical link
experiment. An optical sensitivity of 29.5 dBm is measured
at 10 BER.
REFERENCES
[1] R. Yu, R. Pierson, P. Zampardi, K. Runga, A. Campana, D. Meeker, K. C.
Wang, A. Peterson, and J. Bowers, Packaged clock recovery integrated
circuits for 40-Gb/s optical communication links, in GaAs IC Symp.
Tech. Dig., 1996, pp. 129132.
[2] M. Wurzer, J. Bock, H. Knapp, W. Zirwas, F. Schumann, and A. Felder,
A 40-Gb/s integrated clock and data recovery circuit in a 50-GHz f
silicon bipolar technology, IEEE J. Solid-State Circuits, vol. 34, pp.
13201324, Sept. 1999.
[3] J. Hauenschild, C. Dorschky, T. W. von Mohrenfels, and R. Seitz, A
plastic packaged 10-Gb/s BiCMOS clock and data recovering 1 : 4 demultiplexer with external VCO, IEEE J. Solid-State Circuits, vol. 31,
pp. 20562059, Dec. 1996.
[4] M. Reinhold, C. Dorschky, R. Pullela, E. Rose, P. Mayer, P. Paschke, Y.
Baeyens, J. P. Mattia, and F. Kunz, A fully integrated 40-Gb/s clock and
data recovery/1 : 4 DEMUX IC in SiGe technology, IEEE J. Solid-State
Circuits, vol. 36, pp. 19371945, Dec. 2001.
[5] J. D. H. Alexander, Clock recovery from random binary signals, Electron. Lett., vol. 11, pp. 541542, 1975.
[6] H.-M. Rein, Design considerations for very high speed Si-bipolar ICs
operating up to 50 Gb/s, IEEE J. Solid-State Circuits, vol. 31, pp.
10761090, Aug. 1996.
[7] M. Sokolich, D. Doctor, Y. Brown, A. Kramer, J. Jensen, W. Stanchina,
S. Thomas, C. Fields, D. Ahmari, M. Liu, R. Martinez, and J. Duvall,
A low power 52.9-GHz static divider implemented in a manufacturable
180-GHz InAlAs/InGaAs HBT IC technology, in GaAs IC Symp. Tech.
Dig., 1998, pp. 117120.
[8] G. Georgiou, P. Paschke, R. Kopf, R. Hamm, R. Ryan, A. Tate, J. Burm,
C. Schullien, and Y.-K. Chen, High gain limiting amplifier for 10-Gb/s
lightwave receivers, in Proc. 11th Int. Conf. InP and Related Materials,
1999, pp. 7174.

George Georgiou (M92) was born in Greece


in 1954. He received the Ph.D. degree in applied
physics from Columbia University, New York, NY,
in 1980.
He joined AT&T (now Lucent Technologies) Bell
Laboratories in 1980 to develop sub-micron X-ray
lithography systems. He proceeded into process
integration of novel gate and metal structures for
sub-micron silicon CMOS. He is currently a Member
of Technical Staff with the High-Speed Electronics
Research Department of Lucent Technologies, Bell
Laboratories, Murray Hill, NJ. His current interest is mixed-signal IC design
for high-speed lightwave communications systems using InP and SiGe HBT
technologies.

Yves Baeyens (S87M96) received the M.S. and


Ph.D. degrees in electrical engineering from the
Catholic University, Leuven, Belgium, in 1991
and 1997, respectively. His Ph.D. research was
performed in cooperation with IMEC, Leuven, and
treated the design and optimization of coplanar
InP-based dual-gate HEMT amplifiers operating up
to the W-band.
He was a Visiting Scientist with the Fraunhofer Institute for Applied Physics, Freiburg, Germany, for a
year and a half, and is currently a Technical Manager
in the High-Speed Electronics Research Department of Lucent Technologies,
Bell Laboratories, Murray Hill, NJ. His research interests include the design of
mixed analog-digital circuits for ultrahigh-speed lightwave and millimeter-wave
applications.

Young-Kai Chen (S78M86SM94F98) received the B.S.E.E. degree


from National Chiao Tung University, Hsinchu, Taiwan, R.O.C., the M.S.E.E.
degree from Syracuse University, Syracuse, NY, and the Ph.D. degree from
Cornell University, Ithaca, NY, in 1988.
From 1980 to 1985, he was a Member of Technical Staff in the Electronics
Laboratory of the General Electric Company, Syracuse, responsible for the design of silicon and GaAs MMICs for phase array applications. Since 1988,
he has been with Lucent Technologies, Bell Laboratories, Murray Hill, NJ,
as a Member of Technical Staff. Since 1994, he has been the Director of the
High Speed Electronics Research Department. He is also an Adjunct Associate Professor at Columbia University, New York, NY. His research interest
is in high-speed semiconductor devices and circuits for wireless and fiber-optic
communications. He has authored more than 90 technical papers and holds nine
patents in the field of high-frequency electronic and semiconductor lasers.
Dr. Chen is a member of the American Physics Society and the Optical Society of America.

Alan H. Gnauck (M98SM00) received the B.S. degree in physics and the
M.S. degree in electrical engineering from Rutgers University, New Brunswick,
NJ, in 1975 and 1986, respectively.
In 1982, he joined AT&T (now Lucent Technologies) Bell Laboratories. He
has designed and built multigigabit amplifiers, multiplexers, demultiplexers,
and optical receivers, and performed record-breaking optical transmission
experiments at single-channel rates of from 2 to 40 Gb/s. He has investigated
coherent detection, chromatic-dispersion compensation techniques, CATV
hybrid fiber-coax architectures, wavelength-division-multiplexed (WDM)
systems, and system impacts of fiber nonlinearities. His WDM transmission
experiments include the first demonstration of terabit transmission. He is a
Technical Committee Member of the Optical Fiber Communications Conference (OFC) 2003. He holds twelve patents in optical fiber communications.
His current research interests include the study of WDM systems with
single-channel rates of 40 Gb/s.
Dr. Gnauck is an Associate Editor for IEEE PHOTONICS TECHNOLOGY
LETTERS.

Carsten Grpper was born in Mnster, Germany, in


1971. He received the Dipl.-Ing. degree in electrical
engineering from the Ruhr University, Bochum, Germany, in 1998.
He joined Lucent Technologies, Nrnberg,
Germany, in 1998. He is currently with the Optical
Networking Group, Lucent, developing high-speed
bipolar ICs for 10- and 40-Gb/s optical communication systems.

Peter Paschke was born in Dusseldorf, Germany, on


May 21, 1959. He received the M.S. degree in electrical engineering from the Ruhr University, Bochum,
Germany, in 1988.
In 1988, he joined Philips Kommunikation Industries, Nrnberg, Germany, as a Full Custom ASIC
Designer. He is currently a Technical Manager with
Lucent Technologies GmbH, Nrnberg, where he is
responsible for the high-speed ASICs. His main focus
is analog circuits such as laser drivers and limiting
amplifiers for high bit rates up to 40 Gb/s. In the lightwave system area, he has been involved in 2.5-Gb/s receiver design and several
research projects for 40 Gb/s.

Rajasekhar Pullela received the B.Tech. degree in electrical and communications engineering from the Indian Institute of Technology, Madras, India, in
1993. From 1993 to 1998, he was a graduate student researcher at the University
of California, Santa Barbara. During this period, he received M.S. and Ph.D degrees in electrical engineering, studying device physics and high-speed circuit
design.
During 19982000, he was a Member of Technical Staff with Lucent Technologies, Bell Laboratories, Murray Hill, NJ, designing high-speed ICs for fiberoptic communication systems. Since 2000, he has been with Gtran, Inc., Newbury Park, CA.

GEORGIOU et al.: CDR IC FOR 40-Gb/s FIBER-OPTIC RECEIVER

Mario Reinhold was born in Mlheim/Ruhr,


Germany, in 1972. He received the Dipl.-Ing. degree
in electrical engineering from the Ruhr University,
Bochum, Germany, in 1998.
He joined the Optical Networking Group, Lucent Technologies, Nrnberg, Germany, in 1998,
where his activities focused on the development
of various analog and digital high-speed bipolar
ICs for 40-Gb/s and advanced 10-Gb/s fiber-optic
communication systems. Since 2001, he has been
with CoreOptics Inc., Nrnberg, working on a
next-generation 40-Gb/s chipset.

Claus Dorschky received the Dipl.-Ing degree in electrical engineering from


Friedrich Alexander University, Erlangen, Germany, in 1986.
He was with Phillips Kommunikation Industries (later Lucent Technologies),
Nrnberg, Germany, for 14 years, working in the development department for
high-speed optical transmission systems. His research interests include design
and integration of analog and mixed-signal full custom ICs for 10- and 40-Gb/s
as well as integration of optical receivers and transmitters into single-wavelength and DWDM transmission systems at those bit rates. In early 2001, he
cofounded CoreOptics Inc., Nrnberg.

1125

John-Paul Mattia received the B.S., M.S.E.E.,


and Ph.D. degrees in electrical engineering and
computer science from the Massachusetts Institute
of Technology (MIT), Cambridge.
He began working in high-speed electronics
at MIT Lincoln Laboratory in 1989. In 1996, he
joined Texas Instruments Incorporated in the DSP
R&D organization. From 1997 to 2000, he worked
in the High-Speed Electronics Group, Lucent
Technologies, Bell Labs, designing and testing
circuits for lightwave communication systems. Since
July 2000, he has been with Big Bear Networks, Sunnyvale, CA, where he is
Chief Technical Officer of Electronics.

Timo Winkler von Mohrenfels, photograph and biography not available at


time of publication.

Christoph Schulien, photograph and biography not available at time of publication.

1156

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 7, JULY 1997

Clock/Data Recovery PLL Using Half-Frequency Clock


M. Rau, T. Oberst, R. Lares, A. Rothermel, R. Schweer, and N. Menoux

Abstract A clock and data recovery PLL is described for


serial nonreturn-to-zero (NRZ) data transmission. The voltage
controlled oscillator (VCO) works at half the data rate, which
means for a 1-Gb/s data rate, the VCO runs at 500 MHz. A
specially designed phase comparator uses a delay-locked loop
(DLL) to generate the required sampling clocks to compare clock
and data. The VCO can typically be tuned from 350 MHz to 890
MHz and the phase-locked loop (PLL) locks between 720 Mb/s
and 1.3 Gb/s. Data recovery is error free up to 1.2 Gb/s with
a 9-b pseudorandom data sequence. The core consumes 85 mW
(3.3 V) at 1 Gb/s.
Index TermsBang-bang control, CMOS digital integrated circuits, data communication, high-speed integrated circuits, phase
locked loops, synchronization.

I. INTRODUCTION

IGITAL signal processing becomes economical in consumer applications. The main requirement there is low
cost in mass production. Digital processing and transmission
has to be carried out with low power and in cheap IC packages.
Data transmission between different digital signal processing ICs influences significantly the power consumption and
the system cost. For video signal transmission in 100 Hz TV
sets, typically 16 data lines in parallel are driven with 27 MHz
rail-to-rail nonreturn-to-zero (NRZ) data signals. Sharp data
transitions are in use to ensure reliable synchronous operation.
A power saving alternative could be found in low-swing highspeed serial data transmission in the range of 500 Mb/s or
more. However, this kind of high-speed data transmission
has to be asynchronous. The most economic solution avoids
separate transmission of the clock. In that case, clock recovery
from the NRZ data stream is required. In this paper we describe
a phase-locked loop (PLL) which is designed to process more
than 1 Gb/s data in a 0.5- m CMOS technology.

II. ARCHITECTURE
The PLL generally consists of three building blocks (Fig. 1):
1) phase comparator, detecting the phase difference between the data and the recovered clock;
Manuscript received December 15, 1996; revised February 6, 1997. This
work was supported in part by the German Ministry for Education and
Research under Contract 01M2880A.
M. Rau was with the University of Ulm, Germany. He is now with
Siemens AG, 81359 Munich, Germany.
T. Oberst was with the University of Ulm, Germany. He is now with
DASA, D-89077 Ulm, Germany.
R. Lares and A. Rothermel are with the Microelectronics Department,
University of Ulm, D-89081 Ulm, Germany.
R. Schweer is with Thomson Multimedia, D-78048 VillingenSchwenningen, Germany.
N. Menoux is with Thomson, 38240 Meylan, France.
Publisher Item Identifier S 0018-9200(97)04386-2.

Fig. 1. Classic PLL.

2) loop filter, filtering the phase detector output and forming the control signal for the oscillator;
3) voltage controlled oscillator (VCO).
The unusual feature in our design is the phase detector, which
uses a delay-locked loop (DLL) to generate multiple sampling
clocks. Thus, the VCO can run at only half the data rate,
which means that we can detect a 1-Gb/s serial data stream
with a 500-MHz VCO. This relieves the timing constraints
in the phase detector logic and results in well correlated and
data independent control signals. Also, at the lower frequency
the VCO tuning range is large enough to compensate all
technology parameter variations. With this architecture we
could achieve higher data rates.
The block diagram of the circuit is shown in Fig. 2. No
external components are required for the PLL. The loop filter
capacitor is integrated on chip together with the VCO, the
phase comparator, and a charge pump. The data stream is
retimed in two flip-flops with the inverted and noninverted
clock. Two flip-flops are required because the clock has only
half the data rate. These two half-speed data streams are
combined in a multiplexer, forming an output stream at the
original data rate. A lock-in circuit is realized on chip, because
the phase comparator is not frequency sensitive.
III. PHASE COMPARATOR
The PLL adjusts the clock to an incoming data stream.
Because of the random nature of data there is not necessarily
a data transition at every clock cycle. The loop has to handle
sequences of consecutive zeroes or ones in the data stream.
The following phase comparator output signal properties are
essential.
First, the phase comparator must not give any output signal
if there is no data edge. Second, the duration of the control
signal pulses at the data transitions is important, especially if
there are few of them. In general, for a good loop performance
the control signal should be proportional to the phase error.
However, for very high operating frequencies, analog signals
depend on the data pattern and become highly nonlinear,
because they do not settle during the bit duration. It was
found by simulation that different phase detectors with analog
outputs [1], [2] limit the PLL operating frequency. On the other
hand, clock recovery schemes based on sampling techniques
[3], [4] result in uniform digital control pulses. They are

00189200/97$10.00 1997 IEEE

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 7, JULY 1997

1157

Fig. 2. Clock recovery block diagram.

(a)

Fig. 3. Phase comparator.


(b)

best suited to support highest possible data rates at a given


technology.
The phase comparator used here is an extension of the
circuit from [3], modified to work with half the normal
clock frequency (Fig. 3). The data stream is sampled at four
equally spaced timepoints. The logic circuitry driven by the
flip-flops generates the up and down control pulses for the
VCO according to Fig. 4. Because these control pulses are
generated by clocked flip-flops, they are of well defined width.
The advantage is that they do not depend on the data pattern.
On the other hand, they do not reflect the amount of the phase
error, either. The pulse width is constant, even for very small
phase errors. This so-called bang-bang operation generates an
increased jitter in the locked state. However, the magnitude
is much smaller compared to the one introduced by datadependent and nonlinear analog pulses at high frequencies.
The phase logic evaluates only rising signal edges, in order
not to depend on duty cycle variations of the input signal.
There is an issue to be taken care of when dimensioning
the flip-flops and the phase logic. The stable operating point
of the loop is reached when the signal is sampled exactly
at its transition (see Fig. 4). Thus the loop forces the flipflop to sample the metastable state, which is not allowed in
normal flip-flop operation. In this application, however, it is
not critical for the operation. If the metastable state is sampled,
it does not matter whether it will be interpreted as up or down,
because any decision is equally wrong, as we are at the stable
operating point, i.e., zero phase error. Only the jitter of the

(c)
Fig. 4. Operation of the phase detector: (a) data at sampling time B equals
data transition is late
the data at the preceding sampling time A
frequency up, (b) data at sampling time B equals the data at the following
data transition is early
frequency down, (c) data at
sampling time A
sampling time A equals the data at the preceding sampling time A
no data
edge, no control signal output.

)
)

bang-bang operation results. Also, there is an increased short


current inside the flip-flops that has to be limited.
For uniform pulses and small jitter, absolutely identical
sampling intervals are required. Therefore, a DLL has been
implemented to generate four 90 shifted clock phases clk1
clk4 from the VCO output signal (Fig. 5). The loop compares
the phase of the original clock to a clock fed through four
adjustable delay elements. The clock signal repeats with a
period . A delay element in Fig. 5 can therefore delay by
, or as well by
. By rearranging the output signals,
are also possible. With a
delay times of
, it is not possible to compensate for
delay element for
all technology and environment variations. Therefore, it is
necessary to select a larger value for the delay, to just be
able to deal with all technology parameter variations.

1158

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 7, JULY 1997

Fig. 5. DLL to generate all 90 phase shifted sampling clocks with high
accuracy.

Fig. 7. VCO schematic.

Fig. 8. VCO frequency versus control voltage.


Fig. 6. Current mirror charge pump.

VI. LOCK-IN CIRCUIT

The stability of the system containing two coupled loops


can be guaranteed for two reasons. First, the DLL is a firstorder loop and inherently stable. Second, the time constants
of the two loops are two orders of magnitude different.

IV. CHARGE PUMP

AND

LOOP FILTER

The control pulses drive a current mirror charge pump [6]


(Fig. 6) which assures that the charge delivered to the loop
filter does not vary with the VCO control voltage. The charge
pump allows the realization of an ideal integrator transfer
function (pole at
) with no additional active amplifier,
resulting in a zero-phase error in steady state. A simple RC
network shown in Figs. 2 and 6 is used for the low-pass loop
filter. The current level of the charge pump and hence the
charge delivered at every rising data transition can be set
to a small value. This allows the implemention of the loop
capacitor on chip.

V. VCO
Both high oscillation frequency and a wide tuning range
are required. We choose a ring oscillator design with variable
load capacitors (Fig. 7) based on [5]. Duty cycle is not an
issue here, because the flip-flops all are triggered with the
same edge; the DLL generates the required phase shifts. This
circuit can safely cope with all parameter variations. Fig. 8
shows the VCO tuning characteristic.

The bang-bang operation and the data dependent phase


detector output signal require a narrow loop bandwidth for
a low jitter. This results in a reduced pull-in range of the
PLL. Instead of adapting the loop bandwidth during operation
we created a lock-in circuit which is active only after power
up. For lock-in, a 1010-sequence has to be fed to the circuit.
The VCO is swept, starting with the highest frequency. When
clock and input frequencies are the same, the sampled data
(before the Mux) do not change. An edge-triggered monoflop
then stops the frequency sweep and closes the PLL.
VII. LAYOUT
Fig. 9 shows the test chip. A large area is used for the onchip loop filter capacitor (upper left). A comparable area is
required for the ring oscillator, including its load capacitors
(lower left). Because the series resistance of those load capacitors is more critical compared to the one in the loop filter,
a finer finger structure was chosen. All capacitors have been
realized as MOS-transistor gates. No special mask is required.
In the top right area are located the lock-in circuit and the
DLL with its loop filter, whereas in the lower middle and to
the right, buffers and control logic can be seen.
VIII. MEASUREMENT RESULTS
We verified locking of the PLL at data rates from 720 to
1300 Mb/s with pseudorandom sequences up to
bit
at the data input. However, data recovery is not guaranteed
under these conditions because of the clock jitter. Fig. 10
shows the maximum available data rates for different lengths

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 7, JULY 1997

1159

Fig. 11. VCO clock output and data output eye pattern at 1 Gb/s with a
(215
1)-bit length pseudorandom input sequence.

Fig. 9. Chip micrograph.

Chip core area is 0.38 mm , power consumption without


pad drivers is 85 mW at 1 Gb/s, 0.5- m CMOS, 3.3 V supply.
Only 1/4 of the power consumption is proportional to the clock
frequency, 3/4 are constant. The circuit consumes 91 mW at
1.3 Gb/s. The power saved by using only half the conventional
clock frequency is partly used to supply the DLL, which needs
21 mW ( 1/4 of total power at 1 Gb/s).
No external components are required, except one reference
current, which is not very critical (a 20% variation is
allowed).
IX. CONCLUSION

Fig. 10. Maximum data rate versus pseudorandom sequence length for
error-free receiving during time of measurement (complies with error rate
smaller than 1 10011 ).

of the pseudo random sequences for correct data recovery.


Measurement period was 10 clocks (corresponding to a bit
).
error rate of
At very high data rates, clock and data phase precision has
to be better at the input of the retiming flip-flops, because
the eyes become smaller. The lower required phase jitter
corresponds to shorter pseudorandom sequences.
-bit
Fig. 11 shows the locked PLL at 1 Gb/s with a
length pseudorandom sequence. The clock jitter is about
350 ps, which is caused mainly by the bang-bang operation
of the phase comparator. We believe that this behavior can
be improved by reducing the uncertain time interval of the
sampling flip-flop, i.e., reducing their setup-and-hold times and
increasing the clock slope.
All measurements have been done with the IC housed
in a standard 16-pin dual in-line ceramic package which
shows rather poor high-frequency performance. It was our
goal to demonstrate the circuit in a critical environment. Better
results could be expected when using packages with shorter
leads.

Complete on-chip clock and data recovery at 1 Gb/s is


feasible with a standard 0.5- m CMOS technology. Onchip clock is only 500 MHz in this case. Data are directly
demultiplexed one to two in the retiming flip-flops. A multiplexer to regenerate the original data stream was included for
measurement purposes only. In applications, serial-to-parallel
conversion will normally follow the PLL. In that case, the
halved clock frequency is an advantage, because the following
blocks can be designed more easily.
ACKNOWLEDGMENT
The authors greatly acknowledge perfect layout support by
Y. A. Savalle and G. Kimmich from TCEC. They thank J.
Borel from SGS-Thomson for providing the design kit and
acknowledge the fast sample production in the factory.
REFERENCES
[1] T. H. Lee, A 155-MHz clock recovery delay- and phase-locked loop,
IEEE J. Solid-State Circuits, vol. 27, pp. 17361746, Dec. 1992.
[2] B. Thompson, A 300-MHz BiCMOS serial data transciever, IEEE J.
Solid-State Circuits, vol. 29, pp. 185192, Mar. 1994.
[3] B. Lai and R. C. Walker, A monolithic 622 Mb/s clock extraction data
retiming circuit, in Int. Solid-State Circuits Conf., San Francisco, CA,
1991, vol. 306, pp. 144145.
[4] A. Pottbaecker, U. Langmann, and H.-U. Schreiber, A si bipolar phase
and frequency detector IC for clock extraction up to 8 Gb/s, IEEE J.
Solid-State Circuits, vol. 27, pp. 17471751, Dec. 1992.
[5] M. Bazes, A novel precision MOS synchronous delay line, IEEE J.
Solid-State Circuits, vol. 20, pp. 12651271, Dec. 1985.
[6] A. Waizman, A delay line loop for frequency synthesis of de-skewed
clock, in Int. Solid-State Circuits Conf., San Francisco, CA, 1994, pp.
298299.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 9, SEPTEMBER 2000

1353

SiGe Clock and Data Recovery IC with Linear-Type


PLL for 10-Gb/s SONET Application
Yuriy M. Greshishchev, Member, IEEE, and Peter Schvan, Member, IEEE

AbstractAn integrated 10 Gb/s clock and data recovery


(CDR) circuit is fabricated using SiGe technology. It consists of
a linear-type phase-locked loop (PLL) based on a single-edge
version of the Hogge phase detector, a LC-tank voltage-controlled
oscillator (VCO) and a tri-state charge pump. A PLL equivalent
model and design method to meet SONET jitter requirements are
presented. The CDR was tested at 9.529 GB/s in full operation and
up to 13.25 Gb/s in data recovery mode. Sensitivity is 14 mVpp
at a bit error rate (BER)
10 9 . The measured recovered clock
jitter is less than 1 ps rms. The IC dissipates 1.5 W with a 5-V
power supply.

Index TermsCharge pump, clock and data recovery (CDR),


jitter generation, jitter tolerance, jitter transfer, phase detector,
phase-locked loop (PLL), SONET, VCO.

I. INTRODUCTION
N A CLOCK and data recovery (CDR) circuit with integrated phase-locked loop (PLL), the reference clock is
extracted from the incoming data stream and is automatically
aligned to the center of the data pulse independent of its pattern.
Two CDR ICs have been reported operating at 10 Gb/s: one
with a linear PLL (LPLL) using a modified Hogge-type phase
detector (PD) [1] and the other with a binary PLL using a
bang-bang (Alexander)-type PD [2].1 CDR jitter characteristics
critical to SONET optical receiver design are Jitter Transfer
(Bandwidth and Jitter Peaking), Jitter Tolerance, and Jitter
Generation. In a linear CDR (LCDR), jitter transfer characteristics are independent of the jitter amplitude and can be
analytically predicted according to a LPLL theory. This feature
can be important for SONET applications, particularly if data
is to be retransmitted and jitter transfer must be well controlled.
This paper describes a 10 Gb/s LCDR with less than 1 ps
rms pattern-independent jitter generation required for SONET
application [3]. This jitter generation represents the best published CDR result. A number of techniques have been used to
achieve low jitter. First, a charge pump with tri-state is employed. This is a well known technique frequently used in conjunction with frequency-PD PLLs [4]. This technique is also
called switched-filter PLL [5]. Generally, this approach provides a hold mode in the PLL filter in the absence of data transitions and prevents variation of the voltage-controlled oscillator

Manuscript received December 8, 1999; revised February 2, 2000.


The authors are with Nortel Networks, Ottawa, ON K1Y 4H7, Canada.
Publisher Item Identifier S 0018-9200(00)05928-X.
1The linear property of a LPLL is due to a linear phase response of the
Hogge-type PD. In publication [1], [4] the Hogge-type PD is called a phase
comparator, which is a misleading terminology since the word comparator
implies a binary output. The bang-bang (Alexander)-type PD is a true phase
comparator.

(VCO) frequency (that causes jitter) during a long run of data


0 s or 1 s. Second, charge-pump and VCO control circuits were
designed to provide a high degree of PLL filter isolation, or low
charge-pump offset current, in a tri-state. In addition, the charge
pump has a high output impedance necessary for high loop gain
in a PLL with passive filter. Third, the original Hogge PD was
modified to provide a single-edge operation and to extend linear
phase range. Fourth, circuit and layout cross-talk isolation techniques similar to those presented in [6] are employed to prevent
jitter generation and sensitivity degradation due to a cross-talk.
The CDR IC was implemented in IBMs SiGe bipolar process
which includes pMOS devices.
Jitter characteristics of a LPLL depend to a large degree on
the PLL filter parameters. An LPLL equivalent model and design method to satisfy SONET requirements are presented in
this paper. A theoretical jitter tolerance function is introduced
based on considerations similar to those presented in [7]. It is
shown that in a LPLL, all of the jitter characteristics specified by SONET requirements can be analytically expressed via
and PLL damping factor . The
jitter transfer bandwidth
should be above 4 MHz to satisfy OC
PLL bandwidth
192 SONET jitter tolerance mask requirement, and the damping
to satisfy 0.1-dB jitter peaking.
factor should be above
In Section II the CDR architecture is described with an attention to low jitter operation and the LPLL equivalent model
and its design method are considered. Then, in Section III the
building-blocks circuit diagrams are discussed. The CDR hierarchical simulation flow is given in Section IV. Finally, the IC
implementation details and measured results are presented in
Sections V and VI.
II. CDR ARCHITECTURE
A. The Architecture
The CDR architecture includes a single-edge Hogge-type PD
with a decision circuit, a charge pump, an integrated LC-tank
VCO and a passive second-order PLL filter, as shown in Fig. 1.
These four components constitute the LPLL. To minimize jitter,
a tri-state PD and charge pump are used. The charge pump
produces close to zero differential output current, when no data
transitions occurs. The CDR is fully differential. Maximum
differential input voltage of the CDR is 1 V . Two differential
and
are used to
threshold slicing inputs
optimize the threshold of the decision and the clock recovery
circuits within 80% of the data swing. The recovered clock and
data are transmitted to the corresponding outputs
and
via buffers and cross-talk isolation interface
) [6]. The amplitudes
(transmission lines and transmitters

00189200/00$10.00 2000 IEEE

1354

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 9, SEPTEMBER 2000

Fig. 3.

functions of
:
,
. The CDR circuit
. Fig. 3 shows analytically calcuwas designed for
with
4 MHz compared to OC192
lated
SONET mask. The bandwidth does not satisfy SONET requirement (to be below 120 KHz), but the jitter peaking is within the
recommended 0.1-dB jitter gain.
The CDR jitter tolerance is a measure of how much
peak-to-peak sinusoidal jitter can be added to the incoming
data before causing data errors2 due to misalignment of the
data and the recovered clock. In a CDR with LPLL, jitter
tolerance is defined by a jitter transfer function and the PLL
slew-rate capabilities [7]. Considering no slew rate limitations
in the loop (this is usually the case in a well designed CDR),
the frequency response of the jitter tolerance can be described
by the following function:

Fig. 1. CDR IC architecture.

Fig. 2.

CDR analytical jitter transfer compared to a SONET mask.

Equivalent model of the linear PLL.

of
and
are 1 V differential. The IC can
also be used in a data recovery mode only, when the VCO clock
.
is overdriven with an external signal

(4)

(2)

determines the shape of the jitter tolerance response.


It can be used to compare the performance of the design with
is multiplied by the
SONET jitter tolerance mask if
at high frequencies
. The
jitter tolerance
is defined by the CDR circuit design and
value of
decision-circuit clock-phase margin. For the SONET OC192
4 MHz
mask it is specified to be more than 15 ps at
for
[8]. Fig. 4 shows analytically calculated
4 MHz compared to the OC192 SONET mask. The dotted line is
the response with asymptotic single-pole jitter transfer function
[(A.4) in Appendix A]. The 20-dB/decade slope of the mask in
the frequency range of 0.44 MHz coincides with the
response for
4 MHz. SONET compliant LPLL design
4 MHz for minimum data tranmust have a bandwidth
.
sition density
Several factors affect jitter generation of the CDR shown in
Fig. 1. The recovered clock central frequency value is held as a
. An offset current
at the charge
voltage on the capacitor
pump output in tri-state causes a frequency step

(3)

(5)

is 3-dB bandwidth of the PLL jitter transfer funcwhere


is VCO sensitivity in Hz/V,
is the loop
tion in Hz,
is the average data transition density
natural frequency, and
for 0101
pattern,
for
factor (maximum
is doubled,
PRBS pattern). In (3) the charge pump current
compared to the Gardners formula [9], since in a Hogge PD a
corresponds to -radians of the data
current variation of
and the damping factor are
phase. Both the bandwidth

For practical filter parameters and expected maximum time


interval with no data transitions, the voltage variation across
is negligible compared to the voltage step
capacitor
. The phase jitter
is proportional to the number
of
consecutive 0s or 1s in the data, as follows:

B. PLL Equivalent Model for CDR Jitter Characteristics


Analyses
Three types of jitter characteristics are important in SONET
receiver design: jitter transfer function (bandwidth and jitter
peaking), jitter tolerance and jitter generation [8]. The PLL
bandwidth was specified to be between 410 MHz with less
than 0.1-dB jitter peaking and jitter generation below 1 ps
rms. The PLL was analytically designed using a continuous
time approximation for the equivalent model of Fig. 2 [9].
is required to provide low
A damping factor above
jitter peaking. Due to overdamping, the PLL jitter transfer
approaches a single-pole low-pass-type response
with the following parameters (see Appendix A):
(1)

[ps]
2The

[MHz]

(6)

amount of errors is defined with 1-dB receiver input power penalty.

GRESHISHCHEV AND SCHVAN: CLOCK AND DATA RECOVERY IC FOR SONET APPLICATION

Fig. 5.

PD and decision circuit.

Fig. 6.

PD simulated response.

1355

Fig. 4. CDR analytical jitter tolerance compared to a SONET mask.

Combining (1)(6) the jitter can be expressed as follows:


[ps]

[MHz]

(7)

where is the relative current offset at the charge pump output


[MHz] is the PLL bandwidth [MHz] at minimum exand
.
pected data transition density factor
across the damping resistor
Instantaneous voltage drop
creates the well known frequency ripple. Peak-to-peak jitter associated with this ripple can be expressed as
[ps]

[MHz]

(8)

is the attenuation due to high order poles in the


where
increase jitter genPLL filter. Lower targeted values of
eration. The single-edge PD, used in this CDR, reduces DF by
a factor of 2 and doubles the jitter amplitude. Because of the
, attention must be paid to the
required high bandwidth
and to the attenuation
to keep
charge-pump offset
jitter generation in the sub-picosecond range.
Jitter can also be generated by the loop static phase error and
its pattern dependence. To minimize static error, a charge pump
with high output impedance and a VCO with high control input
impedance were employed. The VCO phase noise had little impact on the recovered clock jitter because of the high loop gain
achieved.
and wide loop bandwidth
C. PLL Design Method
The LPLL with previously described model is fully defined
, ,
, , and
by five independent parameters:
. Filter components
and
are functions of these
parameters and can be found from (1)(3) (see Appendix B).
and
are mostly constrained by circuit
Parameters
implementation. Their initial values do not impact, in the
first order, LPLL jitter generation as is seen in (7) and (8).
are specified at
(or
Parameters and
accounting for single-edge phase detection):
to satisfy
4 MHz to satisfy jitter tolerance
jitter peaking and
mask.
III. CIRCUIT DESIGN
A. Phase Detector and Decision Circuit
The block diagram in Fig. 5 shows the PD and decision circuit. The data decision circuit is split with the clock recovery

circuit to provide independent data threshold optimization. A divide-by-two circuit


results in a single-edge operation of
the original Hogge-type PD [10]. Therefore the recovered clock
jitter is not affected by possible asymmetry between the rising
and falling edges of the incoming data. A dummy latch circuit
is introduced to compensate for the
delay. An additional advantage of the single-edge operation is an extended
linear phase range, which is explained in Fig. 5. The
output provides a phase-independent reference signal with a
constant pulse width of about 70 ps at each positive data transioutput is the phase difference signal in the form
tion. The
of a variable pulse width of 70 50 ps. Fig. 6 shows SPICE
simulation results of the PD circuit phase response. The linear
phase range is about 80 ps. In the absence of data transitions
and
are at a low level. This is detected
both outputs
as a tri-state by the charge pump.
The front-ends of the decision circuit and the PD contain limwith differential slicing level control at the
iting amplifiers
input. To increase the time resolution of the decision circuit,
a masterslavemaster structure is employed in the retiming
. The sensitivity of the decision circuit, defined priblock
marily by thermal and shot noise in the input slicing circuitry,
is simulated to be 13.4 mV at BER 10 . The simulated
latching metastability region is less than 1 ps . This region is
determined as a time zone in the clock-delay sweep where the
output of the decision circuit is not defined.
B. LC-Tank VCO
The block diagram of the LC-tank-based VCO is shown in
Fig. 7(a). The 10-GHz oscillator core is a cross-coupled differential circuit [Fig. 7(b)] [11]. The VCO includes a differential

1356

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 9, SEPTEMBER 2000

(a)

Fig. 8.

PLL charge pump with a tri-state.

(b)
Fig. 7.

VCO. (a) Block diagram. (b) LC-VCO core.


Fig. 9. Jitter generation due to offset current in the charge-pump tri-state.

control buffer and a pulse-edge sharpening limiting amplifier


to reduce jitter sensitivity to the cross-talk noise. The
has an open collector output to drive the transmission line interface with a 50- termination at the far end. The control buffer
is designed with pMOS source followers at the input. VCO frequency is tuned with a varactor which is split into two parts: a
coarse tune (to compensate for process variation) and a fine tune
for frequency control in the loop. VCO phase noise is measured
to be less than 80 dBc/Hz at a 100-kHz offset frequency.
C. Charge Pump
The charge pump (Fig. 8) employs a well known currentswitching technique with the addition of a common-mode feedback amplifier . Care was taken to achieve unconditional stability in the feedback with sufficient gain and with a small value
in Fig. 8. Small
is necessary for low jitter
of capacitance
peaking in the PLL. The charge-pump output differential cur, as accounted for by the model in Fig. 2.
rent
The charge pump is in a tri-state when both differential inputs
and
are switched into a low (or high) state. A mismatch
between charge-pump current sources, , their finite output
resistances, and the VCO control input current cause an offset
current in the tri-state. Fig. 9 shows a plot of the PLL jitter due
in the tri-state as calculated from
to relative offset current
. Single-edge phase detection, employed in the
(7) for
value compared to the double-edge
CDR, requires half the
Hogge-type PD. The top current sources and the feedback
amplifier were designed with pMOS transistors. Appropriate
matching was achieved by sizing the critical components and
using symmetrical layout. To increase the charge-pump output
impedance, cascode current sources were employed. The
measured offset was less than 0.2%.

D. PLL Filter
The PLL filter is split into internal and , and external
and
components (see Fig. 1). Resistors R make jitter performance less sensitive to the external parasitics and coupled noise.
(along with
in Fig. 8) performs smoothing of
Capacitor
the PD output pulses and introduces the required attenuation
used in (8). In the PLL, resistor
is limited by maximum voltage drop required for normal circuit operation. Pulse
smoothing also relaxes this constraint.
E. Cross-Talk Isolation
Two differential output buffers ( in Fig. 1) provide adjustable differential voltage swing up to 1 V . The buffers are
physically separated from the VCO and PD with transmission
line interfaces to prevent jitter generation due to cross-talk via
substrate and common grounds. The VCO is also separated
from the PD with similar transmission line interface. All of the
blocks have separate power-supply systems routed according
to the isolation and analogdigital ground splitting techniques
described in [6]. All of the CDR circuits are fully differential.
The 10 Gb/s inputs and outputs are terminated on-chip with
50- resistors.
IV. SIMULATION
Five levels of hierarchical PLL analysis were carried out:
analytical, behavioral linear, behavioral mixed-mode, circuit
schematic level, and post layout with distributed parasitics. The
last four levels are HSPICE-based. A mixed-mode behavioral
library of linear and digital components was developed. All
levels of simulation give consistent results, with increasing

GRESHISHCHEV AND SCHVAN: CLOCK AND DATA RECOVERY IC FOR SONET APPLICATION

Fig. 11.

1357

Microphotograph of the SiGe CDR with linear PLL.

(a)

Fig. 12. CDR 9.529-Gb/s eye diagrams and the recovered clock. Input data 30
1 PRBS pattern.
mV , 2

VI. EXPERIMENTAL RESULTS


(b)
Fig. 10. HSPICE simulated jitter characteristics. (a) Jitter tolerance. (b) Jitter
transfer.

insight into jitter behavior at more detailed levels. Analytical


models of jitter transfer and jitter tolerance are based on the
second-order linear PLL theory as described in Section II.
and , the PLL becomes a third-order
With the addition of
loop. This was simulated along with on-chip, in-package, and
external filter parasitics using HSPICE-based models. AC
ranging from 1/6 to 1 are shown
simulation results for
response was
in Fig. 10(a) and (b). Jitter tolerance
. For
,
designed to fit the mask at
is compliant with SONET requirements. Jitter
the
peaking is within the required 0.1-dB value for the simulated
range. Sub-picosecond jitter generation was predicted in
circuit transient simulation.
V. FABRICATION
The CDR circuit was implemented in IBMs SiGe HBT
bipolar process which includes pMOS devices. Detailed device
characteristics are given in [12]. Die size is 3 3 mm (Fig. 11).
Three external RC-filter components are required to complete
the CDR design.

The IC performed as simulated, except for the VCO oscillation frequency which was 5% lower than simulated. Measurements were done on-wafer using membrane probes from Cascade Microtech. The PLL was locked by an external sweeping of
the VCO frequency using the 10-GHz adjust input (see Fig. 1).
The locking range is 25 MHz and the PLL stays locked within a
200-MHz frequency range. Fig. 12 shows recovered clock and
CDR eye diagrams at 9.529-Gb/s data rate and 30 mV ,
PRBS input signal. Measured sensitivity was 14 mV at BER
10 . This value is close to the simulated 13.4 mV (Section III), which indicates that a sufficient level of cross-talk isolation is achieved. The jitter tolerance was measured for jitter
amplitudes below 40 ps . This jitter was generated by modu(see Fig. 1) with an exlating the clock slicing level
ternal signal from dc to 100-MHz frequency range. No data errors were detected associated with this jitter. This demonstrates
jitter tolerance of more than 40 ps compared to the 15 ps
SONET mask above 4 MHz.
To verify the maximum bit rate of the IC, it was tested at
13.25 Gb/s in a data recovery mode with an external clock
(Fig. 13). Sensitivity at 12.5 Gb/s was measured to
be 15.5 mV . Data-recovery clock-delay margin of 77 ps
at 10 Gb/s was the same as the BER tester delay margin,
confirming picosecond timing resolution of the decision circuit.
The recovered clock jitter was measured, with a digital oscilloscope, to be 1.85 ps rms versus 1.68 ps rms jitter of the refer-

1358

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 9, SEPTEMBER 2000

APPENDIX A
The CDR jitter transfer function is similar by definition to
. For the second-order
the PLL phase transfer function
charge-pump PLL of Fig. 2, the phase transfer function is [9]
(A.1)
This formula can be rewritten as
Fig. 13.

CDR data recovery eye diagrams at 13.25 Gb/s. Input data 14 mV

0 1 PRBS pattern. The RECCLK waveform is the PRBS generator

(A.2)

reference clock translated by CDR.

is a bandwidth of the
where
asymptotic single-pole low-pass transfer function
(A.3)
approaches the low-pass reThe jitter response
at
or at
. The asymptotic
sponse
jitter tolerance shape function can be defined from (4) and (A.3)
as
(A.4)

Fig. 14. Phase-noise comparison of the CDR recovered clock, free-running


VCO, and data pattern generator reference clock.

ence clock. Therefore jitter generated by the CDR is estimated to


be 0.78 ps rms. Phase noise was measured more accurately with
a HP4352B phase-noise meter (Fig. 14). Recovered clock phase
noise follows, with no error, the data reference clock noise down
to the CDR jitter noise floor at 110 dBc/Hz. The noise floor is
reached within the bandwidth of the loop (designed to be above
4 MHz). Numerically integrated phase noise of the recovered
clock in 80 MHz bandwidth gives a jitter value of 0.77 ps rms.
Jitter was found to be independent of the PRBS word length up
. The IC dissipates 1.5 W with a 5-V power supply.
to
VII. CONCLUSION
In this paper, a low-jitter integrated CDR with a linear-type
PLL has been demonstrated. The PLL equivalent model and design method to meet SONET jitter requirements were presented.
The IC was implemented in SiGe technology. Sub-picosecond
rms jitter with no jitter dependence on data PRBS pattern is
achieved. Jitter generation factors in CDR were considered. A
single-edge version of the Hogge-type PD and a tri-state charge
pump were designed to satisfy jitter requirements. PMOS transistor circuits and cross-talk isolation technique were used to
improve CDR jitter performance. In a second-order LPLL a
bandwidth of more than 4 MHz and a damping factor of 46
at minimum expected data transition density are recommended
to satisfy OC192 jitter tolerance and jitter transfer peaking requirements. To satisfy jitter transfer bandwidth ( 120 KHz),
additional low-pass filtering of the recovered clock must be performed, for instance, in the PLL of a transmitter circuit.

APPENDIX B
The following PLL filter components are found by solving
(1)(3):
(B.1)
(B.2)

ACKNOWLEDGMENT
The authors thank C. Kelly and P. Popescu for discussions,
J. E. Rogers for his contributions to layout design and simulations, M.-L. Xu for help with the output buffer layout, J. Showell
for assistance with the measurements, Dr. S. Voinigescu and D.
Marchesan for their expertise in SiGe components modeling,
and Dr. M. Copeland for advice on VCO phase noise analyses.
Special thanks to R. Hadaway for his support and to IBM corporation for fabrication.
REFERENCES
[1] T. Morikawa et al., A SiGe single-chip 3.3 V receiver IC for 10Gb/s optical communication systems, in ISSCC Dig. Tech. Papers, Feb. 1999,
pp. 380381.
[2] R. C. Walker et al., A 10Gb/s Si-bipolar Tx/Rx chipset for computer
data transmission, in ISSCC Dig. Tech. Papers, Feb. 1998, pp. 302303.
[3] Y. Greshishchev and P. Schvan, SiGe clock and data recovery IC
with linear-type PLL for 10 Gb/s SONET application, in Proc. 1999
Bipolar/BiCMOS circuits and Technology Meeting, Sept. 1999, pp.
169172.
[4] B. Razavi, Design of monolithic phase-locked loops and clock recovery
circuitsA tutorial, in Monolithic Phase-Locked Loops and Clock Recovery Circuits: Theory and Design, B. Razavi, Ed. New York, NY:
IEEE Press, 1996, pp. 405420.

GRESHISHCHEV AND SCHVAN: CLOCK AND DATA RECOVERY IC FOR SONET APPLICATION

[5] K. Kishine, N. Ishihara, K. Takiguchi, and H. Ichino, A 2.5-Gb/s clock


and data recovery IC with tunable jitter characteristics for use in LANs
and WANs, IEEE J. Solid-State Circuits, vol. 34, pp. 805812, June
1999.
[6] Y. Greshishchev and P. Schvan, 60 dB gain 55 dB dynamic range
10Gb/s SiGe HBT limiting amplifier, IEEE J. Solid-State Circuits,
vol. 34, pp. 19141920, Dec. 1999.
[7] L. De Vito, A versatile clock recovery architecture and monolithic implementation, in Monolithic Phase-Locked Loops and Clock Recovery
Circuits: Theory and Design, B. Razavi, Ed. New York, NY: IEEE
Press, 1996, pp. 40542.
[8] SONET OC-192 Transport System Generic Criteria, Bellcore,
GR-1377-CORE, Mar. 1998.
[9] F. M. Gardner, Charge-pump phase-lock loops, IEEE Trans.
Commun., vol. COM-28, pp. 18491858, Nov. 1980.
[10] C. R. Hogge, A self-correcting clock recovery circuit, IEEE J. Lightwave Technol., vol. 3, pp. 13121314, Dec. 1985.
[11] B. Jansen, K. Negus, and D. Lee, Silicon bipolar VCO family for 1.1 to
2.2 GHz with fully-integrated tank and tuning circuits, in ISSCC Dig.
Tech. Papers, Feb. 1997, pp. 392393.
[12] J. D. Cressler, SiGe HBT technology: A new contender for Si-based
RF and microwave circuit applications, IEEE Trans. Microwave Theory
Tech., vol. 46, pp. 572589, May 1998.

Yuriy M. Greshishchev (M95) received the


M.S.E.E. degree from Odessa Electrotechnical
Institute of Communications, Odessa, Ukraine,
in 1974 and the Ph.D. degree in electrical and
computer engineering from V. M. Glushkov Institute
of Cybernetics, Kyiv, Ukraine, in 1984.
From 1976 to 1994, he worked with research
and development organizations and academia on
high-speed ADC and DAC circuit theory and design,
primarily in the area of silicon bipolar and GaAs
MESFET integrated circuits. His Ph.D. research was
dedicated to the development of folding-type video ADCs embedded into TV
systems. In 1993, he was a Visiting Scientist at Micronet, Institution Center of
University of Toronto, Toronto, ON, Canada. In 1994, he joined the Department
of Electrical and Computer Engineering, University of Toronto, where he
conducted research on low-voltage GaAs MESFET circuits for digital wireless
communication. Since 1996, he has been with Nortel Networks, Ottawa, ON,
Canada, where he is responsible for the development of highly integrated
circuit solutions in emerging technologies for optical communications. He is
the coauthor of two books and more than 40 technical papers on the area of
data converters, high-speed circuit design, and statistical modeling.

1359

Peter Schvan (M89) was born in Budapest, Hungary, in 1952. He received the M.S. degree in physics
from Eotvos Lorand University, Budapest, in 1975
and the Ph.D. degree in electrical engineering from
Carleton University, Ottawa, ON, Canada, in 1985.
In 1985, he joined Nortel Networks, Ottawa,
ON, Canada, where he worked in the area of
BiCMOS and bipolar technology development, yield
prediction, device characterization, and modeling.
Recently, his work has been extended to the design
of multigigabit circuits and systems. He is currently
Senior Manager of a group responsible for evaluating various high-performance technologies and demonstrating, advanced circuit concepts required for
fiberoptic communication systems. He is the author or coauthor of numerous
publications.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 12, DECEMBER 1999

1951

A 21600-MHz CMOS Clock Recovery


Capability
PLL with LowPatrik Larsson

Abstract A general-purpose phase-locked loop (PLL) with


programmable bit rates is presented demonstrating that large
frequency tuning range, large power supply range, and low jitter
can be achieved simultaneously. The clock recovery architecture
uses phase selection for automatic initial frequency capture. The
large period jitter of conventional phase selection is eliminated
through feedback phase selection. Digital control sequencing of
the feedback enables accurate phase interpolation without the
traditional need of analog circuitry. Circuit techniques enabling
low-V dd operation of a PLL with differential delay stages are
presented. Measurements show a PLL frequency range of 1200
MHz at V dd = 1:2 V linearly increasing to 21600 MHz at V dd
= 2:5 V, achieved in a standard process technology without low
threshold voltage devices. Correct operation has been verified
down to V dd = 0:9 V, but the lower limit of differential operation
with improved supply-noise rejection is estimated to be 1.1 V.

design is a large digital circuit incorporating a PLL-based


clock generator with low-jitter requirements, which is the most
common mixed-mode design today. Digital style PLLs have
been suggested, e.g., [3], but these cannot compete with the
supply-noise rejection of differential analog circuitry.
A clock recovery PLL architecture suitable for programmable bit rates is developed in Sections II and III with
emphasis on jitter reduction. Sections IVVI present PLL
circuit techniques that use the noise resistant differential pair
but avoid other expensive (in terms of headroom) analog
operation is enabled in a standard
circuitry, such that lowdigital CMOS process without the need of low-threshold
devices.

Index TermsFrequency locked loops, frequency synthesizers,


phase comparators, phase jitter, phase locked loops, phase noise,
synchronization.

II. LOW-JITTER PHASE-SELECTING CLOCK RECOVERY

I. INTRODUCTION

HE continuing scaling of CMOS process technologies


enables a higher degree of integration, reducing cost.
This fact, combined with the ever shrinking time to market,
indicates that designs based on flexible modules and macrocells have great advantages. In clock recovery applications,
flexibility means, for example, programmable bit rates requiring a phase-locked loop (PLL) with robust operation over a
wide frequency range. Increased integration also implies that
the analog portions of the PLL (mainly the voltage-controlled
oscillator [VCO]) should have good power-supply rejection to
achieve low jitter in the presence of large supply noise caused
by digital circuitry.
This
Another trend is low-power design using reduced
reduces the headroom available for analog design, causing
integration problems for mixed-mode circuits [1]. Furthermore, in applications where power consumption is a more
is not scaled as
critical design goal than compute power,
to avoid leakage currents in OFF devices,
aggressively as
which aggravates the headroom problem. For mixed-mode
circuits with significant analog circuitry, dual- and/or dualprocessing combined with a dc/dc converter [2] is a viable
solution. However, for circuits dominated by digital logic, it
is difficult to justify the additional fabrication steps required
for these solutions. A common case of the latter mixed-mode
Manuscript received April 13, 1999; revised June 19, 1999.
The author is with Bell Laboratories, Lucent Technologies, Holmdel, NJ
07733 USA.
Publisher Item Identifier S 0018-9200(99)08963-5.

A basic PLL for clock recovery is shown in Fig. 1(a). In


most CMOS implementations, the VCO must have a tuning
range covering more than 50% of the target frequency
to guarantee high yield over large process variations. This
large frequency range requires special techniques for initial frequency locking since there exists no phase detector
for nonreturn-to-zero (NRZ) data that operates reliably with
large initial frequency offset. Available techniques include
frequency sweeping [4], using a replica VCO matched to the
clock generating VCO [5], or initially locking the PLL to a
reference frequency with a frequency detector before switching
to the input data and locking with a phase detector [6].
One common technique requiring no special initialization
is shown in Fig. 1(b). This dual-loop PLL can be traced
back to [7], which was based on a delay-locked loop (DLL).
The multiple-output VCO in Loop A in Fig. 1(b) generates
a number of equally spaced clock phases at a frequency of
This loop can have a large frequency tuning range
since it is locked with a phase frequency detector (PFD). Clock
recovery is performed by Loop B that generates the recovered
clock by selecting the clock phase from Loop A that is best
aligned with the incoming data. If there is a frequency offset
and the incoming data, an appropriate clock
between
can still be generated by changing the Ctrl signal to select a
different phase over time.
Frequency initialization is automatically achieved by selectand
for the expected data rate. Most
ing appropriate
communication systems have a frequency tolerance of a few
hundred parts per million (ppm), eliminating any need for a
frequency detector in Loop B. The decoupling of the VCO loop
from the data recovery loop enables independent selection of
bandwidth in those two loops. This allows a large bandwidth in

00189200/99$10.00 1999 IEEE

1952

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 12, DECEMBER 1999

Loop A is selected from the multiple VCO phases. When Loop


B detects a misalignment of the incoming data and the VCO
output clock, the Ctrl signal is changed to select a different
phase for the feedback clock. This will cause a phase change in
the divided clock feeding the PFD such that the charge pump
will alter the VCO control voltage stored in the loop filter.
Therefore, sudden phase steps generated by the clock recovery
logic will be smoothed by the filter of Loop A, causing the
VCO clock to slowly drift toward the correct phase with a
rate of change determined by the bandwidth of Loop A. In a
622-MHz application with a division ratio of
and a
it will take
bandwidth of Loop A equal to one-tenth of
clock cycles to complete a phase
approximately
switch. The jitter caused by a phase step in the structure in
Fig. 1(c) is therefore spread out over 80 clock cycles, significantly reducing the jitter compared to Fig. 1(b). Feedback
phase selection has previously been applied to fractionalfrequency synthesizers for other purposes [10], [11].

(a)

(b)

III. AVERAGING PHASE INTERPOLATION

(c)
Fig. 1. Clock recovery PLLs. (a) Standard, (b) phase selection, and (c)
loop filter.
feedback phase selection. ChP F denotes charge pump

Loop A to suppress VCO jitter [4], for example, jitter induced


by power-supply noise. At the same time, a low bandwidth
can be used in Loop B to reduce jitter transfer. This cannot
be achieved by the PLL in Fig. 1(a), which has a single loop
with conflicting design goals regarding loop bandwidth.
A disadvantage of a phase-selecting PLL is that the phase
step that is generated when the Ctrl signal in Fig. 1(b) switches
to a new clock phase. This phase switching leads to large
cycle-to-cycle jitter (greater than or equal to the phase spacing) that can actually dominate the peak-to-peak jitter. By
increasing the number of phases, the phase spacing will be
smaller with less jitter. More phases can be generated by
having more delay stages in the VCO, but this limits the
speed. An alternative is phase interpolation that enables a large
number of phases without degrading the VCO speed [8], [9].
However, interpolators add analog circuitry to the design and
are prone to mismatch, which in the worst case can lead to
nonmonotonic phase spacing.
A proposed remedy for the jitter due to phase steps is shown
in Fig. 1(c). Instead of selecting a clock phase feeding the
sampling flip-flop and the phase detector, the feedback clock in

The smoothing effect of the loop filter can also be used for
phase interpolation. If the Ctrl signal in Fig. 1(c) alternates between two different clock phases every second cycle of the reference clock, the result will be a VCO clock phase corresponding to the average of the two selected phases. In the test chip,
four levels of averaging phase interpolation were implemented
by circulating through four clock cycles and in each clock cyor
as the feedback clock. A quarcle selecting phase
is then achieved by
ter phase interpolation generating
for three consecutive clock cycles, then selecting
selecting
for the fourth cycle and repeating this sequence.
The architecture in Fig. 1(c) lends itself naturally to combining both averaging phase interpolation and standard currentmode interpolation. A test chip was built in a 0.25- m,
2.5-V digital CMOS process to evaluate the jitter performance
of the phase selection architecture. A block diagram of the
implemented VCO and phase control circuitry is shown in
Fig. 2. The phase select control code at the input consists
of seven bits, of which two are directly fed to a finite state
machine (FSM) that generates control signals for realizing
the averaging interpolation. The remaining five bits of the
from which the code for
control code represent
is generated by adding one. The FSM controls Mux1 to
and
in a four
select one of the codes representing
clock period repetitive cycle, as described above. The five
bits at the output of Mux1 are split into three bits coarse
select and two bits fine select. The three coarse bits select
two neighboring phases from a four-stage differential VCO
having eight evenly spaced output phases and send these two
phases to a current-mode interpolator. Mux2/Mux3 in Fig. 2
receive one coarse bit each, and the third coarse bit is used
to conditionally invert the output signals. The interpolator is
similar to the Type-I circuit in [9] and is controlled by a fourbit temperature code derived from the two fine select bits.
Both the current-mode interpolation and the averaging phase
interpolation are programmable in the test chip and can be
disabled. The two complementary multiplexers at the output

LARSSON: 21600-MHz CMOS CLOCK RECOVERY PLL

1953

Fig. 4. Phase shift versus phase code measured at 1 GHz.

Fig. 2. VCO, phase selector, interpolator, and feedback multiplier.

(a)

(b)

(c)

(d)

Fig. 3. 500-MHz period distribution histograms. (a) Clock recovery inactive,


(b) standard phase selection, (c) feedback phase selection, and (d) averaging phase interpolation. Measurement conditions were V dd = 2:5 V,
N = 25; fref = 20 MHz, and fdata = 499:4 Mb/s.

of the VCO/interpolator (Mux4/Mux5) allow the chip to be


configured for the scheme in either Fig 1(b) or (c), enabling
a performance comparison.
Freezing the 7-bit phase select control code to a fixed
phase gives a measured output period jitter1 of 7.6 ps rms
when running the VCO at 500 MHz, as shown in Fig. 3(a).
This is the jitter inherent in the VCO and the output buffers.
Configuring the chip for the standard phase selecting scheme
in Fig. 1(b) with 32 clock phases (4 interpolation) gives the
jitter in Fig. 3(b), revealing a long tail in the histogram caused
by a frequency offset of 1200 ppm between the incoming data
The phase spacing is ns/
ps such
and
that we can expect a second peak in the histogram 62.5 ps
away from the main peak, which is confirmed by the shape
1 Timing

uncertainty between two consecutive edges of the generated clock.

of the histogram. As shown by the inset, this peak is slightly


off its ideal position and is smeared out due to the nonideal
ac behavior of the current-mode interpolator.
Feedback phase selection with 4 current-mode interpolation eliminates the long tail in the histogram, bringing the
period jitter down to 7.9 ps rms, as shown in Fig. 3(c).
This indicates that the jitter is completely dominated by
the VCO jitter, and nearly all of the phase-switching jitter
caused by digital clock recovery can be eliminated. Fig. 3(d)
shows the period jitter histogram obtained when the currentmode interpolators are disabled and 4 averaging phase
interpolation is used instead. Its similarity to the result in
Fig. 3(c) proves that the same low jitter and the same number
of discrete clock phases (32) can be achieved without the
analog interpolation circuitry.
Enabling both the current-mode interpolator and the averaging phase interpolation gives a total of 128 selectable clock
phases. The graph in Fig. 4 shows the phase shift as a function
of the phase select control code when the period of the VCO
ps, whereas
is 1 ns. The expected phase step is ns/
the largest measured step is 21 ps, resulting in a differential
nonlinearity (DNL) of 1.7 bits. The differential VCO makes the
phase curve near-symmetric around the midpoint, suggesting
that the integral nonlinearity (INL) of 94 ps is mainly due to
delay mismatch in the VCO. The main contributing factor to
this mismatch is unbalanced parasitic wiring capacitors that
are difficult to match without incurring speed penalty. If Loop
B is a first-order loop or a well-damped second-order loop,
the feedback in Loop B will automatically select the best fit
phase select code, reducing the impact of INL. The maximum
phase deviation in a clock recovery application is then the
DNL added to the VCO jitter.
In addition to jitter reduction and phase interpolation, feedback phase selection also has other advantages when combined
with other architectures. Using feedback instead of feedforward phase selection reduces circuit complexity, thereby
eliminating the need for good matching in an analog-style
interpolator [12] and a high-speed parallel sampling structure
[13].
IV. VCO
Recently, low-noise VCOs utilizing high-swing complementary signals have been presented (e.g., [14] and [15]).

1954

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 12, DECEMBER 1999

(a)
Fig. 6. Example VCO waveforms to estimate lower limit on

(b)
Fig. 5. Bias generation and one VCO delay stage of (a) replica bias scheme
and (b) diode clamping.

Good 1 noise performance has been shown, but their powersupply noise rejection is inferior to that of the standard analog
differential pair since they lack a high-impedance source,
Therefore, the analog style
making the delay depend on
differential pair is preferable in applications where powersupply noise is the main source of oscillator jitter. When a
differential pair with resistor loads is used as a delay cell in a
VCO, the frequency is regulated by changing the tail current
control voltage in Fig. 5(a). To
as implemented by the
achieve a large frequency tuning range, it is desirable that the
output swing and common mode do not change significantly
with frequency. Often the replica-bias scheme in Fig. 5(a) is
employed, which relies on good matching between a replica of
the delay stage (devices Mr1Mr3) and the VCO delay stages
to
, giving a known
to set the VCO output swing from
common mode and swing independent of the speed-regulating
current. A disadvantage of this technique is that the PMOS
load (Mr3) will operate as a current source at low frequencies,
introducing high gain in the replica feedback loop. To prevent
instability, a large compensation capacitor is required, which
introduces another pole in the PLL, leading to more intricate
design. Furthermore, the amplifier in the replica bias loop
requires additional headroom, thereby prohibiting lowoperation.
Fig. 5(b) shows a structure that achieves the good powersupply noise rejection of the analog differential pair, at the
operation. The PMOS diodes
same time enabling loware used for clamping the output voltage to a minimum level
of
giving a fixed common mode and swing
without the need for a replica bias circuit. This makes the VCO
suitable for a wide range of operating frequencies and supply
voltages. To guarantee clamping action, the NMOS tail current
must be larger than the current through the controlled
Furthermore, a proposed design goal for
PMOS load

V dd:

low oscillator noise suggests that the rise and fall times of the
output nodes should be made equal [16]. This is achieved
to each of the controlled PMOS
by reflecting half of
loads by the current mirror formed by devices Md1Md3.
Assuming that Md4 recently turned on, node a will be
discharged by a current of
At the same time, the complementary output node is pulled
by a current equal to
indicating equal
to
rise and fall times. A disadvantage of this oscillator is the
additional parasitic capacitance of the diodes, which makes the
maximum operating frequency lower than that of the replica
bias structure. The additional gate capacitance of the diode
loads can be eliminated by using NMOS diodes [17].
The minimum supply voltage for the VCO is
which has been verified by measurements
V. However, at this value of
down to
the VCO is no longer differential. An estimate of the minfor differential operation can be derived from
imum
the simulated VCO waveforms in Fig. 6. The VCO output
down to approximately
For
swings from
differential operation, it is required that both NMOS devices
in the differential pair (Md4, Md5) are turned ON at the
drop
crossover point of the waveforms. Assuming a
over the current source device Md6 leads to a minimum input
At the lowest limit of
this
voltage of
generated by the previous
input voltage is
of
stage in the oscillator, indicating a minimum
Measurements determined
and
to
be 0.53 and 0.85 V, respectively, indicating a minimum
of about 1.1 V assuming a
of 0.1 V. Note that this
is a theoretical number, since the differential operation of
Good
the VCO has zero tuning range at this value of
power-supply rejection can also be achieved by the regulatedsupply structure in [18]. However, the requirement of a large
decoupling capacitor generates contradicting design goals on
PLL bandwidth.
V. CHARGE PUMP
A. Bandwidth and Peaking Compensation
To reduce peak-to-peak jitter due to VCO noise, it is
advantageous to keep as high a PLL bandwidth as possible.
Traditional worst case design would keep the PLL bandwidth
and damping factor sufficiently far away from stability limits
under all variations of the input reference frequency, the

LARSSON: 21600-MHz CMOS CLOCK RECOVERY PLL

1955

(a)
Fig. 7. Current multiplier generating charge-pump biasing voltages
and Vqpb :

qbn

manufacturing process, and the division ratio in the feedThe concept of self-biasing introduced in [19]
back path
simplifies the design by eliminating process variations and
the input reference frequency from the stability constraints.
so
However, the PLL bandwidth is still a function of
that maximum noise suppression can only be achieved for a
In programmable applications,
can vary by more
fixed
than an order of magnitude, indicating that the variation in
instead of process
stability constraint can be dominated by
variations, as shown by the stability limit of a charge-pump
PLL [20]

(b)
Fig. 8. Jitter transfer functions for different division ratios. (a) Simulated
standard PLL. (b) Measured characteristics of Loop A with intentionally low
damping.

(1)
is the input reference frequency (or effectively the
where
is the VCO gain,
sampling rate of the phase detector),
is the charge-pump current, and is the loop filter resistance.
Other PLL design parameters, such as bandwidth and damping
Compensating loop parameters
factor, also change with
guarantees that the PLL is always operating
for changes in
with maximum bandwidth and fixed damping factor without
endangering stability. This can be done by setting the charge
pump current to
(2)
is a fixed reference current. This is realized by the
where
current multiplier in Fig. 7, which generates the charge-pump
by letting the individual bits of
control binary
current
weighted current sources.
The simulated jitter transfer function of a standard PLL in
is
Fig. 8(a) demonstrates the change of loop parameters as
altered. The damping factor is intentionally set low to show
The measured jitter transfer function of
its dependence on
The
Loop A in Fig. 8(b) shows the desired independence of
slight deviation of the curves is caused by transistor mismatch
in the current multiplier.
B. Charge Sharing
A common problem of many charge pumps is charge
sharing. For the charge pump in Fig. 9(a) (Type A), charge
sharing is caused by the parasitic capacitance in nodes pcs
is active, node pcs is charged to
and ncs [21]. When
When deactivating
some of the charge stored in node pcs
will leak through the current source device. Since the parasitics

(a)

(b)

Fig. 9. (a) Charge-pump suffering from charge sharing (Type A). (b) Charge
removal transistors eliminate charge sharing (Type B).

of nodes ncs and pcs can never be matched, this will lead to a
static phase offset, as shown in Fig. 10(a). This is the transfer
function of a phase-frequency detector followed by a Type A
charge pump. The two transistors Mp and Mn in the Type
B charge pump in Fig. 9(b) will remove the charge from the
nodes pcs and ncs when Up and Down are deactivated [22].
This leads to a large reduction in the phase offset, as shown
in Fig. 10(a).
For this application, static phase offset in Loop A is not
critical. However, when analyzing the cause of phase offset, a
source of increased jitter is revealed. Fig. 10(b) indicates that
the leakage from node pcs is larger than that from ncs. When
the PLL is locked, the leakage mismatch is compensated for by
earlier than
, giving a phase offset. Since
activating
the compensation charge is applied in the early portion of the
charge-pump activation time, it will cause voltage ripple on

1956

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 12, DECEMBER 1999

(a)

(b)

(c)

(d)

(a)

(e)
(b)
Fig. 10. Characteristics of the Type A and B charge pumps. (a) Transfer
function of PFD followed by charge pump. (b) Simulated IUp and IDown
when net output charge is zero.

the loop filter, leading to phase jitter at the VCO output. The
charge removal transistors Mn and Mp in Fig. 9(b) eliminate
the current tails resulting in a well-balanced Up and Down
and
cancel each other,
activation time such that
reducing the loop filter ripple. A further advantage of the Type
noise in the current source
B charge pump is reduced 1
to below
transistors achieved by periodically resetting their
0 V [23], [24].
A limitation of the Type B charge pump is a reduced
). If
dynamic range of the VCO control voltage (
is less than
there will be a current flowing
through the Mp device to the output when the Dwn control is
inactive. When NMOS devices are used for speed-regulating
will never drop below
, constraining
the VCO,
to be less than
which can easily be fulfilled. However,
the charge pump works only up to an output voltage of
limiting the upper tuning range of the
VCO. However, the charge pump in Fig. 9(a) has the same
and
is a similar
upper voltage limit. Mismatch in
source of jitter as charge sharing described above. For low
jitter, it is essential to have good matching, implying that the
should be saturated. Again,
devices controlled by
this requires
Charge removal can also be done by ac coupling [18], but
this requires careful timing of the control signals in the charge
pump. The solution to charge sharing in [21] is less suitable for
applications due to the common-mode restrictions
lowon the differential amplifier.

Fig. 11. Evolution of loop filter. (a) Ideal model, (b) MOS-only implementation, (c) improved resistor linearity for low-V dd operation, (d) improved
capacitor linearity, and (e) final model where C3 models the well-to-substrate
capacitance.

VI. LOOP FILTER


The most common PLL loop filter is the simple RC circuit
in Fig. 11(a). Common design options for the resistor are
poly or the channel resistance of an MOS transistor. For high
resistance values, an MOS device is most attractive. However,
if implemented with the
it has a disadvantage at low
straightforward configuration of Fig. 11(b). For a nominal
of 2.5 V, the effective resistance of the transmission gate
).
is nearly independent of the VCO control voltage (
However, the resistance becomes strongly dependent on
for low
For
the resistance goes
[25], [26]. Exchanging
to infinity for some values of
the position of the transmission gate resistor and the MOS
and the resistance of
capacitor as in Fig. 11(c) will make
The resistance still
the NMOS device independent of
but the variation is much less than for the
varies with
previous configuration.
Since the capacitor in Fig. 11(c) is a floating capacitor,
it must be implemented with a PMOS device. When the VCO
the MOS device is between
control voltage approaches
inversion and depletion, where its capacitance value is voltage
dependent, as shown in Fig. 12. By altering the gate and
source/drain connections of the PMOS as shown in Fig. 11(d),
it will operate in accumulation where the capacitance value
V in
is less voltage dependent, as shown for
Fig. 12. To avoid strong power-supply noise injection, the well
must be connected to the same node as source and drain,
as shown in Fig. 11(d). The corresponding filter model is

LARSSON: 21600-MHz CMOS CLOCK RECOVERY PLL

Fig. 12.

1957

Voltage-dependent capacitance of filter in Fig. 11(c) and (d).

shown in Fig. 11(e), where


is the parasitic well-to-substrate
capacitance of the MOS capacitor. This filter has an impedance
of
(3)
which is a close approximation to the impedance of the original
filter in Fig. 11(a), given as
(4)
when

as is common design practice [20].


VII. PHASE-FREQUENCY DETECTOR

Phase detectors may exhibit a dead zone, resulting in enlarged jitter. A common design technique to avoid a dead zone
is to make sure that both Up and Down output signals are fully
activated before shutting them both off. This is implemented
by generating a reset signal with an AND operation of Up
and Down output and introducing a delay before feeding back
this signal to reset the phase detector. It is this reset delay
and
in Fig. 10(b).
that causes the simultaneous
If the charge sharing in the charge pump is not perfectly
and
there will
cancelled or if there is a mismatch of
always be some current compensation, leading to phase offset
and loop filter ripple, as discussed in Section V. A longer
reset delay results in a longer period during which the VCO
is running at a different frequency due to the compensation
current. Therefore, the reset delay should be minimized under
the constraint that it has to be longer than the response time
of the PFD with some additional design margin to avoid a
dead zone.
A PFD with low logic depth is shown in Fig. 13, including
details of the Up section. Its operation is easiest to analyze
by assuming an initial state of
This implies that
and that
is
discharges
and sets
precharged high. A rising edge on
without changing the state of the RS flip-flop. The
path will assure that
internal weak feedback in the
is kept active even if
falls. At the next rising edge on V,
is activated, which sets
This triggers the RS
high, which shuts off
; and, at the
flip-flop to precharge
is deactivated in a similar way.
same time,
sets
which is
In summary, a positive edge on
reset by the next positive edge on This behavior is identical

Fig. 13. Phase-frequency detector used in Loop A with details of Up section.

to the two classical PFDs implemented by either four RS flipflops or two resettable Dflip-flops. The precharged gate and
the shorter logic depth of this implementation make the delay
shorter than for the standard PFDs. This allows a smaller
delay in the reset path for eliminating the dead zone, such that
loop filter ripple will be reduced and generate less noise. An
additional benefit of low logic depth is a reduction in phase
detector jitter caused by power-supply-dependent delays and
device noise.
The reset delay of this PFD can be further reduced by
directly reset the precharged gate
letting the signal
simultaneously as the RS flip-flop is reset. This technique
was not adopted in order to keep a conservative design,
guaranteeing operation with no dead zone. Similar precharged
gates have previously been used in PFD designs [27][29].
VIII. FREQUENCY DIVIDER
To enable high flexibility, the frequency divider in Fig. 2 is
) divider. The structure in
a fully programmable (
[30] based on a clock-gated dual-modulus prescaler followed
by a counter was chosen to achieve high speed at low supply
voltages. The divider was realized in standard static CMOS
logic, reaching a maximum operating frequency of 800 MHz in
simulations of worst case slow process variation at
V and
C This exceeded the simulated speed limit of
the VCO. The potential startup deadlock in [30] was eliminated
by logic that prohibits two consecutive clock pulse removals.
IX. PLL OPERATING RANGE

AND JITTER

The maximum operating frequency of the PLL measured


at room temperature is plotted in Fig. 14 as function of
Simulations indicate that the speed is limited by the VCO. A
of 0.9 V agrees well with the measured
minimum
V. At low power-supply voltages, the speed cannot
However,
compare with high-end circuits using standard
the operating frequency range exceeds that of low-voltage
circuit implementations [2], [3], [25], [26]. The maximum
speed also compares favorably with another low-voltage PLL
based on a low-threshold process [18].
With a PLL bandwidth of 2 MHz, the tracking jitter is 5.2 ps
rms at 1200 MHz, as shown in Fig. 15(a). This measurement
represents the standard deviation of the delay between a

1958

Fig. 14.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 12, DECEMBER 1999

Measured maximum PLL operating frequency.

triggering clock edge and a clock edge occurring 320 ns later


according to the setup in Fig. 2(e) in [31]. The delay was
chosen four times larger than the delay at which the jitter
ns) to
knee occurs in Fig. 2(f) in [31] (
get reliable data.
All signals on the chip are periodic with the reference
frequency, so when using the frequency divider output as a
triggering signal, most of the jitter due to power supply and
board noise will cancel in the measurement. Such a setup gave
an rms jitter of 2.5 ps at 1200 MHz, as shown in Fig. 15(b).
The jitter with respect to an ideal reference would in this
ps [31]. This proves that device noise
case be
in a standard ring oscillator is tolerable for communication
standards with very tight jitter tolerances such as SONET OC48 (2.5 Gb/s), which has a jitter specification of 4 ps rms at a
2-MHz PLL bandwidth. The VCO power consumption was 5
mW at 1200 MHz in simulation of extracted layout.
The figure of merit defined in [31] is estimated from

(a)

(5)
is the PLL bandwidth,
is the measured longwhere
is the tracking
term self-referenced tracking jitter, and
jitter with respect to an ideal reference clock. Table I lists
and derived as function of operating frequency.
measured
The jitter reported here is lower than that in [32] due to an
improved measurement setup and more accurate measurement
equipment. The of this oscillator is better than that reported
and a
for bipolar implementations in [31]
of
(as derived from the
CMOS VCO with a
for a complete
Slide Supplement of [33]). A
reported for a stand-alone
PLL is similar to
VCO in [34], suggesting that noise contributions from the
other PLL components (charge pump, PFD, frequency divider)
are much smaller than the VCO noise. The tracking jitter
compares favorably with several other CMOS oscillators [e.g.,
[8], [33], and [35][37]. degrades significantly at 1600-MHz
operation, indicating the speed limit of the PLL. The
measurements are also used for estimating the VCO period
due to device noise, based on the relation [31]
jitter
(6)

(b)
Fig. 15. A 1.2-GHz tracking jitter histogram. (a) Including power supply
and board noise. (b) Supply and board noise cancelled.

where
is the VCO period. The derived period jitter
agrees to within 10% of the phase noise estimation technique
in [38].
The PLL tracking jitter is plotted as a function of frequency
for various power-supply voltages in Fig. 16. These measureand a fixed PLL bandwidth.
ments were taken with
the chargeSince the loop filter resistance changes with
in Fig. 7 was adjusted until a 3-dB
pump reference current
PLL bandwidth of 2 MHz was measured. A bandwidth of 2

LARSSON: 21600-MHz CMOS CLOCK RECOVERY PLL

TABLE I
PLL JITTER FOR V DD = 2:5 V MEASURED WITH A PLL BANDWIDTH OF 2 MHz

Fig. 16. Tracking jitter as function of V dd and frequency with a PLL


bandwidth of 2 MHz. The right-hand scale represents the figure of merit
 [31].

MHz represents a scaling factor of approximately 5000 in (5),


as indicated by the right-hand scale in Fig. 16.
The VCO frequency is set by the tail current in the differential delay stages and is practically independent of
Therefore, the power consumption at a fixed VCO frequency
The graph in Fig. 16 shows that
drops linearly with
suggesting that power
the jitter does not change with
) can be achieved without jitter
reduction (by lowering
penalty. This seems to contradict the common belief that jitter
should increase with lower power consumption. However, the
critical parameter for low jitter is not power consumption
but current consumption, as has previously been theoretically
derived for LC oscillators [39], [40]. As shown by these
measurements, low-jitter design with a fixed power budget
and as large a current
should be based on a minimum
as can be tolerated. The measured power consumption at 250
MHz and 2.5 V is 18 mW and is dominated by buffers and
the current-mode interpolator. Simulations of extracted layout
indicate that the VCO consumes 0.7 mW at 250 MHz and 5
mW at 1200 MHz. The PLL characteristics are summarized
in Table II.
X. CONCLUSION
Clock recovery circuits in CMOS processes require special
techniques for initial frequency locking. This need is due to the
fact that CMOS process variations dictate a larger frequency
tuning range than can be covered by existing frequency
detectors for NRZ data. An attractive technique for initial

1959

TABLE II
PLL CHARACTERISTICS

frequency locking is phase selection clock recovery, where


a multioutput VCO is locked onto a reference clock and a
clock recovery loop selects one of the output phases of the
VCO. The large period jitter in traditional phase selection
clock recovery is eliminated by the feedback phase selection
technique presented here. This scheme filters the phase jumps
through the PLL loop filter and also enables accurate phase
interpolation with digital circuitry only, as opposed to the
conventional analog-style phase interpolation.
In applications where the PLL is programmable, important
loop characteristics such as bandwidth and damping factor
change with the frequency multiplication mode. By making
the charge-pump current depend on the division ratio in the
feedback divider, a fixed bandwidth and damping factor can
be obtained.
Differential analog circuits have superior supply noise rejection compared to digital complementary logic styles and
are therefore preferred in an environment with large powersupply noise. However, previous differential PLL implementations have used circuits requiring large headroom, thereby
operation. Circuit techniques for PLL
prohibiting lowoperation in
components are discussed that enable lowa process technology without low-threshold devices. Correct
V, but
operation has been verified down to
the lower limit for differential operation is estimated to be
V in a process with
V and
V.
measured
conMeasurements show that jitter is independent of
tradicting the common belief that jitter is strongly correlated
to power consumption. At a fixed operating frequency, power
reduction is achieved without any penalty in jitter performance
by lowering
The tracking jitter at 5001500 MHz was measured to be
25 ps rms dominated by device noise. This indicates that a
standard ring oscillator can fulfill the jitter specification for a
SONET OC-48 receiver.
This paper demonstrates that a clock recovery circuit with
programmable bit rates can be realized with a large frequency
tuning range. Robust operation and low jitter are achieved over
a large range of power-supply voltages, making it ideal for
low-power applications and suitable as a reusable macrocell.

1960

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 12, DECEMBER 1999

REFERENCES
[1] K. Bult, Analog broadband communication circuits in pure digital deep
sub-micron CMOS, in Proc. IEEE Int. Solid-State Circuits Conf., 1999,
pp. 7677.
[2] H. Neuteboom, B. M. J. Kup, and M. Janssens, A DSP-based hearing
instrument IC, IEEE J. Solid-State Circuits, vol. 32, pp. 17901806,
Nov. 1997.
[3] W. Lee, P. E. Landman, B. Barton, S. Abiko, H. Takahashi, H. Mizuno,
S. Muramatsu, K. Tashiro, M. Fusumada, L. Pham, F. Boutaud, E.
Ego, G. Gallo, H. Tran, C. Lemonds, A. Shih, M. Nandakumar, R.
H. Eklund, and I. C. Chen, A 1-V programmable DSP for wireless
communication, IEEE J. Solid-State Circuits, vol. 32, pp. 17661776,
Nov. 1997.
[4] F. M. Gardner, Phase-Lock Techniques. New York: Wiley, 1979.
[5] R. J. Baumert, P. C. Metz, M. E. Pedersen, R. L. Pritchett, and J. A.
Young, A monolithic 50200 MHz CMOS clock recovery and retiming
circuit, in Proc. IEEE Custom Integrated Circuits Conf., 1989, pp.
14.5.14.
[6] K. M Ware and C. G. Sodini, A 200-MHz CMOS phase-locked loop
with dual phase detectors, IEEE J. Solid-State Circuits, vol. 24, pp.
15601568, Dec. 1989.
[7] J. Sonntag and R. Leonowich, A monolithic CMOS 10 MHz DPLL
for burst-mode data retiming, in Proc. IEEE Int. Solid-State Circuits
Conf., 1990, pp. 194195.
[8] M. Horowitz, A. Chan, J. Cobrunson, J. Gasbarro, T. Lee, W. Leung,
W. Richardson, T. Thrush, and Y. Fujii, PLL design for a 500 MB/s
interface, in Proc. IEEE Int. Solid-State Circuits Conf., 1993, pp.
160161.
[9] S. Sidiropoulos and M. A. Horowitz, A semidigital dual delay-locked
loop, IEEE J. Solid-State Circuits, vol. 32, pp. 1683 1692, Nov. 1997.
[10] S. Kasturia, A novel fractional divider for improving the switching
speed of phase-locked frequency synthesizers, Bell Labs Tech. Memo.,
May 1995.
[11] J. G. Maneatis, personal communication, Feb. 1999.
[12] T. H. Lee, K. S. Donnelly, J. T. C. Ho, J. Zerbe, M. G. Johnson,
and T. Ishikawa, A 2.5 V CMOS delay-locked loop for an 18 Mbit,
500 megabytes/s DRAM, IEEE J. Solid-State Circuits, vol. 29, pp.
14911496, Dec. 1994.
[13] T. H. Hu and P. R. Gray, A monolithic 480 Mb/s parallel
AGC/decision/clock-recovery circuit in 1.2 m CMOS, IEEE J.
Solid-State Circuits, vol. 28, pp. 13141320, 1993.
[14] C.-H. Park and B. Kim, A low-noise 900 MHz VCO in 0.6 m CMOS,
IEEE J. Solid-State Circuits, vol. 34, pp. 15861591, May 1999.
[15] J. Lee and B. Kim, A 250 MHz low jitter adaptive bandwidth PLL,
in Proc. IEEE Int. Solid-State Circuits Conf., 1999, pp. 346347.
[16] A. Hajimiri and T. H. Lee, A general theory of phase noise in electrical
oscillators, IEEE J. Solid-State Circuits, vol. 33, pp. 179195, Feb.
1998.
[17] K. Iravani, F. Saleh, D. Lee, P. Fung, P. Ta, and G. Miller, Clock and
data recovery for 1.25 Gb/s Ethernet transceiver in 0.35 m CMOS,
in Proc. Custom Integrated Circuits Conf., 1999, pp. 261264.
[18] V. von Kaenel, D. Aebisher, C. Piguet, and E. Dijkstra, A 320 MHz,
1.5 mW at 1.35 V CMOS PLL for microprocessor clock generation,
in Proc. IEEE Int. Solid-State Circuits Conf., 1996, pp. 132133.
[19] J. G. Maneatis, Low-jitter process-independent DLL and PLL based
on self-biased techniques, IEEE J. Solid-State Circuits, vol. 31, pp.
17231732, Nov. 1996.
[20] F. M. Gardner, Charge-pump phase-lock loops, IEEE Trans. Communications, vol. COM-28, pp. 18491858, Nov. 1980.
[21] M. Johnson and E. Hudson, A variable delayline PLL for CPUcoprocessor synchronization, IEEE J. Solid-State Circuits, vol. SC-23,
pp. 12181223, Oct. 1988.
[22] P. Larsson and J.-Y. Lee, A 400 mW 50380 MHz CMOS programmable clock recovery circuit, in Proc. IEEE ASIC Conf. Exhibit,
1995, pp. 271274.
[23] I. Bloom and Y. Nemirovsky, 1=f noise reduction of metal-oxidesemiconductor transistors by cycling from inversion to accumulation,
Appl. Phys. Lett., vol. 58, no. 15, pp. 16641666, Apr. 1991.

[24] S. L. J. Gierkink, E. A. M. Klumperink, T. J. Ikkink, and A. J. M.


van Tuijl, Reduction of intrinsic 1=f device noise in a CMOS ring
oscillator, in Proc. IEEE European Solid-State Circuits Conf., 1998,
pp. 272275.
[25] J. Crols and M. Steyeart, Switched-opamp: An approach to realize full
CMOS switched-capacitor circuits at very low power supply voltages,
IEEE J. Solid-State Circuits, vol. 29, pp. 936942, Aug. 1994.
[26] A. M. Abo and P. R. Gray, A 1.5 V, 10-bit, 14 MS/s CMOS pipeline
analog-to-digital converter, IEEE J. Solid-State Circuits, vol. 34, pp.
599606, May 1999.
[27] S. Kim, K. Lee, Y. Moon, D-K. Jeong, Y. Choi, and H. K. Lim, A 960Mb/s/pin interface for skew-tolerant bus using low jitter PLL, IEEE J.
Solid-State Circuits, vol. 32, pp. 691700, May 1997.
[28] D. W. Boerstler and K. A. Jenkins, A phase-locked loop clock generator
for a 1 GHz microprocessor, in Proc. IEEE Symp. VLSI Circuits, 1998,
pp. 212213.
[29] H. O. Johansson, A simple precharged CMOS phase frequency detector, IEEE J. Solid-State Circuits, vol. 33, pp. 295298, Feb. 1998.
[30] P. Larsson, High-speed architecture for a programmable frequency
divider and a dual-modulus prescaler, IEEE J. Solid-State Circuits, vol.
31, pp. 744748, May 1996.
[31] J. A. McNeill, Jitter in ring oscillators, IEEE J. Solid-State Circuits,
vol. 32, pp. 870879, June 1997.
[32] P. Larsson, A 21600 MHz 1.22.5 V CMOS clock-recovery PLL
with feedback phase-selection and averaging phase-interpolation for
jitter reduction, in Proc. IEEE Int. Solid-State Circuits Conf., 1999,
pp. 356357.
[33] J. F. Ewen et al., Single-chip 1062 Mbaud CMOS transceiver for serial
data communication, in Proc. IEEE Int. Solid-State Circuits Conf.,
1995, pp. 3233.
[34] A. Hajimiri, S. Limotyrakis, and T. H. Lee, Jitter and phase noise
in ring oscillators, IEEE J. Solid-State Circuits, vol. 34, pp. 790804,
June 1999.
[35] I. Novof, J. Austin, R. Chmela, T. Frank, R. Kelkar, K. Short, D. Strayer,
M. Styduhar, and S. Watt, Fully-integrated CMOS phase-locked loop
with 15240 MHz locking range and 50 ps jitter, in Proc. IEEE Int.
Solid-State Circuits Conf., 1995, pp. 112113.
[36] Z.-X. Zhang, H. Du, and M. S. Lee, A 360 MHz 3 V CMOS PLL
with 1 V peak-to-peak power supply noise tolerance, in Proc. IEEE
Int. Solid-State Circuits Conf., 1996, pp. 134135.
[37] I. A. Young, M. F. Mar, and B. Bhushan, A 0.35 m CMOS 3880
MHz PLL N/2 clock multiplier and distribution network with low jitter
for microprocessors, in Proc. IEEE Int. Solid-State Circuits Conf., 1997,
pp. 330331.
[38] A. Demir, A. Mehrotra, and J. Roychowdhur, Phase noise in oscillators:
A unifying theory and numerical methods for characterization, in Proc.
ACM/IEEE Design Automation Conf., June 1998, pp. 2631.
[39] Q. Huang, On the exact design of RF oscillators, in Proc. IEEE
Custom Integrated Circuits Conf., 1998, pp. 4144.
[40] P. Kinget, Integrated GHz voltage controlled oscillators, in Proc.
Advances in Analog Circuit Design, Nice, France, Mar. 1999.

Patrik Larsson received the Ph.D. degree from


Linkoping University, Sweden, in 1995.
During his Ph.D. research, he investigated the
inherent analog properties of digital circuits, such
as di=dt noise, clock skew, and clock slew rate.
After graduation, he joined Bell Laboratories, where
he is currently working on VCOs and PLLs for
gigabit/second communication. He has also been
working on low-power digital filtering for cable
modems, equalization, and clock recovery structures
while maintaining his interest in di=dt noise.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

805

A 2.5-Gb/s Clock and Data Recovery


IC with Tunable Jitter Characteristics
for Use in LANs and WANs
Keiji Kishine, Member, IEEE, Noboru Ishihara, Member, IEEE, Ken-ichi Takiguchi,
and Haruhiko Ichino, Member, IEEE
Abstract A 2.5-Gb/s monolithic clock and data recovery
(CDR) IC using the phase-locked loop (PLL) technique is
fabricated using Si bipolar technology. The output jitter
characteristics of the CDR can be controlled by designing the
loop-gain design and by using the switched-filter PLL technique.
The CDR IC can be used in local-area networks (LANs) and in
long-haul backbone networks or wide-area networks (WANs).
Its power consumption is only 0.4 W. For LANs, the jitter
generation of the CDR when the loop gain is optimized is 1.2 ps
(0.003 UI). The jitter characteristics of the CDR optimized for
WANs meet all three types of STM-16 jitter specifications given
in ITU-T Recommendation G.958. This is the first report on a
CDR that can be used for both LANs and WANs. This paper
also describes the design method of the jitter characteristics of
the CDR for LANs and WANs.
Index TermsClock and data recovery (CDR), IC, jitter suppression, local-area network (LAN), low jitter, phase-locked loop
(PLL), transmission receiver, wide-area network (WAN), 2.5
Gb/s.

I. INTRODUCTION

PTICAL communication systems, which are used in


local-area networks (LANs) and wide-area networks
(WANs), are expected to play an important role in realizing
the future multimedia society. These systems must be compact, economical to produce, and efficient in terms of power
consumption. Given these requirements, researchers have been
developing low-power and small-size optical receiver/sender
(OR/OS) modules. A clock and data recovery (CDR) circuit
is one of the key components of the OR, which must have
retiming, reshape, regeneration (3R) operation. To ensure that
the receivers have low power consumption and are costeffective and compact, it is essential to employ a single-chip,
adjustment-free CDR IC using the phase-locked loop (PLL)
technique without any high- components.
A number of approaches have been proposed for developing
a CDR IC using the PLL technique [1], [2]. Generally, the
jitter specifications for the CDR differ depending on what
it is being used for, and jitter suppression is one especially
serious problem for the CDR-IC design. There are different
jitter specifications for the following two applications.
Manuscript received August 19, 1998; revised February 8, 1999.
K. Kishine and H. Ichino are with NTT Network Innovation Laboratories,
Yokosuka, Kanagawa 239-0847 Japan.
N. Ishihara is with NTT Opto-electronics Laboratories, Atsugi-shi, Kanagawa 243-01 Japan.
K. Takiguchi is with NTT Electronics Corp., Atsugi-shi, Kanagawa Pref.
243-0032 Japan.
Publisher Item Identifier S 0018-9200(99)04198-0.

Case 1) LANs such as gigabit/second ethernets, fiber


channels, and other optical interconnections. They
use a single span of a transmission medium.
Case 2) Backbone networks or WANs such as synchronous digital hierarchy (SDH) or synchronous
optical network. They use line regenerators to
transport information over long distances.
For case 1), the CDR must suppress mainly the jitter generated
due to noise in the CDR, so-called jitter generation. In case 1),
there is no jitter accumulation due to cascaded regenerators.
For case 2), however, the ITU-T G.958 recommendation
for SDH stipulates other specifications [3]: a) jitter transfer
specification, which is the criterion of the suppression of
the noise in input signals to line regenerators, and b) jitter
tolerance specification.
This paper describes a 2.5-Gb/s CDR that can be used in
both cases, which eliminates the need to fabricate two chips
with different characteristics. The key design techniques are
based on the switched-filter (SF) PLL technique and loop gain
adjustment using a gain control amplifier (GCA) circuit. The
CDR IC is fabricated using 0.5- m Si bipolar technology.
The loop gain and loop bandwidth can be adjusted using a
control signal from outside the chip. For case 1), the rms
jitter generation of the CDR can be reduced to only 1.2 ps,
and the capture range is 150 MHz. For case 2), the jitter of
the CDR meets the jitter specifications of the ITU-T G.958
recommendation. The rms jitter generation is 3.6 ps, and the
capture range is 50 MHz. The power consumption of the CDR
for both cases is only 0.4 W.
In Section II, the concept of the suppression of jitter in each
case is discussed. It is explained that the SF PLL technique
can be used in the CDR for both cases. Design details
and the configurations of the circuits of the CDR are given
in Section III. Section IV discusses the experimental results,
which show that the CDR has very good jitter characteristics,
and discusses the feasibility of using the CDR for various
transmission systems.
II. CONCEPT FOR JITTER-SUPPRESSION DESIGN
A. Jitter Characteristics of CDR Using the PLL Technique
Generally, output jitter of a CDR based on the PLL technique can be caused by two kinds of sources: 1) additive
noise that accompanies the input signal [Fig. 1(a)] and 2)

00189200/99$10.00 1999 IEEE

806

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

(a)

(b)

Fig. 2. Loop-gain dependence of jitter.

Fig. 1. Noise source (a) in input signal and (b) in PLL.

noise generated in the CDR [Fig. 1(b)]. In Fig. 1(b), two cases
[noise forward and present for voltage-controlled oscillation
(VCO)] have been shown to be equivalent in terms of VCOphase fluctuation [4]. In addition, because the phase drift of the
VCO output due to the random input data is random, Fig. 1(b)
gives a rough approximation of the noise due to the input data
pattern, with no jitter applied.
To suppress the jitter caused by additive noise, the CDR
should be designed so that the noise bandwidth of the PLL is
minimized. The output jitter (the phase deviation [in rad]) of
the CDR using the PLL technique is expressed as [4]
(1)
is the power spectra density of noise, is the input
where
is the natural angular frequency, is the
signal amplitude,
damping factor, and is the loop gain. When the loop gain
becomes larger, the jitter
becomes larger, as shown in the
Appendix. This means that smaller loop gain causes narrow
noise bandwidth, thereby suppressing jitter. It should be noted
that smaller loop gain leads to a smaller cutoff frequency of
the jitter transfer function of a PLL. On the other hand, in
order to suppress the jitter caused by noise generated in the
CDR circuit, the operation of the CDR circuit needs to be
made stable. This stability can be obtained by reducing the
signal fluctuation in the CDR circuit caused by the input of
consecutive data bits, device noise, and so on. This is the socalled suppression of jitter generation, which is specified for
SDH in ITU-T Recommendation G.958. The output jitter (in
rad) in this case (jitter generation) is expressed as [4]
(2)
where is the power spectra density of noise. This equation is
derived assuming that the instantaneous frequency deviation of
the VCO output is caused by disturbance due to random phase
becomes larger,
noise. In this equation, when the loop gain
becomes smaller, as shown in the Appendix. In
the jitter
other words, the jitter increases as the loop gain decreases.
Larger loop gain can reduce the jitter caused by the noise in

the CDR. However, larger loop gain results in a larger cutoff


frequency of the jitter transfer function.
Consequently, there is a tradeoff, as shown in Fig. 2,
between reducing the jitter shown in Fig. 1(a) (equivalent to
reducing the cutoff frequency of the jitter transfer function)
and reducing the jitter shown in Fig. 1(b) (jitter generation).
When the jitter of Fig. 1(a) is dominant, the loop gain must
be controlled to be lower (area 1 in Fig. 2). When the jitter of
Fig. 1(b) is dominant, it must be controlled to be higher (area
2 in Fig. 2). The optimum loop gain depends on which type
of jitter has a greater effect on the output jitter of the CDR
and causes degradation of transmission quality in the system.
B. Design of the CDR
Given the previous discussion, it is clear that there should
be two types of CDR design, one for LANs and another for
WANs.
1) CDR for LANs: In the case for LANs, the jitter from
input signals is small because there is no jitter accumulation
through the short and single laser-fiber-receiver span. We can
therefore concentrate on reducing the jitter generation, which
is caused by the input-signal-pattern dependence of the circuit,
the fluctuation of the supply voltage, and device noise in the
CDR. As described in Section II-A, this design should not
utilize smaller loop gain to lower the cutoff frequency, but
instead should utilize larger loop gain to achieve smaller output
jitter.
2) CDR for WANs: In the case for backbone networks or
WANs, the regenerator may be cascaded in order to transport
information over long distances, causing the jitter to accumulate. Therefore, not only the jitter generation of the CDR
has to be taken into consideration but also the jitter transfer
characteristics, which is the criterion of suppression of noise in
input data signals as given in ITU-T Recommendation G.958.
There is a tradeoff between reducing the jitter generation and
reducing the cutoff frequency of the jitter transfer function.
a) Jitter transfer: The loop gain of the CDR IC using the
PLL technique, on an IC whose jitter transfer specifications
meet those of ITU-T Recommendation G.958, must be designed to be lower. The jitter transfer function of the 2.5-Gbit/s
PLL using a lag-lead filter can be expressed by substituting the

KISHINE et al.: CDR IC FOR LANS AND WANS

807

Fig. 3. Jitter transfer function.

phase transfer function for the jitter transfer function as

(3)
This function is plotted in Fig. 3. It is a curve when the time
and
of the lag-lead filter described in the
constants
Appendix are set to values that provide that the natural angular
becomes nearly 2 MHz, and the damping factor
frequency
is larger than the value where the jitter gain peaking is less
than 0.1 dB. Fig. 3, in which it is stipulated that the loop gain
is lower than 3.2 10 (1/s), indicates that the curve of the
smaller loop gain meets the ITU-T (STM-16) specification.
b) Jitter generation: As described in Section II-A, as the
loop gain becomes small, it becomes more difficult to suppress
the jitter generation because of the tradeoff between reducing
jitter generation and reducing the cutoff frequency of the jitter
transfer function. To solve this problem, we introduce the SF
PLL technique. Fig. 4 shows the SF CDR configuration, which
we originally proposed as a way to maintain a precise clock
signal, thereby achieving tolerance to the input of consecutive
data bits. (Our previous work, 156-Mb/s SF CDR [5], has no
GCA in Fig. 4.) The main features of SF circuit operation
are that the PC output can be transferred to the low-pass
filter (LPF) only when data transitions occur (sample mode)
and the LPF output can be constant during consecutive data
inputs (hold mode). These features prevent the phase drift of
VCO output during the input of consecutive data bits. We
thought that the equivalent high- operation of the SF circuit
could be utilized to reduce the jitter generation. Fig. 5 shows
simulation results of the change in differential output voltage
of the loop filter when the input signal changes from a 1/0repeated bitstream to consecutive data bits (in this case, 0) at
805 ns. Fig. 5 shows the filter output of the SF CDR levels out,
while that of the CDR without the SF circuit begins to degrade
at 805 ns. This means that the operation of the SF circuit is
equal to that of the larger RC time constant of the loop filter,
and the jitter generation due to the input signal pattern would
be more suppressed than that of the CDR without an SF circuit.

In other words, the SF circuit would provide equivalent highoperation and achieve low jitter operation.
We also thought this advantage of the SF circuit could
be used to solve the tradeoff problem. Fig. 6 shows the
SPICE simulation results of the cutoff-frequency (of the jitter
transfer curve) dependence of the jitter generation of the 2.5Gb/s CDRs both with the SF circuit and without it. Both
curves indicate that the jitter generation decreases as the
cutoff frequency increases. Furthermore, the jitter generation
of the SF CDR is 70% lower than that of the CDR without
the SF circuit. It is noteworthy that the suppression of jitter
generation by the SF circuit is marked. In addition, the jitter
characteristics of the SF CDR meet the STM-16 specifications
(rms jitter generation that is lower than 4 ps, and equivalent
jitter transfer specification in which the cutoff frequency at 3dB jitter gain is lower than about 2.8 MHz), while the CDR
circuit without the SF circuit fails to meet the specifications.
In the SPICE simulation, the source of jitter is the instability
of the circuit operation of the CDR, with no jitter applied at
random input. In the experimental results, the device noise,
the fluctuations of supply voltage, and so on are also noise
sources. The jitter generation in experiments is therefore larger
than that in the simulation results. The simulation results do,
however, show the characteristics of jitter generation versus
jitter transfer functions.
Given our findings, we conclude that in the design of the
CDR IC using the PLL technique, an IC used in backbone
networks or WANs, the loop gain must be large enough so that
the jitter generation meets the ITU-T specs, yet small enough
so that suitable jitter transfer characteristics are obtained.
III. CIRCUIT DESIGN
A block diagram of the CDR, including a GCA circuit
between the SF and VCO, is shown in Fig. 4 [6]. This CDR
can be used in both short- and long- distance transmission
systems by adjusting the loop gain through the GCA from
outside the chip. The main features of the CDR are 1) an SF
circuit for equivalent high- operation [5], 2) optimum timing
adjustment between extracted clock and input data, and 3) loop
gain control on optimization.
A. VCO
In Fig. 7, the circuit configuration of VCO is shown. The
oscillation frequency is controlled by the voltage swing of
VC1, VC2, which is determined by the feedback signal
from GCA. The free-running frequency is determined by the current IF1, IF2, which can be controlled from
outside the chip. The free-running frequency can be adjusted
from 2.2 to 2.8 GHz. It covers the free-running frequency
deviation caused by fluctuations in device performance. Fig. 8
shows the tuning-voltage (feedback-voltage) dependence of
the oscillation frequency. The simulation results are in good
agreement with the experimental results. The VCO modulation
frequency sensitivity is designed to be about 1 GHz/V. The
oscillation frequency range by a feedback signal is 200 MHz.
The tuning-range diagram is shown in Fig. 9. The total tunable
range is sufficiently wide, from 2.0 to 3.0 GHz.

808

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

Fig. 4. SF CDR with GCA.

Fig. 5. Output voltage of low-pass filter (differential output).


Fig. 6. Cutoff-frequency dependence of jitter generation.

B. GCA
A current-bypass GCA circuit is used in the CDR (see
Fig. 10). The gain of the GCA, which can be controlled from
outside the chip, can be varied from 40 to 0 dB. To lower
the jitter generation of the CDR, the gain should be higher. On
the other hand, to achieve the lower cutoff frequency of the
jitter transfer curve, it must be lower. Therefore, gain should
be adjusted according to the jitter specification in each case.
C. Delay Circuit
To reduce jitter generation, the edge-inclined circuit in the
90 -delay block shown in Fig. 4, which includes a capacitor
for delay control [5], is replaced by a chain of emitter-coupled
logic (ECL) buffer circuits without capacitors, the delay of
which can be adjusted from about 100 to 300 ps from outside
the chip. The edge-inclined circuit was employed to make
the 156-Mbit/s CDR smaller. But the delay needed for the
2.5-Gbit/s CDR is only 200 ps, which is much smaller than
that needed for the 156-Mbit/s CDR. Therefore, only a small
number of ECL circuits are needed for the 200-ps delay, and

the delay circuit itself is relatively small. In addition, in the


ECL circuit, there is no input-data-pattern dependence of the
response in the edge-inclined circuit capacitor. When the ECL
delay circuit is used, the simulated jitter due to the input
data pattern is about 80% of that when an edge-inclined-delay
circuit is used. As a result of using the ECL circuit, the jitter
due to the input pattern effect is more suppressed than in the
circuit reported in our previous work [5].
D. Loop Filter
The lag-lead filter is used as the loop filter, and the RC time
constant is adjusted for each use. An additive capacitor outside
the chip is not needed when the CDR is used for short-distance
transmission systems. It is, however, needed for long-distance
use.
E. Other Considerations
Furthermore, in order to guarantee jitter tolerance, it is
important to maintain an optimum timing adjustment between

KISHINE et al.: CDR IC FOR LANS AND WANS

809

Fig. 7. VCO configuration.

Fig. 10. GCA configuration.


Fig. 8. VCO tuning curve.

Fig. 11. GCA gain dependence of jitter generation.

Fig. 9. Range of controllable oscillation frequency.

the extracted clock and the input data. This timing adjustment
is attained by allowing the clock to trigger the center of the
data period by means of the phase-comparator (PC) output

signal generated from the comparison between the phase of the


delay flip-flop output and the 90 -delayed phase for the input
data. In addition, in order to lower the power consumption,
the new 2.5-Gb/s CDR IC uses stacked differential pairs on
two levels, which enables its supply voltage to be decreased
to 3.0 V (as opposed to 5.2 V in our previous work [5]).
Furthermore, current dissipation is optimized in each block to
reduce power consumption.
IV. EXPERIMENTAL RESULTS
A new chip was fabricated using the 0.5- m super selfaligned process technology Si bipolar process [7]. It was

810

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

(a)

(b)

Fig. 12.

Output waveforms of the CDR for LAN. (a) Data output. (b) Clock output.

(a)

(b)

Fig. 13.

Output waveforms of the CDR for WAN. (a) Data output. (b) Clock output.

mounted in a 7 7 mm -square ceramic package. The CDR


IC of both high and low loop gain is evaluated in each case
when the gain is adjusted to both short- and long-distance
transmission systems. Jitter was measured with a commercial
jitter analyzer. An rms jitter-generation value from which the
jitter value of input data is subtracted can be obtained with
the analyzer.
A. SF CDR for LAN
The internal capacitor in the loop filter is 10 pF, and an
external capacitor is not needed in this case. The GCA gain
dependence of the jitter generation is shown in Fig. 11. The
jitter generation decreases as the gain increases, and the lowest
point is when the gain is larger than about 8 dB. The loop
10 (1/s). The output waveforms
gain at this point is 1.2
when the loop gain is set to that point and input data is 2.488 32

Gb/s are shown in Fig. 12. The eye opening of the output data
was sufficiently wide, and clock extraction was very precise.
The rms jitter generation is 1.2 ps and the capture range is
over 150 MHz.
B. SF CDR for WAN
To lower the cutoff frequency of the jitter transfer curve, the
external capacitor for the loop filter of 0.1 F is added. The
loop gain is set to about 2 10 (1/s), which is the loop gain
when the jitter transfer curve meets the ITU-T jitter transfer
specification in Fig. 3.
The output waveforms when the loop gain is set to the
point above are shown in Fig. 13. Again, the eye opening of
the output data was sufficiently wide, and clock extraction
was very precise. The rms jitter generation is 3.6 ps, which is
larger than that of the CDR when its loop gain is adjusted for

KISHINE et al.: CDR IC FOR LANS AND WANS

Fig. 14.

811

Jitter transfer function.

Fig. 16. Loop-gain dependence of jitter generation and cutoff frequency of


jitter transfer function.
Fig. 15.

Jitter tolerance curve.

short-distance transmission systems, but is smaller than the


specification of the jitter generation of 4.0 ps (for STM-16;
ITU-T Recommendation G.958). The capture range is over
50 MHz. Fig. 14 shows the measured jitter transfer function
of the CDR in this case. The curve meets the ITU-T G.958
specification. Fig. 15 shows the jitter tolerance curve when the
input jitter magnification is 120% of the ITU-T specification.
The squares indicate error-free operation (where the error rate
is lower than 10 ). The rms jitter generation, jitter tolerance,
and jitter transfer function all meet the jitter specifications in
ITU-T G.958.
The relationship between the measured jitter generation (or
cutoff frequency) and the loop gain in this experiment is
shown in Fig. 16. The darker shaded area is for the CDR,
whose jitter characteristics meet the specifications of ITU-T
Recommendation G.958. In the area of larger loop gain, the
jitter generation becomes small. Fig. 16 shows clearly that,
when its loop gain is optimized, the CDR IC is suitable for both
LANs and WANs. The capture range of both types of CDRs
is wide enough to cover the deviation in the free-running
frequency due to changes in temperature (ranging from 5 to
90 C In addition, the power consumption (including that of

the I/O circuit) in both cases is less than 35% of that in the
2.5-Gb/s PLLs reported previously [1], [2].
V. CONCLUSION
The design method of the CDR for both LANs and WANs
is presented. A new 2.5-Gb/s SF monolithic CDR IC using
the 0.5- m Si bipolar process has been developed. The CDR
IC can be used in the transmission receivers for both LANs
and WANs. The rms jitter generation of the CDR adjusted
for LANs is 1.2 ps. Furthermore, the jitter characteristics
of the CDR for backbone networks or WANs meet the
specifications for STM-16 given in ITU-T Recommendation
G.958. In addition, the power consumption of the CDR is
only 0.4 W.
APPENDIX
A. Loop-Gain Dependence of the Jitter Shown in Fig. 1
The jitter due to the noise in the input signal to the PLL is
expressed as (1). When the loop filter is a lag-lead type (the
and
and the
series and shunt register are respectively
and
shunt capacitance is ), the natural angular frequency

812

the damping factor

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 6, JUNE 1999

are expressed as

where
and
in (1) therefore becomes

[7] C. Yamaguchi, Y. Kobayashi, M. Miyake, K. Ishii, and H. Ichino, A


0.5  bipolar technology using a new base formation method, in
Proc. BCTM, 1993, pp. 6366.

The term

(A1.1)

This shows that


Furthermore,

increases as the loop gain increases.


in (1) can be expressed as

(A1.2)
increases. Therefore, the
This also shows that
jitter in Fig. 1 increases as the loop gain increases.
B. Loop-Gain Dependence of the Jitter Shown in Fig. 2
The jitter due to the noise in the PLL can be expressed as
is
(2). In (2), the term
(A2.1)

Keiji Kishine (M98) was born in Kyoto, Japan,


on October 26, 1964. He received the B.S. and
M.S. degrees in engineering science from Kyoto
University, Kyoto, in 1990 and 1992, respectively.
In 1992, he joined the Electrical Communication Laboratories, Nippon Telegraph and Telephone
Corp. (NTT), Tokyo, Japan. At the NTT System
Electronics Laboratories, Kanagawa, Japan, he was
engaged in research and design of high-speed, lowpower circuits for Gbit/s LSIs using Si-bipolar
transistors with application to optical communication systems. Since 1997, he has worked on research and development of
Gbit/s clock and data recovery IC at the Photonic Network Laboratory, NTT
Network Innovation Laboratories, Kanagawa, Japan.
Mr. Kishine is a member of the Institute of Electronics, Information and
Communication Engineers (IEICE) of Japan.

Noboru Ishihara (M89) was born in Gunma,


Japan, on April 27, 1958. He received the B.S.
degree in electrical engineering from Gunma
University, Gunma, in 1981 and the Dr.Eng. degree
from the Tokyo Institute of Technology, Tokyo,
Japan, in 1997.
In 1981, he joined the Electrical Communication
Laboratory, NTT, Tokyo, where he has been
engaged in research and development of analog
ICs for communication use. His recent work is in
the area of low-power and high-speed analog ICs
for optical communications.
Mr. Ishihara is a member of the Institute of Electronics, Information
and Communication Engineers (IEICE) of Japan and the IEEE Microwave
Theory and Techniques Society.

decreases as the loop gain increases. In


The term
is expressed as
addition, the term
(A2.2)
also decreases as the loop gain increases. Therefore,
the jitter in Fig. 2, expressed in (2), decreases as the loop gain
increases.
ACKNOWLEDGMENT
The authors wish to thank H. Yoshimura and K. Sato for
their helpful discussions and suggestions.
REFERENCES
[1] H. Ransjin and P. OConner, A PLL-based 2.5b/s GaAs clock and data
Regenerator IC, IEEE J. Solid-State Circuits, vol. 26, pp. 13451353,
Oct. 1991.
[2] R. Walker, C. Stout, and C.-S. Yen, A 2.488Gb/s Si-bipolar clock and
data recovery IC with robust loss of signal detection, in ISSCC Dig.
Tech. Papers, 1997, pp. 246247.
[3] Digital Line Systems Based on the Synchronous Digital Hierarchy for
Use on Optical Fiber Cables, CCITT Rec. G.958.
[4] A. Blanchard, Phase-Locked Loops. New York: Wiley, 1976, ch. 8.
[5] N. Ishihara and Y. Akazawa, A monolithic 156Mb/s clock and data
recovery PLL circuit using the sample-and hold technique, IEEE J.
Solid-State Circuits, vol. 29, pp. 15661571, Dec. 1994.
[6] K. Kishine, N. Ishihara, and H. Ichino, Jitter-suppressed low-power
2.5-Gbit/s clock and data recovery IC without high-Q components,
Electron. Lett., vol. 33, no. 18, pp. 15451546, Aug. 1997.

Ken-ichi Takiguchi was born in Kanagawa, Japan, on July 21, 1969. He


graduated from Tokyo Computer School, Tokyo, Japan, in 1992.
In 1992, he joined NTT Electronics Corp., Kanagawa. He has been engaged
in development of high speed ICs, especially gigabit/second PLL ICs.

Haruhiko Ichino (M89) was born in Yamaguchi, Japan, on January 26, 1957.
He received the B.S., M.S., and Ph.D. degrees in applied physics from Osaka
University, Osaka, Japan, in 1979, 1981, and 1994, respectively.
In 1981, he joined the Electrical Communication Laboratories, Nippon
Telegraph and Telephone Corp. (NTT), Tokyo, Japan. He has been engaged
in research and development of Gbit/s SSI-MSIs using bipolar transistors
(Si bipolar transistor and AlGaAs/GaAs HBT), with application to Gbit/s
optical communication systems and high-frequency satellite communication
systems. His work also includes low-power Gbit/s LSIs for SDH networks
and future ATM switching systems; and O-E, E-O converter modules. During
this research and development, he also worked on the modeling of a highspeed bipolar transistor, analyzing ECL gate delay and maximum operating
speed of GHz flip-flop, and high-speed design methodology based on gatearray and standard-cell approaches. His interests include high-speed packaging
and measurement systems. Since 1997, he has worked on research and
development of Gbit/s-interface hardware design of photonic transport network systems. Currently, he is a Senior Research Engineer, Supervisor, and
Photonic Network Systems Research Group Leader of the Photonic Network
Laboratory, NTT Network Innovation Laboratories, Kanagawa, Japan. He was
a Visiting Lecturer at Osaka University during 19951996.
Dr. Ichino is a member of the Institute of Electronics, Information and
Communication Engineers (IEICE) of Japan. He was a Secretary of IEICEs
Technical Group on Integrated Circuits and Devices.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 12, DECEMBER 2002

1781

A 10-Gb/s CDR/DEMUX With LC Delay Line VCO


in 0.18-m CMOS
Jonathan E. Rogers, Member, IEEE, and John R. Long, Member, IEEE

AbstractA monolithic 10-Gb/s clock/data recovery and 1 : 2


demultiplexer are implemented in 0.18- m CMOS. The quadrature LC delay line oscillator has a tuning range of 125 MHz and a
60-MHz/V sensitivity to power supply pulling. The circuit meets
SONET OC-192 jitter specifications with a measured jitter of 8
ps p-p when performing error-free recovery of PRBS 231 1 data.
Clock and data recovery (CDR) is achieved at 10 Gb/s, demonstrating the feasibility of a half-rate early/late PD (with tri-state)
based CDR on 0.18- m CMOS. The 1.9
1.5 mm2 IC (not including output buffers) consumes 285 mW from a 1.8-V supply.
Index TermsBangbang phase-locked loop, clock and data recovery, LC delay line, phase detector, SONET OC-192, voltagecontrolled oscillator.
Fig. 1. Optical receiver architecture.

I. INTRODUCTION

HE VOLUME of data transported over the telecommunications network increased at a compounded annual rate of
100% from 1995 to 2001 in the U.S. (and since 1997 in Europe), mainly due to increased internet traffic [1]. Contrast this
with the historic demand for bandwidth, which grew at an annual rate of between 6% and 10% before the mid-1990s. The
call for technologies, such as interface electronics, which expand the capacity of fiber-based transport links to 10 Gb/s (and
beyond) has risen in response to this explosion in data traffic.
Systems at 10 Gb/s per channel are currently implemented in
either OC-192 or STM-64 formats using the synchronous optical network (SONET OC-192) or European synchronous data
hierarchy (SDH STM-64), respectively.
A typical Gb/s receiver is shown in Fig. 1. It uses a photodiode to convert the incoming nonreturn to zero (NRZ) optical pulses from a single 10-Gb/s fiber channel to a current.
These current pulses are then converted by the transimpedance
(TZ) amplifier into a voltage. The pulses are low-pass filtered
to remove out-of-band noise, thereby improving the received
signal-to-noise ratio (SNR). An automatic gain control amplifier
(AGC) provides additional amplification while compensating
for variations in the received signal power. The clock required to
retime the incoming synchronous data stream is recovered from
Manuscript received April 8, 2002; revised June 25, 2002. This work was
supported by Micronet, NSERC, and the Nortel Institute at the University of
Toronto.
J. E. Rogers was with the RF/MMIC Group, Department of Electrical and
Computer Engineering, University of Toronto, Toronto, ON, Canada. He
is now with Inphi Corporation, Westlake Village, CA 91361 USA (e-mail:
jrogers@inphi-corp.com).
J. R. Long was with the RF/MMIC Group, Department of Electrical and Computer Engineering University of Toronto, Toronto, ON, Canada. He is now with
the Electronics Research Laboratory, Delft University of Technology, 2628CD
Delft, The Netherlands (e-mail: j.r.long@its.tudelft.nl).
Digital Object Identifier 10.1109/JSSC.2002.804337

the received data by the clock/data recovery block (CDR), and


the received data are regenerated. It is then applied to a 1 : N
time-division demultiplexer (DEMUX) to separate the 10-Gb/s
stream into multiple, lower speed channels. Typical ratios for
demultiplexing are 1 : 4, 1 : 8 and 1 : 16. These lower speed channels are then further processed by VLSI CMOS circuitry designed to comply with the SONET or SDH standards.
II. BANGBANG CDR/DEMUX
This paper describes the implementation of an early/late
(bangbang) phase-locked loop (PLL) based CDR in 0.18 m
CMOS. For monolithic CDR implementations, a PLL implementation is preferred [2], [3] since it eliminates narrowband
(i.e., high-Q) resonators, used in direct extraction of a timing
tone by filtering that are difficult to integrate. Also, the use
of CMOS technology is potentially advantageous from both
manufacturability and cost perspectives [4], [5].
Linear and early/late (bangbang) phase detectors (PD) have
been used for Gb/s CDR implementations. A linear PD has an
average output voltage that is directly proportional to the error
between the data and clock phases. This linear relationship
allows the loop to be designed using classical control theory. In
contrast to a linear loop, the early/late phase detector outputs
only signify whether the clock is early or late with respect to the
ideal data sampling instant. A loop incorporating an early/late
phase detector is nonlinear and has a jitter transfer bandwidth
which varies with jitter amplitude. Therefore, characterizing a
bangbang loop requires time-domain numerical simulation,
which appears to be a disadvantage. However, guaranteeing
the stability of a linear phase-locked loop (PLL) often requires
multiple design iterations, lengthening the time to manufacture. Early/late PD-based PLLs are quasi-digital systems and
consequently they are more resistant to component and process

0018-9200/02$17.00 2002 IEEE

1782

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 12, DECEMBER 2002

Fig. 2. Early/late (bangbang) phase detector-based CDR.

variations as well as noise. In addition, an early/late PD possesses intrinsic matching between the sampling and retiming
phases, allowing operation at speeds where it is difficult to
match the delay of a conventional analog phase detector to that
of the retiming latch. It is this combination of robustness and
speed that makes the bangbang PD an excellent choice for an
IC-based implementation.
Fig. 2 shows a block diagram of the CDR-PLL implemented
in this work. Data is compared to the voltage-controlled oscillator (VCO) clock at the loop input. If the falling clock edge
occurs before the data transition (early), the early/late phase de. If the clock edge falls after the
tector outputs a voltage,
data transition (late), then the phase detector outputs a voltage
. In the case where there is no data transition during the clock
period, the phase detector outputs a 0 voltage. The PD output
pulses are then filtered by integral and differential loop filter
branches. An important parameter of the early/late phase detector based CDR is the bangbang frequency step of the VCO,
,
or
(1)
is the output voltage of the phase detector,
where
gain of the proportional branch of the loop, and
tuning gain of the VCO in Hz/V.

is the
is the

A. System-Level Design
Characterization of early/late (bangbang) PLL behavior
is described by Walker [2] and analysis of its application to
SONET compatible CDR systems is found in the publication
by Greshishchev [3].
At the system level, design of the bangbang CDR is reduced
to the selection of two main system parameters. The first is
, which (somewhat inconthe bangbang frequency step
veniently) simultaneously governs jitter tolerance performance,
the related jitter transfer bandwidth, and jitter generation. Optimal jitter generation is realized by setting the bangbang frequency step to the lowest value which still meets the jitter tolof 3 MHz was selected, which inerance specification. A
cludes some margin so that the CDR exceeds the jitter tolerance
specification for OC-192.
The second system parameter is the loop stability factor ,
defined by Walker [2] as

(2)

where is the stability factor,


the gain of the integral branch
of the loop, is the number of bit periods of latency around the
is the unit interval or length of the bit period in
loop, and
seconds.
This stability factor is the ratio of proportional and integral
path gains with a correction for the amount of latency around the
loop. A stability factor greater than unity ensures stability while
the loop is not slew-rate limited in the phase domain. It should
be noted that the bandwidth of the jitter transfer function (JTF)
for a bangbang loop is inversely proportional to the input jitter
amplitude. A JTF for jitter amplitudes in excess of 1 UI cannot
be defined, as the loop will slip cycles when such large jitter inputs are not tracked. To ensure that the loop JTF does not exhibit
peaking, the stability factor is made large enough so that the proportional loop dominates during slew limiting (slope overload)
at jitter inputs less than 1 UI p-p. Therefore, this design implements a large stability factor ( 1 10 ) using interchangeable
(off-chip) loop filter capacitors.
B. CDR Architecture
Fig. 3 shows the CDR/Demux test chip at the block level.
Differential 10-Gb/s data are brought on-chip through DATA
and DATA pins which are broad-band matched to 50 . The
half-rate early/late phase detector compares the data falling edge
with the phase of the quadrature clock. The phase detector produces no output if a falling edge of the data is not present, otherwise an early or late pulse is produced. The phase information generated by the phase detector travels down two separate
paths. The direct path to the oscillator provides high frequency
proportional (bangbang) control for the system. Through this
tuning path, the early/late phase detector outputs translate into
and
from the oscillator
frequency steps of
center frequency. If early or late pulses are not present, the oscillator center frequency remains unchanged.
The second tuning path leads to the input of the charge pump.
The charge pump adds or removes charge from capacitor
for early and late inputs, respectively. When the phase detector
result is neither early nor late, the charge pump does not alter
the charge on the capacitor (charge pump tri-state). The voltage
is used for low-frequency tuning of the VCO, thus
across
completing the integral control loop. An on-chip capacitor
serves to reduce glitches caused by the probe or bond inductances between the chip and external circuitry.
When the PLL has settled into lock, the in-phase clock of the
oscillator is aligned with the ideal data sampling instant. This
allows the 1 : 2 DEMUX to divide the data into two streams,
and these signals are buffered off-chip so that the data recovery
properties of the chip can be analyzed. Although this is significantly less demultiplexing than in an actual 10-Gb/s application,
it is sufficient to demonstrate feasibility. The clock signal is also
brought off-chip for evaluation of the system performance.
Fig. 4 shows the half-rate early/late phase detector. The representation of this block with single-ended signal paths is purely
to simplify the diagram; the actual phase detector is fully differential. The architecture is similar to Hauenschilds phase detector/demux [6], which was implemented in a BiCMOS technology. The main difference here results from modifications
necessary to accommodate the relatively low transconductance

ROGERS AND LONG: 10-Gb/s CDR/DEMUX WITH LC DELAY LINE VCO IN 0.18- m CMOS

Fig. 3.

1783

CDR/DEMUX block diagram.

available from 0.18- m CMOS transistors. In the quadrature


sampling paths, where metastability and hysteresis are a concern, extra buffering is used. Also, the input latch of the first
flip-flop is increased in size in order to improve performance.
All samples enter the logic on the same clock edge simplifying
the early/late logic. The phase detector pulses are retimed at the
output of the phase detector in order to remove asymmetries in
both amplitude and duration from the output pulses. Note that
a drawback to these modifications is unequal loading of the inphase and quadrature clock lines. Thus, care must be taken to
avoid a static offset in the phase detector as this could cause a
reduction in the residual jitter tolerance.
Note that 1 : 2 demultiplexed data could be tapped off directly
from early/late logic inputs A and B of Fig. 4. However, a separate 1 : 2 demux was used for the testchip to minimize loading
of the phase detector latches, at the expense of possible phase
alignment errors at the demux and a slight increase (10 mW) in
power consumption.
III. CIRCUIT DESCRIPTION
The transistor and block level design of the 10 Gb/s CDR
circuits are described in the following sections. The implementation of the phase detector is examined first, followed by an
in-depth description of the LC delay line VCO.
A. Phase Detector
The phase detector logic is implemented in resistively-loaded
MOS current mode logic (MCML). This offers several advan-

tages over conventional CMOS and other all-NMOS implementations. First, it has a relatively low output impedance making
it suitable for high-speed operation. MCML logic also benefits
from reduced logic voltage swing as well as from the elimination of lower mobility PMOS transistors compared to CMOS
logic. The MOS equivalent of a bipolar ECL gate is not practical, especially from a 1.8-V supply due to attenuation of the
signal by source followers and lack of headroom.
Another benefit of using MCML is reduced switching-related supply noise, due to the relatively constant current drawn
from the power supply. For improved supply rejection, the gain
stages and output buffers of the ring oscillator are implemented
as MCML inverter/buffers. An additional benefit of this simple
design is that the clocked phase detector elements can interface
with the ring oscillator without level shifting or swing adjustment.
The first goal of this design is to create a buffer which has the
widest possible bandwidth, while still having enough gain. A
minimum value of approximately 2 for the small-signal gain was
chosen, otherwise the gate noise margin becomes unacceptable.
Biasing of the circuit so that the large-signal switching speed
approaches maximum performance is now considered.
) is seFirst, an appropriate voltage swing (
lected. The voltage swing is made as large as possible, without
forcing the switching transistors into the triode region at any
time during the cycle. Larger drain-to-gate capacitance of the
MOSFET in the triode region limits the switching speed. Note
that gain is (first-order) dependent on the voltage swing, but

1784

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 12, DECEMBER 2002

Fig. 4. Early/late (bangbang) phase detector.

the propagation delay time is not. When the differential pair is


switched so that transistor M1 [see Fig. 5(a)] is carrying all the
bias current, M1 will be in the active region (saturation) as long
(i.e.,
) obeys the following conas voltage swing
dition:
(3)
is the threshold voltage of the switching transistor,
where
is the difference between
and
and
is the
minimum drain source voltage for which M1 is still in the active
force the
region. Note that signal swings in excess of
transistors into the triode region, which puts an upper limit on
the signal swing that is practical.
Once an upper limit for the voltage swing is defined, then
and
it is simply a matter of deciding what proportion of
are used to make up the signal swing (
).
The maximum bandwidth for the MCML circuit is achieved
to
(from circuit simby finding the largest ratio of
ulations) for which the small-signal gain condition is still met,
. Additionally, it must
as this minimizes the load resistance
be ensured that this minimized swing is still sufficient to fully
switch the differential pair. Swings of approximately 700 mV at
current densities of 50100 A/ m of FET width (depending on
the application) are typical of the CML cells used in this work.
The combinatorial logic required by the phase detector is performed using MCML logic gates consisting of cascoded differential pairs [Fig. 5(b)]. These are preferred over MCML gates
consisting of parallel differential pairs since they maintain two
distinct logic levels for all possible input combinations despite
the low transconductance typical in MOS circuits.

(a)

(b)
Fig. 5.

MCML logic gates. (a) Inverter/buffer. (b) Cascode logic gate (AND).

The latch design (refer to Fig. 6) is critical to the jitter generation performance of the CDR system. Two important nonide-

ROGERS AND LONG: 10-Gb/s CDR/DEMUX WITH LC DELAY LINE VCO IN 0.18- m CMOS

Fig. 7.

(a)

1785

LC delay line oscillator.

edges. The physical layout of the MCML latch is also shown


in Fig. 6.
B. LC Delay Line Oscillator

(b)
Fig. 6.

MCML latch. (a) Schematic. (b) Physical layout.

alities, metastability and hysteresis, directly translate into degraded jitter generation performance. This degradation occurs
above and beyond the jitter generation, which is inherent in the
system properties of the early/late phase detector-based PLL.
Metastability occurs when positive feedback during the latch
phase cannot ramp the output from the initial tracked voltage
to the voltage required for reliable recognition of the dig. An approximate value for the minimum time
ital state
is [7]
required to latch an initial voltage

(4)
term is not much greater than
It should be noted that the
one for a typical MCML latch with resistive loading in a typical
0.18- m CMOS technology.
Maximizing the tracking bandwidth reduces the pattern-dependent jitter (hysteresis) caused by the phase detector latches.
Increasing bandwidth by reducing the gain creates a tradeoff between resistance to metastability and the tracking bandwidth.
Tracking bandwidth is improved without increasing the likelihood of metastability by making device sizes in the input latches
(e.g., the master latch) significantly larger than the slave. This
reduces the relative loading of the master by the slave latch. This
technique was used in the latches which sample the incoming
full-rate data in the 1 : 2 demux and on the quadrature clock

Fig. 7 shows a block diagram of the two-stage LC delay line


VCO. The symmetry of this architecture ensures that precise
in-phase (CKI) and quadrature (CKQ) clocks are generated.
The 5-GHz center frequency is tunable through external (EX),
internal (IN), and high-frequency bangbang (BB) inputs.
The external tuning input is used to control the oscillators
center frequency in laboratory testing. This input could also be
incorporated into a frequency-locked calibration loop to boost
the frequency acquisition range of the PLL. The internal tuning
port is part of the integral tuning path, while the bangbang
tuning input completes the higher speed proportional tuning
path. Circuitry in the oscillator core is fully differential (with
the exception of the varactors) in order to reject supply noise.
The LC delay line promotes frequency stability with supply,
process, and temperature variations, without compromising
tuning speed.
The oscillation frequency is determined by the total propagation time through the gain blocks and differential transmission line stages. Load resistors of each MCML buffer match the
output to the 75- characteristic impedance of the delay line. A
delay line impedance of 75 was chosen as a reasonable compromise between power dissipation in the gain stages and attenuation of the delay lines. The delay through each gain stage
is only 7 ps (simulated) due to the low impedance seen at the
drains of the switching transistors.
The delay line is a fully symmetric square microstrip spiral
[see Fig. 8(a)]. Each delay line (two are required) has an outside
dimension of 150 m, a conductor width of 4 m, conductor
spacing of 2 m, and consists of top metal (aluminum) directly
over the substrate. A relatively wide gap between lines reduces
the interwinding capacitance [Co in Fig. 8(b)], allowing a larger
tuning capacitance. The effective inductance seen between each
pair of input and output ports is approximately 2.5 nH with a
parasitic capacitance of 120 fF. The inductance per unit area increases due to mutual coupling between interwound delay lines.
This also reduces chip area compared to the alternative implementation requiring two separate delay lines between gain
stages. In addition to saving chip area, a differential delay line
also improves common-mode rejection of the oscillator. This is

1786

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 12, DECEMBER 2002

(a)

(a)

(b)
Fig. 9. NMOS varactor. (a) Varactor structure. (b) Varactor C V curve.

(b)
Fig. 8.

Balanced LC delay line. (a) Physical layout. (b) GEMCAP2 model.

TABLE I
SIMULATED VCO SENSITIVITIES

because the inductance seen in common mode is less than when


differentially driven, and this places common-mode oscillations
outside the bandwidth of the gain stages.
The delay lines account for 43 ps of delay time each at 5 GHz
or 86% of the total delay around the oscillator loop. Concentration of the loop delay in these lines makes the VCO resistant
to variations in power supply, temperature and process. Table I
compares simulated sensitivities of the LC delay line VCO (including tuning circuitry) with an ring oscillator composed of
identical MCML gain stages (without tuning capability) and
comparable center frequency. Supply pulling is over an order of
magnitude lower for the delay line oscillator, at 45 MHz/V. In
addition, sensitivity of the delay line oscillator to process (based
on transistor variation only) and temperature variations is substantially less than for a ring oscillator, mainly due to the dominance of inductance over capacitive parasitics in the loop delay.

The simulated temperature sensitivity includes back-end metal


and substrate resistivity effects, which dominate temperature dependence of the on-chip delay line. In addition, it is important to
note that the tuning response time of the delay line oscillator is
on the order of the clock period (comparable to a conventional
ring oscillator), which is important in the bangbang PLL application.
GEMCAP2 [8] is used to derive a SPICE-compatible lumpedelement model for the delay line [see Fig. 8(b)] and refine the
oscillator design. Each pi-section of this model corresponds to
an individual conductor segment of the delay line, with circuit
elements representing the self-inductance, frequency-dependent
resistances (e.g., skin effect), capacitance to the substrate as well
as the capacitance and loss of the substrate itself. Also modeled
is the capacitance and mutual magnetic coupling between windings. These elements are then combined to form the multi-segment delay line model which is employed in transient simulations of the oscillator.
The remaining capacitance required for oscillation at 5 GHz
is added by inversion-mode NMOS varactors at each gain stage
input (a high impedance node). Integral loop tuning range is
designed at 32.5 MHz/V, and the (simulated) voltage swing at
the clock buffer inputs is 2.5 V differential, which consumes
extra power but improves switching speed of the MCML logic.
The varactors were selected based on practical issues related
to the fabrication technology [9]. Fig. 9 shows a cross section
and the tuning characteristics of the inversion mode NMOS varactor used in the VCO. This varactor was chosen because the
four-terminal NMOS transistor model in the IC design kit could
be used for circuit simulations without modification.
IV. EXPERIMENTAL RESULTS
Table II summarizes the measurements for the LC delay line
oscillator. Phase noise spectral density of the VCO running

ROGERS AND LONG: 10-Gb/s CDR/DEMUX WITH LC DELAY LINE VCO IN 0.18- m CMOS

1787

TABLE II
MEASURED VCO PERFORMANCE

Fig. 12.

CDR clock and data waveforms (10-Gb/s operation).

Fig. 13.

Measured CDR jitter tolerance.


TABLE III
CDR SUMMARY

Fig. 10.

Oscillator and CDR phase noise performance.

open-loop and phase-locked and the phase noise of the reference source are all plotted in Fig. 10. At a 1-MHz offset from
the carrier, the free-running VCO phase noise is 103 dBc/Hz,
which falls to 127 dBc/Hz when the CDR is locked to a
sinusoidal data input at 2.5 GHz. The reference source phase
noise is also shown ( 135 dBc/Hz). The difference is primarily
due to frequency multiplication between the reference source
(2.5 GHz) and locked VCO (5 GHz), which adds a minimum
of 6 dB to the phase noise.
The fabricated oscillator has a measured center frequency of
4.45 GHz (10% slower than predicted by simulations). Subsequent measurements of individual component test structures revealed that the frequency shift is caused by unanticipated loss
and delay between the oscillator gain stages. Excessive resistive
losses in the top metal, inaccuracy in the modeling of parasitic
capacitances, and stray inductance between the delay line and
gain stages (which is not extracted from the physical layout for
simulation) all contribute to this error. The measured capacitance per unit length and resistance per unit length of the delay
line are 45% and 28% higher, respectively, than those derived
from the same simulation. This result exposes a sensitivity of
the circuit to the absolute loss and capacitance in the delay line
and, more importantly, sensitivity to variations in these parameters over process. Work is ongoing to analyze the architecture
for variations in the properties of the backend metal/dielectric

stack and the actual magnitude of these variations. This characterization work is aimed at improving the accuracy of the CAD
models thereby allowing better correlation between the simulation and measurement. Nevertheless, it is important to note that
oscillators from two separate fabrication runs showed only 0.2%
variation in VCO center frequency.
The measured external tuning range and bangbang frequency step (varied using the BB input) are 125 and 2.55 MHz,
respectively. Characterization of this varactor using a separate
test structure showed 40% less capacitance variation than
expected, thus explaining the smaller tuning ranges observed.
A VCO center frequency close to 4.98 GHz is needed in order
to conduct full-speed testing of the CDR including bit error rate
testing (BERT) at the SONET OC-192 rate (9.953 Gb/s). A noninvasive technique for adjusting the oscillation frequency was

1788

Fig. 14.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 12, DECEMBER 2002

A 10-Gb/s CDR testchip micrograph.

developed so that a design iteration is avoided, thereby allowing


the prototype to be fully characterized. The method is shown
schematically in Fig. 11, where a metal plate is placed in close
proximity to the delay line using a micro-manipulator. Current
induced in the metal plate reduces the self-inductance of the
delay lines. Since delay between oscillator stages is proportional
to line inductance, the frequency increases when the inductance
is lowered. However, the plate must be placed within approxim from Fig. 11) to be
mately 10 m of the IC surface (
effective. Conductivity of the metal plate is important (i.e., gold
or a similar metal is used), as resistive losses actually increase
the signal delay and slow down the oscillator. An unwanted secondary effect is additional interwinding capacitance that results
from placing another conductor in close proximity to the delay
line, which acts to reduce the oscillation frequency. The inductive effect dominates, however, with the net result that the center
frequency is adjustable from 4.45 to 5.5 GHz with negligible effect on phase noise.
The oscilloscope eye pattern of Fig. 12 is measured in response to 10-Gb/s PRBS input data (2 1). Error-free data recovery at 10 Gb/s was measured in BER tests with a recovered
clock jitter of 1.2 ps rms, or 8 ps p-p. The 5-Gb/s output data
eye has larger jitter than the clock due to pattern dependencies
that are likely introduced by bandwidth limitations of the 50output buffers used for testing. Note that a 1 : 8 or 1 : 16 demultiplexer would be used in a typical application, which relaxes the
bandwidth requirements for off-chip buffering of the recovered
data.
Measured jitter transfer, generation, and tolerance all meet
the SONET OC-192 requirements (measured jitter of 8 ps p-p),

with the exception of the small residual jitter tolerance. The


jitter tolerance exceeds specifications at low jitter frequencies
but is very close to the SONET mask at higher frequencies (see
Fig. 13). Poor electrical contact from probes to the chip, phase
error between the quadrature clocks, and mismatch between the
demux and PD latches are likely sources of degradation in the
jitter performance at higher frequencies. Poor electrical contact
is partly due to wear caused by mechanical scrubbing of the pad
by the probe tip. Repeated contacts were needed to trim the oscillation frequency before measuring the jitter tolerance, which
caused significant wear of the pad metal and inconsistent electrical contacts. CDR performance is summarized in Table III.
A photomicrograph of the 1.9
1.5 mm IC is shown in
Fig. 14. The input and output data lines are implemented in 50
microstrip. The pad configuration used was dictated by the RF
on-wafer probes used for test. In order to increase the isolation
between the oscillator and the data path circuits, power supplies
are kept separate. The layout also includes an extensive bottom
metal ground plane which provides the reference plane for the
microstrips as well as increasing the capacitance from substrate
to ground. The IC consumes 285 mW from a 1.8-V supply (not
including 50- test output drivers).

ACKNOWLEDGMENT
Circuit fabrication was facilitated by the Canadian Microelectronics Corporation. The authors thank Dr. Y. Greshishchev and
Dr. P. Schvan for providing access to test facilities at Nortel Networks Ottawa Laboratories.

ROGERS AND LONG: 10-Gb/s CDR/DEMUX WITH LC DELAY LINE VCO IN 0.18- m CMOS

REFERENCES
[1] M. J. Reizenman, Optical nets brace for even heavier traffic, IEEE
Spectr., pp. 4445, Jan. 2001.
[2] R. C. Walker, C. L. Stout, J.-T. Wu, B. Lai, C.-S. Yen, T. Hornak, and P. T.
Petruno, A two-chip 1.5-GBd serial link interface, IEEE J. Solid-State
Circuits, vol. 27, pp. 18051811, Dec. 1992.
[3] Y. M. Greshishchev, P. Schvan, M. Xu, J. Showell, J. Ohja, and J. E.
Rogers, A fully integrated SiGe receiver IC for 10-Gb/s data rate,
IEEE J. Solid-State Circuits, vol. 35, pp. 19491957, Dec. 2000.
[4] J. Savoj and B. Razavi, A 10Gb/s CMOS clock and data recovery circuit with frequency detection, in Proc. ISSCC, San Francisco, CA, Feb.
2001, pp. 7879.
[5] J. E. Rogers and J. R. Long, A 10Gb/s CDR/demux with LC delay
line VCO in 0.18m CMOS, in Proc. ISSCC, San Francisco, CA, Feb.
2002, pp. 254255.
[6] J. Hauenschild, C. Dorschky, T. Winkler von Mohrenfels, and R. Seitz,
A plastic packaged 10 Gb/s BiCMOS clock and data recovering 1 : 4demultiplexer with external VCO, IEEE J. Solid-State Circuits, vol. 31,
pp. 20562059, Dec. 1996.
[7] D. A. Johns and K. Martin, Analog Integrated Circuit Design, First
ed. New York: Wiley, 1997.
[8] J. R. Long and M. A. Copeland, The modeling, characterization, and
design of monolithic inductors for silicon RFICs, IEEE J. Solid-State
Circuits, vol. 32, pp. 357367, Mar. 1997.
[9] P. Andreani and S. Mattisson, On the use of MOS varactors in RF
VCOs, IEEE J. Solid-State Circuits, vol. 35, pp. 905910, June 2000.

1789

Jonathan E. Rogers (S00M01) received the


B.A.Sc. degree in engineering science (electrical
option) and the M.A.Sc. degree in electronics from
the University of Toronto, Toronto, ON, Canada, in
1991 and 2001, respectively.
During his undergraduate work, he spent 16
months working at Nortel, Ottawa, ON, where he
participated in the design and characterization of
SiGe MMIC for OC-192 applications. His graduate
work focused on the implementation of 10-Gb/s
clock and data recovery systems in deep submicrometer CMOS. In October of 2001, he joined Inphi Corporation, Westlake
Village, CA, where he is working to develop physical layer solutions for 10and 40-Gb/s communication systems.

John R. Long (S77A78M83) received the B.Sc.


degree in electrical engineering from the University
of Calgary, Calgary, AB, Canada, in 1984, and the
M.Eng. and Ph.D. degrees in electronics engineering
from Carleton University, Ottawa, ON, Canada, in
1992 and 1996, respectively.
His current research interests include low-power
transceiver circuitry for highly integrated radio applications and electronics design for high-speed data
communications systems.
Prof. Long is a member of the ISSCC, IEEE
BCTM and ESSCIRC conference technical program committees and is an
Associate Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS. He received
the NSERC Doctoral Prize and Douglas R. Colton and Governor Generals
Medals for research excellence and a Best Paper Award from ISSCC 2000.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 5, MAY 1998

713

A 0.5- m CMOS 4.0-Gbit/s Serial Link Transceiver


with Data Recovery Using Oversampling
Chih-Kong Ken Yang, Student Member, IEEE, Ramin Farjad-Rad, Student Member, IEEE,
and Mark A. Horowitz, Senior Member, IEEE

AbstractA 4-Gbit/s serial link transceiver is fabricated in a


MOSIS 0.5-m HPCMOS process. To achieve the high data rate
without speed critical logic on chip, the data are multiplexed when
transmitted and immediately demultiplexed when received. This
parallelism is achieved by using multiple phases tapped from a
PLL using the phase spacing to determine the bit time. Using an
8 : 1 multiplexer yields 4 Gbits/s, with an on-chip VCO running
at 500 MHz. The internal logic runs at 250 MHz. For robust data
recovery, the input is sampled at 32 the bit rate and uses a digital
phase-picking logic to recover the data. The digital phase picking
can adjust the sample at the clock rate to allow high tracking
bandwidth. With a 3.3-V supply, the chip has a measured bit
error rate (BER) of <10014 .

I. INTRODUCTION

HE increasing demand for data bandwidth in networking


has driven the development of high-speed and low-cost
serial link technology. Applications such as computer-tocomputer or computer-to-peripheral interconnection are requiring gigabit-per-second rates either over short distances
in copper or longer distances in fiber. CMOS technology is
used increasingly over GaAs or bipolar technologies because
of the development toward faster and faster devices. In 0.18m CMOS technology, the -channel
is expected to equal
or exceed that of the standard 0.5- m GaAs process. While
other technologies are limited in the number of transistors due
to yield or power, CMOS technology allows implementation of
complex digital logic enabling more integration of the backend processing, lowering the cost. Recent development has
shown CMOS capability to achieve Gbit/s data rates [1], [5],
[6], [8], [11]. This work pushes NRZ signaling rates to the
bandwidth limitations of the process technology and explores
the issues involved.
The primary components of a link are the transmitter, the
receiver, and the timing recovery circuits. Section II describes
the overall architecture of the link. Because many of the
circuits in the transmitter and receiver blocks have been
previously discussed [1], this paper focuses on the timing
recovery technique. Section III evaluates the impact of timing
recovery on performance and compares two different timing
recovery techniques: phase-locked loops versus oversampled
phase picking. This chip implements a phase-picking algorithm
that is discussed in Section IV. The measured performance of
Manuscript received September 1997; revised December 3, 1997.
The authors are with the Center for Integrated Systems, Stanford University,
Stanford, CA 94305-4070 USA.
Publisher Item Identifier S 0018-9200(98)02225-2.

Fig. 1. Transmit architecture.

the entire transceiver chip is presented in Section V. Finally,


some conclusions are drawn from these results in Section VI.

II. ARCHITECTURE
A 0.5- m CMOS technology is not fast enough to directly
generate and receive a 4-Gbit/s stream (since the maximum
ring oscillator frequency is <2 GHz). Instead, we use parallelism to reduce the performance requirements of each circuit.
The transmitter generates the bit stream by an 8 : 1 multiplexer
that multiplexes current pulses directly onto the output channel
(Fig. 1). The receiver (Fig. 2) performs a 1 : 8 demultiplexing
by sampling with a bank of input samplers. Similar to the
transmitter, each sampler is triggered by individual clock
phases. Furthermore, clock/data recovery is achieved by a 3
oversampling of each data bit. Thus, the receiver requires a total of 24 clock phases to support both the oversampling and the
1 : 8 demultiplexing. Various techniques exist for generating
multiple clock phases [2], [3]. The receive side uses a sixstage ring oscillator ( -PLL) followed by phase interpolators
to generate intermediate phases (ick[23 : 0]) between the ring
-PLL, eight
oscillator edges (ck[11 : 0]) [1]. Similar to the
different clock phases tapped from a four-stage ring oscillator
( -PLL) control the transmitter multiplexing.
A timing recovery circuit extracts the clock from the multiple samples per bit by finding the positions of the data
transitions. Once the transitions are determined, a decision
logic selects the samples furthest from data transitions (phase
picking) as the received data byte. This approach is similar to

00189200/98$10.00 1998 IEEE

714

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 5, MAY 1998

Fig. 3. Transceiver test-chip block diagram.

limits the run length to <5 consecutive zeros or ones. The


PRBS sequence is a suitable substitute because it guarantees
a maximum run length of 7. The transmitter can be optionally
configured to transmit the PRBS sequence, a fixed sequence,
or the received data for testing.
III. TIMING RECOVERY

Fig. 2. Receive architecture.

what is done in UARTs, and was first applied to a high-speed


link by Lee et al. in [4].
Fig. 3 shows the full transceiver test-chip block diagram.
Since the sampling clocks are different phases, the sampled
results are resynchronized to a global clock. To facilitate
the digital design, the on-chip data are further demultiplexed
(2 : 1) to 250 MHz. Finally, in order to test the bit-error
rate (BER), an on-chip parallel pseudorandom bit sequence
sequence.
(PRBS) encoder and decoder are used for a
Serial data are commonly encoded with 8B10B coding which

The goal of the timing recovery scheme is to maximize


the timing marginthe amount that a sample position can err
with the data still properly received. Errors that impact the
timing margin can be classified into two sources: static phase
error, and jitter (dynamic phase error). Fig. 4 illustrates the
where
is the
timing margin
static sampling error, and
and
are the jitter on the data
transition and the sampling clock. Since the sampling position
is defined with respect to the data transition, jitter on both the
clock and the data additively reduces timing margin. With ideal
square pulses, as long as the sum of the magnitudes of the static
and dynamic phase error is less than a bit time, the phase error
does not impact signal amplitude. However, in a band-width
limited system (for this work, due to the process technology),
signal amplitude is lower with sampling phase error because
the signals have finite slew rates. Correspondingly, this reduces
the signal-to-noise ratio (SNR), hence impacting performance.
The amount of SNR degradation can be calculated based
on the shape of the signal waveform. For static phase error,
the SNR penalty is shown in Fig. 5 for a triangular signal
waveform and a sinusoidal signal waveform. When the sample
position phase offset is small, the sinusoidal waveform has a
lower penalty than a triangular waveform due to the lower
signal slew rate near the sample point.1 For jitter, the SNR
penalty is more complex to evaluate since it additionally
depends on the statistics of the noise. For example, we can
1 This

penalty is only applicable to transitions.

YANG et al.: 4.0-Gbit/s SERIAL LINK TRANSCEIVER

715

Fig. 4. Timing margin.

Fig. 6. BER versus SNR with various amounts of phase noise.

(a)

Fig. 5. SNR penalty for different phase offsets.

assume an idealized jitterless system with signal amplitude


and additive white Gaussian noise (AWGN) of standard
on the signal amplitude. In this system, we can
deviation
determine the performance (BER) for various SNR [14]:

(1)

This equation is plotted as the lowest dotted line in Fig. 6.


If we further assume jitter to be a AWGN as well, for a
triangular waveform, the phase noise can be translated into
(where the bit time
amplitude noise using
spans ). Since the noise sources are additive, the probability
of error can be simply expressed as

(2)

Fig. 6 illustrate the BER versus amplitude SNR for various


amounts of phase noise. The SNR penalty, as shown in
the figure, increases at higher SNR because the phase noise
eventually limits performance, a BER floor. For a sinusoidal
signal waveform (with a lower slew rate near the sample
point), the behavior is similar, except with lower SNR penalty.

(b)
Fig. 7. Clock recovery architectures: (a) phase picking block diagram and
(b) data/clock recovery architectures.

The amount of phase error and the jitter depends on the


implementation of the clock recovery circuit. Two techniques
are commonly used, a phase-locked loop (PLL) and a phase
picker. A PLL employs a feedback loop that actively servos
the sampling phase of an internal clock source based on the
phase of the input [7]. Fig. 7(a) illustrates a common VLSI
implementation using an on-chip voltage-controlled oscillator
(VCO) as the clock source, and a charge pump following the
phase detector to integrate the phase error. A phase picker,
as shown in Fig. 7(b), oversamples each bit, and uses the
oversampled information to determine the transition position
(phase) of the data. Based on the transition information, the
best sample is then selected as the data value (UART [10]).
Each of the two architectures has a different tradeoff in terms
of static phase error and jitter.
The static phase error of a PLL depends mainly on its
phase detector design. Ideally, sampling at the middle of the
bit window gives the maximum timing margin. However, if
the sampler has a setup time, the middle of the effective bit

716

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 5, MAY 1998

window is shifted by the setup time. Not compensating this


shift causes significant static phase error. This error can be
reduced by using the data samplers as the phase detector.2
Additional phase error occurs due to inherent mismatches
within the phase detectors and/or charge pump. Furthermore,
any phase detector dead band (window in which the phase
detector does not resolve phase information) limits the phase
resolution, increasing the static phase error.
In a phase-picking architecture, the multiple samples per
bit are used to find the transitions, effectively behaving as the
phase detector. Sampler uncertainty limits the resolution of the
transition detection. Sources of this uncertainty are sampler
metastability window and data dependence of the sampler
setup time. The uncertainty window for the sampler design
used is <1/10 the bit time which does not impact performance
significantly. More importantly, in this architecture, the phase
information is quantized by the oversampling, causing a finite
quantization error of 1/2 the phase spacing between samples.
For a higher oversampling ratio, this static phase error is
less, but it has a significant cost of increasing the number
of input samplers, increasing the input capacitance, and hence
limiting the input bandwidth. For a 3 oversampling system,
the maximum static phase error is 1/6 the bit time.
In terms of jitter, a PLL tracks the phase of the input
data with a tracking bandwidth limited by the stability of
the feedback loop. The loop tracking is effectively a highpass filter that rejects the phase noise of the input at lower
frequencies. The noise not tracked appears as data jitter.
Furthermore, because the PLL frequency source is an onchip VCO, supply and substrate noise from on-chip digital
switching can introduce additional jitter. The impact of these
two sources is formulated for a second-order PLL in the
following equation as the first and second terms:

(3)
Constants that determine the loop bandwidth in the equation
(V/rad) the gain of the filter,
are depicted in Fig. 7(a) with
the stabilizing zero in the filter, and
(rad-hertz/V) the
gain of the VCO.
is the noise induced onto the VCO,
and
is the sensitivity of the VCO to this noise. Thus,
the total amount of effective jitter depends on the tracking
bandwidth of the loop, the amount of supply and substrate
noise, and the sensitivity of the loop elements to the noise.
Because the feedback loop has a loop delay of at least one
clock cycle, the bandwidth of the loop is often chosen to be
<1/10 of the oscillation frequency for sufficient phase margin
and stability. The delay makes tracking high-frequency phase
noise ineffective because, if the phase error from on transition
is independent of the phase of the next transition, correction
2 This causes additional difficulties because such phase detectors can only
determine if transitions are early or late. The control loop is bangbang
control instead of linear control, which is less stable, has inherent dithering,
and requires additional frequency acquisition aid. Although a DLL (delay-line
based PLL) [8] can be used to eliminate the stability and frequency acquisition
problems, the phase spacing, when tapping phases from the buffer stages, is
sensitive to the input clocks duty cycle and amplitude.

(a)

(b)
Fig. 8. Effect of tracking bandwidth on jitter.

based on the first transitions phase information could increase


the phase error for receiving the next bit.
The impact of different tracking bandwidth on jitter is illustrated in Fig. 8. The single sideband power spectral density
(PSD) of an oscillator, such as the VCO of the transmitter,
is shown to represent the phase noise in Fig. 8(a). Two
hypothetical PLLs with different bandwidths ( , and )3
behave as high-pass filters that reject the lower frequency
noise. Their transfer functions are overlaid in Fig. 8(a). The
resulting phase error is shown in the PSD of Fig. 8(b). Note
that this example excludes the additional noise from the phasetracking circuit [second term of (3)]. The integral of the area
beneath the curve is an indication of the amount of jitter [13]
[ for (2)]; thus, the phase noise of Circuit I is larger than
that of Circuit II. Additionally, if a second-order PLL is not
critically damped, the transfer function can exhibit peaking.
This peaking accumulates phase noise at its loop bandwidth,
increasing the noise.
For a phase picker, the sampling clocks experience similar
jitter problems from supply and substrate noise since the
phases for the oversampling are also generated from an onchip VCO. The primary difference is the tracking bandwidth.
A phase-picking system is a feedforward architecture (instead
of feedback); thus, there are no intrinsic bandwidth limitations.
The tracking rate depends on the rate at which new phase
decisions are made, which in turn depends on the logics
cycle time. The importance of this fast tracking is that it can
potentially track the accumulation of phase noise by the onchip multiphase generator (PLL). We delay the data by the
time to arrive at a decision so the corrections are applied to the
appropriate bit (although with a latency overhead). However,
the maximum phase change between two transitions must be
3 The actual shape of the tracking transfer function
implementation.

H (s)

varies with

YANG et al.: 4.0-Gbit/s SERIAL LINK TRANSCEIVER

less than , half the bit time, even if the peak-to-peak jitter can
greater than are
be much larger than a bit time. Changes
indistinguishable from a phase shift in the opposite direction,
.
Choosing between the two clock recovery systems depends
on the system requirements and noise behavior. We chose a
phase-picking architecture to explore the usefulness of the
higher phase-tracking capability. In such VLSI implementations, supply noise can be significant enough for the peak-topeak jitter to occupy a large fraction of the bit time, especially
since a PLL accumulates jitter. For the 4-Gbit/s link, we
chose a low oversampling ratio of 3 to maintain high input
bandwidth and to keep the number of clock phases manageable
(1 : 8 demultiplexing and 3 oversampling yields 24 phases).
With a bit time of 250 ps, the phase-picking scheme4 can track
the noise of the on-chip multiphase generator (PLL) from both
the transmit and receive sides to keep the total effective jitter
below the 83-ps quantization spacing. One limitation of the
phase-picker tracking is that the maximum rate of the tracking
depends on the data transition density. Since the PRBS signal
guarantees one transition per byte, the maximum tracking rate
of one sample spacing every transition is fast (83 ps/2 ns).
Although the tracking rate is high, the maximum static phase
error from the quantization is 41 ps (2% of the clock period,
8 bit time), causing an SNR penalty (Fig. 5). Whether or
not a 3 oversampled phase-picking approach with higher
tracking bandwidth than a PLL can achieve better performance
with the larger static phase error depends on the amount of
jitter induced by on-chip noise sources. If the lower SNR
penalty from the lower jitter compensates the higher SNR
penalty of larger static phase error, phase picking would be
the better choice.
IV. PHASE-PICKING ALGORITHM AND IMPLEMENTATION
The details of the phase-picking algorithm are illustrated in
Fig. 9. Picking the center sample requires finding and tracking
the bit boundaries. The decision logic first detects transitions
by an XOR of adjacent samples, indicating the bit boundary to
be in one of three possible positions. Fig. 10 shows an example
of the boundary detection with a portion of a sampled stream.
To find which of the three transition positions is the most
likely bit boundary, transitions corresponding to the same bit
boundary position are tallied. The position with the largest
total determines the bit boundaries.
The decision logic makes a new decision per byte of data. In
contrast to a higher order oversampling phase picker, the 3
oversampling limits the change of the selected sample position
to one sample position per byte. To guarantee sufficient
transitions for averaging any bit-to-bit variations of highfrequency noise (near the bit rate), the tally is across a sliding
window of 3 bytes. The transitions are accumulated from the
current byte, the previous byte, and the next byte (delaying the
data allows the noncausal information) so that the decision is
applied to the byte at the middle of the window. As a result
4 In

our system, the oscillator is at 250 MHz so the PLL bandwidth is


restricted to <25 MHz. This yields a 10 tracking rate difference between
the two systems.

717

Fig. 9. Phase-picking algorithm block diagram.

Fig. 10. Example of the phase-picking algorithm.

of the 3-byte sliding accumulation, the rate of phase change


that the algorithm can track is slower than the maximum of 83
ps/2 ns. The algorithm picks the correct sample if the majority
of the transition information within the 3-byte window (6 ns)
indicates the correct phase. For example, if the input phase
has a constant rate of change of <1 sample spacing per 3 ns
(corresponding to a frequency difference of 4%), the transition
information from >1.5 bytes of the 3-byte window would fall
in the same phase quantization. Then the tally and compare
would select the correct sample to track the phase change.
This indicates a maximum phase-tracking rate of 83 ps/3 ns.
and
-PLLs accumulation
The criterion of tracking both
is met because the VCO elements supply noise sensitivity
is
%/% (percent of frequency change per percent of
supply noise [1], [3]),5 corresponding to 30 ps/3 ns for a 10%
supply step, which is less than the tracking rate. If the phase
change is slower than 83 ps/3 ns, the 3-byte accumulation
offers some robustness by averaging any uncertainty in the
transition detection due to high-frequency bit-to-bit noise. A
smaller window of one byte can track phase faster, but has
poorer performance without sufficient transitions within that
byte to average the bit-to-bit variation. A larger window of 5
bytes (<83 ps/6 ns) would be too slow to track the - and
-PLLs phase accumulation under reasonable supply noise.
Once the transition position is determined, the middle
sample within the bit boundaries is selected as the data.
5 Although the maximum phase error accumulation rate is based on the
supply sensitivity of the VCO, the peak phase error depends on the loop
bandwidth. The Tx -PLL and Rx -PLL generating the multiple clock phases
have bandwidths of 15 and 5 MHz, respectively.

718

Fig. 11.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 5, MAY 1998

Comparison between center picking versus majority voting.

The selection is implemented by multiplexers selecting the


appropriate samples based on three select signals. In the case
where no transitions are detected, the three select signals
use previously stored values to maintain data through the
multiplexers.
The actual algorithm for deciding the received data value
from the oversampled information can be designed alternatively while still keeping the advantage of higher tracking
bandwidth of a feedforward architecture. Instead of selecting
the middle (phase pick), a simple alternative implementation
is to take a majority vote based on the three sampled values.
Fig. 11 shows the performance comparison. Majority voting
works well with nonbandwidth-limited signals that have highfrequency noise because it averages the noise over many
samples. In a bandwidth-limited system (low-pass filtered by
time constant), it performs worse because at least
the I/O
one of the two nonmiddle samples is required to be valid,
and the nonmiddle samples have a much higher probability
of error.
Arbitration is required when two transition positions have
equal counts. This occurs when two of the sample positions
straddle the center of the bit and the third sampler samples
at the transition. Picking either of the two straddling the
center gives equivalent performance. More complex logic can
be implemented by using the previous, current, and next
cycles comparison results to follow the direction of any phase
transition. However, this only improves the performance by
less than 1 dB.
If the peak-to-peak phase jitter is larger than one bit time, or
if the transmitter and receiver operate at different frequencies,
the tracking must allow bit(s) to overflow/underflow. For
example, if the SEL[2 : 0] signal changes from 001 to
100, the selected sample of the first cycle corresponds to
the same bit as the selected sample of the following cycle.
This underflow condition must be appropriately handled by
dropping one of the two samples. Typically, these samples
are of the same bit, and thus have the same value. However,
in the case where they are different, if phase movement
changes directions (the SEL signal returns to 001) in the
following cycles decision, dropping the latter one gives a
slight performance improvement. Similar to the underflow

Fig. 12. Chip micrograph.

where only 7 bits are received, the opposite transition from


100 to 001 causes an overflow, requiring an extra bit
(9 bits total) to be stored. These conditions are handled by a
bitwise FIFO built by shifting the input byte to accommodate
the one extra/less bit. If the aggregate shift increases beyond
1 byte, a bytewise FIFO handles the overflow/underflow byte.
The limited depth of the FIFO can only handle a finite number
of byte overflow. If the application requires handling long
streams of data with a slight frequency difference with the
local reference clock, the local frequency can be corrected
based on the phase information from the decision logic.6
V. TRANSCEIVER EXPERIMENTAL RESULTS
The transceiver chip was implemented in a 0.5- m CMOS
process offered through MOSIS. The 3 mm
3 mm die
photo is shown in Fig. 12. The chip is packaged in a 52pin CQFP package supplied by Vitesse Semiconductor which
has internal power planes for controlled impedance. The size
70 m to
of the I/O bond pads are reduced to 70 m
keep pad capacitance to a minimum because the capacitance
would otherwise limit the I/O bandwidth. With an effective
impedance at the I/O of 25
(for a doubly terminated 50line), the total I/O capacitance can not exceed 4.5 pF
for 4-Gbit/s operation without losing 10% of the bit height
to the
filtering. The 1 : 8 demultiplexing receiver and
8 : 1 multiplexing transmitter designs have capacitances of 2.2
and 1.2 pF, respectively, with 600 fF due to the pad and
metal interconnects. An input time constant of
ps is
estimated from measurements sweeping the reference voltage
for a single-ended input pulse. The width of the pulse with a
different reference voltage determines the time constant.
The performance of the link depends significantly on the
I/O circuits. The minimum receivable amplitude of 50 mV
was measured by using a fixed data pattern while changing
6 This

feature is not implemented as part of this test chip.

YANG et al.: 4.0-Gbit/s SERIAL LINK TRANSCEIVER

Fig. 13.

Transmitter data eye.

the amplitude. This indicates the worst case input offset in


the bank of samplers. The transmitter data eye at 3.0 Gbits/s
is shown in Fig. 13 with the output driving a PRBS
sequence. The measured data rate is limited by the triggering
bandwidth of the oscilloscope. The maximum speed of the
transmitter was 4.8 Gbits/s, and was limited by the maximum
frequency of the ring oscillator used in the clock generation.
The multiple-phased clock generation (PLL) is crucial to the
performance of the link because the phase spacing determines
the bit time in the multiplexing/demultiplexing architecture,
and the supply sensitivity and loop bandwidth determine the
amount of jitter that needs to be tracked. Mismatches can
cause one phase to be shifted with respect to the others. In
the transmitter, the shift enlarges one bit, but reduces the next.
By measuring the spacing between edges, we can evaluate the
ability to match the phases tapped from the oscillators and
interpolators [3]. The differential nonlinearity (DNL) of the
phase spacing is plotted for the transmitter in Fig. 14 at various
frequencies. The error is expressed as a percentage of the ideal
bit time for all eight phase positions. While transmitting the
PRBS pattern and using a trigger frequency of 1/8 the data
rate (internal clock rate), these spacings are measured with a
20-GHz bandwidth digital oscilloscope by the width of each of
the eight data-eye patterns.7 If we use the data-rate frequency
as a trigger instead of using a divided frequency, the data eye
of Fig. 13 overlaps all eight of the bits. The overlaid histogram
shows that the 333-ps bit time is degraded by 90 ps due to
equal contributions from jitter and errors in the transmitter
phase spacing.
The peak-to-peak variation, <7% of the bit time, indicates
very little degradation in bit width due to mismatches. The
dominant cause of these bit-width variations is the
and
7 The

719

measurement uncertainty is the DNL is 2 ps.

Fig. 14. Transmit-side DNL at various frequencies.

mismatches of the transistors in the clock generation circuits


[12]. The increase in error with decreasing oscillation frequency, shown in Fig. 14, is an indication of these mismatches.
) is less at lower oscillation
The gate overdrive (
frequencies, making the phase spacing more sensitive to these
mismatches. Fig. 15 shows the measurement of the DNL for
four chips. The darker line indicates the average at each phase
position. The variation of this average across phase positions
potentially indicates some systematic error. However, because
the average is over a sample size of only four chips, and
the variation of the average is significantly smaller than the
variation between chips, the random component is believed to
be the dominant source of static phase spacing error.
Although a systematic component of the offset can also be
expected from noise at any integer multiple of the oscillator

720

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 5, MAY 1998

Fig. 16. BER testing configuration.

Fig. 15.

Transmit-side DNL for four chips.

frequency, it is not apparent in Fig. 15. Normally, noise such


as substrate or supply noise at the same frequency as the
oscillator would modulate the oscillator, causing a duty-cycle
error which spreads the phases in the first half cycle and
compresses the phases in the second half cycle. Since most
(250
of the digital logic clock on this chip switches at
MHz), this effect of the clock buffer switching on the 500
MHz oscillator would cause different phase spacings for two
consecutive oscillator cycles. However, Fig. 15 shows that
the average phase spacing errors from the second cycle is
nearly the same as the first cycle, indicating that this coupling
is negligible. Also, any systematic components from path
mismatches (e.g., capacitive loading errors) are insignificant
compared to the random source.
On the receive side, the DNL of the sample spacing is also
measured, as was shown for a 0.8- m process technology [1]
to be <8% of the bit time. Receive clock phase spacing errors
reduce the effectiveness of the oversampling by increasing the
sample spacing, causing both increased static phase error and
larger jitter.
Jitter in the transmitter can be measured by a outputting a
fixed pattern and measuring the jitter on the data transition. We
can also measure the sampling clock jitter by looking at the
sampler output while sweeping a clean input transition. The
window in which the sampler output is uncertain indicates the
jitter with respect to the input. The supply sensitivitiy can also
be measured by the increase in jitter due to induced supply
noise with an internal switch that shorts between supply and
ground. The sensitivities of the transmit and receive PLLs are
0.2 and 0.3 ps/mV, respectively, with a similar peak-to-peak
quiescent jitter of 45 ps.
The BER testing is performed with two different configurations. The first measurement is by feeding the transmitted
output directly back into the input. This yielded a BER of
. The second configuration is by placing the chip in a
mock optical network (Fig. 16). A bit error rate tester (BERT)
is used to generate the data pattern. The pattern is modulated
onto a fiber-optic network. The optical power is measured by
siphoning 1/10 of the total optical power. The optical signal

Fig. 17. Measured BER versus SNR.

is received and amplified by a avalanche photodiode (APD)


followed by an amplifier. The output of the amplifier is either
returned to the BERT for the baseline measurement, or sent
into the chip configured in its transceiver mode. Because the
BERT and optical amplifiers have a bandwidth limitation at
3 Gbits/s, the experimental results of this configuration are
limited in data rate. As shown in Fig. 14, the phase spacing
at lower frequencies is worse, so the performance is slightly
worse than at 4 Gbits/s.
The BER versus SNR is plotted with SNR expressed in
optical power showing both the baseline and the DUT with
BER (Fig. 17). The SNR penalty
a 1.5-dB penalty at
for not having the selected sample at the middle of the data
eye is shown in Fig. 18. Because of the phase spacing errors
on the receive side, the penalty shown here is worse than
simulated. Since the quiescent jitter of the clock generation is
smaller than the sample spacing (<83 ps), the phase tracking
is not active. In order to test the effectiveness of the phase
picking, voltage steps are induced on the supply, causing 250-PLL. While this causes
ps jitter on both the -PLL and
the data eye to collapse, the receiver can still track this jitter
. Also, the transceiver is operated
and maintain BER
with the transmitter and receiver at different frequencies. The
chip was able to track a frequency difference of 1 MHz with
.
BER
Table I shows some additional performance measurements
of the chip. The total power dissipated is 1.5 W, with 1/3 from
the clock generation and 1/3 from the receive-side logic. The
BER is 90
minimum amplitude that can still maintain

YANG et al.: 4.0-Gbit/s SERIAL LINK TRANSCEIVER

721

when additional noise is induced. This low accumulated jitter


implies that the lower tracking bandwidth of a PLL-based
clock recovery circuit can potentially perform equally. The
design of such a system is nontrivial, and still has challenges
in maintaining small static phase offsets. However, since the
phase picking has significant hardware overhead in the extra
number of input samplers and large digital processing, a PLL
would potentially offer similar performance with lower area
and power.

ACKNOWLEDGMENT
The authors would like to thank S. Sidiropoulos, B. Amrutur, K. Falakshahi, Vitesse Semiconductor, Prof. T. Lee,
Prof. L. Kazovsky, and their research groups for invaluable
discussions and assistance.
Fig. 18.

Measured BER at various sampling phase.


TABLE I
TEST-CHIP PERFORMANCE

REFERENCES
[1] C.-K. Yang and M. Horowitz, A 0.8 m CMOS 2.5 Gbps oversampling
receiver and transmitter for serial links, IEEE J. Solid-State Circuits,
vol. 31, Dec. 1996.
[2] C. Gray et al., A sampling technique and its CMOS implementation
with 1-Gb/s bandwidth and 25 ps resolution, IEEE J. Solid-State
Circuits, vol. 29, Mar. 1994.
[3] J. Maneatis and M. Horowitz, Precise delay generation using coupled
oscillators, IEEE J. Solid-State Circuits, vol. 28, pp. 12731282, Dec.
1993.
[4] K. Lee et al., A CMOS serial link for fully duplex data communications, IEEE J. Solid-State Circuits, vol. 30, pp. 353364, Apr.
1995.
[5] A. Fiedler et al., A 1.0625Gb/s transceiver with 2 -oversampling and
transmit signal pre-emphasis, in ISSCC97 Dig. Tech. Papers, Feb.
1997, pp. 238239.
[6] A. Widmer et al., Single-chip 4
500 Mbaud CMOS transceiver,
IEEE J. Solid-State Circuits, vol. 31, pp. 20042014, Dec. 1996.
[7] F. M. Gardner, Phaselock Techniques, 2nd ed. New York: Wiley, 1979.
[8] W. Dally and J. Poulton, A tracking clock recovery receiver for 4-Gb/s
signaling, in Hot Interconnect97 Proc., Aug. 1997, p. 157.
[9] S. Sidiropoulos and M. Horowitz, A semi-digital DLL with unlimited
phase shift capability and 0.08400MHz operating range, in ISSCC95
Dig. Tech. Papers, Feb. 1995, pp. 332333.
[10] J. E. McNamara, Technical Aspects of Data Communication, 2nd ed.
Bedford, MA: Digital, 1982.
[11] S. Kim et al., An 800Mbps multi-channel CMOS serial link with 3
oversampling, in IEEE 1995 CICC Proc., Feb. 1995, p. 451.
[12] M. J. Pelgrom, Matching properties of MOS transistors, IEEE J.
Solid-State Circuits, vol. 24, p. 1433, Dec. 1989.
[13] J. A. Crawford, Frequency Synthesizer Design Handbook. Boston,
MA: Artech House, 1994.
[14] J. Proakis, Communication Systems Engineering. Englewood Cliffs,
NJ: Prentice-Hall, 1994.

mV with an internal eye height of 65 mV. The 24 mV of


amplitude noise is primarily due to ringing from the package
inductance and on-chip output capacitance at the transmitter.
VI. CONCLUSION
Very high data rates are achievable in CMOS technologies by making extensive use of parallelism. Using an 8 : 1
demultiplexing at the input and a 8 : 1 multiplexing output
transmitter, we achieved a 4-Gbit/s transceiver while keeping
all internal signals <500 MHz in a 0.5- m process technology.
The fundamental limitations of this approach are the I/O
capacitance (increased due to the parallelism), the sampler
uncertainty, and the phase position accuracy of the multiple
clock phases.
Provisions were made in this design to handle very large
jitter accumulation of 83 ps/3 ns by a fast phase-picking
algorithm. The effectiveness of this architecture critically
depends on the jitter characteristics. Although a CMOS PLL
can potentially exhibit this large jitter due to supply noise,
the measured jitter while operating this transceiver is only 50
ps. This jitter is measured in a realistic noise environment
because of the presence of significant digital switching noise
from the large digital phase picker that can couple onto the
VCO elements. Since the jitter is less than the quantization
error, the advantage of the phase picking is only apparent

Chih-Kong Ken Yang (S93) received the B.S. and


M.S degrees in electrical engineering from Stanford
University, Stanford, CA, in 1992.
He is currently pursuing the Ph.D. degree at
Stanford University in the area of circuit design for
high-speed interfaces.
Mr. Yang is a member of Tau Beta Pi and Phi
Beta Kappa.

722

Ramin Farjad-Rad (S95) was born in Tehran,


Iran, in 1971. He received the B.Sc. degree in
electrical engineering from Sharif University of
Technology, Tehran, in 1993 and the M.Sc. degree
in electrical engineering from Stanford University,
Stanford, CA, in 1995, where he is currently a Ph.D.
candidate in electrical engineering.
He worked at SUN Microsystems Laboratories,
Mountain View, CA, on a 1.25-Gbit/s serial transceiver for the fiber channel standard during the
summer of 1995. Over the summer of 1996, he
worked at LSI Logic, Milpitas, CA, where he examined different multi-Gbit/s
serial transceiver architectures.
Mr. Farjad-Rad holds one U.S. patent, and is also the Bronze Medal Winner
of the 20th International Physics Olympiad, Warsaw, Poland.

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 5, MAY 1998

Mark A. Horowitz (S77M78SM95) received


the B.S. and M.S. degrees in electrical engineering
from MIT in 1978, and the Ph.D. degree from
Stanford University, Stanford, CA, in 1984.
He is the Yahoo Founders Professor of Electrical
Engineering and Computer Science at Stanford. His
research area is in digital system design, and he has
led a number of processor designs including MIPSX, one of the first processors to include an on-chip
instruction cache, TORCH, a statically scheduled,
superscalar processor, and FLASH, a flexible DSM
machine. He has also worked on a number of other chip design areas including
high-speed memory design, high-bandwidth interfaces, and fast floating point.
In 1990, he took a leave from Stanford to help start Rambus Inc., a company
designing high-bandwidth memory interface technology. His current research
includes multiprocessor design, low-power circuits, memory design, and highspeed links.
Dr. Horowitz is the recipient of a 1985 Presidential Young Investigator
Award and an IBM Faculty Development Award, as well as the 1993 Best
Paper Award from the International Solid-State Circuits Conference.

You might also like