Professional Documents
Culture Documents
Preface
xi
xiii
Part I
Original Contributions
13
23
/. Galton
Designing Bang-Bang PLLs for Clock and Data Recovery in Serial Data Transmission Systems
R. C. Walker
34
46
K. S. Kundert
Part II
Devices
Physics-Based Closed-Form Inductance Expression for Compact Modeling of Integrated Spiral Inductors
S. Jenei, B. K. J C. Nauwelaers, and S. Decoutere {IEEE Journal ofSolid-State Circuits, January 2002)
The Modeling, Characterization, and Design of Monolithic Inductors for Silicon RF IC's
J R. Long and M. A. Copeland {IEEE Journal of Solid-State Circuits, March 1997)
Analysis, Design, and Optimization of Spiral Inductors and Transformers for Si RF IC's
A. M. Niknejad, and R. G. Meyer {IEEE Journal of Solid-State Circuits, October 1998)
Stacked Inductors and Transformers in CMOS Technology
A. Zolfaghari, A. Chan, and B. Razavi {IEEE Journal of Solid-State Circuits, April, 2001)
Estimation Methods for Quality Factors of Inductors Fabricated in Silicon Integrated Circuit
Process Technologies
K. O {IEEE Journal of Solid-State Circuits, August 1998)
73
77
89
101
110
114
On-Chip Spiral Inductors with Patterned Ground Shields for Si-Based RF IC's
C. Patrick Yue and S. S. Wong {IEEE Journal of Solid-State Circuits, May 1998)
118
The Effects of a Ground Shield on the Characteristics and Performance of Spiral Inductors
S.-M. Yim, T. Chen, and K. O {IEEE Journal of Solid-State Circuits, February 2002)
Temperature Dependence of Q and Inductance in Spiral Inductors Fabricated in
a Silicon-Germanium/BiCMOS Technology
R. Groves, D. L. Harame, and D. Jadus (IEEE Journal of Solid-State Circuits, September 1997)
127
135
140
Design of High-g Varactors for Low-Power Wireless Applications Using a Standard CMOS Process
A.-S. Porret, T. Melly, C C Enz, and E. A. Vittoz. (IEEE Journal of Solid-State Circuits, March 2000)
148
157
165
189
205
209
The Effect of Varactor Nonlinearity on the Phase Noise of Completely Integrated VCOs
JWM. Rogers, J A. Macedo, and C Plett (IEEE Journal of Solid-State Circuits, September 2000)
214
221
231
246
253
260
176
Building Blocks
A Low-Noise, Low-Power VCO with Automatic Amplitude Control for Wireless Applications
M.A. Margarit, J. L. Tham, R. G Meyer, and M. J. Been (IEEE Journal of Solid-State Circuits, June 1999)
A Fully Integrated VCO at 2 GHz
M. Zannoth, B. Kolb, J. Fenk, and R. Weigel (IEEE Journal of Solid-State Circuits, December 1998)
vi
271
282
287
294
301
306
310
314
318
35-GHz Static and 48-GHz Dynamic Frequency Divider IC's Using 0.2-jjum AlGaAs/GaAs-HEMT's
Z. Lao, W. Bronner, A. Thiede, M. Schlechtweg, A. Hulsmann, M. Rieger-Motzer, G. Kaufel, B. Raynor, and M. Sedler
{IEEE Journal of Solid-State Circuits, October 1997)
330
337
A Family of Low-Power Truly Modular Programmable Dividers in Standard 0.35-|xm CMOS Technology
C. S. Vaucher, I. Ferencic, M. Locher, S. Sedvallson, U. Voegeli, and Z Wang {IEEE Journal of Solid-State Circuits,
July 2000)
346
353
A 1.2 GHz CMOS Dual-Modulus Prescaler Using New Dynamic D-Type Flip-Flops
B. Chang, J Park, and W Kirn {IEEE Journal of Solid-State Circuits, May 1996)
361
365
A 1.6-GHz Dual Modulus Prescaler Using the Extended True-Single-Phase-Clock CMOS Circuit
Technique (E-TSPC)
J N. Soares, Jr. and W A. M. Van Noije {IEEE Journal of Solid-State Circuits, January 1999)
370
376
A 320 MHz, 1.5 mW @ 1.35 V CMOS PLL for Microprocessor Clock Generation
V von Kaenel, D. Aebischer, C. Piguet, and E. Dijkstra {IEEE Journal of Solid-State Circuits, Nov. 1996)
A Low Jitter 0.3-165 MHz CMOS PLL Frequency Synthesizer for 3 V/5 V Operation
H. C Yang, L. K. Lee, and R. S. Co {IEEE Journal of Solid-State Circuits, April 1997)
VII
383
391
396
A Low-Jitter PLL Clock Generator for Microprocessors with Lock Range of 340-612 MHz
D. W. Boerstler (IEEE Journal of Solid-State Circuits, April 1999)
406
413
422
430
439
449
456
464
474
481
CMOS DLL-Base 2-V 3.2-ps Jitter 1-GHz Clock Synthesizer and Temperature-Compensated
Tunable Oscillator
C J. Foley and M. P Flynn (IEEE Journal of Solid-State Circuits, March 2001)
493
A 1.5 V 86 mW/ch 8-Channel 622-3125-Mb/s/ch CMOS SerDes Macrocell with Selectable Mux/Demux Ratio
F. Yang, J. O 'Neill, P Larsson, D. Inglis, and J. Othmer (Dig. International Solid-State Circuits Conference, Feb. 2002)
499
502
A Low-Jitter Wide-Range Skew-Calibrated Dual-Loop DLL Using Antifuse Circuitry for High-Speed DRAM
S. J Kim, S. H. Hong, J.-K. Wee, J. H Cho, P. S. Lee, J. H Ahn, and J Y Chung (IEEE Journal of Solid-State Circuits,
June 2002)
506
Part VI
RF Synthesis
An Adaptive PLL Tuning System Architecture Combining High Spectral Purity and Fast Settling Time
C S. Vaucher (IEEE Journal of Solid-State Circuits, April 2000)
517
A 2-V 900-MHz Monolithic CMOS Dual-Loop Frequency Synthesizer for GSM Receivers
W.S.T. Yan and H C Luong (IEEE Journal of Solid-State Circuits, Feb. 2001)
530
viii
A CMOS Frequency Synthesizer with an Injection-Locked Frequency Divider for a 5-GHz Wireless
LAN Receiver
H R. Rategh, H Samavati, and T. H Lee {IEEE Journal ofSolid-State Circuits, May 2000)
543
551
558
566
574
A Modeling Approach for X-A Fractional-TV Frequency Synthesizers Allowing Straightforward Noise Analysis
M. H Perrott, M. D. Trott, and C G. Sodini (IEEE Journal of Solid-State Circuits, Aug. 2002)
578
A Fully Integrated CMOS Frequency Synthesizer with Charge-Averaging Charge Pump and Dual-Path
Loop Filter for PCS- and Cellular-CDMA Wireless Systems
Y Koo, H Huh, Y Cho, J Lee, J Park, K Lee, D.-K. Jeong, and W. Kim (IEEE Journal of Solid-State Circuits,
May 2002)
589
A 1.1-GHz CMOS Fractional-TV Frequency Synthesizer With a 3-b Third-Order 2-A Modulator
W.Rhee, B.-S. Song, and A. AH (IEEE Journal of Solid-State Circuits, Oct. 2000)
596
603
A 27-mW CMOS Fractional-TV Synthesizer Using Digital Compensation for 2.5-Mb/s GFSK Modulation
M. H Perrott, T. L Tewksbury III, and C G. Sodini (IEEE Journal of Solid-State Circuits, Dec. 1997)
610
622
A 2.5-Gb/s Clock and Data Recovery IC with Tunable Jitter Characteristics for Use in LAN's and WAN's
K. Kishine, N. Ishihara, K Takiguchi, and H Ichino (IEEE Journal of Solid-State Circuits, June 1999)
Clock/Data Recovery PLL Using Half-Frequency Clock
M. Ran, T. Oherst, R. Lares, A. Rothermel, R. Schweer, and N. Menoux (IEEE Journal of Solid-State Circuits,
July 1997)
635
643
A 0.5-jxm CMOS 4.0-Gbit/s Serial Link Transceiver with Data Recovery Using Oversampling
C.-K. K. Yang, R. Farjad-Rad, andM.A. Horowitz (IEEE Journal of Solid-State Circuits, May 1998)
647
656
SiGe Clock and Data Recovery IC with Linear-Type PLL for 10-Gb/s SONET Application
Y M. Greshishchev and P Schvan (IEEE Journal of Solid-State Circuits, Sept. 2000)
666
673
ix
A 10-Gb/s CMOS Clock and Data Recovery Circuit with a Half-Rate Linear Phase Detector
J. Savoj and B. Razavi (IEEE Journal of Solid-State Circuits, May 2001)
681
A 10-Gb/s CMOS Clock and Data Recovery Circuit with Frequency Detection
J. Savoj and B. Razavi (Dig. International Solid-State Circuits Conference, Feb. 2001)
688
691
A 40-Gb/s Integrated Clock and Data Recovery Circuit in a 50-GHz/y, Silicon Bipolar Technology
M. Wurzer, J. Bock, H. Knapp, W.Zirwas, E Schumann, and A. Felder (IEEE Journal of Solid-State Circuits,
Sept. 1999)
694
A Fully Integrated 40-Gb/s Clock and Data Recovery IC With 1:4 DEMUX in SiGe Technology
M. Reinhold, C. Dorschky, E. Rose, R. Pullela, P. Mayer, E Kunz, Y Baeyens, T. Link, andJ-P. Mattia
(IEEE Journal of Solid-State Circuits, Dec. 2001)
699
707
Index
713
Accumulation
Strong Inversion
^TH
VGS
(a)
^var
Depletion
Accumulation
p-substrate
0
vQS
(b)
channel resistance, and (2) proper modeling for circuit simulations. We now study each issue.
Dynamic Range Deep-submicron MOSFETs exhibit susbtantial overlap capacitance between the gate and source/drain
terminals. For example, in a typical 0.13-/mi technology, a
transistor having minimum channel length, Lmin, displays an
overlap capacitance of 0.4 fF/^m and a gate-channel capacitance of 12 fF/fim2. In other words, for an effective channel
length of 0.12 /im and a given width, the overlap capacitance
between the gate and source/drain terminals of a varactor constitutes 2 x 0.4 fF /(0.12 x 12 fF+2 x 0.4 fF) 36% of the
total capacitance. Thus, even if the gate-channel component
varies by a factor of two across the allowable voltage range, the
overall dynamic range of the capacitance is given by (0.12 x 12
fF+2 x 0.4 ff)/(0.12 x 6 fF +2 x 0.4 fF) = 1.47.
In order to widen the varactor dynamic range, the transistor
length can be increased, thereby raising the voltage-dependent
component while maintaining the overlap capacitance relatively constant. This remedy, however, leads to a greater resistance between the source and drain, lowering the Q. The resistance reaches a maximum for the most negative gate-source
voltage, at which the depletion region's width is maximum and
the path through the n-well the longest (Fig. 3).1 Note that
var
''max
Cmin
VGS
I CvardVGS
(2)
Cmax Cmin T , . i
(-'max + l^min T ,
.,
. VGS J
Vo In cosh(a + - y - )
VGS,
(3)
dt
(4)
If used in charge-based analyses, Eq. (1) typically overestimates the tuning range of oscillators.
p-substrate
Fig. 3. Effect of n-well resistance in MOS varactor.
B. Inductors
The design of monolithic inductors has been studied extensively. The parameters of interest include the inductance, the
Q, the parasitic capacitance (i.e., the self-resonance frequency,
fsR), and the area, all of which trade with each other to some
extent. For a spiral structure such as that in Fig. 5, the line
width, the line spacing, the number of turns, and the outer
, T / ..
Cmax ~ Cn
Cvar{VGS) =
~
0)
Here, a and Vo allow fitting for the intercept and the slope,
respectively, and C m , n includes the overlap capacitance.
The above model yields different characteristics in different
circuit simulation programs! Simulation tools that analyze
1
lm(ZL)
Q=f^-.
(7)
Lp
LP
Rp
CP
Vout
CP
V,n'
(a)
RP
(2) the flow of displacement current through the series combination of the inductor's parasitic capacitance and the substrate
resistance; (3) the flow of magnetically-induced ("eddy") currents in the substrate resistance. At low frequencies, the dc
resistance is dominant, and as the frequency rises, the other
components begin to manifest themselves.
With the above observations in mind, let us construct a circuit model for inductors. Depicted in Fig. 8(a) is a simple
model where Rs denotes the series resistance at the frequency
L
C2
<M
(b)
Fig. 6. (a) Parallel tank for definition of Q, (b) common-source stage using a
tank.
The utility of Eq. (7) can be seen in the example illustrated in Fig. 6(b). Here, the knowledge of Lp and
Q = Rp/(LpL>ji) directly provides the voltage gain and the
output swing, whereas the Q given by Eq. (6) serves no purpose.
The Q of inductors is limited by resistive losses: parasitic
resistances dissipate a fraction of the energy that is reciprocated between the inductor and the capacitor in a tank. Note
that the finite Q is also accompanied by generation of noise.
For example, in the circuit of Fig. 6(b), Rp produces an output noise voltage of V = AkTRp = AkTQLpu>n per unit
bandwidth if Lp resonates with Cp.
The losses in inductors arise from three mechanisms (Fig.
7): (1) the series resistance of the spiral, including both lowfrequency resistance and current crowding due to skin effect;
2
Rs
c2
Ci
S1
S2
(a)
c3:
"si
*S2
C4
(b)
VDD
LP
Lp
"out
M2
'ss
(a)
(b)
Fig. 9. (a) On-wafer measurement of inductor using coaxial probe, (b) calibration structure.
"signal" (S) pad and the other to the "ground" (G) pads. The
signal pad is sensed by the center conductor and the ground
pads by the outer shield of the coaxial probe.
Since the capacitance of the pads and the wires connecting to
the spiral is typically significant, the test device is accompanied
by a calibration structure [Fig. 9(b)], where the spiral itself
is omitted. The scattering (S) parameters of both structures
are measured by means of a network analyzer across the band
of interest and subsequently converted to Y parameters. Subtraction of the Y parameters of the calibration geometry from
those of the device under test yields the actual characteristics
of the spiral.
An alternative method of measuring the Q of inductors is
illustrated in Fig. 10. Here, inductors are incorporated in
an oscillator and the tail current can be controlled externally.
In the laboratory measurement, the output is monitored on
a spectrum analyzer while Iss is reduced so as to place the
circuit at the edge of oscillation. Next, the value of Iss thus
obtained is used in the simulation of the oscillator and the
equivalent parallel resistance of each tank, Rp, is lowered
(a)
(b)
Voo
4Cl+C
12
>
(8)
(a)
(b)
^eg
(c)
Fig. 12. (a) Cascade of inductively-loaded differential pairs, (b) layout of first
stage using a symmetric inductor, (c) layout of first stage using asymmetric
inductors.
(a)
(b)
Fig. 13. Stack of (a) metal 8 and metal 7 spirals, (b) metal 8 and metal 3
spirals.
Fig. 14. (a) Parallel combination of spirals to reduce metal resistance, (b)
tapered metal width, (c) patterned shield.
C. MOS Transistors
vln
VDD
\M5
VDD
"e
M5
MB
M4
Af3
X
"out"
X*
M2
.My
'ss
^cont
(b)
(a)
Fig. 15. (a) Differential stage for use in a ring oscillator, (b) effect of supply
noise.
VDD
M2
X{
Vcont
Mz
"1
'DD
/1
CB
Kcont
(a)
(b)
Fig. 16. (a) Constant-current ring oscillator, (b) transistor-level implementation of (a).
rather than a voltage source, and frequency tuning is also accomplished through IDD- If IDD is designed for low sensitivity to VDD, then the oscillator remains relatively immune
to supply noisethe principal advantage of this configuration
over standard inverter-based rings that are directly connected
to the supply voltage.
In practice, the nonidealities associated with IDD limit the
supply rejection. Shown in Fig. 16(b) is a transistor implementation where M\ operates as a contolled current source. If
I\ is constant, V\ tracks VDD variations whereas Vy does not,
yielding a change in IDD through channel-length modulation
in M\. Choosing long channels for M\ and Mi alleviates
this issue while necessitating wide channels as well to allow
a relatively small drain-source voltage for M\. However, the
resulting high drain junction capacitance of M\ at Y creates a
low-impedance path from VDD to this node at high frequencies. To suppress both resistive and capacitive feedthrough of
VDD noise, a bypass capacitor, CB, is tied from Y to ground.
However, the pole associated with this node now enters the
VCO transfer function, complicating the design of the PLL.
Let us now study the response of the circuit of Figs. 15(a)
and 16 to substrate noise, VSub- In the former, V8Ub manifests
itself through two mechanisms (Fig. 17): (1) by modulating the drain junction capacitance of M\ and M2 and hence
<
M2
*1
v*
Vln
cP
Vsub
"h
fsub
/ss
f
III. LC OSCILLATORS
LC oscillators have found wide usage in high-speed and/or
low-noise systems. Extensive research on inductors, varactors, and oscillator topologies has provided the grounds for
systematic design, helping to demystify the "black magic."
LC oscillators offer a number of advantages over ring structures: (a) lower phase noise for a given frequency and power
dissipation; (b) greater output voltage swings, with peak levels
that can exceed the supply voltage; and (c) ability to operate
at higher frequencies.
However, LC VCO design requires precise device and circuit modeling because (a) the narrow tuning range calls for
accurate prediction of the center frequency; (b) the phase noise
is greatly affected by the quality of inductors and varactors and
the noise of transistors. Also, occupying a large area, spiral
inductors pick up noise from the substrate and make it difficult
to incorporate many such oscillators on one chip.
The design of LC VCOs targets the following parameters:
center frequency, phase noise, tuning range, power dissipation,
voltage headroom, startup condition, output voltage swing,
and drive capability. The last two have often received less
attention, but they directly determine the design difficulty and
power consumption of the stages following the oscillator. That
is, a buffer placed after the VCO may consume more power
than the VCO itself!
A. Design Example
As an example of VCO design, let us consider the topology
shown in Fig. 18. Here, M\ and M2 present a small-signal
negative resistance of -2/gm\}i
between nodes X and Y,
*fao
Cp
RP
LP
LP
RP
CP
M2
"1
/ss
LP
(typically a buffer), CL- Thus, the allowable varactor capacitance is given by the difference between Ctot and the sum of
these components:
(9)
Cvar = (LPU2)-1-CLP-CDB-CGS-4CGD-CL.
(13)
Umax, where
-
IssQu'
l 0 )
UJmin
IT ir
-x-r
(14)
where it is assumed the tank Q is limited by that of the inducy L>p\Lsvar,max ~r Ofixed)
tor. Note that this calculation demands knowledge of the Q
"max =
.
,
(15)
before the inductance is computed, a minor issue because for a
VLP\^var,min + ^ fixed)
given geometry and frequency of operation, the Q is relatively
independent of the inductance.
and Cfixed = CLP -f CDB + CGS + 4CGD + CL.
We now determine the dimensions of Mi and M2. IncreasFigure 19(a) depicts the oscillator with MOS varactors diing the channel length beyond the minimum value allowed rectly tied to X and Y. Since the output common-mode level
by the technology does not significantly lower 7 unless the
V
length exceeds approximately 0.5 fim. For this reason, the
DD
VDD vb
vb
minimum length is usually chosen to minimize the capaciLi
L2
L2
M
tance contributed by the transistors. The transistors must be
*1
R2
Y
X
X
Y
wide enough to steer most of Iss while experiencing a voltage
C
C1
CC1
swing of Vmin at nodes X and Y. Viewing M\ and M2 as a
differential pair, we note that M\ must turn off as Vx - Vy
/W v 2
Mv1
Mv1
reaches Knn. For square-law devices,
Mv2
Vcont
VminZ=
\lnC0!w/L'
(U)
(a)
and hence
w-
2Iss
^cont
(b)
Fig. 19. LC oscillator with (a) direct coupling and (b) capacitive coupling of
varactors to tanks.
(n)
a r 1/2 IT '
^ '
but for short-channel devices, W must be obtained by simulations using proper device models. This choice of W typically
guarantees a small-signal loop gain greater than unity, enabling
the circuit to start at power-up.
With Lp computed from Eq. (10), the total capacitance
at nodes X and Y is calculated as Ctot = (Lpu2)~l.
This
capacitance includes the fo\\ovfingfixed components: (1) the
parasitic capacitance of Lp, CLP\ (2) the drain junction, gatesource, and gate-drain capacitances of Mi and M 2 , CDB +
Cos + 4CGD>6 and (3) the input capacitance of the next state
5
We assume that, at a given frequency, the Q is relatively independent of
the inductance value.
6
Since CQD experiences a total voltage swing of 2V p m m, its Miller effect
translates to a factor of two for each transistor.
10
DD
t.1
L2
Cu
Cu
Cu
Cu
Fine Control
Coarse Control
Coarse Control
(a)
'out
Fewer
Capacitors
Switched in
(b)
Fig. 21. (a) VCO with fine and coarse digital control, (b) resulting characteristics.
11
REFERENCES
[1] B. Razavi, "Design of Monolithic Phase-Locked Loops
and Clock Recovery Circuits - A Tutorial," in Monolithic
Phase-Locked Loops and Clock Recovery Circuits, B.
Razavi, Ed., Piscataway, NJ: IEEE Press, 1996.
[2] P. Larsson, "Parasitic Resistance in an MOS Transistor
Used as On-Chip Decoupling Capacitor," IEEEJ. SolidState Circuits, vol. 32, pp. 574-576, April 1997.
[3] K. Kundert, Private Communication.
[4] M. Danesh et al., "A Q-Factor Enhancement Technique
for MMIC Inductors," Proc. IEEE Radio Frequency Integrated Circuits Symp., pp. 217-220, April 1998.
[5] A. Zolfaghari, A. Y. Chan, and B. Razavi, "Stacked Inductors and Transformers in CMOS Technology," IEEE
Journal of Solid-State Circuits, vol. 36, pp. 620-628,
April 2001.
[6] F. Behbahani, et al., "A 2.4-GHz Low-IF Receiver for
Wideband WLAN in 0.6-//m CMOS," IEEE Journal of
Solid-State Circuits, vol. 35, pp. 1908-1916, December
2000.
[7] O. E. Akcasu, "High-Capacity Structures in a Semiconductor Device," US Patent 5,208,725, May 1993.
VMLogic
Coarse
Control
VL<
Charge
Pump
VCO
Fine
Control
12
I. INTRODUCTION
II. DLL
CHARACTERISTICS
13
PLL
1.2
in
PD
Filter
KpD
Gp(s)
DLL
1
0.8
VC
0.6
Delay Line
dly__in
*^DL
0.4
dly_put
0.2
&
*HLOOP
open loop
H(s)
(dB)
20
^eO*)
60
Figure 3: Step response of PLL and DLL (with same loop characteristics).
,20dB/dec
the tracking of the phase of the input clock changes at
different frequencies. Based on the transfer function, the
loop bandwidth is (Obw = KPDKDLGF.
For frequencies
within the loop bandwidth the phase of the output clock will
track that of the reference input and reject noise within the
loop. The phase characteristics of the output clock above the
bandwidth of the loop depend on the phase behavior of the
delay-line input and the noise from the delay line. The noise
transfer function from a noise source lumped at the
delay-line output is a high-pass response.
closed loop
1
log CO
^ l - o -
l + l/{KPDKDLGF(s))s
1 -I-
0 )
=o
H(s) =
1+
(s/KPDKDLGF)
(s/KpDKDLGF)
(2)
14
data sample
Rcvr
+7T
data
lock
point
transition sample
Rcvr
Filter
2nd lock
point
Vc
ref_clk
-71
Delay Line
Delay (V c )
Figure 5: Delay line phase/delay characteristic.
sampling clock
15
clock in
Vref
clock out
offset
vref
Vrefl
/
clock in
\vrcf2
duty c y c l e
clock out
reduction
*
mmmmmmm
Because the buffer input has finite slew rate, changing the
threshold effectively adjusts the output high and low
half-periods. The loop settles when the high and low
half-periods are equal. Figure 6 illustrates the reduction in
duty cycle as the threshold shifts from Vrefi to Vref2. Since
random variation of the duty cycle effectively appears as
jitter, single-ended implementations such as that shown in
the figure can be very sensitive to common-mode noise. For
this reason, differential architectures are preferred [3].
For low jitter on the output clock, the loop components
must be carefully designed. Many of the loop components
are very similar to that of a PLL and are well described in
[10]. For a charge-pump based loop filter, since the filter is
only first-order, a simple capacitor replaces the RC filter. As
in a PLL, noise on the control voltage directly translates into
jitter. Designers may use additional filtering to suppress the
noise. The loop element that has deviated the most from PLL
design and is critical for functionality and performance is the
design of the delay line.
III. D E L A Y - L I N E ARCHITECTURES
16
vc
out
out
in +
in+
vb~
in_
Vc
(a)
capacitive
control (f)
(b)
V
CP
vc
out
in
in+
Figure 8: Delay versus voltage for two different delay buffer elements: types (d) and (f) of Fig. 7.
out
in.
VOT
Vb~
resistive
control (d)
Vc
(d)
(c)
Vc
in
out
vc
in
out
(e)
(f)
17
180 Phase
Detect +
Filter
<*inO
& cI W>
Ckjnl
ck
yc
<*inl
X
ck
inO
<*outO
^Io
clock^
4*90,270
^outOl
^0,180
in0
^outl
^outl
cllWoi
0135,315
Figure 9: 180-locked DLL to generate intermediate phases that are
a fraction of a cycle.
^45,225
0-oc)I0
\ JPhaselntei^Iartor, ^
delay elements exhibit some nonlinearity. As a result, the
delay-line gain, K DL , is a function of the delay. Because a
DLL is unconditionally stable, the loop still functions with
the varying loop parameter. However, more linear elements
are better for designs that require a constant loop bandwidth.
To compensate for the variable K D L , designers add
programmability to the loop-filter capacitor.
The control signal for either type of delay elements can
be digital. In a digital implementation [11], the current
source is binary weighted and switched by a digital word.
For capacitively-controlled elements, the capacitance can be
binary weighted and switched. A nearly all-digital DLL is
then possible by using a simple counter to replace the analog
integrating filter.
B. Phase Interpolation
18
clocks
Phase Generator
3.61
l^O
J<feo
J<Pl80 % 7 0
,Mux A
Control
I 2.6T
Interpolator
datajn
1.6T
0.0
0.2
Phase
Detect
docksamp
0.4
0.6
0.8
1.0
current partition (X)
Figure 11: Buffer based phase interpolator linearity.
Nc-^K^-K^I
t
4*45,225
^ ^ O
t
$135,315
0,180
19
datait1
L L Lr
R.
C
>
>
cloclT l^o
[1:N]
>
Receiver
Samplers
do'
Decision
Logic
Transition
Detect
Startable Oscillator
dafcin
TDH
->\
r
dock^p
Receiver
delay control
f received data
Figure 15: Clock/data recovery using startable oscillator.
received data
Figure 14: Oversampled data recovery architecture.
20
Counter+
Control
clock^
Delay Line
Nxf ref
Filter
Detect
REFERENCES
**
[I]
[2]
reference input with the oscillator output and tunes the delay
of the delay elements. Once locked, the resulting output
clock frequency is a multiple of the input reference
frequency. Recent designs [26] extend the frequency range
and use an interpolator instead of a multiplexer to blend the
delay-line feedback and the low-frequency reference clocks.
Both edge-combining multiplication and delay-line
multiplication reduce the phase noise of the output clock
because the core DLL does not have an oscillator that
accumulates phase error. After N cycles, where N is the
divide ratio, a new clean reference clock edge arrives and
resets any accumulated phase error to zero. The architecture
potentially lowers jitter by eliminating the peaking in the
transfer function and allows a high tracking bandwidth.
However, matching is critical in these designs. Mismatches
in the phase detector or charge pump result in a static phase
error that modulates the output frequency at the input
reference frequency. Similarly, in the edge combining
implementations, if the delay line is mismatched, the output
clock would contain significant reference tones. Designers
either choose the reference frequency carefully so that the
tones do not impact the system performance or employ
additional circuitry to compensate for the mismatches.
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
V.
CONCLUSION
[II]
[12]
[13]
[14]
[15]
21
[16]
[ 17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
22
Reference Signal
Generator
V
ref
<//v
Phase/
Frequency
Detector
U
Lowpass
Loop Filter
VCO ]
out
Charge Parop
f N
V
V
ref
div
Phase/
Frequency
Detector
rfv
I. INTRODUCTION
ref-
23
/ . - 4 0 kHz
^-492
Phase/
Freq.
Detector
fw
Charge
Pump
Loop
Filter
Phase/
Freq.
Detector
- 2.402 GHz
+ *MHz<wk
VCO
Charge
Pump
Loop
Filter
f**r2-403
GHz
(on average)
19.68 MHz
-MM+JPI-I
19.68 MHz
^60050 + 2 5 *
PLL can be designed to generate a spectrally pure output signal at any integer multiple of the reference frequency,/*/.
As indicated by the timing diagram in Figure 1, the loop
filter is updated by the charge pump once every reference period. This discrete-time behavior places an upper limit on the
loop bandwidth of approximately fnj/l0 above which the PLL
tends to be unstable [1]. In integrated circuit PLLs, it is common to further limit the bandwidth to approximately f^/20 to
allow for process and temperature variations.
The output frequency can be changed by changing N, but
N must be an integer, so the output frequency can be changed
only by integer multiples of the reference frequency. If finer
tuning resolution is required the only option is to reduce the
reference frequency. Unfortunately, this tends to reduce the
maximum practical loop bandwidth, thereby increasing the
settling time of the PLL, the noise contributed by the VCO,
and the in-band portions of the noise contributed by the reference source, the PFD, the charge pump, and the divider.
This fundamental tradeoff between bandwidth and tuning
resolution in integer-Af PLLs creates problems in many applications. For example, a PLL that can be tuned from 2.402
GHz to 2.480 GHz in steps of 1 MHz is required to generate
the local oscillator signal in a direct conversion Bluetooth
transceiver [3]. An integer-N PLL capable of generating the
local oscillator signal from a commonly used crystal oscillator
frequency, 19.68 MHz, is shown in Figure 2. A reference frequency of fref = 40 kHzthe greatest common divisor of the
crystal frequency and the set of desired output frequenciesis
obtained by dividing the crystal oscillator signal by 492. The
resulting PLL output frequency is 60050 + 25k times the reference frequency, where k is an integer used to select the desired frequency step.
The PLL achieves the desired output frequencies, but its
bandwidth is limited to approximately 2 kHz, i.e.,/^/20. Unfortunately, with such a low bandwidth the settling time exceeds the 200 jiS limit specified in the Bluetooth standard, and
the phase noise contributed by the VCO would be unacceptably high if it were implemented in present-day CMOS technology. One solution is to use a 1 MHz reference signal, but
this requires the crystal frequency to be an integer multiple of
1 MHz, or another PLL to generate a 1 MHz reference frequency. Unfortunately, in low cost consumer electronics applications such as Bluetooth, it is often desirable to be compatible with all of the popular crystal frequencies, so restricting the crystal frequencies to multiples of 1 MHz is not always
an option. In such cases, an additional PLL capable of generating the 1 MHz reference signal with very little phase noise
from any of the crystal frequencies is required, or, as de-
_r
y\\
Figure 3: A fractional-.^ PLL that generates non-integer multiples of the reference frequency, but has phase noise consisting of large spurious tones.
24
HOI-
19.68 MHz
Phase/
Freq.
Detector
Charge
Pump
Loop
Filter
rD>T
HDr-J
fvaT 2 - 4 0 3 G H z
(on average)
* Phase/
Freq.
r Detector
Charge
Pump
Loop
Filter
vco
2.403 GHz
19.68 MHz
122 +y\n\
-f- 122 +y[n)
Randomized Pulse
Density Modulator
51/492
**U
Figure 4: A fractional-TV PLL that generates non-integer multiples of the reference frequency, but has a large amount of in-band phase noise.
25
{0,217}
pseudo-random
bit sequence
-80
y[n
{-1,0,1,2}
100
g-120-140
" 16
-180
-y
TI
43
2
9-Levei
Quantizer
1
-4 5 - 3 5 -2.5 -I 5
Delay
f]
\
Delay
/J
9-Level
Quantizer
-Si
=y-r
0.5
can be used to circumvent this problem. The structure incorporates the same 9-level quantizer presented above, but in this
case the quantizer is preceded by two delaying discrete-time
integrators (i.e., accumulators), and surrounded by two feedback loops [8], [9]. Each discrete-time integrator has a transfer function of z~ 1 /(l-z~ 1 ) which implies that its * output
sample is the sum of all its input samples for times k < n.
With the quantizer represented as an additive noise source as
depicted in Figure 6, the AS modulator can be viewed as a
two-input, single-output, linear time-invariant, discrete-time
system. It is straightforward to verify that
y[n] = x[n-2] + em[n],
(1)
where em[n] is the overall quantization noise of the AS modulator and is given by
em[n] = eq[n]-2eq[n-l] + eq[n-2].
(2)
-0.5
M
No-ovcrload range"
(a),(b)
9-level
Quantizer
I*
2
500
80
jj -
1500
2000
< -120
Ao
-2
-140
160
1000
(c)
|.100
II
-fc>
(b)
(a)
I-
Lowpass Filter
(BW = 500 kHz)
10*
10S
1O 9
107
Hz
500
1000
1500
Figure 7: (a) A power spectral density plot of the quantizer output in dB,
relative to the quantization step-size of A = 1, per Hz, (b) a time domain plot
of the quantizer output, and (c) a time domain plot of the quantizer output
filtered by a sharp lowpass filter with a cutoff frequency of 500 kHz.
26
Second-Order
AI Modulator
(a),(b)
Lowpass Filter
(BW = 500 kHz)
Phase/
Freq.
Detector
-fc>
Charge
Pump
vco
'*>
*<
c,
^')
'm
Ci
(a)
(b),
J .0
Loop Filter
tQ
f:
+ N + y\n\
V
y\n\-
\
0
.100
II
SO0
1OO0
1500
2000
ref-
div-
(C)
2
<-120
-2
10 4
10 s
10 e
Hz
10T
500
1000
1500
2000
'
*-'.
Figure 10: The AL fractional-N PLL with the details of a commonly used
loop filter and a timing diagram relating to the charge pump output.
-180
Figure 9: (a) A power spectral density plot of the A I modulator output in dB,
relative to the quantization step-size of A = 1, per Hz, (b) a time domain plot
of the AE modulator output, and (c) a time domain plot of the A I modulator
output filtered by a sharp lowpass filter with a cutoff frequency of 500 kHz.
27
ing the w* reference period the charge pump output is a current pulse of amplitude / or - / and duration \tn - rn\, where /
and rn are the times of the charge pump output transitions triggered by the positive-going edges of the divider output and
reference signal, respectively. Therefore, the average current
sourced or sunk by the charge pump during the * reference
period is /(/ - rn)ITnf. In practice, the PFD is usually designed such that, except for a possible constant offset, this
result holds even though the current sources have finite rise
Figure 11: The A fractional-N PLL linearized model. Except for the shaded
and fall times [2].
region the model is identical to the corresponding integer-Af PLL model.
The first step in deriving the model is to develop an expression for / xn. Ideally, rn = nTKf, but phase noise intro- tions in (5) can be neglected, and the charge pump output can
duced by the reference source and PFD cause it to have the be modeled as a smoothly varying function of time with an
average value over each reference period equal to that of (5).
form
With these approximations, (5) implies that
T = nT
' "m(j)-Kcoum{t)-kycoUVc^t)-vml)dt~^l
lE
where 6reJ(j) and OPFDO) a r e the reference source and PFD
LO) = I
cp
phase noise functions, respectively. If the VCO output were
N+a
ideal its positive-going edges would be spaced at uniform in(6)
1
tervals of Tnf I (N + a), where a is the fractional part of the
0
^,v(') , ~AO , &pFD(t)
modulus (e.g., a = 51/492 in Figure 5). Therefore, ideally,
2n + 2n + 2n
"
* "fj[MO+*m>(O]
(3)
^v^i{N+m'
<.-iyg<"+**j)-^-(v-<<>-v,)*-^]
T
2n e*.(O,
which reduces to
4 ^ ^
#/(*)
where
tn=nTnf +^\y\(yW-<*)
where ujj) is the result of discrete-time integrating and converting to continuous-time the quantity, y[n] a.
The AL fractional-N PLL linearized model follows directly
from (6) and Figure 10. It is shown in Figure 11, where in(t)
represents the noise contributed by the charge pump current
sources and the loop filter, and z//s) is the transfer function of
the loop filter. The model specifies the phase noise transfer
functions and loop dynamics of the PLL. For example, the
model implies that
N + alh
+ aJ ) ^ - ,
T(s)=
'
and ^ l l
l + T(s)
eYCO{s)
vc
y Y
_L_
l + T(s)
(7)
(8)
For the loop filter shown in Figure 10, the transfer function is
1
l + sRCt
z (J,
(9)
"
C^+C2 s[\ +sRClC2 /(C, + C 2 )]'
ref
0*(O-
2n
Subtracting (3) from (4) yields an expression for the average
current sourced or sunk by the charge pump during the * B. Differences Between the AS Fractional-N and Integer-N
reference period:
PLL Models
I(tn-Tn)/Tre/ =
The shaded region in Figure 11 indicates the part of the
model that is specific to AZ fractional-TV PLLs; except for the
shaded region the model is identical to the corresponding
_ _
(5)
Il
(5)
model
for integer-iV PLLs. Therefore, each phase noise transN+a
fer function in an integer-iVTLL is identical to the corresponding phase noise transfer function in a AZfractional-.AfPLL,
except
every occurrence of TV in the former is replaced by N+a
g,C.) , M O , <WO
in
the
latter.
In most cases, N \ and a<\, so
N+a~N
2n
In ' In
and the corresponding transfer functions in integer-N and AE
As mentioned above, the phase noise terms are assumed to fractional-AT PLLs are nearly identical in practice. Similarly,
have bandwidths that are much smaller than the reference fre- the loop dynamics and stability issues are nearly the same in
quency. Consequently, the sampling of the phase noise func- ASfractional-TVPLLs and integer-TV PLLs.
SUtj-^-^fMo-v,,)*-^
28
^ ^ /
L|QI I pi Debtor H
pum
pM
Fiiter
= 2.402 GHz
r V T r
19.68 MHz
| N+y\n\
2nd-Order~|
I
m/492*<>-*> Digital A
*
T
1 Modulator | y\n\ = {-1, 0,1, 2}
{0,2~i7} pseudo-random bit sequence
Frequency Plan:
Toget* = 0, I , . . . , o r l 8 :
To get* = 19, 2 1 , . . . , or 38:
To get * = 39, 4 1 , . . . , or 57:
To get k = 58,60,..., or 79:
set N=
set N=
settf=
set N=
122, m = k-25 + 26
123, m = (k- 19)25 + 9
124, m = (*-39)-25 + 17
125, m = (k- 58)25
Figure 12: The example A I fractional-N PLL and frequency plan for generation of the Bluetooth wireless LAN RF channel frequencies.
l0.J-*J2.J*tf
_j_Au(W] dBc/Hz
(10)
The argument of the log function has the form of a highpass
function times a lowpass function, which is consistent with the
claim in Section III that the PLL lowpass filters the primarily
high frequency quantization noise from the AS modulator. It
follows from (10) that the phase noise resulting from em[n] can
be decreased by reducing the PLL bandwidth or increasing the
reference frequency. If a higher-order AS modulator is used,
an equation similar to (10) results except that the exponent of
the sinusoid is greater than two. This reduces the in-band portion of the quantization noise, but increases the out-of-band
portion, which, depending upon the loop parameters of the
PLL, can result in a somewhat lower overall phase noise.
However, the PLL loop filter is highly constrained to maintain
PLL stability, so the phase noise reduction that can be
achieved by increasing the order of the AS modulator is limited in most applications [16].
=tan
iH)'
_ IKycoR b-\
~ 2nN " b '
hw
(11)
.
(
XJBW
and
S0 (/)|
(14)
where PM is the phase margin of the PLL, fBw is the 3 dB
bandwidth of the PLL, and b = 1 + C2IC\ is a measure of the
separation between the two loop filter capacitors [22]. The
derivations assume that b is greater than about 10, and (14) is
valid for frequencies greater than (C2+Ci)/(2^RC2Ci).
These equations are sufficient to determine appropriate
loop filter component values. For example, suppose b is set to
29
tion delay depends upon the divider modulus and the number
of AI modulator output levels is greater than two, the effect is
-80
that of a hard non-linearity applied to the AI modulator quantization noise. This tends to fold out-of-band AI modulator
-100
quantization noise to low frequencies and introduce spurious
-120
tones,
which can significantly increase the PLL phase noise.
N
The problem is analogous to that of multi-bit digital-to-analog
-140
O
converter step-size mismatches in analog AI data converters
CQ
o
[23]. Unfortunately, circuit simulations are required to evalu-160
ate the severity of the problem on a case by case basis as both
-180
the extent of any modulus-dependent delays and their affect
on the PLL phase noise are difficult to predict using hand
-200
analysis.
-220
There are two well-known solutions to this problem. One
105
106u
10?
108
solution is to resynchronize the divider output to the nearest
Hz
VCO edge or at least a higher-frequency edge obtained from
FigureB: Simulated and calculated PSD plots of the phase noise arising from
within
the divider circuitry [22], [24]. The ^synchronization
A I modulator quantization noise for the example A I fractional-N PLL.
erases memory of modulus-dependent delays and noise intro49, so, as indicated by (11), the phase margin is approximately duced within the divider circuitry, but care must be taken to
70. Solving (14) with the phase noise set to -130 dBc/Hz a t / ensure that the signal used for resynchronization is itself free
= 3 MHz indicates t h a t y ^ ~ 50 kHz. Therefore, the phase of modulus dependent delays. The primary drawback of the
noise resulting from AI modulator quantization noise is suffi- approach is that it increases power consumption.
ciently suppressed with a 50 kHz bandwidth and a phase marThe other solution is to use a AI modulator with single-bit
gin of 70. With this information (12) can be solved to find R (i.e., two level) quantization. In this case, modulus-dependent
= 960 Q. with which (13) and the definition of b can be used to delays give rise to phase error at the output of the divider that
calculate C2 = 23 nF and C\ = 480 pF. It is straightforward to consists of a constant offset plus a scaled version of the AI
verify that the phase noise introduced by the loop filter resistor modulator quantization noise. Since, by design, the AI modu(the only noise source in the loop filter) is well below -130 lator quantization noise has most of its power outside the PLL
dBc/Hz at offsets from the carrier of 3 MHz and above as re- bandwidth, the modulus-dependent delays increase the phase
quired.
noise only slightly. Unfortunately, AI modulators with singleFigure 13 shows PSD plots of the phase noise arising from bit quantization tend not to perform as well as AI modulators
AI modulator quantization noise for the example PLL with the with multi-bit (i.e., more than two-level) quantization. For
loop filter component values derived above. The heavy curve example, if the 9-level quantizer in the 48 Msample/s AI
was calculated directly from the linearized model equations modulator example presented in Section IV were replaced by
(7) through (10). The light curve was obtained through a a one-bit quantizer, the dynamic range of the AI modulator in
behavioral computer simulation of the PLL. As is evident the zero to 500 kHz band would be reduced from 88.5 dB to
from the figure, the two curves agree very well which suggests approximately 65 dB. Moreover, unlike the 9-level quantizer
that the approximations made in obtaining the linearized case, the additive noise from the single-bit quantizer would
not be white and would be correlated with the input sequence.
model are reasonable.
An effect that does not have a counterpart in integer-Af Its variance would be input dependent and it would contain
PLLs is the presence of zeros in the PSD of the phase noise spurious tones.
arising from AI modulator quantization noise at multiples of
These problems can be mitigated by using a higher-order
the reference frequency. These zeros are a result of the dis- AI modulator architecture to more aggressively suppress the
crete-to-continuous-time conversion of the AD modulator in-band portion of the additive noise from the two-level quanquantization noise; each zero is a sampling image of the dc tizer. However, to maintain stability in a higher-order AI
zero imposed on the quantization noise by the AI modulator.
modulator with single-bit quantization, the useful input range
of the AI modulator input signal must be reduced and more
poles
and zeros must be introduced within the feedback loop
VI. AS FRACTIONAL-TV PLL SPECIFIC PROBLEMS
as compared to a multi-bit design with a comparable dynamic
One of the most significant problems specific to AI frac- range. Even then, the problem of spurious tones persists, and
tional-AT PLLs is that they can be sensitive to modulus- it is difficult to predict where they will appear except through
dependent divider delays. In practice, each positive-going extensive simulation. Furthermore, to compensate for the
divider edge is separated from the VCO edge that triggered it restricted input range of the AI modulator the reference freby a propagation delay. Ideally, this propagation delay is in- quency must be large enough that all of the desired PLL outdependent of the corresponding divider modulus, in which put frequencies can be achieved. This can severely limit decase it introduces a constant phase offset but does not other- sign flexibility. For example, if the magnitude of the AI
wise contribute to the phase noise. However, if the propaga- modulator input signal were limited to less than 0.5 in the case
-60
"Exact" simulation
Linearized Model
30
(15)
The last term in (15), which is caused by having the AI modulator switch the divider modulus, represents a significant increase in the time during which the charge pump current
sources are turned on each reference period. Consequently,
the phase noise arising just from charge pump current source
noise is larger in the AIfractional-TVPLL by
T Averagefractional-TVPLL charge pump "on time"!
L Average integer-TV PLL charge pump "on time" J
where A is a constant between 10 and 20. The value of A depends upon the autocorrelation of the charge pump current
source noise. For example, if the current source noise in successive charge pump pulses is completely uncorrelated, then A
is 10. Near the other extreme, A is close to 20.
VII. TECHNIQUES TO WIDEN AE FRACTIONAL-N
PLL LOOP BANDWIDTHS
A transmitter with virtually any modulation format can be
implemented using D/A conversion to generate analog baseband or IF signals and upconversion to generate the final RF
signal. However, many of the commonly used modulation
formats in wireless communication systems such as MSK and
FSK involve only frequency or phase modulation of a single
carrier [25]. In such cases, the transmitted signal can be generated by modulating a radio frequency (RF) VCO, thereby
eliminating the need for conventional upconversion stages and
much of the attendant analog filtering. At least two approaches have been successfully implemented in commercial
wireless transmitters to date. One is based on open-loop VCO
modulation, and the other is based on AI fractional-TV synthesis.
An example of a commercial transmitter that uses the
open-loop VCO modulation technique is presented in [26] and
[27], in this case for a DECT cordless telephone. Between
transmit bursts, the desired center frequency is set relative to a
reference frequency by enclosing the VCO within a conventional PLL. During each transmit burst the VCO is switched
out of the PLL and the desired frequency modulation is applied directly to its input. The primary limitation of the approach is that it tends to be highly sensitive to noise and interference from other circuits. For example, in [27], the required
level of isolation precluded the implementation of a single-
VIII. CONCLUSION
The additional concepts and issues associated with AI
31
I. Galton, "One-bit dithering in delta-sigma modulatorbased D/A conversion," Proc. of the IEEE International
Symposium on Circuits and Systems, 1993.
13.
16. W. Rhee, B. S. Song, A. AH, "A 1.1-GHz CMOS fractional-N frequency synthesizer with a 3-b third-order AI
modulator," IEEE Journal of Solid-State Circuits, vol.
35, no. 10 , pp. 1453-1460, October 2000.
ACKNOWLEDGEMENTS
REFERENCES
1.
2.
3.
4.
U. L. Rohde, Microvave and Wireless Synthesizers Theory and Design, John Wiley & Sons, 1997.
5.
6.
7.
8.
9.
K. Uchimura, T. Hayashi, T. Kimura, A. Iwata, "Oversampling A-to-D and D-to-A converters with multistage
noise shaping modulators," IEEE Transactions on
Acoustics, Speech, and Signal Processing, vol. AASP36, pp. 1899-1905, December 1988.
21.
22.
23.
S. R. Norsworthy, R. Schreier, G. C. Temes, Eds. DeltaSigmaData Converters, Theory, Design, and Simulation,
New York: IEEE Press, 1997.
24.
L. Lin, L. Tee, P. R. Gray, "A 1.4 GHz differential lownoise CMOS frequency synthesizer using a wideband
PLL architecture", IEEE ISSCC Digest of Technical Papers, pp. 204-205, Feb. 2000.
25.
J. G. Proakis, Digital
McGraw Hill, 2000.
26.
32
Communications,
fourth ed.,
Feb. 1995.
27. S. Heinen, K. Hadjizada, U. Matter, W. Geppert, V.
Thomas, S. Weber, S. Beyer, J. Fenk, E. Matshke, "A 2.7
V 2.5 GHz bipolar chipset for digital wireless communication," Digest of Technical Papers, IEEE International
Solid-State Circuits Conference, vol. 40, pp. 306-307,
Feb. 1997.
28. N. Filiol, et. al., "A 22 mW Bluetooth RF transceiver
with direct RF modulation and on-chip IF filtering," Digest of Technical Papers, IEEE International Solid-State
Circuits Conference, vol. 43, pp. 202-203, Feb. 2001.
29. M. H. Perrott, T. L. Tewksbury III, C. G. Sodini, "A 27Ian Galton received the Sc.B. degree from Brown
University in 1984, and the M.S. and Ph.D. degrees
from the California Institute of Technology in 1989
and 1992, respectively, all in electrical engineering.
Since 1996 he has been a professor of electrical
engineering at the University of California, San
Diego where he teaches and conducts research in the
field of mixed-signal integrated circuits and systems
for communications. Prior to 1996 he was with UC
Irvine, the NASA Jet Propulsion Laboratory, Acuson, and Mead Data Central. His research involves the invention, analysis,
and integrated circuit implementation of key communication system blocks
such as data converters, frequency synthesizers, and clock recovery systems.
The emphasis of his research is on the development of digital signal processing techniques to mitigate the effects of non-ideal analog circuit behavior with
the objective of generating enabling technology for highly integrated, lowcost, communication systems. In addition to his academic research, he regularly consults at several communications and semiconductor companies and
teaches portions of various industry-oriented short courses on the design of
data converters, PLLs, and wireless transceivers. He has served on a corporate Board of Directors and several corporate Technical Advisory Boards, and
his is the Editor-in-Chief of the IEEE Transactions on Circuits and Systems II:
Analog and Digital Signal Processing.
33
mW CMOS fractional-N synthesizer using digital compensation for 2.5-Mb/s GFSK modulation," IEEE Journal of Solid-State Circuits, vol. 32, no. 12, pp. 20482059, Dec. 1997.
30. S. Willingham, M. Perrott, B. Setterberg, A. Grzegorek,
B. McFarland, "An integrated 2.5GHz LA frequency
synthesizer with 5 ns settling and 2Mb/s closed loop
modulation," Digest of Technical Papers, IEEE International Solid-State Circuits Conference, vol. 43, pp. 200201, Feb. 2000.
I. INTRODUCTION
Prior to the advent of fully monolithic designs, clock recovery
was traditionally performed with some variant of the circuit in Fig.
1. The clock frequency component was typically extracted from
Variable Delay Block 4
nput NRZ Data
Retiming Latch
D Q
Retimed Data
/\
d
dt
X2
pulse
conditioning
BPF
or
PLL
Recovered Clock
frequency
extraction
D Q
X
A
Y
jt
t D Q
2
divide by 20
V
Y
D Q Retimed Data
the data stream using some combination of differentiation, rectification and filtering. The bandpass frequency filtering was provided by LC tank, surface acoustic wave (SAW) filter, dielectric
resonator or PLL. Because the clock recovery path was separate
from the data retiming path, it was difficult to maintain optimum
sampling phase alignment over process, temperature, data-rate,
and voltage variations. Even the PLL techniques had the drawback
of using phase detectors with different set-up times than the retiming flip-flop so that the recovered clock was not intrinsically
aligned to the optimum sampling point in the data eye. Circuits
utilizing SAW resonator filtering typically required hand matching
of SAW and circuit temperature coefficients along with custom cut
vco
loop
filter
34
0.8
training sequence
16 data
data
means: 16 data
means: 16 data
means: Training Sequence
means: Control Word
0.6
:
:
:
:
[38](8x)^j|
[SOpx) 7 :
0.4
g - { 3 5 J ;
T> 2 9 1
:
[361 (4x)
#
J30J(4x)
'
"
[ 1 9 l ( 1 O x >
f13K10x>
*[27]
0.2
0.1
0.08
0.06
0.04
*
[25J
:
fo-n
^
i (15](2X)-" / ? ^ ( 2 X ) H 3 3 ]
i
: 1261(8*) B . i [22] !- Q . . .
l J
j .PI]
j ["]
: [32]
lij*i"
Trck] * H 7 3 i
:
I o
0.02
Master Transition
Linear PLL
BBPLL
1121.
0.01
1988
1990
1992
1994
1996
1998
2000
2002
year of publication
Fig. 4. CDR PLL designs over time. The ratio of link speed to
effective process transit frequency is plotted vs year of publication.
Multi-phase BB PLLs predominate as data rate approaches the process transit frequency limit. (The number of retiming phases used
in each design is given in parentheses.)
The loop drives the falling edge of the VCO into alignment
with the data transitions based on the binary-quantized phase
error. Because the clock-to-Q delay of the retiming flip-flop is
monolithically matched with the phase detector flip-flop, the PLL
aligns the recovered clock precisely in the middle of the data eye
with nofirst-ordertiming skew over process and temperature variations. Because the narrowest pulse is the output of a flip-flop,
such detectors operate at the full speed at which a process is capable of building a functioning flip-flop. This ensures that the phase
detector will not be the limiting factor in building the fastest possible retiming circuit.
VCO
'update
35
ev
w
i
s
I
Q
Kv
fvco-fnom
fin = fnom V
1
s
tn
ra(
bb ~ ^(fbb^fnom)
ee{l}
+ z
fbb
%(tn+l)
Wn)
r,Qbt
(2)
0)
4%(fbb/fnom).
/= C(fbb) +
+ Wn)-
(l-C)(-fbb).
c = (l+
df
fvGO
249O.0
/-/
|
2484.C
200.0
nJ bb'
/,
/..-/
Fin
out of lock
In lock
out of lock
V:
e8
-200.0
' + 1
*n + l/(fnom
Jb^
In a ^ ^
CDR
> fbb
is
5.0
^ate = 1 /fnom
10.0
15.0
time (jiseconds)
is sufficiently accurate
36
slightly too high and slightly too low creates a bounded hunting
jitter (Jpp).
VCO
v$
Xs
>
mod
Integral branch
Fig- 8. demon-
tered on the average incoming data frequency. If certain assumptions are met, as described later, we can consider the system to be
composed of two non-interacting loops. These are the loops
labeled "bang-bang branch" and "integral branch" If the center
frequency control loop is slow enough, the resulting loop behavior
will be very similar to a simple first-order loop, but with an
extended frequency lock range.
vco
fin
2486.CH
Mb
0.0
v.
-200.0
ee
0.0
-100.0
5.0
A. Stability Factor
6.0
7.0
time (^seconds)
8.0
The loop phase change in one update time due to the proportional connection is AQbb = (5 VK tupdate.
. The
qt __
^proportional
__
^^integral
2pT
'update
Jitter generation, lock range, and jitter tolerance are all inconveniently controlled by one parameter, fbb.
8A+i> = W + ^
2" 1
+^ + ^ J
(4)
(3)
From this, it can be seen that the second-order loop has two
degrees of freedom, the loop phase step Qbb (or equivalently, the
To extend the loop tracking range independent of the jitter generation, an extra integrator is added between the phase detector
and the VCO as in Fig. 9. Since the first-order loop dynamic produces a phase detector duty cycle proportional to the loop frequency error, this added integrator can be viewed as an automatic
means for keeping the first-order portion of the loop properly cen-
37
ited to ^fbb,
2480 0 1
400.0
!_
jpTH
tT
Si
i" H
, v rElh
II
-!!^z)J7L^
?
,
,
*
T
,
*"i^
!
_
fi
| ? ! I j ye r^*\
4.0
KV =
.- LI* ..-' J
0.0
2.0
W 6
tint
f
i i
^^
I I ! ?
5.0
6.0
time (^seconds)
7.0
\K '' ''
Af
A6i A92 0 e
tp
V^
Fig. 10. Two equivalent second-order bang-bang loop block diagrams.
The proportional phase-control signal flow is highlighted with a
dashed line, and the integral frequency-control loop with a solid line.
Fig. 12 is a simulation in which the input frequency step is bigger than fbb,
2.0
00
i
I
.
I
.
I
.
I
.
I
.
I
.
I
.
I
.
I
.
I
i
I
.
I
.
I
1
I
IHfMffllfillil
i
4.0
5.0
6.0
time (^seconds)
7.0
bb
S /
with n = t/tupdate>
fint
is
38
200.0
5OO
400
I\ -S
^infinite
300
200
-200.0
100.0
g2d00
100
CO
CD
-10O
-200
-200
-300
-400
'\*z
-500
0
100
A91
AG2
9
0.0.
-100.0
2.0 I
S*20.
ot.tr
v
A0 2 V6-
0.0
200
300
400
500
600
700
time / typdafe
-2.04.0
1
5.0
6.0
time (jiseconds)
7.0
factor ^ as a parameter.
load so that a loop can be easily designed to never slew for signals
meeting a typicalfrequency-domainjitter tolerance specification.
IV. S L O P E O V E R L O A D
A. Delta-Sigma Analogy
e i.
%.
0.0'
5
-50.0
2.0
SB
4(t)'
-AegS
-100.0
50.0
TJ
%.
s**(t)
-A92
fin
CE
fbb
[L
2fbb
0.0
%
-2.0
4.0
6.0
5.0
7.0
tn
first order AE on Af
time (n seconds)
Fig. 14. Second-order loop response to sinusoidal input jitter.
tors through the last summing node prior to the quantizer. The
update time interval is set to 1. The definition for bang-bang frequency
step
f^
= $KVV,
and
stability
factor
Asin(2ntft/tupda(e)/(2nft/tupdate)
noise characteris-
39
O7
H(z)
X(z)
Y(z)
Q(z)
^-TrmXU)+rniu)Q^
c
O)
CD
s+s 2^^/(
2 "\
')
// 2
-V-(( i)
(integration)
(s)
freq
freq
6-0.1;
1G
fctQ.
10M
100k
0.1
in
10H
1m
100|A
10m
0.1
10
jitter frequency * t u p d a te
|AF| >
f^-
Assuming no slew rate limiting, we can use the results from the
AZ analysis to justify replacing the loop quantizer with a unity
gain element. The maximum input phase jitter in UI as a function
(s) , normalized to 8 ^
erance. The tolerance plots are single-pole slope for high ^ and
high jitter frequency, becoming double-pole at lower frequencies
and small i;. At high frequencies, all of the curves become
asymptotic to the single-pole tolerance of a first-order bang-bang
PLL. The operating region below each of these curves is where the
AE approximation is valid, and where a linear loop analysis is
justified.
for which
+f
b b
source
phase
noise
) . Using standard
AF =
F\
Setting A F = fbb,
and tUpciate
P + JT
Kv
S
output
Fig. 19. Loop redrawn replacing phase detector with unity gain
element and additive quantization noise.
1 ^
L
i
m.
1+ ffbbV
BB phase noise
of form: Asin(x)/x
U* JU+/J
fbb
W.999
10
|AF| = fbb
points shown
are from numerical
simulation
"e-iod
1k
of frequency, O
S=3
to get the
40
V. JITTER GENERATION
100k
J(s)=
-80 1
" 90
1777(7) ...
JL
IV^^
TTJHT)
.--{
!
.
0.1
approximated
J
I
10k
_J
100k
s***^..:
I
1M
!
10M
"
100M
1G
with parametersfrom[11].
generally taken to be the spectrum of the clock driving the data
source or BERT, or in the case of a clock multiplying circuit, the
spectrum of the reference clock corrected by 20 times the log of
the loop frequency multiplication ratio.
max
W -
RMS
RMS
=
atan
jitter
in
(^M^)
/TC
unit
intervals
is
'
1k
10k
100k
1M
by
+ J
three
regions
of
+ J
InRe
the out
ls a
PP r o x i m a t e l Y
ion J
operation:
ec ual t 0
result says that loops with large ^ have output jitter which grows
as the square root of the input jitter. Contrast this with a linear PLL
which simply low-pass filters the input jitter and thus has an output jitter which grows linearly with the input jitter.
An approximate analysis of loop jitter can shed light on this curious square-root dependence of output jitter on input jitter. Assume
0
The
^walk
RMS = J
'
100
JUn * 2 a . / ( 1 + 7 | )
*
10
-140 I
1k
*
1
^^r
0 , the loop phase step size. The total loop output jitter can be
"O -120
-130
I120 11'" 1""_" 1 ? ii^jii^? 1 *^:!: 111111 11 iTTr^r^ =*> r>n&se noiso_35^
iS-110 ....A^.:...^w?!...i.i^[.;;
."--.
1
.
,
.
} computed phase noise
.1
1...
0-1
-80 | ^
.
M
~E -90 - ^ y g ^ ^ i ^ - - - -{
75 - 1 0 0 ...^'yiWHUmiWiHL^.,. 1
1 +**^%&rmtrMiu i
-100 ^fff^Z.A
.12o L
1
then
It should be noted that the linearized loop model is only suitable for computation of the jitter spectrum but not for computing
the actual sampling point phase error or other time-domain transient response. The linearized response only covers the dynamics
of the outerfrequencytracking loop, but does not capture the extra
a zero-mean input jitter distribution with a sigma G -. Using a linearized approximation to the standard probability distribution
function, the probability of getting an "early" phase error indication for small loop phase deviation A 6 ,
41
is approximately
1 A9
The expected phase change in the loop after one update time is
((i-")~"')-^e
The discrete time equation for the average evolution of loop phase
under the condition of a small input phase error can then be
expressed as
Retimed Data
26 ^
50% duty-cycle
clock from VCO
tune
pump
Transition Samples
DOWN
Input
Data
UP
'pump
/eTToJ^n
42
State
UP
DOWN
Meaning
hold
early
hold
3
4
0
1
1
1
1
0
late
late
5
6
7
1
1
0
1
0
1
0
1
1
hold
early
hold
The states 2 and 5 in Table 1 correspond to the normally impossible condition of sampling a " 1 " midway between two "0" bits. A
custom truth table can use these states to detect either a high biterror-rate condition [10], a VCO running grossly too slowly (eg:
lump these states into the "late" condition), or taken as an indication that a link has locked onto its own VCO crosstalk, perhaps by
amplification of power supply noise by pick up from a high-gain
optical transimpedance amplifier [11].
when Aj > AQ , for this implies exponential growth of the acquisition transient. The convergence is guaranteed
whenever
$>2X.
Although usable for tightly constrained block codes such as 8b/
10B, binary phase detectors are essentially unusable for codes
such as 10Gb Ethernet 64b/66b or SONET which can have very
long runlengths of up to 66 or 80 bits, respectively.
B. Ternary Phase-Detector
The 3-state, or ternary phase detector provides superior jitter
performance for data with long runs [12]. Ternary PDs neither
charge nor discharge the loop filter during long runs causing the
loop to hold the current estimate of the data frequency. Such loops
effectively "stop time" during long runs.
The peak idling jitter for ternary loops is unchanged from the
simple 100% transition density analysis. The RMS jitter will be
reduced by the average transition density. Because the loop phase
cannot change during hold mode, the jitter tolerance will be derated by the average transition density. This can easily be taken into
account by increasing 8 ^ appropriately for the characteristics of
the code to be used.
S i n c e t h e stabilitv factor is
update = l/fnominversely
dependent on update time, it is possible for binary PDs to become
unstable with data patterns containing very long runs due to the
delay in timely phase-error feedback.
Slope(t=0) = S
X% - SX
*i
Fig. 23. Setup for computing onset of loop instability with latency X.
d/dt
= S. From this we can compute an overshoot AQ.
When the loop phase again crosses zero phase error, the phase
detector is late in responding by a time X. This time is a combination of runlength, latency in the phase detector logic, and highorder poles in the VCO tuning characteristic.
Bang-bang CDR circuits have the unique advantages of inherent sampling phase alignment, adaptability to multi-phase sam-
X /t,-SX
before the "braking" effect of the proportional
branch starts to act. The onset of catastrophic instability occurs
43
[11]
R. C. Walker, C. Stout and C. Yen, "A 2.488Gb/s SiBipolar Clock and Data Recovery IC with Robust Loss of
Signal Detection," in ISSCC Digest of Technical Papers
pp. 246-247,466, Feb. 1997.
[12]
[13]
[14]
[15]
[16]
[17]
[18]
ACKNOWLEDGMENT
The author is grateful to the contributions of Birdy Amrutur,
Bill Brown, John Corcoran, Craig Corsetto, Dave DiPietro, Brian
Donoghue, Jeff Galloway, Andrew Grzegorek, Tom Hornak, Jim
Homer, Tom Knotts, Benny Lai, Adolf Leiter, Bill McFarland,
Charles Moore, Rasmus Nordby, Cheryl Owen, Pat Petruno, Kent
Springer, Guenter Steinbach, Hugh Wallace, Bin Wu, J.T. Wu, and
Chu Yen for technical discussions and helpful insights into bangbang loop behavior.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[19]
R. Gu, J. M. Tran, H. Lin, A. Yee and M. Izzard, "A 0.53.5Gb/s Low-Power Low-Jitter Serial Data CMOS Transceiver," in ISSCC Digest of Technical Papers, pp. 352353,478, Feb. 1999.
[7]
[20]
[8]
M. W. Hauser, "Principles of Oversampling AID Conversion," J. Audio Eng. So. vol 39, no. 1/2, pp 3-26, Jan./
Feb. 1991.
[21]
T. He, and P. Gray, "A Monolithic 480 Mb/s AGC/Decision/Clock Recovery Circuit in 1.2 urn CMOS," IEEE
Journal of Solid State Circuits, vol. 28, no. 12, pp. 13141320, Dec. 1993.
[22]
P. Larsson, "A 2-1600MHz 1.2-2.5V CMOS ClockRecovery PLL with Feedback Phase-Selection and Averaging Phase-Interpolation for Jitter Reduction," in ISSCC
Digest of Technical Papers, pp. 356-357, Feb. 1999.
[9]
[10]
44
[23]
[26]
[30]
[31]
[32]
[33]
[34]
[35]
Z. Wang, M. Berroth, J. Seibel, P. Hofinann, A. Hulsmann, Kohler, B. Raynor and J. Schneider, "19GHz
Monolithic Integrated Clock Recovery Using PLL and
0.3um Gate-Length Quantum-Well HEMTs," in ISSCC
Digest of Technical Papers, pp. 118-119, Feb. 1994,
[37]
[38]
[27]
[29]
[36]
45
(1)
OSC
/ref
PFD
/in
CP
LF
VCO
/fbp
/out
-f-N
Fig. 1. The block diagram of a frequency synthesizer.
B. Direct Simulation
In many circumstances, SpectreRF* can be directly applied to
predict the noise performance of a PLL. To make this possible, the PLL must at a minimum have a periodic steady state
solution. This rules out systems such as bang-bang clock and
data recovery circuits and fractional-Af synthesizers because
they behave in a chaotic way by design. It also rules out any
PLL that is implemented with a phase detector that has a dead
zone. A dead zone has the effect of opening the loop and letting the phase drift seemingly at random when the phase of
the reference and the output of the voltage-controlled oscillator (VCO) are close. This gives these PLLs a chaotic nature.
To perform a noise analysis, SpectreRF must first compute
the steady-state solution of the circuit with its periodic steady
state (PSS) analysis. If the PLL does not have a periodic solution, as the cases described above do not, then it will not converge. There is an easy test that can be run to determine if a
circuit has a periodic steady-state solution. Simply perform a
transient analysis until the PLL approaches steady state and
t Spectre is a registered trademark of Cadence Design Systems.
46
For PLLs that are candidates for direct simulation using SpectreRF, simply configure the simulator to perform a PSS analysis followed by a periodic noise (PNoise) analysis. The period
of the PSS analysis should be set to be the same as the reference frequency as defined in Figure 1. The PSS stabilization
time (tstab) should be set long enough to allow the PLL to
reach lock. This process was successfully followed on a frequency synthesizer with a divide ratio of 40 that contained
2500 transistors, though it required several hours for the complete simulation [4].
C. When Direct Simulation Fails
The challenge still remains, how does one predict the phase
noise and jitter of PLLs that do not fit the constraints that
enable direct simulation? The remainder of this paper
attempts to answer that question for frequency synthesizers,
though the techniques presented are general and can be
applied to other types of PLLs by anyone who is sufficiently
determined.
D. Monte Carlo-Based Methods
Demir proposed an approach for simulating PLLs whereby a
PLL is described using behavioral models simulated at a high
level [5, 6]. The models are written such that they include jitter in an efficient way. He also devised a simulation algorithm
based on solving a set of nonlinear stochastic differential
equations that is capable of characterizing the circuit-level
noise behavior of blocks that make up a PLL [6, 7]. Finally,
he gave formulas that can be used to convert the results of the
noise simulations on the individual blocks into values for the
47
The process of predicting the jitter of a PLL with voltagedomain models involves:
1. Using SpectreRF to predict the noise of the individual
blocks that make up the PLL.
2. Converting the noise of the block to jitter.
3. Building high-level behavioral models of each of the
blocks that exhibit jitter.
4. Assembling the blocks into a model of the PLL.
5. Simulating the PLL to find the jitter of the overall system.
osc
+^
FD/u
PFD
I
'ft
LF
CP
*det
271
//(CO)
VCO
^out
/CO
DN
1
N
48
't-fhn , t.r - H
4 i w ' ;tura^r
T>0
T<0
=
1%
~^^\
"*o)
(4)
The mapping from voltage to frequency is designed to be linear, so a first-order model is often sufficient,
/ o u t = ^vco v c-
(5)
(6)
(7)
A. Small-Signal Stability
This completes the derivation of the phase-domain models for
each of the blocks. Now the full model is used to help predict
the small-signal behavior of the PLL. Start by using Figure 2
to write a relationship for its loop gain. Start by defining
G fwd = I2H1 = - J 2 f f ( ( D ) _ l E 2 =
<3)
dct
vc
(8)
where AT
Kdetdet = / m a x . Of course, this is only valid for to be the forward gain,
ost
^% a t t n e mmost.
- The behavior outside this range
l^l"" ^2! < 2fl
<I>4V
1
depends strongly on the type of phase detector used [3]. Even
G = =~
(9)
rCV
<>ou
*
^
within this range, the phase detector may be better modeled
with a nonlinear transfer characteristic. For example, there to be the feedback factor, and
can be a flat spot in the transfer characteristics near 0 if the
detector has a dead zone. However it is generally not producT ~ GfwdGrev
^
(10)
tive to model the dead zone in a phase-domain model.'
to be the loop gain. The loop gain is used to explore the smallsignal stability of the loop. In particular, the phase margin is
an important stability metric. It is the negative of the differt This phase-domain model is a continuous-time model that ignores ence between the phase shift of the loop at unity gain and
the sampling nature of the PFD. A dead zone interacts with the sam- 180, the phase shift that makes the loop unstable. It should
pling nature of the PFD to create a chaotic limit cycle behavior that
is not modeled with the phase-domain model. This chaotic behavior be no less than 45 [14]. When concerned about phase noise
creates a substantial amount of jitter, and for this reason, most mod- or jitter, the phase margin is typically 60 or more to reduce
ern phase detectors are designed such that they do not exhibit dead peaking in the closed-loop gain, which results in excess phase
zones.
noise.
49
in"
PFD/CP
(g
LF
*det
//(co)
271
'det
^fdm
VCO
2TC
oui
^vco
j(O
TVCO
FD*
^fdn
- *out _
ref =
^"f
*
vco - i
NG
fwd _
i ^
_
tfa
Ml-T
'
MN-G^'
and by inspection,
Gfdn = j = * - "Graf
G fdm - j 2 * - ~GKf,
Tfdm
- *out _
^det - "j
n n
(11)
; f - TTTa
C = isH! = S*a* =
*
fwd
^-Gfwd'
'
27CG
ref
7?'
,v
Alternatively, one could build a Verilog-A model and use simulation to determine the result. The top-level of such a model
A
'det
det
is shown in Listing 1. It employs noisy phase-domain models
On this last transfer function, we have simply referred *det to
for each of the blocks. These models are given in Listings 3-7
the input by dividing through by the gain of the phase detecand are described in detail in the next few sections (HI-VI). In
tor.
this example, the noise sources are coded into the models, but
These transfer functions allow certain overall characteristics the noise parameters are not set at the top level to simplify the
of phase noise in PLLs to be identified. As co->o, model. To predict the phase noise performance of the loop in
Gfwd - 0 because of the VCO and the low-pass filter, and so lock, simply specify these parameters in Listing 1 and perG
form a noise analysis. To determine the effect of injected
ref> Gdet> Gfdm> Gfdn> G i n ~> a n d G vco ~> l A t h i S h f r e "
quencies, the noise of the PLL is that of the VCO. Clearly this noise, first refer the noise to the output of one of the blocks,
must be so because the low-pass LF blocks any feedback at and then add a source into the netlist of Listing 1 at the approhigh frequencies.
priate place and perform an AC analysis.
r
n
u
50
include "discipline.h"
module pll(out);
output out;
phase out;
parameter integer m = 1 from [1 :inf);
parameter real Kdet = 1 from (O:inf);
parameter real Kvco = 1 from (0:inf);
parameter real d = 1 n from (O:inf);
parameter real c2 = 200p from (0:inf);
parameter real r = 10K from (0:inf);
parameter integer n = 1 from [1 :inf);
phase in, ret, fb;
electrical c;
oscillator OSC(in);
divider #(.ratio(m)) FDm(in, ref);
phaseDetector #(.gain(Kdet)) PD(ref, fb, c);
loopFilter#(.c1(c1), .c2(c2), .r(r)) LF(c);
vco #(.gain(Kvco)) VCO(c, out);
divider #(.ratio(n)) FDn(out, fb);
endmodule
(17)
OSCILLATORS
51
a unit step s(t). The phase shift over time for an arbitrary input
disturbance u is
oo
(18)
oo
\\^3:
lHz
f\ **
/o
SJ
> - ^f.
SJL6f)~\+f.
\^
^v
(20)
Vkei2kf'.
(,) =
(23)
440 =I
n
2 2
2
A,2,
2w 7C + A /
(24)
<25>
Use (21) to extract/ c . Then use both (21) and (26) to determine n by choosing JSf well above the flicker noise corner frequency,/^ and the corner frequency of (25),/ c o r n e p to avoid
ambiguity and well b e l o w / 0 to avoid the noise from other
sources that occur at these frequencies.
t Demir uses c rather than , where n = c/02.
52
though they should not contain any white space, wpn was
chosen to represent white phase noise and/p/i stands for
flicker phase noise.
The phase-domain models for the reference and voltage-controlled oscillators are given in Listings 3 and 4. The VCO
model is based on (6). Perhaps the only thing that needs to be
explained is the way that phase noise is modeled in the oscillators. Verilog-AMS provides the flicker jioise function for
modeling flicker noise, which has a power spectral density
proportional to l / / a with a typically being close to 1. However, Verilog-AMS does not limit a to being close to one,
making this function well suited to modeling oscillator phase
noise, for which a is 2 in the white-phase noise region and
close to 3 in the flicker-phase noise region (at frequencies
below the flicker noise corner frequency). Alternatively, one
could dispense with the noise parameters and use the
noise jtable function in lieu of the flicker jfioise functions to
use the measured noise results directly. The "wpn" and "fpn"
When interested in the effect of signals coupled into the oscillator through the supplies or the substrate, one would compute the transfer function from the interfering source to the
phase output of the oscillator using either a PAC or PXF analysis. Again, one would simply assume that the perturbation in
the output of the oscillator is completely in the phase, which
is true except at very high offset frequencies. One then
employs (12) and (13) to predict the response at the output of
the PLL.
IV.
Even in the phase-domain model for the PLL, the loop filter
remains in the voltage domain and is represented with a full
circuit-level model, as shown in Listing 5. As such, the noise
behavior of the filter is naturally included in the phasedomain model without any special effort assuming that the
noise is properly included in the resistor model.
Listing 5 Loopfiltermodel.
include "discipline.h"
module loopFilter(n);
electrical n;
ground gnd;*
parameter real d = 1n from (0:inf);
parameter real c2 = 200p from (0:inf);
parameter real r = 10K from (O:inf);
electrical int;
analog begin
Theta(out) <+ flicker__noise(n, 2, "wpn")
+ flicker_noise(n*fc, 3, "fpn");
end
endmodule
LOOP FILTER
endmodule
t The ground statement is not currently supported in Cadence's Verilog-A implementation, so instead ground is explicitly passed into
the module.
V. PHASE DETECTOR AND CHARGE PUMP
53
A
SQ
A.
A Cyclostationary Noise.
Flicker sources dominate
P 0]
Formally,
the term cyclostationary implies that the autocorrea
lation
function
of a stochastic process varies with / in a peri^V^TJI
White sources dominate
* tJ
odic fashion [19, 20], which in practice is associated with a
periodic variation in the noise power of a signal. In general,
T"
TTS.
pei
/c
\.
the noise produced by all of the nonlinear blocks in a PLL is
strongly
cyclostationary. To understand why, consider the
1
N*^
str
noise produced by a logic circuit, such as the inverter shown
Noise bandwidth
noi
i n Figure 8. The noise at the output of the inverter, n out ,
Fig. 7. Extracting the noise parameters, n and / c , for the PFD/CP. in
comes
from different sources depending on the phase of the
The graph is plotted on a log-log scale.
coi
output signal, v out . When the output is high, the output is
ou
use the noise jtable function in lieu of the white_noise and jinsensitive
ng
to small changes on the input. The transistor A/p is
flickerjnoise
functions to use the measured noise results on
Q n and the noise at the output is predominantly due to the
directly.
the
thermal
noise from its channel. This is region A in the figure.
Wl
When the output is low, the situation is reversed and most of
Listing 6 Phase-domain phase detector noise model.
the output noise is due to the thermal noise from the channel
of
o f A/N. This is region B. When the output is transitioning,
'include "discipline.h"
thermal
noise from both Afp and M N contribute to the output.
* include "constants.h"
the
In addition, the output is sensitive to small changes in the
module phaseDetector(pin, nin, out);
input.
In fact, any noise at the input is amplified before reachm
input pin, nin; output out;
*
ing
the output. Thus, noise from the input tends to dominate
in
phase pin, nin;
*
over
the thermal noise from the channels of M P and M N in
ov
electrical out;
this
region.
Noise at the input includes noise from the previparameter real gain = 1 from (O:inf);
thi
ous
stage
and
noise from both devices in the form of flicker
// transfer gain (A/cycle)
ou
noise
and
thermal
noise from gate resistance. This is region C
parameter real n = 0 from [O:inf);
no
in
// white output current noise (A2/Hz)
[n the figure.
parameter real fc = 0 from [Orinf);
// flicker noise corner frequency (Hz)
Hr
analog begin
l(out) <+ gain * Theta(pin,nin) / (2**M_PI);
l(out) <+ white_noise(n, "wpn")
+ flicker_noise(n*fc, 1, "fpn");
end
endmodule
in
^p
out
outh
p^
Fig. 8. Noise produced by an inverter (nout) as a function of the
oui
output signal (vout). In region A the noise is dominated by the thermal
n0]
n< in region C the output noise includes the thermal noise from both
|jand
JThe
J challenge in estimating the effect of noise passing
m]
through
a threshold is the difficulty in estimating the noise at
tfa
the point where the threshold is crossed. There are several dif-
ferent ways of estimating the effect of this noise, but the simplest is to use the strobed noise feature of SpectreRF. * When
the strobed noise feature is active, the noise produced by the
54
circuit is periodically sampled to create a discrete-time random sequence, as shown in Figure 9. SpectreRF then computes the power-spectral density of the sequence. The sample
time would be adjusted to coincide with the desired threshold
crossings. Since the T-periodic cyclostationary noise process
is sampled every T seconds, the resulting noise process is stationary. Furthermore, the noise present at times other than at
the sample points is completely ignored.
i Pi Pi P.
\J \J u
V (t)
S^f) = [2nfo/^Pjsn{f).
Vn{t) = 2itf/
v(t+m
5A
r\
(28)
M L D | p . (30)
lntn
,dv(iT)
A
/ \
UT
and
. ,
Fig. 10. Extracting the noise parameters, n and/ c , for the divider.
(29)
at
Noise bandwidth \ /
Assume the phase noise is small and linearize v using a Taylor series expansion
ni . sv(jT) + MiI)gZ)_ v ( j T ) =
'
at 2nf0n
(32)
where Sn(f) and S^(f) are the power spectral densities of the
ni and <(), sequences.
n. r
(31)
Vut = 4 ^
(33)
*=o *
arem e
where S^ and 7Q
phase noise and signal period at the
input to the first stage of the ripple counter.
With undesired variations in the supplies or in the substrate
the resulting phase noise in each stage would be correlated, so
one would need to compute the transfer function from the sig-
55
osc
analog begin
Theta(out) <+ Theta(in) / ratio;
Theta(out) <+ white_noise(n, "wpn")
+ flicker_noise(n*fc, 1, "fpn");
end
endmodule
/ref
CP
PFD
LF
VCO
/out
/J
FD
+N, N+l
Mod
FRACTIONAL-N SYNTHESIS
56
The signals at the input and output of a PLL are often binary
signals, as are many of the signals within the PLL. The noise
on binary signals is commonly characterized in terms of jitter.
Jitter is an undesired perturbation or uncertainty in the timing
of events. Generally, the events of interest are the transitions
in a signal. One models jitter in a signal by starting with a
noise-free signal v and displacing time with a stochastic process./. The noisy signal becomes
vn(0 = v(r+y(0)
(34)
between transitions. The next metric characterizes the correlations between transitions as a function of how far the transitions are separated in time.
(38)
1
(35)
vn(0 = v(, + f | ) .
(36)
where/ o = 1/Fand
A. Jitter Metrics
Define {^} as the sequence of times for positive-going zero
crossings, henceforth referred to as transitions, that occur in
vn. The various jitter metrics characterize the statistics of this
sequence.
The simplest metric is the edge-to-edge jitter, 7 ee , which is
the variation in the delay between a triggering event and a
response event. When measuring edge-to-edge jitter, a clean
jitter-free input is assumed, and so the edge-to-edge jitter 7 ec
is
edge-to-edge jitter
~ \
jeeu) = 7^8^j
fc-cycle jitter
(37)
Edge-to-edge jitter assumes an input signal, and so is only
defined for driven systems. It is an input-referred jitter metric,
meaning that the jitter measurement is referenced to a point
on a noise-free input signal, so the reference point is fixed. No
such signal exists in autonomous systems. The remaining jitter metrics are suitable for both driven and autonomous systems. They gain this generality by being self-referred,
meaning that the reference point is on the noisy signal for
which the jitter is being measured. These metrics tend to be a
bit more complicated because the reference point is noisy,
which acts to increase the measured jitter.
<39>
K_c
"*" p '
"~i
|I
H*
HI
cycle-to-cycle jitter
||
|i
k cycles
p
r~\
^\
~ti+k
H * ~H
'= jiTi+l-Ti) J l J l J U L f
Fig. 12. The various jitter metrics.
B. Types of Jitter
The type of jitter produced in PLLs can be classified as being
from one of two canonical forms. Blocks such as the PFD,
CP, and FD are driven, meaning that a transition at their output is a direct result of a transition at their input. The jitter
t Some people distinguish betweenfc-cyclejitter and long-term jitter
by defining the long-term jitter J^ as being thefc-cyclejitter Jk as
k ->.
57
Jitter Type
Circuit Type
synchronous
driven
(pFD/cp ^
. .
accumulating |
autonomous
( Q S Q yep)
Period Jitter
,
/var( ( r ) )
J = ^ / < f t
|
W) = ^
+ k)]
ft-'i>'
- [iT+j^)])
< 43)
,(44)
(45)
Jk(i) = V27ee(0 .
(46)
Since 7Sync(0 is jT-cyclostationary ysync =; syn c('/) is independent of i, and so is 7 ee and Jk. The factor of 72 in (46) stems
from the length of an interval including the independent variation from two transitions. From (46), Jk is independent of ,
and so
Jk = J for k = 1,2, ...m.
(47)
Using similar arguments, one can show that with simple synchronous jitter,
Jcc = J,
(48)
v n (0 = Ht+jsync(O)
k = Vvar<'i +
,
7 = ToT
var0
))s
(49)
exhibits synchronous jitter. If t| is further restricted to be a
white Gaussian stationary or T-cyclostationary process, then
vn(0 exhibits simple synchronous jitter. The essential charac- where tc is the time of a threshold crossing in v (assuming the
teristic of simple synchronous jitter is that the jitter in each noise is small).
event is independent or uncorrelated from the others, and (35)
shows that it corresponds to white phase noise. Driven cir'c
Av
cuits exhibit simple synchronous jitter if they are broadband
and if the noise sources are white, Gaussian and small. The
Noise
sources are considered small if the circuit responds linearly to
Threshold
Histogram
the noise, even though at the same time the circuit may be
responding nonlinearly to the periodic drive signal.
-^
^7^
Jf>=
Similarly, from (38),
J<Jsynci-
At
(42)
Jitter Histogram
58
Generally nv is not stationary, but cyclostationary (refer back With ripple counters, one usually only characterizes one stage
to Section VI-A). It is only important to know when the noisy at a time. The total jitter due to noise in the ripple counter is
periodic signal vn(t) crosses the threshold, so the statistics of then computed by assuming that the jitter in each stage is
nv are only significant at the time when vn(t) crosses the independent (again, this is true for device noise, but not for
noise coupling into the divider from external sources) and
threshold,
taking
the square-root of the sum of the square of the jitter on
var(n ( O )
each stage.
<50)
'
Unlike in ripple counters, jitter does not accumulate with synchronous counters. Jitter in a synchronous counter is independent of the number of stages and consists only of the jitter of
its clock along with the jitter of the last stage.
2) Extracting the Jitter of the Phase Detector: The PFD/CP is
not followed by a threshold. Rather, it feeds into the LF,
which is sensitive to the noise emitted by the CP at all times,
not just during transitions. This argues that the noise of the
PFD/CP be modeled as a continuous noise current. However,
as mentioned earlier, doing so is problematic for simulators
and would require very tight tolerances and small time steps.
So instead, the noise of the PFD/CP is referred back to its
inputs. The inputs of the PFD/CP are edge triggered, so the
noise can be referred back as jitter.
59
transitions. In the PLL, the OSC and VCO exhibit accumulating jitter. Accumulating jitter is characterized by an undesired
variation in the time since the previous output transition, thus
the uncertainty of when a transition occurs accumulates with
every transition. Compared with a jitter free signal, the frequency of a signal exhibiting accumulating jitter fluctuates
randomly, and the phase drifts without bound. Thus, the jitter
appears as a modulation of the frequency of the output, which
is why it is sometimes referred to as frequency modulated or
FM jitter.
Again assume that T| be a stationary or T-cyclostationary process, then
W ) = f neorfc
J
(55)
Similarly,
Jcc = 727.
(59)
Generally, the jitter produced by the OSC and VCO are well
approximated by simple accumulating jitter if one can neglect
flicker noise.
A. Extracting Accumulating Jitter
The jitter in autonomous blocks, such as the OSC or VCO, is
almost completely due to oscillator phase noise. Oscillator
phase noise is a variation in the phase of the oscillator as it
proceeds along its limit cycle.
In order to determine the period jitter / of vn(f) for a noisy
oscillator, assume that it exhibits simple accumulating jitter
so that T| in (55) is a white Gaussian r-cyclostationary noise
process (this excludes flicker noise) with a power spectral
density of
v n (0 = v(/+y a c c (O)
(56)
exhibits accumulating jitter. While Tj is cyclostationary and so
S^(f)= a,
has bounded variance, (55) shows that the variance of y acc ,
and hence the phase difference between v(t) and v n (0, is and an autocorrelation function of
unbounded.
Rr](tvt2) = ab{tx-t2)y
If t| is further restricted to be a white Gaussian stationary or
T-cyclostationary random process, then v n (0 exhibits simple where 8 is a Kronecker delta function. Then
(60)
(61)
60
(71)
is
(72)
From (26)
UAf) = fa (A/) = ^ - ,
(73)
logC/*)
from VCO ^
Synchronous jitter from
PFD/CP, FDs
AJ
log(*)
'DD
Fig. 14. Differential LC oscillator.
The procedure starts by using an RF simulator such as SpectreRF to compute the normalized phase noise L. Its PNoise
analysis is used, with the maxsidebands parameter set to at
least 10 to adequately account for noise folding within the
oscillator.* In this case, = - 1 1 0 dBc at 100 kHz offset from
the carrier. Apply (74) to compute a from L, where ( A/) =
10"11, A/= 100 kHz, and/ 0 = 1.1 GHz,
a = 2 10" 11
1Q
= 165.3X10" 21 .
(75)
Vl.lxlOV
The period jitter J is then computed from (70),
fa
/165.3 x 10~ 21
,nf-,
1O~.
/ - = /
= 12.3 fs.
(76)
^/0
A/ 1.1 GHz
In this example, the noise was extracted for the VCO alone. In
practice, the LF is generally combined with the VCO before
extracting the noise so that the noise of the LF is accounted
for.
/
r*
J = JaT =
61
include "discipline.h"
module divider (out, in);
62
AT isis aa random
random variable
variable with
with variance
variance
AT
include "discipline.h"
module osc (out);
var(AT) = 2 j p = Jl.
(78)
75,Axf. = lz
(79)
Therefore,
JK
f - if1^ fi
analog begin
@ (initiaLstep) begin
seed = 286;
next = 0.5/freq + $abstime;
end
Y,]
ram
(
''-ursj.-
<81>
The final model given in Listing 12. This model can be easily
modified to fit other needs. Converting it to a model that generates sine waves rather than square waves simply requires
replacing the last two lines with one that computes and outputs the sine of the phase. When doing so, consider reducing
the number of jitter updates to one per period, in which case
the factor of 1.414 should be changed to 1.
Listing 13 is a Verilog-A model for a quadrature VCO that
exhibits accumulating jitter. It is an example of how to model
an oscillator with multiple outputs so that the jitter on the outputs is properly correlated.
D. Efficiency of the Models
Vv JU
mod 2rc
"Knit
75
Fig. 16.
jitter.
Ax, " l + t f A V c
Listings 12 and 13, are constructed using three serial operations, as shown in Figure 16. First, the input signal is scaled to
compute the desired output frequency. Then, the frequency is
integrated to compute the output phase. Finally, the phase is
used to generate the desired output signal. The phase is computed with idtmod, a function that provides integration followed by a modulus operation. This serves to keep the phase
bounded, which prevents a loss of numerical precision that
would otherwise occur when the phase became large after a
long period of time. Output transitions are generated when the
phase passes -n/2 and n/2.
Vin
fc
i +
@(timer(next)) begin
n = !n;
dT = jitter*$dist_normal(seed,0,1);
next = next + 0.5/freq + 0.707*dT;
end
0)
~ K\% + ti%) ~
Kx
63
real Vmin=0;
real Vmax=Vmin+1 from (Vmin:inf);
real Fmin=1 from (Orinf);
real Fmax=2*Fmin from (Fmin:inf);
real Vlo=-1, Vhi=1;
real tt=0.01/Fmax from (O:inf);
real jitter=O from [0:0.25/Fmax);// period jitter
real ttol=1u/Fmax from (0:1/Fmax);
real Vmin=0;
real Vmax=Vmin+1 from (Vmin:inf);
real Fmin=1 from (O:inf);
real Fmax=2*Fmin from (Fminrinf);
real Vlo=-1, Vhi=1;
real jitter=O from [0:0.25/Fmax);// period jitter
real ttol=1u/Fmax from (0:1/Fmax);
real tt=0.01/Fmax;
64
include "discipline.h"
analog begin
@(initial_step) begin
accSeed = 286;
syncSeed = -459;
accSD = accJltter*sqrt(ratio/2);
syncSD = syncJitter;
next = 0.5/freq + $abstime;
end
Thus, to merge the divider into the VCO, the VCO gain must
be reduced by a factor of N, the period jitter increased by a
factor of JN , and the divider model removed.
After simulation, it is necessary to refer the computed results,
which are from the output of the divider, to the output of
VCO, which is the true output of the PLL. The period jitter at
the output of the VCO, Jyco* c a n ^ e computed with (82).
To determine the effect of the divider on 5^(0)), square both
sides of (82) and apply (70)
'FD
= JN+VCO-
integer state;
analog begin
@(cross(V(ref), dir, ttol)) begin
If (state > -1) state = state - 1;
end
@(cross(V(vco), dir, ttol)) begin
if (state < 1) state = state + 1;
end
JVcO'
(82)
fl
FDrFD
aa
T
VCO7VCO
(83)
TvcO=TFD/N> and so
(84)
vco
From (72),
s^ VCO-r2
- ^FD f 2
(85)
Jvco
Finally,/vco = W/ FD , and so
^VCO2^
5prj>.
(86)
Once FDN is incorporated into the VCO, the VCO output signal is no longer observable, however the characteristics of the
VCO output are easily derived from (82) and (86), which are
summarized in Table II.
It is interesting to note that while the frequency at the output
of FDN is N times smaller than at the output of the VCO,
except for scaling in the amplitude, the spectrum of the noise
close to the fundamental is to a first degree unaffected by the
presence of FD#. In particular, the width of the noise spec-
65
Include "discipline.h"
module vco (out, in);
Frequency
Jitter
real
real
real
real
real
real
real
real
Vmin=0;
Vmax=Vmin+1 from (Vmin:inf);
Fmin=1 from (0:inf);
Fmax=2*Fmin from (Fminrinf);
ratio=1 from (0:inf);
Vlo=-1, Vhi=1;
tt=O.O1 *ratio/Fmax from (0:inf);
jitter=O from [0:0.25*ratio/Fmax);
//VCO period jitter
parameter real ttol=1u*ratio/Fmaxfrom (O:ratio/Fmax);
parameter real outStart=inf from (1/Fmin:inf);
real freq, phase, dT, delta, prev, Vout;
integer n, seed, fp;
analog begin
@ (initial_step) begin
seed = - 5 6 1 ;
delta = jitter * sqrt(2*ratio);
fp = $fopen("periods.m");
Vout = Vlo;
end
/vco
^/FD
j
V C 0
-'*>
"
^
54
=
^vco
N2S,
VFD
To understand why FD# does not affect the width of the noise
spectrum, recall that while we started with a jitter that varied
continuously with time, j(t) in (34), for either efficiency or
modeling reasons we eventually sampled it to end up with a
discrete-time version. The act of sampling the jitter causes the
spectrum of the jitter to be replicated at the multiples of the
sampling frequency, which adds aliasing. This aliasing is visible, but not obvious, at high frequencies in Figure 18. However, especially with accumulating jitter, the phase noise
amplitude at low frequencies is much larger than the aliased
noise, and so the close-in noise spectrum is largely unaffected
by the sampling. The effect of FD^ is to decimate the sampled
jitter by a factor of TV, which is equivalent to sampling the jitter signal, yCX at the original sample frequency divided by N.
Thus, the replication is at a lower frequency, the amplitude is
lower, and the aliasing is greater, but the spectrum is otherwise unaffected.
Phase Noise
66
XIV.
EXAMPLE
ahdijnclude "osc.va"
//Listing 14
ahdLJnclude "pfd_cp.va" //Listing 15
ahdl_include "vco.va"
//Listing 16
Osc
freq=25MHz ratio=125\
accJitter=20ps syncJitter=2ns
PFD (err in fb) pfd__cp
lout=500ua
C1
(errc)
capacitor c=3.125nF
R
(c 0)
resistor
r=10k
C2
(c 0)
capacitor c=625pF
VCO (fb err)
vco
Fmin=1 GHz Fmax=3GHz \
Vmin=-4 Vmax=4 ratio=10000 \
jitter=6ps outStart=10ms
(in)
JitterSim
Osc&
+ 125
osc
tran
in
stop=60ms
PFD & CP
fb
err
VCO&
+ 10,000
r I
67
-10
VCO-OL
-20
-10
^-30
5 -40
I-
OL
S-50
^
t o- 3 0
*-
FD/CP,FD-OL
OSC-Ol>
-50
-80
300 Hz
-40
CL
-70
PLL-CL
1kHz 3 kHz
10 kHz 30 kHz
Fig. 17. Noise of the closed-loop PLL at the output of the VCO
when only the reference oscillator exhibits jitter (CL) versus the
noise of the reference oscillator mapped up to the VCO frequency
when operated open loop (OL).
1 kHz
3 kHz
10 kHz
30 kHz
100 kHz
Fig. 20. Closed-loop PLL noise performance compared to the openloop noise performance of the individual components that make up
the PLL. The achieved noise is slightly larger than what is expected
from the components due to peaking in the response of the PLL.
0
OL
-10
300 Hz
100 kHz
m-20
-30
XV.
CL
-40
300 Hz
1kHz 3 kHz
10 kHz 30 kHz
100 kHz
Fig. 18. Noise of the closed-loop PLL at the output of the VCO
when only the VCO exhibits jitter (CL) versus the noise of the VCO
when operated open loop (OL).
-25
-30
^.-35
N
5 -40
CO
OL
X
\
CL
3-45
-55
VV
[1] Ken Kundert. "Introduction to RF simulation and its application." Journal ofSolid-State Circuits, vol. 34, no. 9,
September 1999.
-60
1kHz
10 kHz
CONCLUSION
100kHz
Fig. 19. Noise of the closed-loop PLL at the output of the VCO
when only the PFD/CP, FDM, and FD^ exhibit jitter (CL) versus the
noise of these components mapped up to the VCOfrequencywhen
operated open loop (OL).
68
[17] F. X. Kaertner. "Analysis of white and/^01 noise in oscillators." International Journal of Circuit Theory and Applications, vol. 18, pp. 485-519, 1990.
[18] G. Vendelin, A. Pavio, U. Rohde. Microwave Circuit
Design. J. Wiley & Sons, 1990.
[19] W. Gardner. Introduction to Random Processes: With
Applications to Signals and Systems. McGraw-Hill,
1989.
[20] Joel Phillips and Ken Kundert. "Noise in mixers, oscillators, samplers, and logic: an introduction to cyclostationary noise." Proceedings of the IEEE Custom Integrated
Circuits Conference, CICC 2000. The paper and presentation are both available from
www.designersguide, com.
[21] T. A. D. Riley, M. A. Copeland, and T. A. Kwasniewski.
"Delta-sigma modulation in fractional-TV frequency synthesis." IEEE Journal of Solid-State Circuits, vol. 28 no.
5, May 1993, pp. 553 -559
[22] Frank Herzel and Behzad Razavi. "A study of oscillator
jitter due to supply and substrate noise." IEEE Transactions on Circuits and Systems - //; Analog and Digital
Signal Processing, vol. 46. no. 1, Jan. 1999, pp. 56-62.
[23] T. C. Weigandt, B. Kim, and P. R. Gray. "Jitter in ring
oscillators." 1994 IEEE International Symposium on
Circuits and Systems (ISCAS-94), vol. 4, 1994, pp. 2730.
[24] A. Papoulis. Probability, Random Variables, and Stochastic Processes. McGraw-Hill, 1991.
[25] J. J. Rael and A. A. Abidi. "Physical processes of phase
noise in differential LC oscillators." Proceedings of the
IEEE Custom Integrated Circuits Conference, CICC
2000.
[26] J. McNeill. "Jitter in Ring Oscillators." IEEE Journal of
Solid-State Circuits, vol. 32, no. 6, June 1997.
[27] H. Chang, E. Charbon, U. Choudhury, A. Demir, E. Felt,
E. Liu, E. Malavasi, A. Sangiovanni-Vincentelli, and I.
Vassiliou. A Top-Down Constraint-Driven Methodology
for Analog Integrated Circuits. Kluwer Academic Publishers, 1997.
[28] A. Oppenheim, R. Schafer. Digital Signal Processing.
Prentice-Hall, 1975.
[29] F. Harris. "On the use of windows for harmonic analysis
with the discrete Fourier transform." Proceedings of the
IEEE, vol. 66, no. 1, January 1978.
69
331
I. INTRODUCTION
332
Aiw
Low-Noise
Amplifier
c
Band-Pass
Filter
Frequency
Synthesizer
Amplifier
Band-Pass
(b)
Fig 3. Effect of phase noise on (a) receive and (b) transmt paths.
333
cc
llI
Freq.
Control
-L
Fig. 4.
(2)
LC oscillator,
= 2~
Energy
Energy Dissipated per Cycle
111. DEFINITIONS
OF Q
-(?)
2
H(s)=
(1)
(3) Q=--00 dQ
2 do
Fig. 5. Common definitions of
&.
IV. LINEAROSCILLATORY
SYSTEM
Oscillator circuits in general entail compressive nonlinearity, fundamentally because the oscillation amplitude is not
defined in a linear system. When a circuit begins to oscillate,
the amplitude continues to grow until it is limited by some
other mechanism. In typical configurations, the open-loop gain
of the circuit drops at sufficiently large signal swings, thereby
preventing further growth of the amplitude.
In this paper, we begin the analysis with a linear model. This
approach is justified as follows. Suppose an oscillator employs
strong automatic level control (ALC) such that its oscillation
amplitude remains small, making the linear approximation
334
Since for w
M WO,
= 1, ( 6 ) can be written as
l a w dH/dwl
,[j(wo
+ Aw)] M
-1
.
aw-dH
dw
(5)
335
y,(im)
Y(p)
Fig. 9. Oscillatory system with nonunity-gain feedback.
H(jw)=
(a)
vDD
(13)
(1 + j f i-5~)~
Control
(b)
Fig. 10. CMOS VCO: (a) block diagram and (b) implementation of one
stage.
336
is equal to
E = 8kT(gml + gm3)/3 M
8 k T I R . Thus,
oscilPatory system. Simulations indicate that for the oscillator topologies considered here, these two components have
approximately equal magnitudes. Thus, the nonlinearity folds
all the noise components below W O to the region above and
vice versa, effectively doubling the noise power predicted by
(6). Such components are significant if they are close to W O
and are herein called high-frequency multiplicative noise. This
phenomenon is illustrated in Fig. 12. (Note that a component
at 3w0 A w is also translated to W O A w , but its magnitude
is negligible.)
This effect can also be viewed as sampling of the noise
by the differential pairs, especially if each stage experiences hard switching. As each differential pair switches twice
in every period, a noise component at w, is translated to
2w0 f wn. Note that for highly nonlinear stages, the Taylor
expansion considered above may need to include higher order
terms.
The nonlinearity in the differential stages of Fig. 10, especially as they turn off, causes noise components to be
multiplied by the carrier (and by each other). If the input/output
characteristic of each stage is expressed as VoUt= all,$n
CQV,; Q~V,:, then for an input consisting of the carrier and a
noise component, e.g., K n ( t ) = A0 cos wot A, cosw,t, the
output exhibits the following important terms:
wn)t
~&~~c
( tl)i , ~
cos(wo
, ~:2w,)t
votl(t) fx Q24OAn COS(W0 f
tit
e
.
.
00
..J
*-
00
M1
+ 1,s
(a)
R
4
L
k
+ ss
(b)
Fig. 15. Gain stage with (a) stationary and (b) cyclostationary noise.
wot + Kvco
For KvcoI,/w,
-4
*e*.
:qn
331
I , cos w,t
dt)
(17)
. [ C O S ( w o + b,)t
- COS(WO -
w,)t].
(19)
(20)
As with other analog circuits, oscillators exhibit a tradeoff between power dissipation and noise. Intuitively, we note
that if the output voltages of N identical oscillators are added
in phase (Fig. 16), then the total carrier power is multiplied
by N2, whereas the noise power increases by N (assuming
noise sources of different oscillators are uncorrelated). Thus,
the phase noise (relative to the carrier) decreases by a factor
N at the cost of a proportional increase in power dissipation.
Using the equations developed above, we can also formulate
this trade-off. For example, from (16), since G,R M 2, we
have
338
COMPARISON OF
TABLE I
THREE-STAGE
AND FOUR-STAGE
RINGOSCILLATORS
3-Stage VCO
&
&Stage VCO
Open-Loop Q
?! 4
(e 1.3)
Jz (M 1.4)
Ml M2
1.8 mW
3.6 mW
Even though the gain stage of Fig. 10 is designed as a differential circuit, it nonetheless suffers from some sensitivity to
supply and substrate noise (Fig. 17). Two phenomena account
for this. First, device mismatches degrade the symmetry of the
circuit. Second, the total capacitance at the common source of
the differential pair (i.e., the source junction capacitance of M I
and M2 and the capacitance associated with the tail current
source) converts the supply and substrate noise to current,
thereby modulating the delay of the gain stage. Simulations
indicate that even if the tail current source has a high dc output
impedance, a 1-mV,, supply noise component at 10 MHz
generates sidebands 60 dB below the canier at W O f (27r x 10
MHz).
gmR =
CA
CA - CD
and
339
LT2vDD
Ml
-100
1.1
12
t .3
1A
1.5
1.8
GI42
(C)
(d)
Fig. 18. (a) CMOS relaxation oscillator, (b) circuit of (a) redrawn, (c) noise
current of one transistor, and (d) tranfformed noise current.
A. Simulation Issues
The time-varying nature of oscillators prohibits the use
of the standard small-signal ac analysis available in SPICE
and other similar programs. Therefore, simulations must be
performed in the time domain. As a first attempt, one may
generate a pseudo-random noise with known distribution,
introduce it into the circuit as a SPICE piecewise linear
waveform, run a transient analysis for a relatively large number
of oscillation periods, write the output as a series of points
equally spaced in time, and compute the fast Fourier transform
(FFT)of the output. The result of one such attempt is shown in
Fig. 19. It is important to note that 1) many coherent sidebands
B. Oscillator Simulations
In order to compute the response of oscillators to each
noise source, we approximate the noise per unit bandwidth
at frequency w, with an impulse (a sinusoid) of the same
power at that frequency. As shown in Fig. 21, the sinusoidal
noise is injected at various points in the circuit and the output
spectrum is observed. This approach is justified by the fact that
random Gaussian noise can be expressed as a Fourier series of
sinusoids with random phase [8], [SI.Since only one sinusoid
is injected in each simulation, the interaction among noise
components themselves is assumed negligible, a reasonable
approximation because if two noise components at, say, -60
dB are multiplied, the product is at -120 dB.
In the simulations, the oscillators were designed for a center
frequency of approximately 970 MHz. Each circuit and its
linearized models were simulated in the time domain in steps
340
IX, EXPERIMENTAL
RESULTS
A. Measurements
9.5
10
10.5
11
x 100 MHZ
(b)
Fig. 20 Simple simulation revealing effect of pulse waveforms, (a) single sinusoidal source and (b) sinusoidal source along with a square wave
generator.
341
OL
...,."..........,
. . . . . . . . . . .l'1 ....... ....).
. . . . . . II . ......
!
or .......,.........
I
4
I
....... .....................
.. I . . . . .
..............................................
................................
a.............................
:..... .....................
a
.........................................................
4!
[_ :
:..............:..............
-80 ......................................
-80
......................................
-6oc
-1
I
-100
-120
-140
-180
-1 80
9.4
92
..........................
j ....................
............ . . . . . . . . . . . . . . . .
9.8
II
-@+
9.8
10
10.2
10.4
x 100 MHz
I /I I
-100
-100
-120
-120
-140
-140
-160
-180
-180
-180
9.2
9.4
9.6
9.8
10
9.2
9.4
9.6
9.8
10
10
10.2
10.4
x 100 MHz
x 100 MHz
(b)
(b)
Fig. 22. Simulated output spectra of (a) linear model and (b) actual circuit
of a three-stage CMOS oscillator.
Fig. 23. Simulated output spectra of (a) linear model and (b) actual circuit
of a four-stage CMOS oscillator.
gm= 11214 U
Ms-Me: WILE 13.4u/0.5~
g m = 11630 U
M g : WIL = 1 3 . 4 ~ l 0 . 5 ~
I, 790 UA
g m = 11530 U
342
Ml-M,:
W/L= 100u10.5~
g=
,, 1/84 Ili
R=275 R
C = 0.6 pF
Iss= 3 mA
(b)
F ig. 25. Measured output spectrum of ring oscillator (10 dB/div. vertical
SIsale). (a) 5 MHz/div. horizontal scale and 1 MHz resolution bandwidth, (b)
1 MHz horizontal scale and 100 lcHz resolution bandwidth.
115
B. Discussion
Using the above measured data points and assuming a noise
shaping function as in (10) with a linear noise-power trade-off
(Fig. 16), we can make a number of observations.
343
(wg)zL
aw
Vswingz
__
1
IDD
(35)
ACKNOWLEDGMENT
The author wishes to thank V. Gopinathan for many illuminating discussions and T. Aytur for providing the
oscillator simulation and measurement results.
REFERENCES
[l] A. A. Abidi and R. G. Meyer, Noise in relaxation oscillators, IEEE
J. Solid-state Circuits, vol. SC-18, pp. 794-802, Dec. 1983.
[2] T. C. Weigandt, B. Kim, and P. R. Gray, Analysis of timing jitter in
cmos ring oscillators, in Proc. ISCAS, June 1994.
[3] A. A. Abidi, Direct conversion radio tranceivers for digital communications, in ZSSCC Dig. Tech. Papers, Feb. 1995, pp. 186187.
[4] D. B. Leeson, A simple model of feedback oscillator noise spectrum,
Proc. IEEE, pp. 329-330, Feb. 1966.
[5] B. Razavi, K. F. Lee, and R.-H. Yan, Design of high-speed low-power
frequency dividers and phase-locked loops in deep submicron CMOS,
IEEE J. Solid-state Circuits, vol. 30, pp. 101-109, Feb. 1995.
[6] Y. P. Tsividis, Operation and Modeling of the MOS Transistor. New
York McGraw-Hill, 1987.
[7] J. A. Connelly and P. Choi, Macromodeling with SPICE. Englewood
Cliffs, NJ: Prentice-Hall, 1992.
[8] S. 0. Rice, Mathematical analysis of random noise, Bell System Tech.
J., pp. 282-332, July 1944, and pp. 46156, Jan. 1945.
[9] P. Bolcato et al., A new and efficient transient noise analysis technique
for simulation of CCD image sensors or particle detectors, in Proc.
CICC, 1993, pp. 14.8.1-14.8.4.
[lo] T. Kwasniewski, et al., Inductorless oscillator design for personal
communications devices-A 1.2 pm CMOS process case study, in
Proc. CICC, May 1995, pp. 327-330.
928
Correspondence
Corrections to A General Theory of
Phase Noise in Electrical Oscillators
Kevin J. McGee
i2n 1 c2
n
1f
Lf1!g = 10 1 log 4q2 n=0
2 :
max 1!
I. INTRODUCTION
1
1!1=f = !1=f 1
c0
20rms
2
!1=f 1 12 cc01 :
This will result in the factor of 1/2 becoming redundant in (29), i.e.,
!0
1
Lf1!g = 10 1 log VkT2 1 Rp 1 (C!
1 1!
0 )2
max
AND
BIT REVERSERS
790
I. INTRODUCTION
UE to their integrated nature, ring oscillators have become an essential building block in many digital and
communication systems. They are used as voltage-controlled
oscillators (VCOs) in applications such as clock recovery
circuits for serial data communications [1][4], disk-drive read
channels [5], [6], on-chip clock distribution [7][10], and
integrated frequency synthesizers [10], [11]. Although they
have not found many applications in radio frequency (RF),
they can be used for some low-tier RF systems.
Recently, there has been some work on modeling jitter
and phase noise in ring oscillators. References [12] and [13]
develop models for the clock jitter based on time-domain
treatments for MOS and bipolar differential ring oscillators,
respectively. Reference [14] proposes a frequency-domain
approach to find the phase noise based on an linear timeinvariant model for differential ring oscillators with a small
number of stages.
In this paper, we develop a parallel treatment of frequencydomain phase noise [15] and time-domain clock jitter for ring
oscillators. We apply the phase-noise model presented in [16]
to obtain general expressions for jitter and phase noise of the
ring oscillators.
The next section briefly reviews the phase-noise model
presented in [16]. In Section III, we apply the model to timing
jitter and develop an expression for the timing jitter of oscillators, while Section IV provides the derivation of a closed-form
expression to calculate the rms value of the impulse sensitivity
function (ISF). Section V introduces expressions for jitter and
phase noise in single-ended and differential ring oscillators
Manuscript received April 8, 1998; revised November 2, 1998.
A. Hajimiri is with the California Institute of Technology, Pasadena, CA
91125 USA.
S. Limotyrakis and T. H. Lee are with the Center for Integrated Systems,
Stanford University, Stanford, CA 94305 USA.
Publisher Item Identifier S 0018-9200(99)04200-6.
where
791
(5)
792
where
represents the noise current injected into the node
of interest. Note that the integration arises from the closedloop nature of the oscillator. The single-sideband phase-noise
spectrum due to a white-noise current source is given by [16]1
(6)
is the rms value of the ISF,
is the singlewhere
sideband power spectral density of the noise current source,
is the frequency offset from the carrier. In the case
and
of multiple noise sources injecting into the same node,
represents the total current noise due to all the sources and is
given by the sum of individual noise power spectral densities
[17]. If the noise sources on different nodes are uncorrelated,
the waveform (and hence the ISF) of all the nodes are the same
except for a phase shift, assuming identical stages. Therefore,
noise sources is
times
the total phase noise due to all
the value given by (6) (or 2 times for a differential ring
oscillator).
From (5), it follows that the upconversion of low-frequency
noise, such as 1 noise, is governed by the dc value of the
and 1
regions in
ISF. The corner frequency between 1
and is related to
the spectrum of the phase noise is called
through the following equation [16]:
the 1 noise corner
(7)
is the dc value of the ISF. Since the height of the
where
positive and negative lobes of the ISF is determined by the
slope of the rising and falling edges of the output waveform,
respectively, symmetry of the rising and falling edges can
and hence the upconversion of 1 noise.
reduce
III. JITTER
In an ideal oscillator, the spacing between transitions is
constant. In practice, however, the transition spacing will
be variable. This uncertainty is known as clock jitter and
(i.e., the time deincreases with measurement interval
lay between the reference and the observed transitions), as
illustrated in Fig. 3. This variability accumulation (i.e., jitter
accumulation) occurs because any uncertainty in an earlier
transition affects all the following transitions, and its effect
persists indefinitely. Therefore, the timing uncertainty when
seconds have elapsed is the sum of the uncertainties
associated with each transition.
The statistics of the timing jitter depend on the correlations
among the noise sources involved. The case of each transitions being affected by independent noise sources has been
considered in [12] and [13]. The jitter introduced by each stage
is assumed to be totally independent of the jitter introduced
by other stages, and therefore the total variance of the jitter is
given by the sum of the variances introduced at each stage. For
ring oscillators with identical stages, the variance will be given
where is the number of transitions during
and
by
1 A more accurate treatment [17] shows that the phase noise does not grow
without bound as fo approaches zero (it becomes flat for small values of
fo ): However, this makes no practical difference in this discussion.
793
Fig. 5. ISF for ring oscillators of the same frequency with different number of stages.
in (8) is
(12)
IV. CALCULATION OF THE ISF
FOR
RING OSCILLATORS
To calculate phase noise and jitter using (6) and (12), one
needs to know the rms value of the ISF. Although one can
always find the ISF through simulation, we obtain a closedform approximate equation for the rms value of the ISF of ring
oscillators, which usually makes such simulations unnecessary.
It is instructive to look at the actual ISF of ring oscillators to
gain insight into what constitutes a good approximation. Fig. 5
shows the shape of the ISF for a group of single-ended CMOS
ring oscillators. The frequency of oscillation is kept constant
(through adjustment of channel length), while the number of
stages is varied from 3 to 15 (in odd numbers). To calculate the
ISF, a narrow current pulse is injected into one of the nodes
of the oscillator, and the resulting phase shift is measured a
few cycles later in simulation.
As can be seen, increasing the number of stages reduces the
peak value of the ISF. The reason is that the transitions of the
Since the
normalized waveform become faster for larger
sensitivity during the transition is inversely proportional to the
slope, the peak of the ISF drops. It should be noted that only
the peak of the ISF is inversely proportional to the slope, and
(13)
794
Fig. 8. RMS values of the ISFs for various single-ended ring oscillators versus number of stages.
795
Fig. 9. RMS values of the ISFs for various differential ring oscillators versus number of stages.
where
(19)
and
(20)
and
796
(23)
(30)
(24)
is the characteristic voltage of the device. For
where
long-channel mode of operation, it is defined as
Any extra disturbance, such as substrate and supply
noise, or noise contributed by extra circuitry or asymmetry in
the waveform will result in a larger number than (23) and (24).
Note that lowering threshold voltages reduces the phase noise,
in agreement with [12]. Therefore, the minimum achievable
phase noise and jitter for a single-ended CMOS ring oscillator,
assuming that all symmetry criteria are met, occurs for zero
threshold voltage
which results in a larger phase noise and jitter than the longAgain, note the absence
channel case by a factor of
of any dependency on the number of stages.
B. Differential CMOS Ring Oscillators
Now consider a differential MOS ring oscillator with resistive load. The total power dissipation is
(31)
is the number of stages,
is the tail bias current
where
is the supply voltage. The
of the differential pair, and
frequency of oscillation can be approximated by
(25)
(32)
(26)
As can be seen, the minimum phase noise is inversely proportional to the power dissipation and grows quadratically with
the oscillation frequency. Further, note the lack of dependence
on the number of stages (for a given power dissipation
and oscillation frequency). Evidently, the increase in the
number of noise sources (and in the maximum power due
to the higher transition currents required to run at the same
as
frequency) essentially cancels the effect of decreasing
increases, leading to no net dependence of phase noise on
This somewhat surprising result may explain the confusion
that exists regarding the optimum , since there is not a strong
dependence on the number of stages for single-ended CMOS
ring oscillators. Note that (25) and (26) establish the lower
bound and therefore should not be used to calculate the phase
noise and jitter of an arbitrary oscillator, for which (6) and
(12) should be used, respectively.
We may carry out a similar calculation for the short-channel
case. For such devices, the drain current may be expressed as
(27)
is the critical electric field and is defined as the value
where
of electric field resulting in half the carrier velocity expected
from low field mobility. Combining (17) with (27), we obtain
the following expression for the drain current noise of a MOS
device in short channel:
(28)
(33)
is the load resistor,
for a
where
balanced stage in the long-channel limit and
in the short-channel regime. The phase noise and jitter due
to all 2 noise sources is 2 times the value given by (6)
and (12). Using (16), the expression for the phase noise of the
differential MOS ring oscillator is
(34)
and is given by
(35)
(29)
Equations (34) and (35) are valid in both long- and shortchannel regimes of operation with the right choice of
Note that, in contrast with the single-ended ring oscillator,
a differential oscillator does exhibit a phase noise and jitter
dependency on the number of stages, with the phase noise
797
degrading as the number of stages increases for a given frequency and power dissipation. This result may be understood
as a consequence of the necessary reduction in the charge
swing that is required to accommodate a constant frequency
increases. At the
of oscillation at a fixed power level as
same time, increasing the number of stages at a fixed total
power dissipation demands a proportional reduction of tail,
current sources, which will reduce the swing, and hence
by a factor of 1
C. Bipolar Differential Ring Oscillator
A similar approach allows us to derive the corresponding
results for a bipolar differential ring oscillator. In this case,
the power dissipation is given by (31) and the oscillation
frequency by (32). The total noise current is given by the
sum of collector shot noise and load resistor noise
(36)
is the electron charge,
is the collector
where
Using these
current during the transition, and
relations, the phase noise and jitter of a bipolar ring oscillator
are again given by (34) and (35) with the appropriate choice
of
VI. OTHER NOISE SOURCES
Other noise sources, such as tail-current source noise in a
differential structure, or substrate and supply noise sources,
may play an important role in the jitter and phase noise of
ring oscillators. The low-frequency noise of the tail-current
source affects phase noise if the symmetry criteria mentioned
in Section II are not met by each half circuit. In such cases,
the ISF for the tail-current source has a large dc value, which
increases the upconversion of low-frequency noise to phase
noise. This upconversion is particularly prominent if the tail
device has a large 1 noise corner.
Substrate and supply noise are among other important
sources of noise. There are two major differences between
these noise sources and internal device noise. First, the power
spectral density of these sources is usually nonwhite and often
demonstrates strong peaks at various frequencies. Even more
important is that the substrate and supply noise on different
nodes of the ring oscillator have a very strong correlation.
This property changes the response of the oscillator to these
sources.
To understand the effect of this correlation, let us consider
the special case of having equal noise sources on all the nodes
of the oscillator. If all the inverters in the oscillator are the
same, the ISF for different nodes will only differ in phase by
as shown in Fig. 10. Therefore, the total
multiples of
phase due to all the sources is given by superposition of (5)
(37)
Fig. 11. Sideband power below carrier for equal sources on all five nodes
at nf0
f
:
+ m
798
The jitter and phase noise behavior are different for differential ring oscillators. As (34) suggests, jitter and phase
noise increase with an increasing number of stages. Hence
if the 1 noise corner is not large, and/or proper symmetry
measures have been taken, the minimum number of stages
(three or four) should be used to give the best performance.
This recommendation holds even if the power dissipation is
not a primary issue. It is not fair to argue that burning more
power in a larger number of stages allows the achievement of
better phase noise, since dissipating the same total power in a
smaller number of stages results in better jitter/phase noise as
long as it is possible to maximize the total charge swing.
Another insight one can obtain from (34) and (35) is
that the jitter of a MOS differential ring oscillator at a
and
is smaller than that of a differential
given
bipolar ring oscillator, at least for todays range of circuit
and process parameters. As we go to shorter channel lengths,
the characteristic voltage for the MOS devices given by (30)
becomes smaller, and thus phase noise degrades with scaling.
Bipolar ring oscillators do not suffer from this problem.
LC oscillators generally have better phase noise and jitter
compared to ring oscillators for two reasons. First, a ring
oscillator stores a certain amount of energy in the capacitors
during every cycle and then dissipates all the stored energy
during the same cycle, while an LC resonator dissipates only
of the total energy stored during one cycle. Thus, for a
2
given power dissipation in steady state, a ring oscillator suffers
Second, in a ring
from a smaller maximum charge swing
oscillator, the device noise is maximum during the transitions,
which is the time where the sensitivity, and hence the ISF, is
the largest [16].
799
TABLE I
INVERTER-CHAIN RING OSCILLATORS
TABLE II
CURRENT-STARVED INVERTER-CHAIN RING OSCILLATORS
agreement with the measured results. The die photo of the chip
containing these oscillators is shown in Fig. 13. The slightly
superior phase noise of the three-stage ring oscillator (number
4) can be attributed to lower oscillation frequency and longer
channel length (and hence smaller ).
Table III summarizes the results obtained for differential
ring oscillators of various sizes and lengths with the inverter
topology shown in Fig. 12(c), covering a large span of frequencies up to 5.5 GHz. All these ring oscillators are implemented
in the same 0.25- m 2.5-V process, and all the oscillators,
except the one marked with N/A, have the tuning circuit
shown. The resistors are implemented using an unsilicided
polysilicon layer. The main reason to use poly resistors is to
reduce 1 noise upconversion by making the waveform on
each node closer to the step response of an RC network, which
is more symmetrical. The value of these load resistors and the
ratios of the differential pair are shown in Table III. A
fixed 2.5-V power supply is used, resulting in different total
power dissipations. As before, the measured phase noise is in
good agreement with the predicted phase noise using (34) and
(6). The die photo of oscillator number 26 can be found in
Fig. 14.
To illustrate further how one obtains the phase-noise predictions shown in Table III, we elaborate on the phase-noise
800
TABLE III
DIFFERENTIAL RING OSCILLATORS
(a)
Fig. 12.
(b)
(c)
Inverter stages for (a) inverter-chain ring oscillators, (b) current-starved inverter-chain ring oscillators, and (c) differential ring oscillators.
Fig. 13.
Fig. 14.
801
Fig. 15.
802
Fig. 16.
RMS jitter versus measurement interval for the four-stage, 2.8-GHz differential ring oscillator (oscillator number 12).
(41)
These bias voltages are chosen in such a way as
to keep a constant oscillation frequency while changing only
corner of the
the ratio of rise time to fall time. The 1
phase noise is measured for different ratios of the pullup and
pulldown currents while keeping the frequency constant. One
can observe a sharp reduction in the corner frequency at the
point of symmetry in Fig. 17.
IX. CONCLUSION
An analysis of the jitter and phase noise of single-ended
and differential ring oscillators was presented. The general
noise model, based on the ISF, was applied to the case of ring
oscillators, resulting in a closed-form expression for the phase
noise and jitter of ring oscillators [(6), (23), (34)]. The model
was used to perform a parallel analysis of jitter and phase
noise for ring oscillators. The effect of the number of stages
Therefore
(42)
For a white-noise current source, the autocorrelation func; therefore
tion is
(43)
which is
for
(44)
803
(45)
represents the expected value. Since the autocorwhere
is defined as
relation function of
(46)
the timing jitter in (45) can be written as
Fig. 18. Approximate waveform and the ISF for asymmetric rising and
falling edges.
APPENDIX B
NONSYMMETRIC RISING AND FALLING EDGES
We approximate the ISF in this Appendix by the function
depicted in Fig. 18. The rms value of the ISF is
(47)
The relation between the autocorrelation and the power spectrum is given by the Khinchin theorem [21], i.e.,
(48)
(52)
and
are the maximum slope during the rising
where
and falling edge, respectively, and represents the asymmetry
of the waveform and is defined as
(49)
(53)
can be approximated
It may be useful to know that
for large offsets [22]. As can be seen from the
by
foregoing, the rms timing jitter has less information than the
phase-noise spectrum and can be calculated from phase noise
using (49). However, unless extra information about the shape
of the phase-noise spectrum is known, the inverse is not
possible in general.
In the special case where the phase noise is dominated
and
are given by (6) and (12).
by white noise,
Therefore, can be expressed in terms of phase noise in the
region as
1
noting that
(54)
Combining (52) and (54) results in the following:
(55)
i.e.,
which reduces to (16) in the special case of
symmetric rising and falling edges. The dc value of the ISF,
can be calculated from Fig. 18 in a similar manner and
is given by
(50)
is the phase noise measured in the 1
where
region at an offset frequency of
and
is the oscillation
frequency. Therefore, based on (8), the rms cycle-to-cycle jitter
will be given by
(51)
Note that for (50) and (51) to be valid, the phase noise at
should be in the 1
region.
(56)
Using (7), the 1
corner is given by
(57)
corner
As can be seen for a constant rise-to-fall ratio, the 1
decreases inversely with the number of stages; therefore, ring
oscillators with a smaller number of stages will have a larger
noise corner. As a special case, if the rise and fall time
1
are symmetric,
, and the 1
corner approaches zero.
804
ACKNOWLEDGMENT
The authors would like to thank M. A. Horowitz, G. Nasserbakht, A. Ong, C. K. Yang, B. A. Wooley, and M. Zargari for
helpful discussions and support. They would further like to
thank Texas Instruments, Inc., and Stanford Nano-Fabrication
facilities for fabrication of the oscillators.
REFERENCES
[1] L. DeVito, J. Newton, R. Croughwell, J. Bulzacchelli, and F. Benkley,
A 52 and 155 MHz clock-recovery PLL, in ISSCC Dig. Tech. Papers,
Feb. 1991, pp. 142143.
[2] A. W. Buchwald, K. W. Martin, A. K. Oki, and K. W. Kobayashi, A 6GHz integrated phase-locked loop using AlCaAs/Ga/As heterojunction
bipolar transistors, IEEE J. Solid-State Circuits, vol. 27, pp. 17521762,
Dec. 1992.
[3] B. Lai and R. C. Walker, A monolithic 622 Mb/s clock extraction data
retiming circuit, in ISSCC Dig. Tech. Papers, Feb. 1993, pp. 144144.
[4] R. Farjad-Rad, C. K. Yang, M. Horowitz, and T. H. Lee, A 0.4 mm
CMOS 10 Gb/s 4-PAM pre-emphasis serial link transmitter, in Symp.
VLSI Circuits Dig. Tech Papers, June 1998, pp. 198199.
[5] W. D. Llewellyn, M. M. H. Wong, G. W. Tietz, and P. A. Tucci, A
33 Mbi/s data synchronizing phase-locked loop circuit, in ISSCC Dig.
Tech. Papers, Feb. 1988, pp. 1213.
[6] M. Negahban, R. Behrasi, G. Tsang, H. Abouhossein, and G. Bouchaya,
A two-chip CMOS read channel for hard-disk drives, in ISSCC Dig.
Tech. Papers, Feb. 1993, pp. 216217.
[7] M. G. Johnson and E. L. Hudson, A variable delay line PLL for CPUcoprocessor synchronization, IEEE J. Solid-State Circuits, vol. 23, pp.
12181223, Oct. 1988.
[8] I. A. Young, J. K. Greason, and K. L. Wong, A PLL clock generator
with 5110 MHz of lock range for microprocessors, IEEE J. Solid-State
Circuits, vol. 27, pp. 15991607, Nov. 1992.
[9] J. Alvarez, H. Sanchez, G. Gerosa, and R. Countryman, A widebandwidth low-voltage PLL for PowerPCTM microprocessors, IEEE
J. Solid-State Circuits, vol. 30, pp. 383391, Apr. 1995.
[10] I. A. Young, J. K. Greason, J. E. Smith, and K. L. Wong, A PLL clock
generator with 5110 MHz lock range for microprocessors, in ISSCC
Dig. Tech. Papers, Feb. 1992, pp. 5051.
[11] M. Horowitz, A. Chen, J. Cobrunson, J. Gasbarro, T. Lee, W. Leung,
W. Richardson, T. Thrush, and Y. Fujii, PLL design for a 500 Mb/s
interface, in ISSCC Dig. Tech. Papers, Feb. 1993, pp. 160161.
[12] T. C. Weigandt, B. Kim, and P. R. Gray, Analysis of timing jitter in
CMOS ring oscillators, in Proc. ISCAS, June 1994.
[13] J. McNeill, Jitter in ring oscillators, IEEE J. Solid-State Circuits, vol.
32, pp. 870879, June 1997.
[14] B. Razavi, A study of phase noise in CMOS oscillators, IEEE J.
Solid-State Circuits, vol. 31, pp. 331343, Mar. 1996.
[15] A. Hajimiri, S. Limotyrakis, and T. H. Lee, Phase noise in multigigahertz CMOS ring oscillators, in Proc. Custom Integrated Circuits
Conf., May 1998, pp. 4952.
[16] A. Hajimiri and T. H. Lee, A general theory of phase noise in electrical
oscillators, IEEE J. Solid-State Circuits, vol. 33, pp. 179194, Feb.
1998.
, The Design of Low Noise Oscillators. Boston, MA: Kluwer
[17]
Academic, 1999.
[18] A. A. Abidi, High-frequency noise measurements of FETs with small
dimensions, IEEE Trans. Electron Devices, vol. ED-33, pp. 18011805,
Nov. 1986.
483
I. INTRODUCTION
HASE-LOCKED loops (PLLs) operating on digital signals are fundamental in microelectronic systems. Indeed,
they realize essential functions such as clock distributions and
clock recovery. They can thus be found in large digital circuits
such as microprocessors and on mixed-signal ICs for digital
communications. These devices process digital signals where
the phase information is contained in the transition times. They
usually make use of sequential phase detectors which require
charge-pumps and are thus called charge-pump PLLs [1].
The specifications for these PLLs are extremely aggressive,
especially when embedded in high-speed digital communication systems. Process variations can adversely affect circuit
performance and result in low yield. To increase the number
of good parts, some of the components can be trimmed. This
is, however, a very expensive process. Furthermore, aging or
different operating conditions can later affect circuits such that
they no longer meet specifications. The solution is to allow
the PLL to self-calibrate. With this property, the expensive
trimming stage can be avoided and changes in the operating
conditions can be tracked. However, the challenge now is to
integrate on-chip a characterization scheme. This involves two
functional blocks: a stimulus source and a response analyzer.
Manuscript received July 15, 1997; revised October 20, 1997. This work
was supported by NSERC and the Micronet, a Canadian federal network
of centers of excellence dealing with microelectronic devices, circuits, and
systems for ultra large scale integration.
The authors are with Microelectronics and Computer Systems Laboratory,
McGill University, Montreal, PQ H3A 2A7, Canada.
Publisher Item Identifier S 0018-9200(98)01021-X.
484
(a)
(b)
Fig. 3. Models of charge-pump phase-locked loop: (a) continuous time and
(b) discrete time.
485
(2)
IN THE
LOOP
(a)
(b)
Fig. 6. Phase spectrum: (a) input signal jitter and (b) output signal jitter.
486
(a)
(b)
Fig. 9. Spectrum of test jitter signals: (a) signal band and (b) Nyquist
interval.
(a)
(b)
(c)
Fig. 10. Generating deltasigma encoded sinusoids: (a) direct digital frequency synthesis, (b) deltasigma oscillator, and (c) fixed-length periodic
byte stream.
Fig. 11.
487
OF THE
OUTPUT JITTER
488
(a)
(b)
Fig. 12.
Fig. 13. Signal-to-noise ratio of the output jitter versus the PLL relative
bandwidth.
Fig. 14.
489
Experimental setup.
TABLE I
EXPERIMENT PARAMETERS
490
XI. CONCLUSIONS
Fig. 15.
We have presented a PLL jitter transfer function measurement technique which is entirely digital except for the possible
addition of a charge pump. The technique is suitable for onchip measurement since it does not require trimming and the
silicon overhead is small. Two methods were introduced for
the creation of jitter, allowing tradeoffs between test clock
frequency on one side and loading, complexity, and accuracy
on the other side. Experimental results were presented which
suggest this scheme could be successfully implemented on
silicon.
ACKNOWLEDGMENT
The authors acknowledge the suggestions of B. Gerson from
PMC Sierra.
REFERENCES
Fig. 16.
Benot R. Veillette (S97) was born in TroisRivieres, Quebec, Canada, on January 1, 1971.
He received the B.Eng. (Honors) degree and the
M.Eng. degree from McGill University, Montreal,
PQ, Canada, in 1993 and 1995, respectively. He
is now completing the Ph.D. degree in electrical
engineering from the same institution.
His current research interests are in deltasigma
modulation, analog integrated circuits for communications, and mixed-signal testing.
491
Differential LC Oscillators
Systems Laboratory
University o f California
Los Angeles,CA 90095-1594
Introduction
There is an unprecedented interest among circuit designers today
to obtain insight into mechanisms of phase noise in LC oscillators. For only with this insight is it possible to optimize oscillator
circuits using low-quality integrated resonators to comply with
the exacting phase noise specifications of modern wireless systems. Various numerical simulators are now available to assist the
circuit designer [ 11, [2], [3], in some cases accompanied by qualitative interpretations [4]. At present, therefore, the situation of the
oscillator designer is similar to the designer of amplifiers who is
equipped only with SPICE, but who lacks physical insight and
methods for simple yet accurate analysis with which to optimize a
circuit.
Over the years, various attempts at phase noise analysis have produced results that are variations on Leesons classic heuristic derivation without formal proof [SI, [6]. These analyses are based
on a linear model of an LC resonator in steady-state oscillation
through application of either feedback or negative conductance.
The results confirm Leeson by showing that phase noise is proportional to noise-to-carrier ratio and inversely to the square of resonator quality factor. However, without knowledge of the constant
of proportionality, which Leeson leaves as an unspecified noise
factor, the actual phase noise cannot be predicted.
The results are validated against SpectreRF simulations and measurements on two differential CMOS oscillators tuned by resonators with very different Qls.
25-1-1
0-7803-5809-O/OO/$lO.OO
0 2000 IEEE IEEE 2000 CUSTOM INTEGRATED CIRCUITS CONFERENCE
569
+Csin(w,,
w,")t Dcos(w,,
w,,,)t
and here A=-C= i"x(L0,2/40,), while BzD-0. The relative signs
of A and C prove that the steady-state response to current noise in
the resonator's resistor is phase noise in the oscillator. The singlesideband phase noise density is found by the ratio of the sideband
power at a given frequency to the power in the fundamental oscillation frequency. Thus, the thermally induced phase noise density
due to resonator loss is:
where N,=2, the number of loss sources (in the left and right resonators) and N,=4 because uncorrelated quadrature noise originating at o,+o, contributes to SSB phase noise at offset w,.
DifferentialPair Noise
Noise originating in the differential pair is unlike the previous two
cases. There, only certain parts of the noise spectrum contributed
significantly to the total phase noise. White noise in the resonator is
filtered at harmonics of the resonant frequency. White noise in the
tail current only experiences a significant conversion gain around
the second harmonic of the oscillation frequency. However, the
simple model says that an impulse train samples white noise in the
differential pair, which if true, will cause it to accumulate without
bound at any specified offset frequency om.
In reality, any practical differential pair requires a non-zero input
voltage excursion to switch, and this is provided by the oscillation
waveform across the resonator. Therefore, noise in the differential
pair is actually not sampled by impulses, but by time windows of
finite width. The window height is proportional to transconductance, and width is set by tail current, and slope of the oscillation
waveform at zero crossing. The input-referred noise spectral density of the differential pair is inversely proportional to transconductance. Thus, the narrower the sampling window, that is, the
larger the sampling bandwidth, the lower the noise spectral density
[ 111. Analysis shows that the noise bandwidth product is constant,
and produces pure phase noise. After taking into account the accumulation of frequency translations throughout the sampling bandwidth, the following compact yet exact expression is reached:
We note that [8]has arrived at a similar analysis for the first two
sources of noise, but was unable to obtain a closed-form expression
for this last term.
570
25-1-2
amplitude of oscillation is smaller than the power supply, the differential pair acts as a pure current switch driving the resonator
and V,=(4/x)RIT [13]. Then the second term comprising F simplifies to 2y. This means that as tail current increases and assuming
gmblrSR
is held constant, the noise factor remains constant and phase
noise improves as V i , that is, as I,. This has been observed by
others [131. However, beyond a critical tail current the amplitude
Vuis pegged constant, limited by supply voltage. Further increases
in I, will cause the differential pairs contribution to noise factor to
rise, degrading phase noise proportionally to I, (Figure 4). Therefore, for least phase noise the tail current should be just enough to
drive the amplitude to its maximum possible value.
Aw _
- --
FundamentalSources of FM in Oscillators
In 1934, Groszkowski [151 while studying electronic oscillators
realized that the steady-state oscillation frequency seldom coincides with the natural frequency of the resonator which tunes
the oscillator. He found that the discrepancy arises because the
active device in the oscillator, such as the differential pair current
switch in the circuit considered here, drives the resonator with a
harmonic-rich waveform. T h e harmonics will flow into the lower
impedance capacitor (Figure 5) and upset the exact reactive power
balance between the Land the C required for steady state. Now the
frequency of oscillation must shift down until the reactive power in
the inductor increases to equal the reactive power in the capacitor
due to the fundamental and all harmonics. T h e shift, Am, is:
Aw _
--1
w,
c
2Q
n2(1-n2)
(1 - n2) n2/ Q
w,
2Q2w,L
+q
-
n2(1- n)
.m:
2Q2 n=2 (1 n)
n2/Q
Thus the sensitivity of the reactance to bias current or offset voltage in the differential pair is estimated, which is another means
whereby flicker noise modulates the frequency of oscillation.
Validation of Analysis
T h e phase noise model was validated on two CMOS differential
L C oscillators. One oscillator uses a low Q, on-chip inductor, while
the other uses off-chip inductors with large Q Flicker noise is
modelled as a bias-independent, gate-referred voltage source [ 141.
T h e measured data and SpectreRF simulations are plotted with
predictions based on this paper. Excellent agreement (Figure 7) is
found across the entire spectrum, which encompasses thermally
induced phase noise and upconverted flicker noise.
K. S. Kundert, Introduction to RF simulation and its application, IEEE
Journulof SolidSture Circuitr, pp. 1298-319, 1999.
25-1-3
571
Figure 5. Harmonics of oscillating current flow into capacitor, increasing its reactive energy.
Steady state frequency shifts
down until inductor energy balances.
T
Figure 2. Differential LC oscillator
biased by tail current.
Nonlinear active
/Y~LV~/(~L)~
-40
LO Voltage
-60
til
l%
% -80
.-
v)
z
al
2-100
a
c
-120
Approximate Model of Noise Pulses
J
Samdina
Impdsetain
Jime
N o i s h
-140
1
10
100
Offset Frequency,kHz
1000
10
100
Offset Frequency,kHz
1000
-60
-70
N
5 -80
I%
cm -90
.-v)
:-100
0
Phase
Noise
Oscillation
'
(ZI
v)
E -110
Bias Current, IT
-120
-130
Figure 7. Validation of the analysis presented in this paper. Measured phase noise is compared with predictions
from analysis, and with SpectreRF simulations. (a) 0.35-pm CMOS 1.1 GHz oscillator using resonator with
loaded Q o f 6. (b) 0.25-pm CMOS 830 MHz oscillator using discrete inductor with loaded of Q o f 25.
572
25-1-4
179
I. INTRODUCTION
180
Fig. 1. Typical plot of the phase noise of an oscillator versus offset from
carrier.
is the frequency, and is an arbitrary, fixed phase reference. Therefore, the spectrum of an ideal oscillator with no
random fluctuations is a pair of impulses at
. In a practical
oscillator, however, the output is more generally given by
(1)
(3)
where
and
are now functions of time and is a
periodic function with period 2 . As a consequence of the
fluctuations represented by
and
, the spectrum of a
practical oscillator has sidebands close to the frequency of
oscillation,
.
There are many ways of quantifying these fluctuations (a
comprehensive review of different standards and measurement
methods is given in [4]). A signals short-term instabilities are
usually characterized in terms of the single sideband noise
spectral density. It has units of decibels below the carrier per
hertz (dBc/Hz) and is defined as
1 Hz
(2)
(4)
where
1 Hz represents the single sidefrom the carrier with a
band power at a frequency offset of
measurement bandwidth of 1 Hz. Note that the above definition
includes the effect of both amplitude and phase fluctuations,
and
.
The advantage of this parameter is its ease of measurement.
Its disadvantage is that it shows the sum of both amplitude and
phase variations; it does not show them separately. However, it
is important to know the amplitude and phase noise separately
because they behave differently in the circuit. For instance,
the effect of amplitude noise is reduced by amplitude limiting
mechanism and can be practically eliminated by the application of a limiter to the output signal, while the phase noise
cannot be reduced in the same manner. Therefore, in most
applications,
is dominated by its phase portion,
, known as phase noise, which we will simply
denote as
.
181
(a)
(b)
(c)
Fig. 4. (a) Impulse injected at the peak, (b) impulse injected at the zero
crossing, and (c) effect of nonlinearity on amplitude and phase of the oscillator
in state-space.
(6)
III. MODELING
Note that the factor of 1/2 arises from neglecting the contribution of amplitude noise. Although the expression for the
noise in the
region is thus easily obtained, the expression
portion of the phase noise is completely empirical.
for the
As such, the common assumption that the
corner of the
phase noise is the same as the
corner of device flicker
noise has no theoretical basis.
The above approach may be extended by identifying the
individual noise sources in the tuned tank oscillator of Fig. 2
[8]. An LTI approach is used and there is an embedded
assumption of no amplitude limiting, contrary to most practical
cases. For the RLC circuit of Fig. 2, [8] predicts the following:
(7)
is yet another empirical fitting parameter, and
where
is the effective series resistance, given by
OF
PHASE NOISE
(8)
(9)
,
,
, and
are shown in Fig. 2. Note that it
where
from circuit parameters.
is still not clear how to calculate
Hence, this approach represents no fundamental improvement
over the method outlined in [3].
182
(a)
(b)
Fig. 5. (a) A typical Colpitts oscillator and (b) a five-stage minimum size
ring oscillator.
capacitor and will not affect the current through the inductor.
It can be seen from Fig. 4 that the resultant change in
and
is time dependent. In particular, if the impulse is applied
at the peak of the voltage across the capacitor, there will be no
phase shift and only an amplitude change will result, as shown
in Fig. 4(a). On the other hand, if this impulse is applied at the
zero crossing, it has the maximum effect on the excess phase
and the minimum effect on the amplitude, as depicted in
Fig. 4(b). This time dependence can also be observed in the
state-space trajectory shown in Fig. 4(c). Applying an impulse
at the peak is equivalent to a sudden jump in voltage at point
, which results in no phase change and changes only the
amplitude, while applying an impulse at point results only
in a phase change without affecting the amplitude. An impulse
applied sometime between these two extremes will result in
both amplitude and phase changes.
There is an important difference between the phase and
amplitude responses of any real oscillator, because some
form of amplitude limiting mechanism is essential for stable
oscillatory action. The effect of this limiting mechanism is
pictured as a closed trajectory in the state-space portrait of
the oscillator shown in Fig. 4(c). The system state will finally
approach this trajectory, called a limit cycle, irrespective of
its starting point [10][12]. Both an explicit automatic gain
control (AGC) and the intrinsic nonlinearity of the devices
act similarly to produce a stable limit cycle. However, any
fluctuation in the phase of the oscillation persists indefinitely,
with a current noise impulse resulting in a step change in
phase, as shown in Fig. 3. It is important to note that regardless
of how small the injected charge, the oscillator remains time
variant.
Having established the essential time-variant nature of the
systems of Fig. 3, we now show that they may be treated as
linear for all practical purposes, so that their impulse responses
and
will characterize them completely.
The linearity assumption can be verified by injecting impulses with different areas (charges) and measuring the resultant phase change. This is done in the SPICE simulations of
the 62-MHz Colpitts oscillator shown in Fig. 5(a) and the fivestage 1.01-GHz, 0.8- m CMOS inverter chain ring oscillator
shown in Fig. 5(b). The results are shown in Fig. 6(a) and (b),
respectively. The impulse is applied close to a zero crossing,
(a)
(b)
Fig. 6. Phase shift versus injected charge for oscillators of Fig. 5(a) and (b).
(11)
(a)
183
(b)
Fig. 7. Waveforms and ISFs for (a) a typical LC oscillator and (b) a typical
ring oscillator.
where
represents the input noise current injected into the
node of interest. Since the ISF is periodic, it can be expanded
in a Fourier series
(12)
where the coefficients
are real-valued coefficients, and
is the phase of the th harmonic. As will be seen later,
is not important for random input noise and is thus
neglected here. Using the above expansion for
in the
superposition integral, and exchanging the order of summation
and integration, we obtain
(13)
B. Phase-to-Voltage Transformation
Equation (13) allows computation of
for an arbitrary input
current
injected into any circuit node, once the various
Fourier coefficients of the ISF have been found.
As an illustrative special case, suppose that we inject a low
frequency sinusoidal perturbation current
into the node of
interest at a frequency of
(14)
where
is the maximum amplitude of
. The arguments
of all the integrals in (13) are at frequencies higher than
and are significantly attenuated by the averaging nature of
the integration, except the term arising from the first integral,
which involves . Therefore, the only significant term in
will be
(15)
As a result, there will be two impulses at
in the power
spectral density of
, denoted as
.
As an important second special case, consider a current at a
frequency close to the carrier injected into the node of interest,
given by
. A process similar to that
of the previous case occurs except that the spectrum of
184
(a)
(b)
Fig. 9. Simulated power spectrum of the output with current injection at (a)
f
MHz and (b) f0 f
:
GHz.
m = 50
+ m = 1 06
Fig. 10. Simulated and calculated sideband powers for the first ten coefficients.
185
(a)
(b)
Fig. 11.
bands.
Conversion of noise to phase fluctuations and phase-noise sideFig. 12. (a) PSD of
spectrum, Lf ! g.
(t)
(19)
corner,
, is the frequency where
The phase noise
the sideband power due to the white noise given by (21) is
equal to the sideband power arising from the
noise given
by (23), as shown in Fig. 12. Solving for
results in the
following expression for the
corner in the phase noise
spectrum:
. As a result
(24)
(21)
186
Fig. 14.
0(x), 0e (x), and (x) for the Colpitts oscillator of Fig. 5(a).
Fig. 13. Collector voltage and collector current of the Colpitts oscillator of
Fig. 5(a).
(26)
E. Predicting Output Phase Noise with Multiple Noise Sources
As can be seen, the cyclostationary noise can be treated as
a stationary noise applied to a system with an effective ISF
given by
(27)
can be derived easily from device noise characterwhere
istics and operating point. Hence, this effective ISF should be
187
Fig. 15.
(28)
0(x), 0e (x), and (x) for the ring oscillator of Fig. 5(b).
188
(a)
(b)
(a)
(b)
m = 50 MHz
(c)
(d)
Fig. 16. (a) Waveform and (b) ISF for the asymmetrical node. (c) Waveform
and (d) ISF for one of the symmetrical nodes.
Fig. 18. Simulated and predicted sideband power for low frequency injection
versus PMOS to NMOS W=L ratio.
Fig. 19.
189
m = 100
3 + m = 16 3 MHz.
+ m=55
2 + m = 10 9
190
Fig. 22. Power of the sidebands caused by low frequency injection into
symmetric and asymmetric nodes of the ring oscillator.
Fig. 23. Phase noise measurements for a five-stage single-ended CMOS ring
oscillator. f0 = 232 MHz, 2-m process technology.
Fig. 24. Phase noise measurements for an 11-stage single-ended CMOS ring
oscillator. f0 = 115 MHz, 2-m process technology.
191
Fig. 26. Sideband power versus the voltage controlling the symmetry of the
waveform. Seven-stage current-starved single-ended CMOS VCO. f0 = 50
MHz, 2-m process technology.
Fig. 27. Phase noise measurements for a four-stage differential CMOS ring
oscillator. f0 = 200MHz, 0.5-m process technology.
Fig. 25. Effect of symmetry in a seven-stage current-starved single-ended
CMOS VCO. f0 = 60 MHz, 2-m process technology.
is
for
constant at 60 MHz. As can be seen, making the waveform
more symmetric has a large effect on the phase noise in the
region without significantly affecting the
region.
Another experiment on the same circuit is shown in Fig. 26,
which shows the phase noise power spectrum at a 10 kHz
offset versus the symmetry-controlling voltage. For all the
data points, the control voltages are adjusted to keep the
oscillation frequency at 50 MHz. As can be seen, the phase
noise reaches a minimum by adjusting the symmetry properties
of the waveform. This reduction is limited by the phase noise
in
region and the mismatch in transistors in different
stages, which are controlled by the same control voltages.
The seventh experiment is performed on a four-stage differential ring oscillator, with PMOS loads and NMOS differential
stages, implemented in a 0.5- m CMOS process. Each stage is
tapped with an equal-sized buffer. The tail current source has
a quiescent current of 108 A. The total capacitance on each
fF
of the differential nodes is calculated to be
and the voltage swing is
V, which results in
fF. The total channel noise current on each node
192
Fig. 28. Sideband power versus capacitive division ratio. Bipolar LC Colpitts
oscillator f0 = 100 MHz.
use
ratios of about four (corresponding to
Colpitts oscillators [17].
) in
VI. CONCLUSION
This paper has presented a model for phase noise which
explains quantitatively the mechanism by which noise sources
of all types convert to phase noise. The power of the model
derives from its explicit recognition of practical oscillators
as time-varying systems. Characterizing an oscillator with the
ISF allows a complete description of the noise sensitivity
of an oscillator and also allows a natural accommodation of
cyclostationary noise sources.
This approach shows that noise located near integer multiples of the oscillation frequency contributes to the total
phase noise. The model specifies the contribution of those
noise components in terms of waveform properties and circuit
parameters, and therefore provides important design insight by
identifying and quantifying the major sources of phase noise
degradation. In particular, it shows that symmetry properties
of the oscillator waveform have a significant effect on the
upconversion of low frequency noise and, hence, the
corner of the phase noise can be significantly lower than
device noise corner. This observation is particularly
the
important for MOS devices, whose inferior
noise has been
thought to preclude their use in high-performance oscillators.
APPENDIX
CALCULATION OF THE IMPULSE SENSITIVITY FUNCTION
In this Appendix we present three different methods to
calculate the ISF. The first method is based on direct measurement of the impulse response and calculating
from
it. The second method is based on an analytical state-space
approach to find the excess phase change caused by an impulse
of current from the oscillation waveforms. The third method
is an easy-to-use approximate method.
for a few cycles afterwards. By sweeping the impulse injection time across one cycle of the waveform and measuring
the resulting time shift
,
can calculated noting
that
, where
is the period of oscillation.
Fortunately, many implementations of SPICE have an internal
feature to perform the sweep automatically. Since for each
impulse one needs to simulate the oscillator for only a few
cycles, the simulation executes rapidly. Once
is
found, the ISF is calculated by multiplication with
. This
method is the most accurate of the three methods presented.
B. Closed-Form Formula for the ISF
An th-order system can be represented by its trajectory in
an -dimensional state-space. In the case of a stable oscillator,
the state of the system, represented by the state vector, ,
periodically traverses a closed trajectory, as shown in Fig. 29.
Note that the oscillator does not necessarily traverse the limit
cycle with a constant velocity.
In the most general case, the effect of a group of external
impulses can be viewed as a perturbation vector
which
. As
suddenly changes the state of the system to
discussed earlier, amplitude variations eventually die away,
but phase variations do not. Application of the perturbation
impulse causes a certain change in phase in either a negative
or positive direction, depending on the state-vector and the
direction of the perturbation. To calculate the equivalent time
shift, we first find the projection of the perturbation vector on
a unity vector in the direction of motion, i.e., the normalized
velocity vector
(31)
193
the speed
(32)
which results in the following equation for excess phase caused
by the perturbation:
(33)
In the specific case where the state variables are node
voltages, and an impulse is applied to the th node, there will
be a change in
given by (10). Equation (33) then reduces
to
(34)
(35)
(38)
(36)
ACKNOWLEDGMENT
It can be seen that this expression for the ISF is maximum
during transitions (i.e., when the derivative of the waveform
function is maximum), and this maximum value is inversely
proportional to the maximum derivative. Hence, waveforms
with larger slope show a smaller peak in the ISF function.
In the special case of a second-order system, one can use
the normalized waveform
and its derivative as the state
variables, resulting in the following expression for the ISF:
(37)
where
represents the second derivative of the function . In
the case of an ideal sinusoidal oscillator
, so that
, which is consistent with the argument
of Section III. This method has the attribute that it computes
the ISF from the waveform directly, so that simulation over
only one cycle of is required to obtain all of the necessary
information.
C. Calculation of ISF Based on the First Derivative
This method is actually a simplified version of the second
approach. In certain cases, the denominator of (36) shows little
variation, and can be approximated by a constant. In such a
case, the ISF is simply proportional to the derivative of the
waveform. A specific example is a ring oscillator with
194
56
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999
Transactions Briefs
A Study of Oscillator Jitter Due
to Supply and Substrate Noise
Frank Herzel and Behzad Razavi
(a)
(b)
Fig. 1. Single-ended ring oscillator: (a) block diagram and (b) implementation of one stage.
I. INTRODUCTION
High-speed digital circuits such as microprocessors and memories
employ phase locking at the board-chip interface to suppress timing
skews between the on-chip clock and the system clock [1][3].
Fabricated on the same substrate as the rest of the circuit, the
phase-locked loop (PLL) must typically operate from the global
supply and ground busses, thus experiencing both substrate and
supply noise. The noise manifests itself as jitter at the output of the
PLL, primarily through various mechanisms in the voltage-controlled
oscillator (VCO). As exemplified by measured results reported in the
literature, we show that the contribution of device electronic noise
to jitter is typically much less than that due to supply and substrate
noise.
This paper describes the effect of supply and substrate noise on
the performance of single-ended and differential ring oscillators,
providing insights that prove useful in the design of other types of
oscillators as well. Section II summarizes the oscillators studied in
this work and Section III defines various types of jitter. Sections IV
and V quantify the jitter due to thermal noise in the oscillation
loop and frequency-modulating noise, respectively. Sections VI and
VII apply the developed results to the analysis of supply and
substrate noise, and Section VIII presents the dependence of jitter
upon parameters such as device size, the number of stages, and power
dissipation.
II. RING OSCILLATORS
UNDER INVESTIGATION
(a)
(b)
Fig. 2. Differential ring oscillator: (a) block diagram and (b) implementation
of one stage.
=1
=1
=0
III. DEFINITIONS
= 80
=3
OF
()
JITTER
1 =
()
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999
57
because the latter type hardly changes when the oscillator is placed
in the loop.
A more general quantification of the jitter is possible by means of
the steady-state autocorrelation function (ACF) defined as
N
C1T m
Tn+m Tn :
(4)
N !1 N
n=1
To obtain an intuitive understanding of this quantity, we insert (4)
with m
in (2), obtaining
( ) = lim 1
=0
(1
1 )
(a)
(5)
Equation (5) states that the ACF with zero argument is the squared
cycle jitter. For a nonzero argument, the ACF decreases with increasing m, finally approaching zero for m ! 1. This indicates that
the timing error Tn has a finite memory. In order to express the
cycle-to-cycle jitter by the ACF, we rewrite (3) as
N
Tcc2
Tn+1 0 Tn 2
N !1 N
n=1
C1T 0 C1T :
(6)
(b)
1 = lim 1 (1
1 )
= 2 (0) 2 (1)
( )=
(1)
1
1Tc = Nlim
!1
N
N
n=1
1Tn2 :
(2)
1
1Tcc = Nlim
!1
N
(Tn+1 0 Tn )2
n=1
(3)
S (!) =
3
2
!(0!=40!10 )T2 cc (7)
and ! 0 !0 is the offset
(8)
(9)
In [9], the rms value of absolute jitter has been divided by the
square root of the measurement time to obtain a time-independent
figure of merit. This is not possible for supply and substrate noise,
since (7)(9) are derived for white noise in the feedback loop. Supply
and substrate noise, however, are generally not white.
58
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999
(12)
0
0 VmfK
2 cos !m t:
(13)
1 (+ )
(14)
1 ()
(1)
(0)
=0
=1
=0
=0
1Tc = Vpm2Kf 20 :
(15)
For
and
cycle-to-cycle jitter
Equations (15) and (16) express the jitter in terms of the lowfrequency sensitivity and the modulation frequency. They will be
verified numerically in Sections VI and VII. The main benefit of
these equations is that the calculation of the jitter is reduced to the
calculation or measurement of the oscillation frequency as a function
of the supply or substrate voltage.
From (15), we note that cycle jitter is independent of frequency
so long as the quasi-static approximation (11) holds. By contrast,
cycle-to-cycle jitter increases with frequency. For fm f0 we find
from (16)
1Tcc VmpK2f0 !3 m :
(17)
0
Note that the cycle-to-cycle jitter 1Tcc is approximately proportional
to the modulation frequency fm . This can be interpreted by noting
that 1Tcc is a double-differential quantity, as is evident from (5)
and (6).
Having reduced the jitter calculation to the static sensitivity K0 ,
we need to extract this quantity from simulations. For this purpose,
we apply a dc voltage perturbation to the supply. Fig. 6 shows the
oscillation frequency of the SERO and the DRO as a function of the
supply voltage. The frequency varies linearly with the supply voltage
over a relatively wide range of VDD . The slope of the curves in Fig. 6
represents the low-frequency sensitivity K0 , indicating that the SERO
is much more sensitive than the DRO. Using these values in (15) and
(16), we can predict the jitter from supply and substrate noise easily.
(10)
VI. JITTER DUE
(16)
(11)
TO
SUPPLY NOISE
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999
59
(a)
(a)
(b)
(b)
Fig. 6. Oscillation frequency of (a) the single-ended ring oscillator and (b)
the differential ring oscillator as a function of static supply voltage.
Fig. 7. Cycle jitter and cycle-to-cycle jitter of (a) the SERO and (b) the DRO
as a function of supply voltage noise frequency. The solid lines represent the
quasi-static FM expressions.
TO
SUBSTRATE NOISE
FOR
LOW JITTER
60
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999
(a)
Fig. 10.
DRO.
(b)
Fig. 12. Illustration of the relationship between power consumption and
noise for (a) device electronic noise and (b) supply noise.
In applications where the required oscillation frequency is considerably lower than the maximum speed of the technology, a ring
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999
IMPACT
OF
61
TABLE I
POWER CONSUMPTION
TABLE II
THREE-STAGE VERSUS SIX-STAGE OSCILLATOR
Fig. 14. Grounded shield used under the capacitor to block substrate noise.
( ) = V0 cos[!0t + (t)]
V t
(18)
()
Fig. 13. Jitter of the three-stage and the six-stage DRO versus gate width
for an oscillation frequency of 500 MHz.
( )=
t
oscillator may incorporate more than three stages. Thus, the optimum
number of stages with respect to the jitter is of interest.
Shown in Table II and plotted in Fig. 13 is the jitter of three-stage
and six-stage oscillators designed for a frequency of 500 MHz with
constant tail current and voltage swings. We note that the minimum
values of cycle jitter and cycle-to-cycle jitter are smaller in a threestage topology. This is because for the three-stage oscillator, the
reduction of the oscillation frequency to the desired value is obtained
by means of the fixed capacitances CL rather than by the voltagedependent capacitances of the transistors. Hence, a smaller fraction
of the total load capacitance is subject to variations with supply and
substrate noise.
The addition of a fixed capacitor to each stage nonetheless entails
the issue of substrate noise coupling to the bottom plate of the
capacitor. In order to minimize this effect, a grounded shield must
isolate the capacitor from the substrate, as illustrated in Fig. 14. Here,
1!(u) du + (0):
(19)
(20)
1 (1 )
1 ()
1 ()
1!(t + )1!(t) = 2D( )
(21)
where D is the diffusivity and ( ) the Delta function. The
probability density of (t) represents a Gaussian distribution centered
62
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSII: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 46, NO. 1, JANUARY 1999
at
2 = 2D t:
(22)
On the other hand, the phase noise can be expressed by the cycle-tocycle jitter by inserting (32) in (25), yielding
As evident from (22), the variance diverges with time. The autocorrelation of V t is known [12] and reads
()
2
hV (t + )V (t)i = V20 exp(0Dj j)cos(!0 ):
(24)
(! 0 !0)2 + D2 :
This quantity is often normalized to V02 =2 and referred to as relative
2D
(! 0 !0 )2 + D2 :
()
=0
2f0Tn = 2 + 1n :
(27)
1n .
(28)
1Tn = 21fn0 = 1n 2T :
(29)
Hence, the cycle jitter 1Tc of the period during one cycle is related
to 1c according to
1Tc = 1c 2T :
(30)
For white noise sources, two successive periods are uncorrelated.
Since cycle-to-cycle jitter represents the difference between two
periods, the variance of cycle-to-cycle jitter is twice as large as the
variance of one period, yielding
(31)
1Tcc2 = 8!3 D
0
with
!0 =
2 :
T
(32)
(33)
The cycle-to-cycle jitter can now be expressed in terms of the singlesideband phase noise by inserting (32) in (26) to give
S =
f03 1Tc2
(f 0 f0 )2
(36)
=1
1Tcc = 21Tc :
(35)
(26)
For the deviation of the nth period Tn from the mean period
=f0 , we then find
T
(25)
=
1c = 2DT:
1Tcc4
where f0
!0 = . Equation (36) turns out to be a special case of
(35) for ! 0 !0 D .
The absolute jitter increases proportionally to the square root of the
measurement interval t as evident from (22). Hence, the absolute
phase jitter is
()
A similar expression has been derived for ring oscillators in [10] and
reads in our notation
D
SV (!) = V02
For ! 0 !0
(23)
S (!) =
S =
(34)
(37)
=
(38)
REFERENCES
[1] I. A. Young, J. K. Greason, and K. L. Wong, A PLL clock generator
with 5 to 110 MHz of lock range for microprocessors, IEEE J. SolidState Circuits, vol. 27, pp. 15991607, Nov. 1992.
[2] J. Alvarez, H. Sanchez, G. Gerosa, and R. Countryman, A widebandwidth low-voltage PLL for powerPCTM microprocessors, IEEE
J. Solid-State Circuits, vol. 30, pp. 383391, Apr. 1995.
[3] R. Bhagwan and A. Rogers, A 1 GHz dual-loop microprocessor PLL
with instant frequency shifting, in IEEE Proc. ISSCC, San Francisco,
CA, Feb. 1997, pp. 336337.
[4] D. B. Leeson, A simple model of feedback oscillator noise spectrum,
in Proc. IEEE, pp. 329330, Feb. 1966.
[5] B. Razavi, A study of phase noise in CMOS oscillators, IEEE J.
Solid-State Circuits, vol. 31, pp. 331343, Mar. 1996.
[6] T. Kwasniewski et al., Inductorless oscillator design for personal
communications devicesA 1.2 m CMOS process case study, in
Proc. CICC, May 1995, pp. 327330.
[7] F. X. Kartner, Analysis of white and f 0 noise in oscillators, Int. J.
Circuits Theory, Appl., vol. 18, pp. 485519, Sept. 1990.
[8] W. Anzill and P. Russer, A general method to simulate noise in oscillators based on frequency domain techniques, IEEE Trans. Microwave
Theory Tech., vol. 41, pp. 22562263, Dec. 1993.
[9] J. A. McNeill, Jitter in ring oscillators, IEEE J. Solid-State Circuits,
vol. 32, pp. 870879, June 1997.
[10] T. C. Weigandt, B. Kim, and P. R. Gray, Analysis of timing jitter in
CMOS ring oscillators, in Proc. IEEE Int. Symp. Circuits and Systems
(ISCAS94), London, U.K., June 1994, vol. 4, pp. 2730.
[11] C. W. Gardiner, Handbook of Stochastic Methods. Berlin: SpringerVerlag, 1983.
[12] R. L. Stratonovich, Topics in the Theory of Random Noise. New York:
Gordon and Breach, 1967.
204
I. INTRODUCTION
ODERN transceivers for wireless communication consist of low-noise amplifiers, power amplifiers, mixers,
DSP chips, filters, and frequency synthesizers. These building
blocks have been realized using hybrid technologies and require
interfacing circuits, which increases the power consumption and
limits the maximum operating speed of the transceivers. For
this reason, it has become increasingly attractive to design and
monolithically integrate all these building blocks on a single
chip.
Designing fully integrated frequency synthesizers for this integration is always desirable but most challenging. The first
requirement is to achieve high-frequency operation with reasonable power consumption. However, the most critical challenges for the frequency synthesizer are the phase-noise and
spurious-level performance. Finally, small chip area is essential
to monolithic system integration.
In recent years, monolithic frequency synthesizers with
good phase-noise performance have been reported [1][3].
However, those designs operate at supply voltages of at least
2.7 V and power consumption of more than 50 mW. Moreover,
frequency synthesizers suffer from fractional
fractionalspurs which degrade their spurious-tone performance.
This paper presents a monolithic dual-loop frequency
synthesizer for GSM 900 system, which is implemented in a
0.5- m CMOS process, that achieves high operating frequency
(935.2959.8 MHz), low power consumption (34 mW), low
phase noise ( 121.8 dBc/Hz at 600kHz), low spurious level
( 82.0 dBc at 11.3MHz), and fast switching time (830 s).
Section II derives the design specification of the frequency
synthesizer for GSM 900. Section III describes the architecture for the proposed dual-loop design. In Section IV,
Manuscript received December 29, 1999; revised October 5, 2000.
The authors are with the Department of Electrical and Electronic Engineering,
Hong Kong University of Science and Technology, Clear Water Bay, Kowloon,
Hong Kong (e-mail: eetak@ee.ust.hk; eeluong@ee.ust.hk).
Publisher Item Identifier S 0018-9200(01)00927-1.
(1)
MHz
(2)
205
Fig. 2. SNR degradation due to the phase noise and spurious level.
SNR
dBc/Hz at 600 kHz
(3)
MHz.
(4)
respectively. For system monitoring purposes, a time slot between slot #6 and slot #7 is adopted. Therefore, the most critical
switching time is from the transmission period (slot #4) to the
system monitoring period (slot #6.5), which is equal to 865 s.
However, to take care of the settling time of the other components, the switching time of the frequency synthesizer is recommended to be kept within one time slot (577 s) [5].
III. DUAL-LOOP DESIGN
To reduce the switching time and the chip area of a synthesizer, a high loop bandwidth and a high reference frequency
are desired. Moreover, to suppress the phase-noise contribution of the reference signal and improve frequency-divider complexity, a lower frequency-division ratio is desirable. Therefore,
a dual-loop frequency synthesizer is proposed [6]. As shown in
Fig. 4, the dual-loop design consists of two reference signals and
two phase-locked loops (PLLs) in cascade configuration. In the
feedback path of the high-frequency loop, a mixer is adopted to
provide the frequency shift. The output frequency of the synthesizer is expressed as follows:
D. Switching Time
In GSM 900 systems, time-division multiple-access (TDMA)
is adopted within each frequency channel. As shown in Fig. 3,
each frequency channel is divided into eight time slots. Signals are received and transmitted in time slot #1 and slot #4,
(5)
and
are frequencies of the two reference
where
, and
are frequency division ratios.
signals, and
206
Fig. 4.
Due to the dual-loop architecture, the comparison frequencies of the low-frequency and high-frequency loops are scaled
up from 200 kHz to 1.6 and 11.3 MHz, respectively. Therefore,
the loop bandwidths of both PLLs can be increased so that the
switching time and the chip area can be reduced. Compared
to single-loop integer- designs, the frequency-division ratio
is reduced from 42364449
of the programmable divider
to 226349. Such a reduction in the division ratio significantly
simplifies the frequency-divider design and reduces phase-noise
contribution of the input reference signal.
In the proposed dual-loop synthesizer, the divide-by-32 diand the high-frequency loop together greatly attenuvider
ates the phase noise and the spurious tones of the low-frequency
loop. As such, the low-frequency loop can be designed to have
a larger loop bandwidth and a loop filter as small as one-fifth
of the loop filter in the high-frequency loop. The low-frequency
loop requires additional components, including the phase-frequency detector (PFD1), the charge pump (CP1), and the fre, but they are all quite small and have very
quency divider
little impact on the chip area. In additional, VCO1 is implemented by a ring oscillator, which occupies a much smaller chip
area compared to VCO2. Altogether, the dual-loop design requires no more than 25% overhead in the chip area compared to
a fraction- design with the same loop bandwidth.
of the
Although the input-reference frequency
low-frequency loop is scaled up by 8 times; the required
frequency range of the oscillator VCO1 in the low-frequency
loop is also scaled up from 25 to 200 MHz. On the other
hand, the phase-noise of the ring oscillator is attenuated by the
and is then amplified by the high-frefrequency divider
quency loop; the total phase-noise attenuation from VCO1
output to the synthesizer output is 18 dB. Consequently, this
voltage-controlled oscillator (VCO) requires a high operating
frequency (600 MHz), a wide frequency range (200 MHz), and
a low phase noise ( 103 dBc/Hz at 600 kHz). A novel ring
VCO design that meets all of these tough specifications will be
presented in the next section.
(6)
is transconductance,
is channel conductance, and
where
is the total capacitance at output node. Oscillation starts
is large enough to overcome the output load
when the
.
V, transistors
are turned
When control voltage
, and the oscillator operates at maximum freon to cancel
, transistor
are
quency. When control voltage
207
Fig. 5. Circuit implementation of the ring oscillator VCO1. (a) Ring oscillator. (b) Delay cell. (c) Half circuit of delay cell.
turned off
, and the oscillator operates at minimum
, and frequency range
are
frequency. By (6),
expressed as follows:
(7)
is proportional to
, nMOS transistors
Since
are adopted as the input devices to minimize power consumption. From (7), 50% tuning range can be achieved when
.
Based on the approximate impulse-stimulus function (ISF)
and the analysis presented in [8], the phase noise of the oscillator is estimated to be approximately 107 dBc/Hz at 600-kHz
offset. On the other hand, using SpectreRF [9], the phase noise
is simulated to be 111.7 dBc/Hz at 600 kHz.
B. LC Oscillator VCO2
As the far-offset phase noise is dominated by the VCO2, an
LC oscillator is adopted to meet the stringent phase-noise specification. Fig. 6 shows the schematic of the LC oscillator. Crossare used to start and to maintain oscilcoupled transistors
lation with lower parasitics. PN-junction varactors implemented
by p diffusion on the n-well are used for frequency-tuning purpose. The common-mode output voltage is designed at 1.1 V
. To reduce
to enhance the driving of the frequency divider
phase-noise contribution due to flicker noise, pMOS transistors
are used as the current source.
To design an LC oscillator which satisfies the phase-noise requirement with minimum power consumption, inductors with
large inductance and small series resistance are desired. Therefore, two-layer inductors are adopted [10] for which the inductance and the quality factor can be scaled up by 4 and 2 times,
respectively. For the same reason, pn-junction varactors are interdigitized with p islands surrounded by n-well contacts to
enhance the quality factor. Finally, the transconductance
of transistor
is designed so that it does not overcompen-
Fig. 6.
208
TABLE I
SYSTEM DESIGN OF PROGRAMMABLE-FREQUENCY DIVIDER
control signal
1, the gated inverter is by-passed,
and the prescaler is a divide-by-12 divider. When control signal
0, the final state
010 will be de0 and the input signal is delayed by
tected, at which
one clock cycle. Thus the function of divide-by-13 is achieved.
The back-carrier-propagation approach allows low-frequency
signals (more significant bits) to switch to the final state much
earlier than high-frequency signals (less significant bits) and
thus reduces power consumption for a given speed.
E. Charge Pumps and Loop Filters
input signal by
, and its output is counted by both and
counters. After the counter has counted pulses, the
counter changes the state of the modulus control line
and
counter counts
the prescaler divides input by . Then the
the remaining cycles to reach overflow. As a whole, the
programmable divider generates one complete cycle for every
input cycles. The operation
repeats after the counter is reset.
As the frequency-division ratio (226 349) can be achieved
with different combinations of , , and , the most optimal
combination in terms of performance needs to be identified and
chosen as the design. As the counter must finish before the
counter resets it, the division ratio of the counter should
be larger than that of the counter. To optimize the power consumption, the operating frequencies and number of bits of both
the and counters should be minimized.
Table I shows the different combinations of , , and
which can implement the desired division ratio. Case 1 requires
the highest operating frequencies and number of bits for and
counters, so it is not adopted. Case 4 has the problem that
the value is larger than the value. It seems that Case 3 is
the best one, but Case 2 is chosen as the final design because it
is much easier to implement an asynchronous divide-by-12 frequency divider than a divide-by-14 divider.
The dual-modulus prescaler is implemented by the back-carrier-propagation approach as shown in Fig. 9 [14]. When
(8)
where
and ,
209
Fig. 10.
and pole time constant of the corresponding loop filters, respecis the phase-detector gain,
is the VCO gain,
tively.
, and are the total capacitance, zero time constant,
and
and pole time constant of the corresponding loop filters, respectively. Since the transfer function is a low-pass function, the
reference phase noise is highly attenuated at high offset frequency. It also shows that the close-in phase noise of the reference signals is amplified by the frequency-division ratio. In
this work, the division ratio is reduced from 4449 to 349, and
the phase-noise contribution from the reference signals is supdB.
pressed by
Another important source of the close-in phase noise is
the charge-pump noise. The transfer functions between the
(9)
210
It shows that small frequency-division ratio and large phasedetector gain or large charge-pump current are preferred for
phase-noise consideration. Another factor not included in (9)
which also affects the charge-pump noise is its turn-on time.
In this proposed synthesizer, the charge-pump turn-on time is
designed to be equal to 1/10 of the input period so that it is
long enough to eliminate the phase-frequency detector (PFD)
dead-zone problem, but at the same time is short enough to minimize the charge-pump phase-noise contribution.
The phase-noise contribution of the loop-filter resistors in
both PLLs can be estimated using their equivalent noise currents as follows:
(12)
(10)
which are bandpass functions with peaks appearing between the
zero and the pole of the loop filter. To suppress the phase-noise
peaking, large loop-filter capacitors are desired at the cost of
large chip area.
For the phase-noise contribution of the VCOs, the transfer
function between the VCO phase noise and output phase noise
can be found to be
dB close-in
which shows that there exists
phase-noise suppression for the low-frequency loop.
The estimated phase noise of the whole synthesizer is
81.4 dBc/Hz at 20.9 kHz and 123.8 dBc/Hz at 600 kHz.
The contribution of each component is shown in Fig. 12, which
shows that the close-in phase noise ( 100 kHz) is dominated
by the charge pump CP1 and loop filter LF1, while the far-offset
phase noise ( 100 kHz) is dominated by the LC oscillator.
V. EXPERIMENTAL RESULTS
(11)
211
B. Measurement of Varactors
The measurement results of the pn-junction varactor at
900 MHz are shown in Fig. 15. As the p diffusion of the
varactors used in the LC oscillator are connected to the output
of the LC oscillator core, they are biased at 1.16 V, which is
the dc bias of the oscillator core during the measurement. The
measured capacitance is close to the estimated results in the
is around 2
reverse-biased region. The series resistance
due to the minimum junction spacing and the nonminimum
junction width. The quality factor is around 30 in the operating
region of the oscillator.
C. Measurement of Ring Oscillator VCO1
Fig. 12. Estimated phase noise of the whole dual-loop frequency synthesizer
and contribution of each components at the synthesizer output.
The phase noise of the oscillators are measured by a directphase-noise measurement [17]. First, the carrier power is determined at large video (VBW) and resolution bandwidths (RBW).
Then, the resolution bandwidth is reduced until the noise edges
and not the envelope of the resolution filter are displayed. Finally, the phase noise is measured at the corresponding frequency offset from the carrier. To make sure that the measured
phase noise is valid, the displayed values must be at least 10 dB
above the intrinsic noise of the analyzer.
Fig. 16 shows the measurement results of the ring oscillator
VCO1. The operating frequency is measured to be between
324.0 and 642.2 MHz, over which the measured phase noise
is between 111 and 108 dBc/Hz at 600 kHz. The power
consumption is around 10 mW.
D. Measurement of LC Oscillator VCO2
Fig. 13.
Fig. 13 shows the die photo of the dual-loop frequency synthesizer, and the active area of the synthesizer is 2.64 mm .
For characterization and measurement of passive devices,
testing structures for spiral inductors and varactors are included
on the same die with the synthesizer and are measured by a
network analyzer. To de-embed the probing-pad parasitics, an
open-pad structure is also measured.
A. Measurement of Inductors
, and
Fig. 14 shows the inductance , series resistance
of the on-chip spiral inductor. The measured
quality factor
inductance is close to simulation results and drops at frequencies close to the self-resonant frequency. However, the series
(30.2 ) is almost three times larger than the exresistance
pected value (11.6 ). The increase in series resistance is mainly
caused by eddy current induced within substrate and n-well fingers [16]. As series resistance increases significantly, the port-1
quality factor is limited to be 1.6 at 900 MHz.
212
Fig. 14.
Measurement results and equivalent circuit model of the spiral inductors at 900 MHz.
Fig. 15.
Measurement results and bias condition of the pn-junction varactors at 900 MHz.
Fig. 16.
213
Fig. 19.
Fig. 17.
Fig. 18.
Table II summarizes the measured performance of the proposed frequency synthesizer, and Table III lists the performance
of other fully integrated synthesizers for comparison. The proposed synthesizer operates at a single 2-V supply while all other
designs require supply voltages of at least 2.7 V. Note that since
the designs [1] and [3] operate at higher frequencies, their power
consumption should be scaled down accordingly for a fair comparison. With this frequency normalization, the power consumption of the proposed dual-loop synthesizer is still comparable to
that of the other designs.
The synthesizer presented in [1] is a fractional- design
with a 26.6-MHz comparison frequency and a loop bandwidth
of 45 kHz. As comparison, the low-frequency loop of this
work has a comparison frequency of only 1.6 MHz but a loop
bandwidth of up to 40 kHz because of the relaxed requirement
of the spurious tones. On the other hand, the high-frequency
loop uses a 11.3-MHz comparison frequency but a loop
bandwidth limited to 27 kHz, which is the real limiting factor
of the switching-time performance. Therefore, it is believed
that a switching time close to that in [1] can be achieved by
214
Fig. 20.
respectively. For the same reason, the total loop-filter capacitance can be smaller than 60 pF, which greatly reduces chip area.
However, the situation would be much different if channel programmability is included.
Although the proposed synthesizer consists of two loop filters, but the chip area is just a little bit larger than that of the
design in [3] due to the use of linear capacitors and silicideblocked resistors. Compared to the designs in [1] and [3], the
spurious levels are between 75 and 85 dBc, which indicates
that CMOS designs suffer the same problem from the substrate
coupling between the reference signal and VCO. However, for
the close-in phase noise, the proposed dual-loop synthesizer suffers from the 15-dB increase at the peak due to the charge pumps
and loop filters as discussed in Section V-E.
I. Generation of the Second Reference Sources
215
TABLE III
PERFORMANCE COMPARISON OF RECENT WORK ON FULLY INTEGRATED FREQUENCY SYNTHESIZERS
In addition to the close-in phase noise, the far-offset phasenoise requirement would also be more stringent by the same
amount. Assuming that VCO1 is adopted in the third PLL, its
phase noise could be improved to be 116 dBc/Hz at 600 kHz
at a 204-MHz operation. With a 30.6-dB filtering effect of the
high-frequency loop, the phase-noise contribution by the VCO
of the third PLL would become 146.6 dBc/Hz at 600 kHz,
which would have negligible effect on the overall phase noise.
Basically, the implementation of the third PLL would be similar to that of the low-frequency loop, and its chip area would be
less than 10% of the total area of the dual-loop synthesizer. Since
the third VCO operates at half the frequency of VCO1, its power
consumption would be 25% of that of VCO1 ( 2.5 mW). Similarly, since the divide-by-128 divider also operates at half of the
frequency, it would consume only half the power as compared
( 0.3 mW). Although the charge-pump curto the divider
rent should be increased by 100 times to 640 A, the average
power current would be only 64 A since the turn-on time is
only around 1/10 of the input period. In conclusion, by introducing the third PLL to generate the second reference signal,
the additional power required would be less than 3 mW, and the
increase in the total chip area would still be less than 10%.
VI. CONCLUSION
A 900-MHz monolithic CMOS dual-loop frequency synthesizer with good phase-noise performance for GSM receivers is
presented. Compared to other fully integrated synthesizer designs, this proposed synthesizer operates at much lower supply
voltage and consumes approximately the same power with
frequency normalization. Implemented in a standard 0.5- m
CMOS technology and at 2-V supply voltage, the synthesizer
has a power consumption of 34 mW. At 900 MHz, the measured
phase noise is 121.8 dBc/Hz at 600-kHz frequency offset,
216
[14] P. Larsson, High-speed architecture for a programmable frequency divider and a dual-modulus prescaler, IEEE J. Solid-State Circuits, vol.
31, pp. 744748, May 1996.
[15] I. A. Young, J. K. Greason, and K. L. Wong, PLL clock generator with
5 to 110 MHz of lock range for microprocessors, IEEE J. Solid-State
Circuits, vol. 27, pp. 15991607, Nov. 1992.
[16] W. S. T. Yan, A 2-V 900-MHz monolithic CMOS dual-loop
frequency synthesizer for GSM receivers, M.Phil. thesis, Hong
Kong University of Science and Technology. [Online.] Available:
http://www.ee.ust.hk/~eetak, 1999.
[17] T. Fredrich, Direct phase noise measurements using a modern spectrum
analyzer, Microwave J., vol. 35, pp. 94114, Aug. 1992.
2048
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997
(a)
61
(b)
I. INTRODUCTION
VCO unperturbed by its dynamics. This technique allows a significant reduction in components; no mixers are required since
the VCO performs the frequency translation, and only one D/A
converter is required to produce the modulation signal. Power
savings are thus achieved, as demonstrated by the fact that
the design in [5] appears to consume nearly half the power
of the mixer based designs in [2][4]. Unfortunately, since
the synthesizer is inactive during modulation, the nominal
frequency setting of the VCO tends to drift as a result of
leakage currents. In addition, undesired perturbations, such
as the turn-on transient of the power amp, can dramatically
shift the output frequency. As stated in [6], the isolation
requirements for this method exclude the possibility of a onechip solution. Therefore, while the approach offers a significant
advantage in terms of power dissipation, the goal of high
integration is lost.
Finally, approach (c) can be viewed as indirect modulation
of the VCO through appropriate control of a frequency synthesizer that sets the VCO frequency and yields the simplest
transmitter solution of those presented. The synthesizer has a
digital input which allows elimination of the D/A converter
that is required when directly modulating the VCO. Since the
synthesizer controls the VCO during modulation, the problem
of frequency drift during modulation is eliminated. Also,
isolation requirements at the VCO input are greatly reduced at
frequencies within the PLL bandwidth. The primary obstacle
faced with this architecture is that a severe constraint is placed
on the maximum achievable data rate due to the reliance on
feedback dynamics to perform modulation.
This paper presents a compensation method and key circuits
that allow modulation of a frequency synthesizer at rates that
are over an order of magnitude faster than its bandwidth.
Application of the technique allows a high data rate ( 1 Mb/s)
transmitter with good spectral efficiency to be realized with
only two components: a frequency synthesizer and a digital
transmit filter. By avoiding additional components such as
mixers and D/A converters in the modulation path, a low
power transmitter architecture is achieved. Since off-chip filters are not required, high integration is accomplished as well.
The technique can be used in transmitter applications where
frequency modulation is desired, and a moderate tolerance is
allowed on the modulation index. (When using compensation,
the accuracy of the modulation index, which is defined as the
ratio of the peak-to-peak frequency deviation of the transmitter
output to its data rate, is limited by variations in the openloop gain of the PLL [7].) To provide proof of concept of the
technique, we present results from a 1.8-GHz prototype that
supports Gaussian frequency shift keying (GFSK) modulation,
the same modulation method used in DECT, at data rates in
excess of 2.5 Mb/s.
We begin by reviewing a fractional- synthesizer method
presented in [8][11] that provides a convenient structure with
which to apply the technique. It is shown that high data rates
and good noise performance are difficult to achieve with this
topology. A method is proposed to overcome these problems,
followed by discussion of issues that ensue from its use. A
description of key circuits in the prototype is then given,
which include an on-chip loop filter, a 64-modulus divider,
2049
N modulator.
2050
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997
N modulator.
calculated as
(1)
and
are the loop filter transfer funcwhere
tion, the VCO gain (in Hz/V), and the nominal divide value,
respectively. (See [7] for modeling details.) An analogy between the fractional- modulator and a D/A converter
can be made by treating the output frequency of the PLL as
an analog voltage.
A key issue in the system is that the modulator adds
quantization noise at high frequency offsets from the carrier.
In general, the noise requirements for a transmitter are very
strict in this range to avoid interfering with users in adjacent
channels. In the case of the DECT standard, the phase noise
density can be no higher than 131 dBc/Hz at a 5-MHz
offset [6]. Noise at low frequency offsets is less critical for
a transmitter and need only be below the modulation signal
by enough margin to insure an adequate signal-to-noise ratio.
Sufficient reduction of the quantization noise can be
accomplished through proper choice of the sample rate,
which is assumed to be equal to the reference frequency, and
the PLL transfer function,
(Note that this problem is
analogous to that encountered in the design of D/A
converters, except that the noise spectral density at high frequencies, rather than the overall signal-to-noise ratio, is the key
parameter.) One way of achieving a low spectral density for
the noise is to use a high sample rate for the so that the
quantization noise is distributed over a wide frequency range
and its spectral density reduced. Alternatively, the attenuation
offered by
can be increased; this is accomplished by
decreasing its cutoff frequency, , or increasing its order,
Unfortunately, a low value of
carries a penalty of lowering the achievable data rate of the transmitter. This fact can be
observed from Fig. 3; the modulation data must pass through
the dynamics of the PLL so that its bandwidth is restricted by
that of
Given this constraint, the achievement of low
noise must be achieved through proper setting of the
sample rate and PLL order.
It is worthwhile to quantify required values of these parameters for a given data rate and noise specification. To do so,
we first choose
to be a Butterworth response of order
61 0
2051
TABLE I
THEORETICALLY ACHIEVABLE DATA RATES
USING COMPENSATION FOR SECOND-ORDER PLL
(5)
For
described in (3), these derivatives are well defined
and can be calculated analytically. A final form is derived by
substituting (3) into (5) to yield
(6)
A. Achievable Data Rates
increases
Equation (6) reveals that the signal swing of
in proportion to
for large values of
Since
is the ratio of the modulation data rate to
the bandwidth of the PLL, we see that high data rates lead
to large signal swings of the modulation signal when using
compensation. Intuitively, this behavior makes sense since the
attenuation of
must be overcome by the compensated
2052
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997
control was used to accurately place the overall pole and zero
positions of the PLL transfer function.
An additional issue related to be mismatch arises from practical concerns in the PLL implementation. The achievement
of a large dynamic range in the charge pump is aided by
including an integrator in the loop filter (see Section IV-B2),
which yields an overall PLL transfer function as
(7)
IV. IMPLEMENTATION
To show proof of concept of the proposed compensation
method, the system depicted in Fig. 7 was built using a
custom CMOS fractional- synthesizer that contains several
key circuits. Included are an on-chip, continuous-time filter
that requires no tuning or external components, a digital
MASH modulator with six output bits that achieves
low power operation through pipelining, and a 64-modulus
divider that supports any divide value between 32 and 63.5
in half cycle increments. An external divide-by-two prescaler
is used so that the CMOS divider input operates at half the
VCO frequency, which modifies the range of divide values to
include all integers between 64 and 127.
2053
2054
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997
(a)
(b)
Fig. 12.
(c)
Fig. 13. Effect of transient time and mismatch on duty cycle range.
kHz
kHz
(8)
Fig. 14.
Fig. 15.
2055
C.
Modulator
2056
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997
Fig. 17.
Fig. 18.
filter
, which is implemented with two pipelined adders
and a delay element,
A delay is inserted between these
two adders in order to pipeline their sum path, which requires
a matching delay
in the path above for time alignment.
Also, a delay
must also be included in the output path of
the first stage to compensate for the time delay incurred
through the second stage. Since a signal once placed in the
pipe shifted domain can be sent through any number of
cascaded, pipelined adders and/or integrators, only one pipe
shift and align shift are needed in the entire structure.
Fig. 18 illustrates the implementation of the overall digital
path using pipelining. To save area, the circuits were pipelined
every two bits as opposed to one, and pipe shifting was not
applied to the carrier frequency signal since it is constant
during modulation. To achieve flexibility, the compensated
digital transmit filter was implemented in software, as opposed
to a ROM, and the resulting digital data stream fed into the
custom CMOS IC.
V. MEASURED PERFORMANCE
The primary performance criteria by which a transmitter is
judged are its accuracy in modulation and its noise performance. We now describe the characterization of the prototype
in relation to these issues.
Fig. 19 shows measured eye diagrams from the prototype
using an HP 89441A modulation analyzer. To illustrate the
impact of mismatch between the compensation filter and PLL
dynamics, measurements were taken under three different
values of open-loop gain. These results indicate that the
modulation performance of the transmitter is quite good even
when the open-loop gain is in error by 25%; the effects of
(a)
Fig. 19.
(b)
2057
(c)
Measured eye diagrams at 2.5 Mb/s for three different open-loop gain settings: (a)
025% gain error, (b) 0% gain error, and (c) 25% gain error.
VALUES
OF
TABLE III
NOISE SOURCES WITHIN PLL
where
is 30 MHz/V. The value of the kT/C noise current produced by the switched capacitor operation,
is
calculated as
(11)
Fig. 20.
MHz
(10)
(13)
where
In the case of
the division of
by two is an
approximation based on the fact that the dominant charge
pump noise source is switched in and out at each opamp input
with a nominal duty cycle of 50%. Note that
is given
by (2).
A plot of the spectra in (13) is shown in Fig. 21(a).
Computation of these spectra assumed the parameter values
2058
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997
(a)
(b)
Fig. 21. Noise spectra of synthesizer: (a) calculated: (1) charge pump induced, S8
(4) overall, S8 f and (b) measured synthesizer and open-loop VCO noise.
()
kHz
As seen in this diagram, the noise from the charge pump
dominates at low frequencies, and the influence of the
quantization noise dominates at high frequencies.
Fig. 21(b) shows measured noise results from the transmitter
prototype taken with an HP 3048A phase noise measurement
()
f ;
()
f ;
(3)
61 induced,
()
f ;
2059
2060
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 12, DECEMBER 1997
780
Fig. 1. (a) U-NII and HIPERLAN frequency bands and (b) channel allocation
in our U-NII band WLAN system.
I. INTRODUCTION
we present our proposed architecture of the frequency synthesizer which takes advantage of an injection-locked frequency
divider (ILFD) to reduce the overall power consumption.
Section IV-A is dedicated to the design of the VCO and
demonstrates how on-chip spiral inductors can be optimized to
reduce the VCO power consumption and to improve the phase
noise performance at the same time. Section IV-B describes
the design issues of ILFDs as well as the optimization of
on-chip spiral inductors for wide-locking-range and low-power
ILFDs. The pulse swallow frequency divider, charge pump,
and loop filter are the subjects of Sections IV-C, IV-D, and
IV-E, respectively. The measurement results are presented in
Section V and conclusions are made in Section VI.
781
TABLE I
POWER CONSUMPTION OF FULLY INTEGRATED
WIRELESS RECEIVERS
aperture phase detector is used to compare the phase of the reference signal and the VCO output at every rising edge of the reference signal for only a time window which is a small fraction
of the reference period. Thus no frequency divider is required
in this PLL. The idea of a dividerless frequency synthesizer, although suitable for systems such as a GPS receiver where only
one LO signal is required, is not readily applied to wireless systems which require multiple LO frequencies with a small frequency separation.
Fig. 2. Frequency synthesizer block diagram.
782
load for the VCO. The output signal is fed back to the gates of
M1 and M2 and is summed with the incident signal across the
gates and sources of M1 and M2. The nonlinearity of M1 and
M2 generates intermodulation products which allow sustained
oscillation at a fraction of the input frequency [6]. As shown
in [6] in the special case of a divide-by-two and a third-order
ILFD
nonlinearity, the phase-limited locking range of an
can be expressed as
(1)
where
freerunning oscillation frequency;
frequency offset from ;
incident amplitude;
impedance of the RLC tank at resonance;
quality factor of the RLC tank;
second-order coefficient of the nonlinearity.
As (1) suggests, a larger incident amplitude as well as a larger
result in a larger achievable
which we refer to as the
oscillator
, so the largest
locking range. In an
practical inductance should be used to maximize the locking
range.
A larger quadratic nonlinearity ( ) also increases the locking
range. So a circuit architecture with a large second-order nonlinearity is favorable for a divide-by-two ILFD and in fact the
circuit in Fig. 4 has such a characteristic. The common source
connection node of the differential pair moves at twice the frequency of the output signal even in the absence of the incident
signal. So this circuit has a natural tendency for divide-by-two
operation when the incident signal is effectively injected into
node Vx.
To further extend the locking range, the ILFD is designed
such that the resonant frequency of its output tank tracks the
input frequency. Accumulation mode MOS varactors are used to
tune the ILFD and its control voltage is tied to the VCO control
voltage (Fig. 2). The locking range of the ILFD therefore does
not limit the tuning range of the PLL beyond what is determined
by the VCO.
783
D. Charge Pump
Fig. 6 shows the circuit diagram of the charge pump and loop
filter. The charge pump has a differential architecture. However,
, drives the loop filter. To preonly a single output node,
from drifting to the rails when neither of the
vent the node
up and down signals (U and D) is active, the unity gain buffer
shown in Fig. 6 is placed between the two output nodes. This
buffer keeps the two output nodes at the same potential and
thus reduces the charge pump offset. The power of the spurious
sidebands in the synthesized output signal is thereby reduced.
In this charge pump the current sources are always on and the
PMOS and NMOS switches are used to steer the current from
one branch of the charge pump to the other.
E. Loop Filter
and capacitor
in the loop filter (Fig. 6) genResistor
. Capacitor
erate a pole at the origin and a zero at
and the combination of
and
are used to add extra poles at
frequencies higher than the PLL bandwidth to reduce reference
feedthrough and decrease the spurious sidebands at harmonics
and
,
of the reference frequency. The thermal noise of
although filtered by the loop, directly modulates the VCO control voltage and can cause substantial phase noise in the VCO
if the resistors are not sized properly. The capacitors and resistors of the loop filter should be properly chosen to perform the
required filtering function and maintain the stability of the loop
without introducing too much noise. Fig. 7 shows a linearized
phased-locked loop model. In a third-order loop, the loop filter
, and
and its impedance can be written
contains only
as
(7)
(4)
(8)
(3)
is the VCO gain constant and
where
current. The phase margin of the loop is
(6)
(2)
and
where
function of the third-order PLL is
Fig. 7.
784
where
and
. The crossover frequency for the maximum
phase margin is shown in (9), at the bottom of the page.
to be the crossover frequency it should satisfy
Finally for
(10), shown at the bottom of the page.
As in the third-order loop the maximum phase margin is not a
function of the absolute values of the s and s and is only a
, and
). The loop
function of their ratios (
filter design recipe for the fourth-order loop is modified as follows.
from the VCO simulation.
1) Find
,
2) Choose a desired phase margin and find
from (8) and (9).
and
(9)
(10)
Fig. 9.
785
Fig. 11.
Fig. 12.
Fig. 10. ILFD locking range and power consumption as a function of incident
amplitude.
TABLE II
ILFD PERFORMANCE SUMMARY
786
TABLE III
MEASURED SYNTHESIZER PERFORMANCE
787
536
I. INTRODUCTION
specification [3]. The lower the phase noise of the LO signal is,
the less unwanted signal around the carrier is modulated within
the in-band channel.
Table I shows worldwide mobile frequency standards and RF
phase-locked loop (PLL) requirements. CDMA systems require
fast switching time with precise accuracy of the channel frequency. The channel raster is 30 kHz in cellular and 50 kHz
in PCS systems and, to support the dual-band solution of the
CDMA system, the frequency resolution of the synthesizer must
be 10 kHz. This is a major limiting factor in the reduction of
the locking time and root mean square (rms) phase error. It also
makes it difficult to achieve single-chip integration due to the
loop filter that has a large time constant.
In Section II, the special features of the proposed frequency
synthesizer are discussed. Section III describes several building
blocks of the synthesizer. The measurement results are given in
Section IV, and conclusions are presented in Section V.
II. SYSTEM ARCHITECTURE
The proposed PLL is a monolithic integrated circuit that performs dual-band RF synthesis for CDMA wireless communication applications without any external device. Fig. 2 shows the
block diagram of the fractional- -type frequency synthesizer
is mainly
architecture. The external reference frequency
19.68 MHz, and 19.8 and 19.2 MHz are also supported. The
voltage-controlled oscillator (VCO) oscillates at 1.7 GHz for
537
TABLE I
WORLDWIDE MOBILE FREQUENCY STANDARDS AND RF-PLL REQUIREMENTS
Fig. 3. Timing diagram of reference and VCO inputs of PFD in locked state
when the fraction is 1/3.
Fig. 2.
Fractional-
the PCS band, and at 900 MHz for the cellular band, and employs bonding wire as the inductor of the inductancecapacitance (LC) tank.
is 10 kHz
To meet the frequency resolution of 10 kHz,
and loop bandwidth is only 1 kHz with an integer- -type
prescaler. To reduce rms phase error, it is very important for the
PLL to have a wide bandwidth. The fractional- architecture,
compared with its integer- counterpart, has a wider loop
bandwidth with the same frequency resolution. However, it
suffers from a major drawback called fractional spur.
The fractional- structure is employed for the prescaler in
the proposed architecture. Channel selection and other control
signals are provided through a serial interface. To suppress the
fractional spur, a special type of charge pump is designed. The
new scheme of the dual-path loop filter enables flexible filter
design for on-chip integration. The VCO combined with current-feedback bias and coarse tuning enables constant power
and low gain of the VCO.
III. SYNTHESIZER BUILDING BLOCKS
A. Charge Pump With Charge-Averaging Scheme
While the reference spur occurs in integer- synthesis due
to charge-pump mismatch, fractional- synthesis suffers from
the fractional spur caused by the phase difference in the locked
state as well as charge-pump mismatch. Fig. 3 shows the timing
diagram of the reference and the VCO inputs of the phase/frequency detector (PFD) in the locked state.
The fractional spur mainly stems from the variation of the
prescaler division factor, in other words, the phase difference
538
Fig. 6.
Fig. 4.
Fig. 8.
539
Fig. 9. Bonding wire for inductor of VCO. (a) Pad and lead frame. (b)
Modeling.
C. Voltage-Controlled Oscillator
Two major issues in the design of the VCO are low phase
noise to meet the overall noise figure criteria and high gain
linearity for robust stability. Phase noise is mainly dependent
of the LC tank [5]. Although an
on the quality factor
on-chip spiral inductor has recently been widely explored [6], a
bonding-wire inductor is superior to a spiral inductor in terms
of resistance, i.e., quality factor. In addition, the bonding-wire
inductor has constant inductance over a wide frequency range.
Fig. 9 shows bonding-wire inductor modeling. Two pads are
connected to the differential output of the VCO and the ends
of two lead frames are connected as a short or by external
inductance, according to the operating band, PCS or cellular.
factor of the inductance is expressed as
The
with parasitic components ignored. If the parameters of the
nH,
,
pH,
bonding wire are
m , which are typical values in the QFN20 package,
the factor of the inductor at 1.7 GHz is 43.
Fig. 10 shows the circuit diagram of the VCO. The oscillation frequency is controlled by the combination of fixed and
Fig. 10. Circuit diagram of (a) VCO and (b) bias circuit.
variable capacitor. There are two methods that have been previously reported for implementation of the fixed capacitor. One
is a metal-to-metal capacitor [1] and the other is a MOS transistor [7]. The former is used since it is superior in terms of VCO
pushing characteristics. Metal-to-metal capacitors and switches
are used for coarse tuning, which are controlled by the coarse
should be
tuning controller. The size of the switch
sufficiently large in order to avoid degradation of the factor
of the LC tank. As a variable capacitor, an accumulation-mode
MOS transistor is used for fine tuning. The MOS capacitor has
an inherent nonlinear capacitance. But, with the coarse tuning
scheme, the control voltage moves within 0.2 V around half
of the supply voltage, thereby obtaining almost linear gain.
540
Fig. 11.
Fig. 13. Coarse tuning controller. (a) Block diagram. (b) Timing diagram of
operation.
Fig. 14.
Die microphotograph.
of 40 MHz/V, and 30 MHz/V is typical. The total range is divided into 64 levels by coarse tuning, and the frequency spacing
of two adjacent curves is approximately 7 MHz in the PCS band.
D. Coarse Tuning Architecture
Fig. 12.
VCO tuning range of (a) cellular band, and (b) PCS band.
541
TABLE II
SUMMARY OF SYNTHESIZER PERFORMANCE
Fig. 15.
at one output of the VCO is 1.05 nH. All control signals are
fed through a serial interface. The VCO at the bottom and the
loop filter on the left have a common analog supply and ground,
and the others are all connected to the digital supply and ground.
The power consumption of the total chip is 60 mW and the VCO
alone dissipates 12.3 mW.
Fig. 15 shows the measured carrier spectrum with the center
frequency of 980 MHz. The output power is 1.2 dBm with an
inductive load, which is sufficient for the output power requirements. Fig. 16 shows the measured phase noise in the cellular
and PCS band. Phase noise is 106 dBc/Hz at 100-kHz offset
and 127 dBc/Hz at 1-MHz offset in the cellular band, and
104 dBc/Hz at 100-kHz offset and 121 dBc/Hz at 1-MHz
offset in the PCS band. Fractional spurs are suppressed to the
phase noise level. Table II shows the performance summary.
V. CONCLUSION
In this paper, we demonstrate a fully integrated CMOS
frequency synthesizer designed for PCS- and cellular-CDMA
wireless systems. A charge-averaging scheme for reducing
fractional spurs and a dual-path loop filter architecture are
proposed. The new bias circuit of the VCO compensates for the
variation of output swing of the VCO caused by the variation
of bonding-wire inductance, and the proposed coarse tuning
technique achieves a small VCO gain and a wide operating
frequency range of the VCO simultaneously. The frequency
synthesizer fabricated in a 0.35- m CMOS technology offers
127-dBc/Hz and 121-dBc/Hz phase noise at 1-MHz offset
with 980 MHz and 1.76 GHz of carrier frequency, respectively.
Fig. 16.
Measured PLL output phase noise. (a) Cellular band. (b) PCS band.
REFERENCES
IV. EXPERIMENTAL RESULTS AND SUMMARY
The proposed frequency synthesizer has been fabricated in
a 0.35- m CMOS technology. Fig. 14 shows the die photograph of the synthesizer with an area of 2.5 mm 2.0 mm including pads. The circuit has been measured with a nominal
3.0-V supply and a 2.7-V worst case. The bonding wire of the
QFN20 package used in the VCO has 1.36 nH of self-inductance and 0.31 nH of mutual inductance, so the total inductance
542
Yido Koo was born in Seoul, Korea, in 1973. He received the B.S. and M.S. degrees from the School
of Electrical Engineering, Seoul National University,
Seoul, Korea, in 1996 and 1998, respectively, where
he is currently working toward the Ph.D. degree.
His research interests include RF building blocks
and systems for wireless communication and highspeed interface for data communications. Currently,
he is developing a low-noise frequency synthesizer
for CDMA and GSM applications.
Deog-Kyoon Jeong received the B.S. and M.S. degrees in electronics engineering from Seoul National
University, Seoul, Korea, in 1981 and 1984, respectively, and the Ph.D. degree in electrical engineering
and computer sciences from the University of California, Berkeley, in 1989.
From 1989 to 1991, he was with Texas Instruments
Inc., Dallas, TX, where he was a Member of Technical Staff and worked on the modeling and design of
BiCMOS gates and the single-chip implementation
of the SPARC architecture. He joined the faculty of
the Department of Electronics Engineering and Inter-University Semiconductor
Research Center, Seoul National University, as an Assistant Professor in 1991.
He is currently an Associate Professor of the School of Electrical Engineering,
Seoul National University. His main research interests include high-speed I/O
circuits, VLSI systems design, microprocessor architectures, and memory systems.
1028
61
AbstractA general model of phase-locked loops (PLLs) is derived which incorporates the influence of divide value variations.
The proposed model allows straightforward noise and dynamic
analyses of fractional- frequency synthesizers and other
PLL applications in which the divide value is varied in time. Based
on the derived model, a general parameterization is presented that
further simplifies noise calculations. The framework is used to analyze the noise performance of a custom synthesizer implemented in a 0.6- m CMOS process, and accurately predicts the
measured phase noise to within 3 dB over the entire frequency
offset range spanning 25 kHz to 10 MHz.
61
61
Index TermsDelta, dithering, divider, fractional- , frequency, modeling, noise, phase-locked loop, PLL, quantization
noise, sigma, synthesizer.
Fig. 1. Block diagram of a
I. INTRODUCTION
61 frequency synthesizer.
FRACTIONAL-
FREQUENCY SYNTHESIZERS
1029
.
Assuming a constant reference frequency, consecutive values
for are related for all as
ever, the pulsed behavior of the PFD output adds some complexity in deriving the value of that gain, so our derivation will
consist of two steps. The first step relates the input phase difsequence. The second step relates the
ference to the
sequence to an impulse approximation of the
waveform.
to the phase difference,
The relationship of
, is defined as
(1)
To verify the above definition, one observes from Fig. 2 that a
to be
.
phase error of causes
sequence on the PLL dynamics is cumThe impact of the
bersome to model analytically since the pulse-width modulated
PFD output has a nonlinear influence on the PLL dynamics.
However, a simple approximation greatly eases our effortswe
simply represent the PFD output as an impulse sequence rather
than a modulated pulse sequence. Fig. 3 illustrates this approxare represented as impulses with area
imation; pulses in
equal to their corresponding pulse, as described by
(2)
We discuss the significance of the above expression when we
derive the frequency-domain model of the PLL in Section IV.
Our justification for the impulse approximation is
heuristiceach PFD output pulse has much smaller width
than the loop filter impulse response, and therefore acts like an
impulse when the two are convolved together. Obviously, the
accuracy of this approximation depends on how much smaller
the PFD output pulse widths are compared to the dominant
time constant of the loop filter. Since the PFD pulses must
be smaller than a reference period, high accuracy is achieved
1030
XOR-Based
PFD
which, if we ignore
, is the tristate expression multiplied by a factor of 2. Thus, if we ignore the phase offset of
and the square wave
, the XOR-based PFD has an iden-
Fig. 4.
XOR-based
tical model to that of the tristate topology except that its gain is
increased by a factor of 2.
C. Voltage-Controlled Oscillator
For our purposes, only two equations are needed to model
the VCO. The first relates deviations in the VCO phase, defined
, to changes in the VCO input voltage,
. Since
as
VCO phase is the integral of VCO frequency, and deviations in
, where
is in units
VCO frequency are calculated as
of hertz per volt, we have
(3)
The second equation relates the absolute VCO phase, defined as
, to deviations in the VCO phase and the nominal VCO
:
frequency
(4)
Our modeling efforts will be primarily focused on deviations
in the VCO phase, so that (3) is of the most interest. However,
(4) is required in the divider derivation that follows.
D. Divider
Modeling of the divider will be accomplished by first re, to the VCO phase deviations,
lating the PFD pulse widths,
, and the divide value sequence,
. Given this relationship, the divider model is backed out using the PFD gain
expression in (1).
We begin by noting that the divider output edges occur
, completes
whenever the absolute VCO phase,
radian increments of phase. As stated in (4),
is
, and phase variations,
composed of a ramp in time,
. These statements are collectively illustrated in Fig. 5.
occur at the rising edges of the
Note that changes in
divider.
FRACTIONAL-
FREQUENCY SYNTHESIZERS
1031
Fig. 5.
(t).
(8)
which, since
written as
and
, is equivalently
(6)
We combine the two key equations into one formulation by substitution of (6) into (5):
The final form of the desired equation is obtained by modifying (8) according to the following statements:
,
,
Define
.
.
Approximate
As such, we obtain
(9)
with the
We obtain the desired divider model by replacing
is zero.
PFD gain expression in (1) and assuming
(10)
(7)
Equation (7) is a difference equation relating all variables of
interest; to remove the differences we sum the formulation over
all positive time samples up to sample :
1032
XOR-based PFD has a factor of two larger gain than the tristate design, which is captured by the factor in the PFD model.
For convenience in analysis to follow, we also define an abstract
, as the output of the divider accumulation action.
signal,
Some observations are in order. First, the divider effectively
samples the continuous-time output phase deviation of the
, and then divides its value by
. The output
VCO,
, is influenced by the integration
phase of the divider,
. The integration of
of deviations in the divider value,
is a consequence of the fact that the divider output is a
causes an incremental change in
phase signal, whereas
the frequency of the divider output. Second, the PFD, charge
pump, and loop filter translate the discrete-time error signal
and
to the continuous-time input of
formed by
. These elements, along with the divider, also
the VCO,
to
.
act as a D/A converter for mapping changes in
A. Pseudocontinuous Approximation
that is sampled with period and then
Consider a signal
, as described by
converted to an impulse sequence
Fig. 8.
where
. The frequency-domain relationship
and
is found by taking the Fourier transform
between
of the above expression, which leads to
copies of
within
except for the baseband copy,
which allows us to approximate the relationship between
and
in the frequency domain as a simple scaling operation
. In so doing, we ignore aliasing effects that will occur
of
at frequencies beyond
if there is frequency content in
to
. However, our analysis will
the range of
be reasonably accurate when performing closed-loop analysis
for most frequencies of interest in our application. The double
outline of the box in the figure is meant to serve as a reminder
that a sampling operation is taking place.
B. Resulting Model
The time-domain block diagram in Fig. 6 is now readily
converted to the frequency domain by taking the -transform
of the discrete-time blocks, the Fourier transform of the
continuous-time blocks, and by applying the approximation
of the sampling operation discussed above. Fig. 8 displays
the resulting model. Note that all blocks are parameterized
by the common variable , which denotes frequency in hertz,
under the assumption that all discrete-time sequences interact
with the continuous-time blocks as modulated impulse trains
of period . Also note that all the signals in the PLL are
still denoted in the time domain even though they interact
FRACTIONAL-
FREQUENCY SYNTHESIZERS
1033
Fig. 9. Detailed view of PLL noise sources and examples of their respective
spectral densities.
V. PARAMETERIZATION OF PLL
We now parameterize the PLL dynamics depicted in Fig. 8 in
. Using this
terms of a single function which we will call
parameterization, we then develop a general noise model for frequency synthesizers in which all the relevant transfer functions
.
are described in terms of
A. Derivation
To parameterize the PLL dynamics, it is convenient to define
a base function that provides a simple description of all the PLL
transfer functions of interest. It turns out that the following definition works well for this purpose.
(11)
where
Divider/reference jitter,
, corresponds to noise-induced
variations in the transition times of the Reference or Divider
is caused
output waveforms. A periodic reference spur
by use of the XOR-based PFD, or by the tristate PFD when its
output duty cycle is nonzero. Charge-pump noise is caused by
noise produced in the transistors that compose the charge-pump
circuit. Finally, VCO noise includes the intrinsic noise of the
VCO and voltage noise at the output of the loop filter. For convenience in later discussion, we have lumped these noise sources
into two categories, VCO noise and detector noise, as shown in
Fig. 9.
Fig. 10 displays the transfer function relationships from each
of the above noise sources to the synthesizer output. The derivation of these transfer functions is straightforward based on Fig. 9
parameterization derived earlier. Note that two
and the
different parameterizations are shown to describe the impact of
divide value variations on the PLL output phase. The alternate
, more directly to
model relates changes in the divide value,
the PLL output frequency. Its derivation follows by noting that
the order of linear time-invariant blocks can be switched, and
that
(13)
for
Note that the validity of the dynamic model, and its alternate,
presented in Fig. 10, has been verified in previous work discussed in [9], [13]. The validity of the noise model will be verified in Section VII.
Calculation of spectral noise densities using Fig. 10 is
complicated by the fact that both discrete-time (DT) and
1034
to pro(15)
Case 3) DT input
CT output
to produce a
(16)
In Case (3), we assume that the DT input interacts with the CT
filter as a modulated impulse train of period .
The above spectral density calculations and Fig. 10 allow us
to accurately calculate the influence of the various noise sources
on the PLL output. A few qualitative observations are also in
order. Detector noise is low-pass filtered by the PLL dynamics,
while VCO noise is high-pass filtered by the PLL dynamics. The
overall noise power in the PLL output, whose integral over frequency corresponds to the time-domain jitter of the PLL output,
is a function of the PLL bandwidth. If the PLL bandwidth is very
low, VCO noise will dominate over a wide frequency range due
to the abundant suppression of detector noise. Likewise, a high
PLL bandwidth will suppress VCO noise over a wide frequency
range at the expense of allowing more detector noise through.
VI.
SYNTHESIZER MODEL
Modulator
Fig. 11.
61 modulator.
Fig. 12.
Parameterized model of a
FRACTIONAL-
FREQUENCY SYNTHESIZERS
1035
61 synthesizer.
Fig. 13.
(18)
If the quantization noise spectra of
is white, then
1036
Fig. 14.
(19)
Fig. 15.
The parameters of the system were set such that the PLL had a
bandwidth of 84 kHz:
kHz
kHz
kHz
(20)
Fig. 15 expands the block diagram of the prototype to indicate the circuits of relevance and their respective noise contributions. A few comments are in order. First, a reference frequency
of 20 MHz was chosen to achieve an acceptably low impact
of quantization noise while still allowing low-power implementation of the digital logic. This choice of reference freto achieve an output
quency, in turn, required that
was set to
carrier frequency of 1.84 GHz. The value of
was chosen
30 MHz/V by the external VCO. The value of
as large as practical in order to obtain good noise performance;
it was constrained to 30 pF due to area constraints on the die of
the custom IC.
B. Noise Analysis
Table I displays the value of each noise source shown
in Fig. 15. Many of these values were obtained through ac
simulation of the relevant circuits in HSPICE. Note that all
are assumed to be white, so that
noise sources other than
the values of their variance suffice for their description. This
,
assumption holds for the input-referred VCO noise,
provided that the output phase noise of the VCO rolls off at
20 dB/dec [22], [23]; the 20 dB/dec rolloff is achieved in
, which has a flat spectral density,
the model since
passes through the integrating action of the VCO. The actual
VCO deviates from the 20 dB/dec rolloff at low frequencies
noise, and at high frequencies due to a finite noise
due to
floor. However, the assumption of 20 dB/dec rolloff suffices
for the frequency offsets of interest.
MHz
(21)
noise current
was calcu-
(22)
is Boltzmanns constant, and
is temperature
where
in degrees Kelvin. Finally, the spectral density of the
quantization noise was calculated as
(23)
is the order of the modulator.
where
The noise sources in Table I can be classified as either
charge-pump noise, VCO noise, or quantization noise,
,
, and
, respectively. For
which we denote as
is referred to the
convenience, we will assume that
input of the VCO, so that it passes through the transfer function
before influencing the VCO output phase. Given the
FRACTIONAL-
FREQUENCY SYNTHESIZERS
1037
(25)
and
are white, and
Note that we have assumed that
since an XOR-based PFD is used.
that
and
The task that remains is to determine the values of
. Examination of Fig. 15 reveals that charge-pump noise
is a function of the following noise sources:
Fig. 16.
(26)
while VCO noise is a function of the noise sources
(27)
and
We will quickly infer the value of the functions
in this paper; the reader is referred to [13] for more detail.
. Examination of Table I reveals
Let us first determine
is an order of magnitude larger than
,
, and
that
. Since the
noise source is switched alternately between
the positive and negative terminals of OP1, its contribution to
will be pulsed in nature. At a nominal duty cycle of 50%,
to be split equally between
we would expect the energy of
is then
the positive and negative terminals of OP1. As such,
. This intuitive argument was verified using a detailed
C simulation of the PLL [24]. Note that a more accurate estiwill take into account any offset in the nominal duty
mate of
cycle of the phase detector output, and the transient response of
the charge pump.
. Since Table I reveals that
Now let us determine
is of the same order of
, we simply add these components
. This expression is accurate at
to obtain
frequencies less than the unity gain bandwidth of OP1; the
noise source is passed to its output with a gain of approximately
one in this region. At frequencies beyond OP1s bandwidth, the
is attenuated in this
expression is conservatively high since
frequency range.
Based on the above information, plots of the spectra in (24)
are shown in Fig. 16. For convenience, we have also overlapped
measured results from Fig. 17 for easy comparison, which will
be discussed shortly. As shown in Fig. 16, the influence of detector noise dominates at low frequencies, and the influence of
VCO and quantization noise dominate at high frequencies.
described by (19) with the
Note that the calculations use
parameter values specified in (20).
Fig. 17. Measured closed-loop synthesizer noise and open-loop VCO noise.
1038
ACKNOWLEDGMENT
The authors would like to thank the Hong Kong University of
Science and Technology, and in particular, J. Lau, P. Chan, and
P. Ko, for their support in the writing of this paper.
REFERENCES
[1] T. A. Riley, M. A. Copeland, and T. A. Kwasniewski, Deltasigma
modulation in fractional- frequency synthesis, IEEE J. Solid State
Circuits, vol. 28, pp. 553559, May 1993.
[2] M. A. Copeland, VLSI for analog/digital communications, IEEE
Commun. Mag., vol. 29, pp. 2530, May 1991.
[3] B. Miller and B. Conley, A multiple modulator fractional divider, in
Proc. 44th Annu. Symp. Frequency Control, May 1990, pp. 559567.
, A multiple modulator fractional divider, IEEE Trans. Instrum.
[4]
Meas., vol. 40, pp. 578583, June 1991.
[5] W. Rhee, B.-S. Song, and A. Ali, A 1.1-GHz CMOS fractional- frequency synthesizer with 3-b third-order sigmadelta modulator, IEEE
J. Solid-State Circuits, vol. 35, pp. 14531460, Oct. 2000.
[6] B. Miller, Technique enhances the performance of PLL synthesizers,
Microw. RF, pp. 5965, Jan. 1993.
[7] T. Kenny, T. Riley, N. Filiol, and M. Copeland, Design and realization of a digital deltasigma modulator for fractional- frequency synthesis, IEEE Trans. Veh. Technol., vol. 48, pp. 510521, Mar. 1999.
[8] T. A. Riley and M. A. Copeland, A simplified continuous phase modulator technique, IEEE Trans. Circuits Syst. II, vol. 41, pp. 321328,
May 1994.
[9] M. Perrott, T. Tewksbury, and C. Sodini, A 27-mW CMOS fractional- synthesizer using digital compensation for 2.5-Mb/s GFSM
modulation, IEEE J. Solid-State Circuits, vol. 32, pp. 20482060,
Dec. 1997.
[10] S. Willingham, M. Perrott, B. Setterberg, A. Grzegorek, and W. McFarland, An integrated 2.5-GHz sigmadelta frequency synthesizer with
5 microseconds settling and 2-Mb/s closed-loop modulation, in Proc.
IEEE Int. Solid-State Circuits Conf. (ISSCC), Feb. 2000, pp. 200201.
[11] N. Filiol, T. Riley, C. Plett, and M. Copeland, An agile ISM band
frequency synthesizer with built-in GMSK data modulation, IEEE J.
Solid-State Circuits, vol. 33, pp. 9981008, July 1998.
[12] N. Filiol, C. Plett, T. Riley, and M. Copeland, An interpolated
frequency-hopping spread-spectrum transceiver, IEEE Trans. Circuits
Syst. II, vol. 45, pp. 312, Jan. 1998.
[13] M. H. Perrott, Techniques for high data rate modulation and low power
operation of fractional- frequency synthesizers with noise shaping,
Ph.D. dissertation, Massachusetts Inst. Technol., Cambridge, MA, 1997.
[14] A. Hill and A. Surber, The PLL dead zone and how to avoid it, RF
Design, pp. 131134, Mar. 1992.
[15] M. Thamsirianunt and T. A. Kwasniewski, A 1.2-m CMOS implementation of a low-power 900-MHz mobile radio frequency synthesizer, in Proc. IEEE Custom Integrated Circuits Conf. (CICC), 1994,
p. 16.2.
[16] J. A. Crawford, Frequency Synthesizer Handbook. Norwood, MA:
Artech, 1994.
[17] E. A. Lee and D. G. Messerschmitt, Digital Communication, 2nd
ed. Norwell, MA: Kluwer, 1994.
[18] J. Candy and G. Temes, Oversampling DeltaSigma Data Converters. New York: IEEE Press, 1992.
[19] S. Norsworthy, R. Schreier, and G. Temes, DeltaSigma Data Converters: Theory, Design, and Simulation. New York: IEEE Press,
1997.
[20] A. Sripad and D. Snyder, A necessary and sufficient condition for quantization errors to be uniform and white, IEEE Trans. Acoust. Speech
Signal Proc., vol. ASSP-25, pp. 442448, Oct. 1977.
[21] W. Bennett, Spectra of quantized signals, Bell Syst. Tech. J., vol. 27,
pp. 446472, July 1948.
[22] D. Leeson, A simple model of feedback oscillator noise spectrum,
Proc. IEEE, vol. 54, pp. 329330, Feb. 1966.
[23] A. Hajimiri and T. Lee, A general theory of phase noise in electrical oscillators, IEEE J. Solid-State Circuits, vol. 33, pp. 179194, Feb. 1998.
Michael H. Perrott received the B.S. degree in electrical engineering from New Mexico State University,
Las Cruces, in 1988, and the M.S. and Ph.D. degrees
in electrical engineering and computer science from
the Massachusetts Institute of Technology (M.I.T.),
Cambridge, in 1992 and 1997, respectively.
From 1997 to 1998, he was with Hewlett-Packard
Laboratories, Palo Alto, CA, working on high-speed
circuit techniques for synthesizers. In 1999,
he was a visiting Assistant Professor at the Hong
Kong University of Science and Technology, where
he taught a course on the theory and implementation of frequency synthesizers.
From 1999 to 2001, he was with Silicon Laboratories, Austin, TX, where he developed circuit and signal-processing techniques to achieve high-performance
clock and data recovery circuits. He is currently an Assistant Professor in the
Department of Electrical Engineering and Computer Science at M.I.T., where
his research focuses on high-speed circuit and signal processing techniques for
data links and wireless applications.
61
888
I. INTRODUCTION
Fig. 1. (a) Conventional PLL architecture. (b) Proposed PLL architecture with
delayed charge pump circuit.
It is important to note that the problem of ripple becomes increasingly more serious as the supply voltage is scaled down
and/or the operating frequency goes up. The relative magnitude
of the primary sidebands at the output of the VCO is given by
where
is the peak amplitude of the
is the gain of the VCO, and
first harmonic of the ripple,
is the synthesizer reference frequency. For a given relative
tuning range (e.g., 10 ), the gain of LC VCOs must increase
MHz/V and
if the supply voltage goes down. If
MHz, then the fundamental ripple amplitude must be
less than 63 V to guarantee sidebands 60 dB below the carrier.
In order to arrive at the stabilization technique, consider the
PLL architecture shown in Fig. 1(b). Here, the primary charge
, drives a single capacitor
while a secondary
pump,
, injects charge after some delay
. The
charge pump,
is thus equal to
total current flowing through
(1)
(2)
is assumed to be much smaller than the loop
where
time constant. Consequently, the transfer function of the
PFD/CP/LPF combination can be expressed as
(3)
Assuming
, we have
889
(4)
obtaining a zero at
(5)
can, therefore, stabilize the loop.
Proper choice of
The damping factor and the settling time of the loop can be
written, respectively, as
(6)
(7)
In order to achieve a sufficiently low zero frequency,
must be large or close to unity. Since the accuracy in the definition of is limited by mismatches between the two charge
must still be a large value. For example, if
pumps,
MHz,
pF,
A,
MHz/V,
, and
, then a
of approximately 500 ns
is required to ensure a well-behaved loop response.
The architecture of Fig. 1(b) suffers from two critical
drawbacks. First, it requires that the delay stage provide a very
and accommodate a wide range of Up and Down
large
pulsewidths. Specifically, when the loop is locked, the Up and
Down pulses are less than 500 ps wide. The tradeoff between
delay and bandwidth, therefore, makes the design of the delay
line difficult. As depicted in Fig. 1(b), if the bandwidth of each
stage in the delay line is reduced so as to yield a large delay,
then the narrow Up and Down pulses are heavily attenuated,
giving rise to a dead zone. Conversely, if the bandwidth of
890
Fig. 2. Actual implementation of PLL with delay sampling circuit and continuous-time approximation of delay network.
.
MHz, the output frequency covers the 2.4-GHz
With
ISM band. The output of the swallow counter is pipelined by the
to allow a relaxed design for the level converter
flip-flop
and the swallow counter. The buffer following the VCO suppresses the kickback noise of the prescaler when the modulus
changes. It also avoids limiting the tuning range of the VCO by
the input capacitance of the prescaler.
A. VCO Design
The VCO topology is shown in Fig. 5(a). To provide both
negative and positive voltages across the MOS varactors, the
and
are grounded and the circuit is biased on
sources of
. The inductors are realized as shown in Fig. 5(b),
top by
with the bottom spiral moved down to metal 2 so as to reduce
the parasitic capacitance [7]. Each inductor is about 14 nH, occupies an area of 180 m 180 m, and exhibits a of 4 and
a parasitic capacitance of 100 fF.
891
Fig. 3. (a) MATLAB behavioral simulations for the ripples on the control lines. (b) Time-domain settling and VCO output spectrum during lock for Type A
(delay-sampling loop filter) and Type B (conventional loop filter).
The prescaler must divide the 2.4-GHz signal while consuming a small power dissipation. Depicted in Fig. 6(b),
the circuit employs three current-steering flip-flops with
diode-connected loads. The use of NOR gates obviates the need
for power- and headroom-hungry level shift circuits (or large
input swings) required in NAND gates. The program counter
and the swallow counter incorporate static flip-flops to ensure
reliable operation at low frequencies.
IV. SIMULATION TECHNIQUES
B. Pulse-Swallow Counter
Shown in Fig. 6(a), the pulse-swallow counter consists of
a prescaler, a program counter and a swallow counter. The
pipelining in the swallow counter allows the use of a small
divider ratio in the prescaler. Simulations suggest that a 4/5
topology minimizes the overall divider power dissipation.
The voltages at nodes and in Fig. 2(b) can be, respectively, expressed by the following discrete-time equations
(12)
892
Fig. 5.
Fig. 6.
(13)
denotes the input phase error. Even though the
where
transform of
and
can be derived, the high order of the
resulting polynomials makes it difficult to derive the close-loop
response in analytical form. Thus, the two equations, along with
the rest of the PLL, are realized in MATLAB. The VCO is modeled as a phase accumulator, with each new value of phase obtained as the previous phase plus the product of the time interval and the new frequency. The phase detector is simply a
for the above equations. The linear
subtractor, generating
discrete-time model facilitates the choice of the charge pump
for fast settling and
current, the value of , and the value of
minimum ripple on the oscillator control line.
Fig. 7 shows the settling behavior of the synthesizer as predicted by MATLAB and transistor-level implementation. The
simple discrete-time model yields a moderate accuracy while
requiring orders of magnitude less simulation time.
B. Transistor-Level Model
The impact of various PFD, CP, and VCO nonidealities upon
the loop dynamics must ultimately be studied in a realistic transistor-level implementation. We present two techniques that reduce the simulation time from days to minutes.
The first method is based on time contraction, whereby the
reference frequency is scaled up by a factor of 100 and the main
loop filter capacitor ( in Fig. 2) and the divide ratio are scaled
Fig. 7.
down by the same factor. All other loop parameters remain unand
by
changed. From (10) and (11), we note that scaling
100 maintains a constant damping factor while scaling the settling time by 100. Since the PFD operates reliably at 100 MHz
with no dead zone, this method directly reduces the simulation
time by a factor of 100.
Fig. 8 depicts an example of time contraction by a factor of
10. Note that the time axis has a logarithmic scale. It can be
observed that the loop settling behavior scales accurately by the
same factor.
In the second method, the divider is realized as a simple behavioral model in HSPICE that uses a handful of ideal devices
and its complexity is independent of the divide ratio. Illustrated
in Fig. 9, the principle of the behavioral divider is to pump a
Fig. 8.
893
Time contraction.
Fig. 10. Die photo.
Fig. 9.
Fig. 11.
894
490
I. INTRODUCTION
491
TABLE I
RECEPTION BANDS WITH CORRESPONDING TUNING SYSTEM PARAMETERS
492
(a)
(b)
Fig. 3. Setting transient for different values of , normalized for f t. (a)
Setting error (represented as f f t =f
) versus f t. (b) Setting error
(represented as
j
f f t j=f
) versus f t.
ln( 1 ( )
1( )
)
493
=f
=e
In (3):
locking time(s);
amplitude of the frequency jump (Hz);
maximum frequency error (Hz) at
;
can be read from Fig. 6.
Two points about the present treatment of the transient response need further explanation. First, the presented results are
based on a linear continuous-time model for the discrete-time
charge-pump PLL. It is known in the literature [6] that the
continuous-time approach is a good approximation for the
discrete-time PLL if the reference (sampling) frequency
of the loop is at least a factor of ten higher than its open-loop
bandwidth . Therefore, the value of , calculated with (3),
. If
has to be checked against the loop's reference frequency
is smaller than ten, then actual settling
the target ratio
behavior will deviate from the calculations.
The second point is that usual implementations of the phase
frequency detector have a limited linear phase error detection
range, namely, from 2 to 2 [9]. When the instantaneous
becomes larger than 2 , the PFD interprets
phase error
2 . This effect leads to a
the error information as
longer settling time than predicted with (3). The maximum value
, denoted
, was found to obey the following relaof
, where is the main ditionship:
is a fitting factor for the influence of the
vider ratio and
. Numerical values for
, obtained
phase margin on
from transient simulations, lie in the range [0.7,0.8]. Hence, the
maximum phase error is contained in the interval 2 , when
2
. If this condition is satisfied, then the
(discrete-time) transient response is accurately predicted by the
continuous-time linear model.
Inaudible RDS background scanning requires settling times
of 1 ms, defined as a residual settling error of 6 kHz for a
20-MHz frequency jump. The nominal loop phase margin is set
of five. On the other
to 50 , which corresponds to a
in the
hand, it is appropriate to use a lower value for
calculations (e.g., 2.5), to provide enough margin for variations
in the nominal values of loop bandwidth and phase margin.
Solving (3) for these settling specifications leads to a nominal
value of 3.2 kHz for the loop bandwidth .
494
Fig. 7. FM noise density and residual FM for loop bandwidths of 800 Hz and 3 kHz.
The loop bandwidth that satisfies different settling requirements can be calculated with the help of (3). Settling specifications, however, often require loop bandwidths that are not optimal with respect to spectral purity performance, as will become clear in the next subsection.
B. Phase Noise Performance and Loop Bandwidth
The dependency of the total phase noise of a PLL tuning
system on the phase noise of the loop components is well known
in the literature [3], [5], [10]. The phase noise of the VCO is suppressed inside the loop bandwidth, whereas the (phase) noise
from the other building blocks is transferred to the VCO output,
multiplied by the closed-loop transfer function of the PLL: a
low-pass function that suppresses their noise contribution outside the loop bandwidth. There is a crossover point for the
loop bandwidth, where the noise contribution from the dividers
and charge pump becomes dominant with respect to the noise
from the VCO.
For terrestrial FM reception, the LO signal residual frequency
noise (residual FM) determines the ultimate receiver's SNR performance. The SNR specification for the application is 64 dB,
defined for a reference level of 22.5-kHz peak deviation with
50- s deemphasis. Complying to the specification requires the
residual FM in the LO signal to be less than 10 Hz rms.
The frequency (FM) noise density of the LO signal
is linked to its phase noise power density
by
[5].
equals
, the single-sideband noise-to-carrier ratio, so that
2
. Finally, the residual FM can be calculated
(4)
in (4) depend on the signal
The integration limits and
bandwidth of the application [3]. For terrestrial FM reception,
the lower limit is 20 Hz and the higher is 20 kHz. Fig. 7 presents
the simulated frequency noise (FM noise) power density and the
residual FM, which is plotted as function of , with fixed at
20 Hz. The FM noise density and the residual FM are plotted for
values of loop bandwidth of 800 Hz and of 3 kHz. For 3 kHz,
the residual FM amounts to 40 Hz rms, which is 12 dB higher
than the specification. A loop bandwidth of 800 Hz, on the other
hand, leads to a residual FM of 8 Hz rms, which satisfies the
SNR requirement.
The contributions of different noise sources to the total frequency noise density, in the case of an 800-Hz loop bandwidth,
are displayed in Fig. 8. The contribution of the VCO to the
residual FM equals that of the other synthesizer building blocks.
This is a good compromise, and 800 Hz was chosen as the nominal loop bandwidth for in-lock situations.
The settling specification requires a bandwidth of 3.2 kHz.
The SNR constraint, on the other hand, asks for 800 Hz. These
conflicting requirements can be combined when the loop bandwidth is made adaptive as a function of the operating mode: frequency jump or in-lock.
Adapting the value of the loop bandwidth during frequency
jumps is easily accomplished by switching the nominal value of
the charge-pump current [6], [13]. This method, however, often
causes disturbances in the VCO tuning voltagethe so-called
secondary glitch-effectat the moment the current is switched
from high to low values. These disturbances are highly undesirable, as they have to be corrected by the loop in small bandwidth mode. What is more, the secondary glitches may cause
audible disturbances in analog systems and increase the bit error
rate in digital systems.
To provide stability for a small bandwidth loop requires a
transfer function zero located at low frequencies (large time constant). A low-frequency zero, however, is undesirable for operation in high bandwidth mode. It causes the phase margin to be
too high, which increases the settling time. Note that the efdecreases for high values of
fective damping coefficient
phase margin (see Fig. 6).
495
Fig. 8. Contributions from different noise sources to the total FM noise density and residual FM (20 Hz20 kHz) with 800-Hz loop bandwidth.
Therefore, for optimal settling time and phase noise, one has
not only to switch the value of the loop bandwidth but also to
change the location of the zero in the transfer function.
C. Reference Spurious Signals and Loop Filter Attenuation
The use of phase frequency detectors yields the minimum
levels of spurious breakthrough at the reference frequency [11].
The spurious signals are due to compensation of leakage currents or to imperfections in the charge pumps implementation.
Standard FM modulation theory and the small angle approximation lead to the following equation for the amplitude of the
from
spurious signal (in dBc), which is at an offset frequency
the carrier:
Fig. 9.
spurious
(5)
where
offset frequency from the carrier (Hz);
amplitude of ac current component with frequency
(A);
impedance of the loop filter at
(V/A);
VCO gain (Hz/V).
is twice the value of the loop-filter
The value of
dc leakage current [12] in loops operating with well-designed
charge pumps. In cases where the charge pump has chargesharing problems and/or charge injection into the loop filter,
may become dominated by these second-order effects.
The imperfections can lead to spurious components with (much)
higher amplitudes than would be expected based on the leakage
current alone.
Rearranging the above equation leads to a formula that relates
to the specified maximum
the required filter attenuation at
, to the dc leakage
level of spurious signals
, and to the VCO gain
current
(6)
496
Fig. 11.
Bode plots of the adaptive loop during frequency jumps and in-lock.
Fig. 12.
creases the loop phase margin and increases the settling time in
high-bandwidth mode.
Therefore, to provide optimal settling, low-power dissipation,
and good spurious performance, one has not only to switch the
value of the loop bandwidth but also to bypass (some) RC sections of the loop filter. The PLL architecture presented here
complies with these requirements.
actions. Loop 1 and Loop 2 share the crystal oscillator, the reference divider, and the main divider.
A smooth takeover from Loop 1, after a frequency jump,
avoids secondary glitch effects. The high-current charge
pump CP2 is only active during tuning. CP2 is controlled by the
dead-zone (DZ) block. DZ generates a smooth transition into a
well-defined dead zone for CP2 when lock is achieved, so that
sudden disturbances of the VCO tuning voltage are avoided.
Additional freedom for optimization of the loop parameters
is obtained by using two separate charge-pump outputs and by
applying the charge-pump currents to different nodes of the loop
filter. In this way, the location of the zeros for frequency jumps
and in-lock can be set in a continuous way, without switching
of loop componentswhich is a source of secondary glitch
problems. Furthermore, the path from Icpl to Vtune may contain additional filtering sections for, e.g., attenuation of spurious
signals and/or fractional- quantization noise [14]. These filter
sections may be bypassed by Icph to increase the phase margin
in high-bandwidth mode.
B. Loop-Filter Implementation
The ideas described above are demonstrated with the help
of Figs. 10 and 11. Fig. 10 presents the loop-filter configuration and component values used in the global tuner IC (Fig. 1).
Fig. 11 shows the optimized Bode diagrams of the adaptive PLL
(in FM mode) with the loop filter of Fig. 10.
During frequency jumps both CP1 and CP2 are active; the
loop filter zero frequency is 1/2 RbCa and lies at a high frequency, matching the 0-dB open-loop frequency. It enables stability and fast tuning to be achieved. The nominal loop bandwidth in this mode is 3.2 kHz, and the phase margin is 50 . After
the frequency jump only CP1 is active. The zero of the loop filter
moves to a lower frequency (1/2 Ra Rb Ca), without the
switching of loop-filter components. The low-frequency zero increases the phase margin in-lock.
When the loop is in-lock, an extra pole is introduced
(1/2 RcCc), which increases the 100-kHz reference suppression by about 20 dB. During frequency jumps, these
elements are bypassed by CP2, increasing the phase margin in
high-bandwidth mode. If the loop bandwidth were increased
by simply switching the amplitude of CP1, one would end up
with an unstable loop, because of a phase margin of less than
10 in high-bandwidth mode.
C. Dead-Zone Implementation
The new element in the adaptive PLL architecture is the combination of the DZ block with the high-current charge pump
CP2. The function of DZ is to provide CP2 with a well-des. The dead zone is centered symmetfined dead zone of
rically around the locking position of charge pump CP1 [see
Fig. 13(a)].
The logic diagram of the DZ/CP2 combination is depicted in
Fig. 12. The figure shows how the different logic functions influence the duty cycle of the up and dn signals from the phase
frequency detector (PFD2). At the input of DZ, the up and dn
signals have a finite duty cycle, even for an in-lock situation
. The finite duty cycle eliminates dead-zone problems
in CP1. The XOR and AND gates are used to cancel the finite
497
(a)
(b)
(c)
Fig. 13.
in-lock duty cycle. The processed up and dn signals are then applied to low-pass filters and slicers, whose function is to prevent
pulses that have too small a duty cycle from reaching CP2. The
cutoff frequency of the low-pass filters, the discrimination level
of the slicers, and the turn-on time of CP2 determine the size of
s.
the dead zone around the lock position
A tradeoff among settling performance, circuit implementation, and robustness arises, when the magnitude of the dead zone
has to be determined. Let us start discussing circuit aspects.
The dead zone of charge pump CP2 should be centered
around the locking position of the loop for optimum settling and
spectral purity performance. The locking position, however, is
a function of the output voltage of charge pump CP1. The effect
is depicted in Fig. 13. One sees that, as the tuning voltage Vtune
increases, there is a shift of the locking position to positive
. The reason lies in the finite output resistance
values of
of the active element used in CP1. Different current gains in
CP1's UP and DOWN branches need to be compensated by up
and dn signals with different duty cycles at the locking point.
Different duty cycles are accomplished by a shift in the loop's
locking position.
Fig. 13 shows situations where the gain in the UP branch of
the pump decreases as Vtune increases. The ideal operating situation is depicted in Fig. 13(a). Situation (b) is still allowed from
the point of view of spectral purity but has asymmetrical settling
performance. Finally, (c) depicts a situation that should never
happen: the locking position shifts so much that the high-current charge pump CP2 becomes active and degrades the in-lock
spectral purity. Therefore, increasing the size of CP2's dead zone
s) eases the design of charge pump CP1 and increases the
(
robustness of the system.
On the other hand, the size of CP2's dead zone influences the
settling performance of the adaptive loop. The influence of
on the transient response was simulated with behavioral models.
The results are displayed in Fig. 14, together with the settling
requirements that ensure inaudible background scanning functionality. Table II presents the settling time for different settling
. A dead-zone value of
accuracies and different values of
infinity corresponds to the situation where only CP1 is active
498
Fig. 14.
TABLE II
SIMULATED IN-LOCK SNR AND SETTLING TIME (ms) FOR A 20-MHz
FREQUENCY JUMP FOR DIFFERENT VALUES OF THE DEAD ZONE AND
DIFFERENT SETTLING ACCURACIES
Fig. 15.
V. CIRCUIT IMPLEMENTATION
A die micrograph of the total tuner IC is displayed in Fig. 15.
The adaptive PLL has been integrated with the other functional
blocks of Fig. 1 in a 5-GHz, 2- m bipolar technology [15].
Fig. 16. Architecture of the main programmable divider.
A. Programmable Dividers
The architecture of the main divider is depicted in Fig. 16.
The high-frequency part of the programmable divider is based
on the programmable prescaler concept described in [12] and
consists of a chain of 2/3 divider cells. The modular architecture
enables easy optimization of power dissipation and robustness
for process variations. The division range of the basic prescaler
configuration is extended by the low-frequency programmable
counter. The logic functions of the PLL were implemented with
current routing logic techniques (CRL) [12], [16]. The low-frequency part of the main and reference dividers operate with low
current levels to limit total power dissipation. To decrease the
phase noise of the reference signal going to the phase detectors,
this signal is reclocked in a high-current D-flip-flop (D-FF). The
clean crystal signal is used to clock the D-FF. The total main divider current consumption is 5 mA. The first 2/3 cell consumes
2.1 mA.
Fig. 17.
Fig. 18.
499
1t .
B. Oscillators
The LC VCO uses an external tank circuit. It can be tuned
from 150 to 250 MHz, with a voltage tuning range from 0.5
to 8 V. The VCO phase noise is 100 dBc/Hz at 10 kHz, for
a carrier frequency of 237 MHz. The VCO core consumes
1.5 mA. The 20.5-MHz reference crystal oscillator operates
in linear mode, to avoid harmonics interfering in the FM
reception bands. Quadrature generation for the image rejection
FM mixers (see Fig. 1) is accomplished in a divider-by-two
(FM DIV), with the exception of reception in the American
Weather Band (WX). In that case, I/Q signals are generated
with a RC-CR network directly from the VCO. This avoids the
need to have the VCO operating at 346 MHz, and a change in
the LC VCO tuned circuit during WX reception.
C. Charge Pumps
Fig. 17 shows the simplified circuit diagram of the low-current charge pump CP1. The up and dn signals from the phase
Fig. 19.
500
(a)
(b)
Fig. 20.
Spectral purity measurements in FM mode: (a) reference spurious breakthrough and (b) close to the carrier.
detector drive the input differential pairs, which set the currents
in the PNP current switches Q1 and Q2 on and off. The collector
outputs of Q1 and Q2 are kept at equal dc levels by the dc feed-
back arrangement provided by Q3 and Q4. This prevents asymmetry in the source and sink currents, ensuring good centring of
the charge-pump characteristics for all tuning voltages. Q5 and
501
> 300 V. Fin = 97:1 MHz, AF freq = 1 kHz. SNR meas.: FMdev = 22:5
VII. CONCLUSION
VI. MEASUREMENTS
The measured charge-pump currents as a function of the
time difference between the phase detector inputs are shown
in Fig. 18. Good centering of the two charge-pump outputs
is observed, and there is enough margin for variations in
the in-lock position of CP1. The measured settling transient
response is displayed in Fig. 19. The settling performance
complies to the settling requirements and enables inaudible
background scanning in single-tuner RDS applications.
The frequency spectrum of the VCO in FM mode is presented
in Fig. 20(a) and (b). Fig. 20(a) shows the spurious reference
breakthrough at 100 kHz to be under 81 dBc. There is yet a
6-dB improvement in noise and spurious breakthrough before
the VCO signal reaches the FM mixers, due to the division by
two in the FM DIV divider (see Fig. 1). Fig. 20(b) displays the
phase noise spectrum close to the carrier. Spectrum measurements done in AM mode showed a reference spurious breakthrough of 57 dBc, at an offset of 20 kHz from the carrier. For
AM, the improvement in phase noise and spurious performance
amounts to 26 dB, due to the division by 20 in between the VCO
and the AM mixers.
Finally, the SNR and THD of the total FM receiver chain are
displayed in Fig. 21 as a function of the antenna input signal
. For low values of
, the noise is dominated by RF
level
input noise and by the quality of the building blocks in the signal
processing chain: low-noise amplifier, mixers, and demodulator.
( 300 V), the dominant noise source
For high values of
becomes the LO signal. The excellent measured FM sensitivity,
2.0 V for 26-dB SNR, and the ultimate SNR of 65 dB verify
the spectrum purity of the tuning system and of the RF channel.
502
REFERENCES
[1] B. Razavi, A 900 MHz/1.8 GHz CMOS transmitter for dual-band applications, IEEE J. Solid-State Circuits, vol. 34, pp. 573579, May 1999.
[2] K. Kianush and C. S. Vaucher, A global car radio IC with inaudible
signal quality checks, in IEEE Int. Solid-State Circuits Conf. Dig. Tech.
Papers, 1998, pp. 130131.
[3] W. P. Robins, Phase Noise in Signal Sources, 2nd ed, ser. 9. London,
U.K.: Inst. Elect. Eng., 1996.
[4] H. Adachi, H. Kosugi, T. Awano, and K. Nakabe, High-speed frephase-locked loop,
quency-switching synthesizer using fractional
IEICE Trans. Electron., pt. 2, vol. 77, no. 4, pp. 2028, 1994.
[5] U. L. Rohde, RF and Microwave Digital Frequency Synthesizers. New
York: Wiley, 1997.
[6] F. M. Gardner, Charge-pump phase-lock loops, IEEE Trans.
Commun., vol. 28, no. 11, pp. 18491858, Nov. 1980.
[7] H. Meyr and G. Ascheid, Synchronization in Digital Communications. New York: Wiley, 1990.
[8] F. M. Gardner, Phase-Lock Techniques. New York: Wiley, 1979.
[9] B. Razavi, Ed., Monolithic Phase-Locked Loops and Clock Recovery
Circuits. New York: IEEE Press, 1996.
[10] V. F. Kroupa, Noise properties of PLL systems, IEEE Trans.
Commun., vol. C-30, pp. 22442552, Oct. 1982.
[11] C. S. Vaucher, Synthesizer architectures, in Analog Circuit Design, R.
J. van de Plassche, Ed. Norwell, MA: Kluwer, 1997.
Cicero S. Vaucher (M98) was born in So Francisco de Assis, Brazil, in 1968. He graduated in electrical engineering from the Universidade Federal do
Rio Grande do Sul, Porto Alegre, Brazil, in 1989.
He joined the Integrated Transceivers group
of Philips Research Laboratories, Eindhoven,
The Netherlands, in 1990, where he works on
implementations of low-power building blocks for
frequency synthesizers, on synthesizer architectures
for low-noise/high-tuning-speed applications, and
on CAD modeling of PLL synthesizers.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 10, OCTOBER 2000
1445
I. INTRODUCTION
(1)
The conventional PD is implemented in conjunction with a
charge-pump loop filter in the PLL, as illustrated in Fig. 3.
To determine the transfer function of the PD, assume there is
and
in the
a time interval between two input signals
PD, the output current of the charge-pump circuit is a pulse of
duration , and the amplitude of the charge-pump current is
. In the continuous-time approximation, the average value
per input signal period can be given as
(2)
The transfer function curve of a linear PD is shown in Fig. 4(a),
where the vertical axis represents the charge injected into the
loop filter during one period of the input signal. The characteristic of a nonlinear PD, as shown in Fig. 4(b), can be divided
into two regions [7]. It has the same characteristic within the
locked-in region as that of the linear PD, but the acquisition
time will be reduced with the steeper characteristic outside the
lock-in region. When designing a PLL with the nonlinear PD,
first the central slope is determined to fulfill the requirement of
noise and modulation for the PLL with a standard PD. Then,
is gradually increased to improve acthe slope near
quisition speed. The proposed nonlinear PD can be built with
delay cells and standard PD circuits, as shown in Fig. 4(c). The
standard PD is a digital circuit, triggered by the positive edge
of the input reference signal and the output feedback signal
. Considering the delay cells with
, the PDs decide the
among these regions. Acposition of the phase difference
cording to the value of , the charge pump will output the corresponding current controlled by the up signals or the down
signals . The behavior model of the nonlinear PD can be explained by the waveforms of Fig. 4(d). According to the time
difference between both input signals and , the up signals
are used to increase and the down signals are used to decrease the frequency of signal . The nonlinear PD always generates the right signal to equalize the frequency of both input
signals as the conventional PD. The time interval is positive
1446
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 10, OCTOBER 2000
Fig. 1.
Fig. 2.
and a resistor
with a capacitor
added in parallel as shown
in Fig. 2. The impedance of this filter can be
(4)
and
with
open-loop gain of the PLL equals
. The
(5)
Fig. 3. Phase detector with charge-pump filter.
(6)
The open-loop gain of this third-order PLL can be calculated in
terms of the frequency , as follows:
(7)
and its phase margin can be determined in terms of
(3)
(8)
is the pump current of the charge-pump circuit. The
where
is the series connection of a capacitor
impedance
In order to maintain the same loop gain and phase margin for
1447
Fig. 4. (a) Characteristic of the conventional linear phase detector. (b) Characteristic of the nonlinear phase detector. (c) Block diagram of the nonlinear phase
detector. (d) Operation of the nonlinear phase detector.
Fig. 5.
1448
Fig. 6.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 10, OCTOBER 2000
1449
(a)
(b)
(c)
Fig. 8. Implementation of phase-frequency detector. (a) Half-transparent cell. (b) Phase-frequency detector (PFD). (c) Timing diagram of PFD.
described later,
and
of the PD connected to the charge
pump should be interchanged. The charge pump, which is based
on one described in [11], is adopted. It suppresses the charge
sharing from the parasitic capacitance by a pair of switchedcurrent sources.
C. Dual-Modulus Prescaler
The dual-modulus prescaler is the high-frequency building
block in the frequency synthesizer. This circuit shown in Fig. 9
divides the frequency of the VCO output signal by a factor of
32 or 33 depending on the logic value of the controlled signal
1450
Fig. 9.
Fig. 10.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 10, OCTOBER 2000
Fig. 11.
VCO schematic.
Fig. 12.
IC microphotograph.
Fig. 13.
1451
Fig. 14. Measured output spectrum of the frequency synthesizer. (a) With span
50 MHz. (b) With span 1 MHz.
Fig. 16.
Fig. 17.
Fig. 15. Measured output spectrum of the frequency synthesizer with added
frequency doubler.
1452
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 10, OCTOBER 2000
V. CONCLUSION
In this paper, a PLL with the DAPD is implemented in a
0.35- m CMOS process. The proposed DAPD can be applied
to enhance the switching speed of the PLL, but maintain better
noise bandwidth. When adding the DAPD in the PLL, it will
control the charge pump and loop filter and still maintain
the loop stablity with the same phase margin as in the steady
state. The prototype frequency synthesizer using this structure
is also implement at 448 MHz, and the output waveform
is 99 dBc/Hz at 100-kHz offset. By adding a frequency
doubler, the synthesizer can operate at 896 MHz, and the output
waveform is 91 dBc/Hz at 100-kHz offset from carrier.
ACKNOWLEDGMENT
The authors would like to thank the SHARP Technology
Company, Japan, for the fabrication of the chip.
REFERENCES
[1] F. M. Gardner, Phaselock Techniques, 2nd ed. New York, NY: Wiley,
1979.
[2] P. Larsson, Reduced pull-in time of phase-locked loops using a simple
nonlinear phase detector, IEE Proc. Commun., vol. 142, no. 4, pp.
221226, Aug. 1995.
[3] D. H. Wolaver, Phase-Locked Loop Circuit Design. Englewood Cliffs,
NJ: Prentice-Hall, 1991.
[4] F. M. Gardner, Charge-pump phase-locked loops, IEEE Trans.
Commun., vol. COM-28, pp. 18491858, Nov. 1980.
[5] R. E. Best, Phase-Locked Loops: Theory, Design and Applications. New York, NY: McGraw-Hill, 1984.
[6] M. V. Paemel, Analysis of a charge-pump PLL: A new model, IEEE
Trans. Commun., vol. 42, pp. 24902498, July 1994.
[7] C. Y. Yang, W. C. Chung, and S. I. Liu, Effectively reduced pull-in
time of PLL with nonlinear phase comparator, in 8th VLSI/CAD Symp.,
Taiwan, R.O.C., Aug. 1997, pp. 205208.
[8] D. Byrd, C. Davis, and W. O. Keese, A fast locking scheme for PLL
frequency synthesizer, National Semiconductor, Santa Clara, CA, Application Note, July 1995.
[9] J. Yuan and C. Svensson, Fast CMOS nonbinary divider and counter,
Electron. Lett., vol. 29, pp. 12221223, June 1993.
[10] S. Kim, K. Lee, Y. Moon, D. K. Jeong, Y. Choi, and H. K. Kim, A 960Mb/s/pin interface for skew-tolerant bus using low jitter PLL, IEEE J.
Solid-State Circuits, vol. 32, pp. 691700, May 1997.
[11] I. A. Young, J. K. Greason, and K. L. Wong, A PLL clock generator with
5- to 110-MHz of lock range for microprocessors, IEEE J. Solid-State
Circuits, vol. 27, pp. 15991607, Nov. 1992.
2232
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998
AbstractA phase-locked-loop (PLL)-based frequency synthesizer incorporating a phase detector that operates on a windowing
technique eliminates the need for a frequency divider. This new
loop architecture is applied to generate the 1.573-GHz local
oscillator (LO) for a Global Positioning System receiver. The
LO circuits in the locked mode consume only 36 mW of the
total 115-mW receiver power, as a result of the power saved by
eliminating the divider. The PLLs loop bandwidth is measured
to be 6 MHz, with a reference spurious level of 047 dBc. The
front-end receiver, including the synthesizer, is fabricated in a
0.5-m, triple-metal, single-poly CMOS process and operates on
a 2.5-V supply.
Index TermsFrequency synthesizers, Global Positioning System, phase detection, phase-locked loops, radio-frequency integrated circuits, radio receivers.
Fig. 1. GPS receiver architecture.
I. INTRODUCTION
HE growing demand for portable, low-cost wirelesscommunication devices has spurred interest in radiofrequency integrated circuits. Part of offering a completely
integrated solution involves identifying a low-power, monolithic gigahertz local oscillator (LO) implementation. A quartzcrystal-based oscillator cannot be used directly for the LO,
since the fundamental modes of inexpensive quartz crystals
are limited to approximately 30 MHz [1], and overtone orders
of 50 are impractical. However, a crystal oscillator can be
used as the reference in a static-modulus phase-locked-loop
(PLL) frequency synthesizer. As is well known, the stability of
the frequency-multiplied reference is retained by a wideband
loop. This ability to synthesize a stable high-frequency source
is beneficial, but it comes at the expense of significant power
consumption. This paper addresses the power issue by introducing a new type of phase detector capable of phaselocking
the synthesizers frequency-multiplied output to its reference
input, without the use of a divider. Eliminating the need for
the divider allows the synthesis of a 1.573-GHz output on only
36 mW of power in this technology.
Section II examines the PLL-based LO used for the Global
Positioning System (GPS) receiver architecture shown in
Fig. 1 [2] and introduces the element that eliminates the need
Manuscript received May 7, 1998; revised August 4, 1998.
The authors are with the Center for Integrated Systems, Stanford University,
Stanford, CA 94305 USA.
Publisher Item Identifier S 0018-9200(98)09432-3.
Fig. 2. Integer-
2233
(a)
(b)
(c)
(d)
Fig. 3. Phaselock techniques. (a) Phaselocked signals. (b) Phaselock with a
divider and PFD. (c) PFD along; negative charge pump current commands
the VCO to decrease its frequency, breaking phaselock. (d) Phaselock with
an APD.
GHz integer- synthesizer built in a 0.6- m CMOS technology reported a total power consumption of 90 mW, of which
22.5 mW were used by the divider [4]. A further disadvantage
of the divider is the on-chip interference generated by its highspeed digital transitions. This is particularly worrisome if the
synthesizer is to be integrated with the front ends sensitive
low-noise amplifier.
To reduce power consumption and high-frequency noise, a
windowing technique that eliminates the divide-by- block
for phase comparisons is investigated here. To appreciate how
windowing may be of benefit, it is worthwhile to revisit
the phenomenon of locking in a conventional PLL. To retain phaselock, it is necessary to align every th rising or
falling edge of the voltage controlled oscillator (VCO) with
a corresponding reference edge. Phaselock is demonstrated in
, where every fourth rising VCO edge
Fig. 3(a) for
lines up with a rising reference edge. A divider with a phasefrequency detector (PFD) accomplishes edge alignment by first
dividing down the VCO by the right multiple so that edge
2234
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998
where
N FAA.
(4)
, giving
(5)
is the phase-detector gain constant. Note that even
where
though there is no explicit divider in the loop, the VCO phase
in (5), just as in a conventional loop.
is divided by
This model can be used in place of the APD in Fig. 4, and
the other blocks in the same figure can be replaced by their
corresponding linear time-invariant (LTI) models, yielding
the overall system model shown in Fig. 7. Fig. 7 is an LTI
representation of the APD PLL in lock, from which the phase
transfer function is readily found to be
(6)
is the VCO gain constant and
where
filters impedance, expressed in the -domain.
B. Loop Theory
Having provided an overview of APD operation, we now
develop a linearized APD PLL model relating input and output
phase. This model is important for quantitative loop design and
ensures that the synthesized output has the desired stability
and noise performance.
From the description of the late and early APD signals given
in the previous subsection, the average charge pump current
over one reference cycle is given by
(1)
is the magnitude of the charge pump current,
is
where
the time of the first VCO rising edge in the window, is the
is the
time of the reference rising edge in the window, and
angular reference frequency. The current can be expressed
as a function of the reference and VCO phases by relating these
phases to and , assuming small phase errors. Expressions
relating edge time to signal phase are
(2)
C. APD Characteristic (
Versus
is the loop
2235
= 4.
Fig. 11.
Fig. 9. APD characteristic over (2 )=N interval.
= 2;
= 7 subharmonic-lock mode.
and
(9)
(10)
From this information, the portion of the APD characteristic
shown in Fig. 9 can be constructed.
The influence of two parameters , the delay between
the time the window opens and when the reference edge
occurs, and , the ratio between the VCO and reference
frequencieswarrants special attention. Decreasing shifts
the characteristic diagonally up (along the line of the characteristic), and increasing shifts the characteristic diagonally
down. It is desirable to have equal to half the synthesized
frequencys period. By designing for this condition, the APD
to provide a
characteristic will be centered about
affects the
symmetrical correction range. The parameter
D. Subharmonic-Lock Modes
The existence of subharmonic-lock modes explains the need
for an acquisition aid. During each window, which opens
periodically at the reference rate, the APD makes a single
phase comparison. It is this property that allows an APD to
phaselock the VCOs output to an integer multiple of the
reference input. But the ability to examine the phase of two
signals at different frequencies introduces more modes than
just the desired integer-lock modes. Additional subharmoniclock modes occur if the net current delivered over multiple
cycles of the reference is zero, allowing the loop to stay locked
at an undesired frequency [5].
the number of reference cycles over
If we designate by
which the net charge delivered to the loop filter is zero, then
an expression relating the reference frequency to the VCO
. Fig. 11
frequency when phaselock occurs is
displays the points on the APD characteristic between which
, and
the loop ping-pongs for the specific case where
. Because
, the charge pump alternates between
pumping up on one cycle and pumping down on the next cycle,
balancing the charge to the loop filter over two cycles.
2236
Fig. 12.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998
(13)
Fig. 13.
Fig. 14.
2237
Fig. 15.
2238
Fig. 17.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998
H (f ) .
j
Fig. 18.
The PLL has a wide bandwidth of 6 MHz, and the APD circuit
consumes only one-quarter of the total synthesizer power. With
the elimination of the divider, the main power consumer in
the synthesizer is now the VCO.
IV. CONCLUSION
A new method for performing phase detection that eliminates the divide-by- function within a PLL has been presented. A frequency acquisition aid circuit, which can be
powered down once lock is established, is required. By using
an aperture phase detector, a 1.573-GHz local oscillator can be
synthesized on roughly half the power of a loop containing a
conventional divider. Additionally, elimination of the divider
also reduces the frequency of transitions that might cause
substrate and supply bounce. The power savings and noise
reduction make the APD PLL an attractive design for lowpower, integrated frequency synthesizers.
ACKNOWLEDGMENT
The authors gratefully acknowledge Rockwell International
for fabricating the receiver and Dr. C. Hull and Dr. P. Singh
for their valuable assistance. In addition, the authors acknowledge Tektronix, Inc., for supplying simulation tools and E.
McReynolds for his invaluable support of, and assistance with,
CMOS modeling issues. Last, the authors thank IBM for
generous student support through IBM fellowships.
REFERENCES
[1] M-tron Engineering Notes, Dec. 1997.
[2] D. K. Shaeffer, A. R. Shahani, S. S. Mohan, H. Samavati, H. R. Rategh,
M. Hershenson, M. Xu, C. P. Yue, D. J. Eddleman, and T. H. Lee, A
115-mW, 0.5-m CMOS GPS receiver with wide dynamic-range active
filters, IEEE J. Solid-State Circuits, vol. 33, pp. 22192231, Dec. 1998.
[3] T. H. Lee, The Design of CMOS Radio-Frequency Integrated Circuits.
Cambridge: Cambridge Univ. Press, 1998.
[4] J. F. Parker and D. Ray, A 1.6 GHz CMOS PLL with on-chip loop
filter, IEEE J. Solid-State Circuits, vol. 33, pp. 337343, Mar. 1988.
[5] F. M. Gardner, Phaselock Techniques, 2nd ed. New York: Wiley, 1979.
2239
Min Xu (S97), for a photograph and biography, see this issue, p. 2231.
C. Patrick Yue (S93), for a photograph and biography, see this issue, p. 2231.
Daniel J. Eddleman (S98), for a photograph and biography, see this issue,
p. 2231.
Arvin R. Shahani, for a photograph and biography, see this issue, p. 2041.
Derek K. Shaeffer (S98), for a photograph and biography, see this issue,
p. 2230.
S. S. Mohan (S98), for a photograph and biography, see this issue, p. 2231.
Hirad Samavati (S98), for a photograph and biography, see this issue, p.
2041.
Hamid R. Rategh (S98), for a photograph and biography, see this issue,
p. 2231.
Maria del Mar Hershenson (S98), for a photograph and biography, see this
issue, p. 2231.
Thomas H. Lee (S87M87), for a photograph and biography, see this issue,
p. 2041.
777
I. INTRODUCTION
Fig. 1.
FREQUENCY
The frequency resolution of a conventional integer- frequency synthesizer is the same as the reference frequency of
778
frequency synthesizer
the synthesizer PLL. Therefore, narrow channel spacing is accompanied by a small loop bandwidth, which leads to slow dynamics [10]. In case of a fractional- frequency synthesizer,
the output frequency is a fractional multiple of the reference frequency, so that narrow channel spacing is achieved along with a
higher phase detector frequency. Consequently, the loop bandwidth can be widened, and faster settling time and lower close-in
phase noise of the frequency synthesizer is achieved [10].
The noninteger dividing values in a fractional- synthesizer
can be realized by the periodic dithering of the dividing ratio
between integer values [11]. However, the dithering leads to
a periodic phase error and introduces spurious tones in the
output spectrum. To resolve the spurious noise problem, several
methods, such as phase interpolation using a digital-to-analog
(D/A) converter [12], or altering the dithering pattern using a
modulator [10], [11] have been proposed and used, but
they still have limitations such as increased power dissipation
and spurious noise.
On the other hand, if multiphase clock signals are available,
the noninteger divider is directly implemented without dithering
or interpolation. Fig. 3 shows the operation of the fractionaldivider using the proposed edge-combining technique. The
output of an integer frequency divider is sequentially latched
by the eight-phase voltage-controlled oscillator (VCO) output
signals, as shown in Fig. 3(a). As a result, a set of phase-shifted
waveforms is obtained, and the amount of the phase shift is
1/8 of the VCO period. By manipulating the waveform set, a
waveform whose period is a fractional multiple of the VCO
period is generated. For example, if the desired fractional
value is 1/8, the pulse-switching block multiplexes the delayed
waveforms with following periodic sequence:
Fig. 3. (a) Block diagram of the fractional divider. (b) Operation of the
fractional divider when = 1.
Fig. 4. Edges of the multiphase signals. (a) With delay mismatches. (b)
Without delay mismatches.
from
to
when the output pulse is multiplexed from
out8 to out1.
When the PLL is locked, the output frequency of the PLL is
, and the synthesizer operates as a modulo-8
fractional- frequency synthesizer. The value of determines
the switching sequence of the pulse-switching block. The
switching sequence and division ratio are controlled by a
separated control logic.
III. DELAY MISMATCH PROBLEM AND FRACTIONAL SPURS
Although multiphase clocks from a ring oscillator can be used
to implement a fractional- frequency synthesizer, an important problem still exists: how to deal with the delay mismatches
between the delay cells in the VCO. Ideally, the phase differences among the multiphase signals from a ring oscillator are
precisely equal, as shown in Fig. 4(a). However, in practice,
the edges of the multiphase clocks are not uniformly spaced,
as shown in Fig. 4(b). The delay mismatches arise from several
mismatch, device size mismatch, and so on.
causes such as
Fig. 5.
779
Since each edge of the fractional- divider output is periodically synchronized with one of the multiphase signals from
the VCO, the timing information on the delay mismatches is
contained in the divider output. Therefore, when the PLL is
locked, the delay mismatches introduce periodic phase errors
at the input of the phase-frequency detector (PFD). Due to the
periodic phase errors, fractional spurs appear in the output spectrum of the synthesizer. Also, if I and Q signals are tapped from
the VCO, the I/Q phase offset will appear. Therefore, the delay
mismatches must be eliminated to realize low phase-noise frequency synthesizer.
The relationship between the phase errors and the fractional
spurs is derived from the noise transfer function of the PLL.
Fig. 5 shows the -domain model of the PLL. If only the effect of
the divider noise is considered, the output phase noise becomes
(1)
Fig. 6. Simulated output spectrum of the PLL with the maximum phase offset
of 2.5 .
TABLE I
PARAMETERS FOR SPURIOUS NOISE SIMULATION OF THE PLL
where
PFD gain;
loop filter transfer function;
VCO gain;
dividing ratio;
output phase noise;
phase noise generated from the fractional divider.
is
For the edge-combining frequency synthesizer,
mainly caused by the delay mismatches between the delay
cells, and is periodic because the fractional- dividing is performed by periodic combining of the phase-shifted waveforms.
is a sinusoidal, the output voltage of the
Assuming that
PLL is presented as
dBc
. The
(3)
(2)
Fig. 7 shows the input waveforms of the PFD when the PLL
is
is locked and the fractional dividing ratio is set to 1/8.
the relative phase error corresponding to the th output signal of
780
Fig. 7. Periodic phase error at the PFD input when PLL is locked without calibration.
where
is the phase error due to the th output signal after
cycles of calibration, and
is the amount of the calibration
for the th VCO output in the th iteration.
If the above iteration is performed continuously until
is satisfied for all delay cells, the final value of the phase
error due to the 1st VCO output becomes
(4)
If the calibration circuit shifts the rising phase of the first output
, the phase errors are temporarily changed to
by
(5)
(9)
(7)
If this operation is performed on each output of the VCO one
by one, the phase errors are changed as shown at the bottom of
the page, and this is the completion of one iteration of the calibration procedure. The resulted phase errors after first iteration
are given by
(8)
Similarly
(10)
Consequently, all the phase errors become zero after finishing
the completion of the calibration. Note that the calibration algorithm performs correctly even if the amount of the individual
phase correction is different and even if the order of the calibration is changed. Fig. 8 shows the trend of phase error during the
calibration, simulated by MATLAB. Here, maximum 10-mV
mismatches are assumed.
V. CIRCUIT IMPLEMENTATION
A. Overall Structure
As shown in Fig. 2, a loop for the calibration is combined
with the main fractional- synthesizer. The calibration loop periodically measures the phase error due to delay mismatch at
the PFD, and compensates for the mismatches by updating the
offset control voltage of delay cells one after another. This update operation must be performed only when the PLL is locked,
initially:
1st step:
2nd step:
..
.
8th step:
..
.
..
.
..
.
Fig. 10.
781
782
Fig. 11.
Fig. 12.
Output spectrum of the synthesizer PLL. (a) Without calibration. (b) With calibration.
VII. CONCLUSION
A self-calibrated 1.8-GHz PLL for fractional- frequency
synthesizing is fabricated in a 0.35- m CMOS process. A
ring-type oscillator is used to generate the multiphase signals,
and a self-calibration loop reduces the output fractional spurs
caused by delay mismatches between the delay cells. The phase
offset of the I/Q signals from the ring oscillator is also relieved.
With this calibration scheme, the fractional spur on the PLL is
attenuated by 25 dB and the maximum phase offset is thereby
reduced to less than 0.2 .
REFERENCES
Fig. 13.
TABLE II
PERFORMANCE SUMMARY OF THE SELF-CALIBRATED PLL
Chan-Hong Park (S92) received B.S. and M.S. degrees in electrical engineering from Korea Advanced
Institute of Science and Technology (KAIST),
Taejon, Korea, in 1994 and 1996, respectively. He
is currently working toward the Ph.D. degree in
electrical engineering at KAIST.
From 1994, he has been with the Department
of Electrical Engineering, KAIST, as a Graduate Researcher, where he has been involved in
designing 100Base-T transceiver ICs, low-noise
phase-locked loops, and RF front-ends for wireless
communications. His research interests include CMOS RF circuits for
wireless communication, high-frequency analog IC design, and mixed-mode
signal-processing IC design.
783
Ook Kim (M86) received the M.S. and Ph.D. degrees in electronics engineering from Seoul National
University, Seoul, Korea, in 1988 and 1994, respectively.
He was with the Electronics and Telecommunications Research Institute, Taejon, Korea, from 1994 to
1998, and with SK Telecom, Seoul, Korea, from 1998
to 1999. Since 1999, he has been with Silicon Image
Inc., Sunnyvale, CA. He was a Visiting Researcher
at the Department of Electrical and Electronic Engineering, Adelaide University, Adelaide, Australia,
during 1992, and a Visiting Scholar at the Department of Electrical Engineering,
Stanford University, Stanford, CA, during 1999. His research interests are in
CMOS mixed mode circuit design, high-speed data conversion, wireless circuit
technology, and high-speed data communication.
788
Transceiver architecture.
I. INTRODUCTION
789
proper choice of device dimensions and bias current, a differential swing of 0.5 V can be achieved at this port. Note that if a
frequency doubler were used, the output would be single-ended
and difficult to convert to differential form at such a high frequency.
The tuning of the oscillator poses several difficulties: the varactor diode must exhibit a small series resistance and remain
reverse-biased even with large swings in the oscillator, and the
varactor capacitance must be large enough to yield the required
tuning range, but at the cost of increasing the power dissipation
or the phase noise. This design incorporates a p -n diode inside an n-well and strapped with metal to reduce the n-well series resistance [4]. Such a structure suffers from a large parasitic
n-well/substrate capacitance, making it desirable to connect the
anode of the diode to the oscillator. This is accomplished as illustrated in Fig. 4(b), where only one of the two oscillators is
shown for clarity. Here, the control voltage varies the dc poten.
tial at nodes and by varying the on-resistance of
leads
However, the sharp variation of the on-resistance of
to significant change in the gain of the VCO. To make the tran, in series with a resistor
sition smoother, another transistor,
serves as a clamp,
is added as shown in Fig. 4(c). Transistor
keeping the tail current source in saturation. Otherwise, the oscillator may turn off during synthesizer loop transients.
Since the minimum voltage at node is only a few hundred
millivolts above ground, an NMOS differential pair cannot
directly sense the 5.2-GHz signal at this node. Instead, a
is constant,
common-gate stage is used [Fig. 4(d)]. But if
turns off for low values of
. Modifying the circuit
then
as shown in Fig. 4(e) ensures that the common-gate stage
carries a constant bias current across the full tuning range.
The choice of the inductors and capacitance of the varactors
entails a compromise between the phase noise and the tuning
range. In this design, 7-nH inductors are used, each contributing
a parasitic capacitance of 120 fF. The cross-coupled transistors
are relatively wide to ensure startup, yielding approximately
175 fF of gate-source capacitance. The differential pairs coupling the oscillators also load the tank. As a result, the varactor
capacitance for 2.6-GHz operation must not exceed 160 fF.
The inductors are realized as stacked spirals [5] made of metal
4 and metal 3 with a width of 6 m. Since the tuning range
is inevitably narrow, it is critical to predict the oscillation frequency accurately. A distributed model is used for each inductor,
yielding an error of only a few percent in the measured frequency of oscillation.
B. Frequency Divider
The design of a 2.6-GHz programmable divider with a reasonable power dissipation in 0.4- m CMOS technology is quite
difficult. A number of circuit techniques are introduced in this
work to ameliorate the powerspeed tradeoff.
The divider is based on a pulse-swallow topology. Shown in
Fig. 5(a) is a conventional implementation, consisting of a dualmodulus prescaler, a fixed-ratio program counter, and a programmable swallow counter. The RS latch is typically included
in the swallow counter and is drawn explicitly here for clarity.
1 until
The prescaler begins the operation by dividing by
the swallow counter is full. The RS latch is then set, changing
790
791
Fig. 7. Divide-by-2/3 circuit: (a) conventional topology and (b) circuit used in
this work.
Fig. 6.
Prescaler.
ment in this path. Fig. 9 illustrates the issue. When the 9 operation of the prescaler is finished, the circuit would have at most
to change the modulus to eight. In this parseven cycles of
ticular prescaler, the timing budget is actually about five input
cyclesapproximately 1.9 ns. Thus, with no pipelining, the last
pulse generated by the prescaler in the 9 mode must propagate
through the level converter, the first 2 stage in the swallow
counter, the subsequent logic, the RS latch, and the three-input
NOR gate in less than 1.9 ns. Such a delay constraint necessitates the use of current steering in this path, raising the power
dissipation and complicating the design. With pipelining, on
the other hand, the maximum tolerable delay increases to about
eight input cyclesapproximately 3.1 ns.
Fig. 10.
792
Fig. 11.
(a) Addition of correction circuit to charge pump. (b) Simple folding circuit. (c) Folding circuit with one reference voltage.
to
. For this reason,
and
are formed as poly-metal
sandwiches (albeit with much less density than MOS capacitors).
Another issue in the design of the loop filter of Fig. 10 relates
. Low-pass filtered by
to the thermal noise produced by
and , this noise modulates the VCO, raising the output phase
noise. The thermal noise on the control voltage per unit bandwidth is given by
(1)
(2)
Fig. 12.
793
Die photograph.
With the values chosen in this design, the output phase noise
reaches 138 dBc/Hz at 10-MHz offset for
GHz/V. While it is desirable to reduce the value of
, the releads to a severe area penalty because of
quired increase in
the low density of the poly-metal capacitors. Note that since the
, if
is,
stability factor
must be quadrupled to maintain constant
say, halved, then
(for a given charge-pump current).
D. Correction Circuit
The gain of the VCO varies substantially across the tuning
range, resulting in considerable change in the settling behavior.
As depicted in Fig. 11(a), it is desirable to vary the charge-pump
, such that the product of
and
and
current,
hence remain relatively constant. Rather than use piecewise
linearization [2], this work incorporates an analog folding techand
nique. Fig. 11(b) shows a possible solution. Here
are off if
is well below 1.1 V and hence
. As
approaches 1.1 V,
turns on while
is off. Thus,
drops,
carries most of
and
a negreaching a low value as
approaches and exceeds 1.3 V,
turns
ligible current. As
eventually returns to
. This design actually ution and
lizes the topology shown in Fig. 11(c), where only one reference
voltage is required and each differential pair provides a built-in
offset by virtue of skewed device dimensions. The characteristic
driving the curis similar to that shown for Fig. 11(b), with
rent mirrors in the charge pump.
The reference voltage of 1.2 V in Fig. 11(c) assumes that
V.
the gain of the VCO reaches its maximum at
This value is somewhat process- and temperature-dependent,
limiting (according to simulations) the suppression of the VCO
nonlinearity to about one order of magnitude.
Fig. 13. Measured spectra at 2.6 and 5.2 GHz in locked condition.
Fig. 14.
Fig. 15.
V. EXPERIMENTAL RESULTS
The frequency synthesizer has been fabricated in a 0.4- m
digital CMOS technology. All of the inductors and capacitors
are included on the chip. Fig. 12 is a photograph of the die,
which measures 1.75 1.15 mm . The circuit has been tested
with a 2.6-V supply.
Figs. 13(a) and (b) depict the output spectra in the locked
condition. The phase noise at 10-MHz offset is equal to 115
dBc/Hz at 2.5 GHz and 100 dBc/Hz at 5.2 GHz. A significant
part of the phase noise at 5.2 GHz is attributed to the considerable loss of the output 50- buffer. Fig. 14 shows the 2.6-GHz
output along with the reference sidebands. The sidebands are
794
Fig. 16.
TABLE I
SYNTHESIZER PERFORMANCE
Christopher Lam received the B.Sc. and M.Sc. degrees in electrical engineering from the University of
California, Los Angeles, in 1997 and 1999, respectively.
He is currently with the Wireless Communication
Group, National Semiconductor, Santa Clara,
CA. His interests include phase-locked loops and
communication circuits.
divider is switched periodically and the control voltage is monitored. The 0.8-pF capacitor results from the trace on the printed
circuit board, and the active probe presents an input capacitance
pF and
pF, the addition of these
of 2 pF. Since
parasitics markedly degrades the stability. Therefore, a 100-k
resistor is placed in series with the active probe to mimic the
and . The low-pass filter thus formed has a corner
role of
frequency comparable to the loop bandwidth, and the 0.8-pF capacitor still produces ringing in the time response. Fig. 16 shows
the measured control voltage, indicating a settling time on the
order of 40 s.
Table I summarizes the measured performance of the synthesizer.
VI. CONCLUSION
The speed and quality of the devices available in an IC technology directly affect the choice of transceiver architectures,
synthesizer topologies, and circuit configurations. In order to
optimize the overall system performance, the transceiver and
the synthesizer must be designed concurrently, with particular
attention to the frequency planning.
Behzad Razavi (S87M90) received the B.Sc. degree from Sharif University of Technology, Tehran,
Iran, in 1985 and the M.Sc. and Ph.D. degrees from
Stanford University, Stanford, CA, in 1988 and 1992,
respectively, all in electrical engineering.
He was with AT&T Bell Laboratories, Holmdel,
NJ, and subsequently Hewlett-Packard Laboratories,
Palo Alto, CA. Since September 1996, he has been
an Associate Professor of electrical engineering at the
University of California, Los Angeles. His current research includes wireless transceivers, frequency synthesizers, phase-locking and clock recovery for high-speed data communications, and data converters. He was an Adjunct Professor at Princeton University, Princeton, NJ, from 1992 to 1994, and at Stanford University in 1995.
He is a member of the Technical Program Committees of the Symposium on
VLSI Circuits and the International Solid-State Circuits Conference (ISSCC),
in which he is Chair of the Analog Subcommittee. He is the author of Principles
of Data Conversion System Design (New York: IEEE Press, 1995), RF Microelectronics (Englewood Cliffs, NJ: Prentice-Hall, 1998), and Design of Analog
CMOS Integrated Circuits (New York: McGraw-Hill, 2000), and the editor of
Monolithic Phase-Locked Loops and Clock Recovery Circuits (New York: IEEE
Press, 1996).
Dr. Razavi received the Beatrice Winner Award for Editorial Excellence at the
1994 ISSCC, the Best Paper Award at the 1994 European Solid-State Circuits
Conference, the Best Panel Award at the 1995 and 1997 ISSCC, the TRW Innovative Teaching Award in 1997, and the Best Paper Award at the IEEE Custom
Integrated Circuits Conference in 1998. He has also served as Guest Editor and
Associate Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS and IEEE
TRANSACTIONS ON CIRCUITS AND SYSTEMS.
835
16
A CMOS Monolithic
-Controlled Fractional-N
Frequency Synthesizer for DCS-1800
Bram De Muer, Student Member, IEEE, and Michel S. J. Steyaert, Senior Member, IEEE
16
16
16
16
I. INTRODUCTION
HE END of the 20th century was characterized by the unrivaled growth of the telecommunication industry. The main
cause was the introduction of digital signal processing in wireless communications, driven by the development of high-performance low-cost CMOS technologies for VLSI. However, the
implementation of the RF analog front end remains the bottleneck. This is reflected in the large effort put into monolithic
CMOS integration of RF circuits both by academics and industry [1][3].
The goal of this work is the monolithic integration in standard CMOS technology of a frequency synthesizer to enable the
full integration of a transceiver front end in CMOS, including
a low-IF receiver and a direct upconversion transmitter [1]. To
achieve a high degree of integratability and fast settling under
fractional- synthesizer topology
low-noise constraints, a
fractional- synthesis circumhas been chosen [4] (Fig. 1).
vents the severe speedspectral purityresolution tradeoff of the
classic phase-locked loop (PLL) synthesizer, by providing synthesis of fractional multiples of the reference frequency. Spurious tones that emerge from the fractional division are whitened
action and ultimately filtered by
and noise shaped by the
the loop filter. To prevent degradation of the spectral purity by
Manuscript received November 5, 2001; revised January 31, 2002.
The authors are with the Katholieke Universiteit Leuven, Department
Elektrotechniek, ESAT-MICAS, B-3001 Heverlee, Belgium (e-mail: bram.demuer@esat.kuleuven.ac.be).
Publisher Item Identifier S 0018-9200(02)05856-0.
Fig. 1. Principle of
16 fractional-N synthesis.
SYNTHESIZER
A. Introduction
fractional- synthesizer is shown
A block diagram of a
modulator output controls the instantaneous
in Fig. 1. The
division modulus of the prescaler, such that the mean division
, with the number of bits of the
modulus is
modulator and the input word. The corresponding phase
changes at the prescaler output are quantized, leading to possible
spurious tones and quantization noise. By selecting higher order
modulators, the spurious energy is whitened and shaped to
high-frequency noise, which can be removed by the low-pass
loop filter. As a result, for a given frequency resolution, an arcan be chosen, by assigning the proper number
bitrary high
836
16 modulator. The internal modulator accuracy is 16 bit. From the five output bits, only four are used for stability
Modulators
16
837
16
18
<
15
838
Fig. 5. Simulation results. The phase error for (a) the MASH modulator and (b) the single-loop multibit modulator. The FFT of the current pulses CP
(c) the MASH modulator and (d) the single-loop multibit modulator.
[i] for
839
Fig. 7. Discrete time autocorrelation estimate of the modulator outputs for (a)
the MASH modulator and (b) the single-loop multibit modulator.
16
840
SUMMARY
TABLE I
LOOP PROPERTIES AND PERFORMANCE
FOURTH-ORDER TYPE-II PLL
OF THE
OF THE
841
Fig. 9. (a) Timing control circuit and signals to control the dummy and the output current branch of the charge pump. (b) Charge-pump circuit with (at the left)
the dummy current branch, denoted by the suffix d, and the output branch.
Conversion
at a fixed level (see Fig. 8). Additionally, the charge-pump current is designed to be at least a magnitude larger than the fixed
parasitic charge injection of the switch transistors. The current
switches are implemented with pMOS and nMOS transistors to
compensate charge injection. Finally, a timing control scheme
[Fig. 9(a)] is developed to control the charge-pump switches.
The up and down control pulses of the PFD are converted to synchronized control signals to drive both the output current branch
and the dummy current branch of the charge pump [Fig. 9(b)].
Fig. 9(a) shows the dummy and output control signals. The
is delayed versus the output control
by
dummy control
modifying the thresholds of the second inverter-string (indicated
always flows, preby high and low) such that the current
venting hard on/off switching of the current sources. To equalize
rise and fall times and force a perfect rad relation between
nMOS and pMOS control signals, latches at the outputs of both
inverter strings are implemented. Capacitors at the control outputs lower the rise and fall times to prevent large charge injections by fast switching.
842
Fig. 10.
To linearize the
conversion, the phase detection
is performed by a zero-dead-zone PFD [15], to prevent a hard
nonlinearity around 0 phase error. Due to the delay added in
the PFD, both the up and down current sources are on, for small
or zero phase errors, enabling the PFD to react to very small
phase errors. The on-time fraction of the charge pump due to
the delay is less than 10%. This value is a tradeoff between
dead-zone prevention and sensitivity to noise coupling, when
the charge pumps are on. To further minimize digital noise coupling, the sampling in the PFD and the computational events
modulator and prescaler are offset in phase. Consein the
quently, the phase-error decision making is done in a relatively
quiet environment. To make sure that the gains for positive and
negative phase-error detection are equal, the current source transistors are oversized to ensure sufficient matching. As a side efnoise, which can seriously affect the
fect, the current source
in-band noise, is decreased. Additionally, the timing control of
Fig. 9(a) provides synchronization between the two filter paths
and the switches of the charge pumps themselves, thereby ensuring equal positive and negative phase-error detection gain.
HSPICE simulations of the PFD charge-pump circuit are performed and show no dead zone and no gain mismatch with ideal
transistor matching.
V. EXPERIMENTAL RESULTS
Fig. 10 shows the IC microphotograph and the measurement
fractional measuresetup in which it is embedded. The
ments are performed by controlling the PLL divider moduli with
an HP80000 data generator, which generates the 4-bit control
output bit stream is generated using Matlab.
word. The 4-bit
modThis provides a flexible way to test different kinds of
ulators, without the need for redesigns. All presented measurements are performed with a 26-MHz reference frequency and at
16
N PLL at 1.76592
SUMMARY
843
OF
TABLE II
MEASURED SPECIFICATIONS COMPARED
DCS-1800 SPECIFICATIONS
TO THE
16
Fig. 13. Phase noise measurement with the MASH converter at 1.76592 GHz
compared to the simulated
noise at the output of the PLL (dashed), and
control (dash-dotted).
with the simulated PLL output without
16
16
noisy control pulses are close to the LC tank and the bonding
wires of the VCO power supply. Without proper shielding, the
VCO phase noise is seriously degraded by this noise coupling.
noise and the
noise as simIn Fig. 13, the measured
ulated in Section III (dashed) is compared. The dash-dotted line
control. The
is the simulated phase noise of the PLL without
noise leakage closely matches the measured resimulated
sults, except at very low offsets due to the limited memory. The
phase noise at high offsets is increased versus the simulated PLL
results due to noise coupling. Second-order tones are larger in
measurements, since the models in the simulator do not include
second-order effects and noise coupling. Tones at 520 kHz are
believed to come from subharmonic tones present in the
modulator output [5], which are amplified by mixing through
noise coupling. When comparing the results for the MASH and
the single-loop modulator, the measured results are less pronounced than the simulated results (see Fig. 6). The measured
phase noise for the single-loop modulator is however a few decibels lower than for the MASH modulator. Note that all measurements are performed for frequencies close to integer multiples
.
of
The measured settling time of the PLL is 226 s for a
104-MHz frequency step. The power consumption of the PLL
is 70 mW from a 2-V power supply. The fully integrated
low-phase-noise VCO is responsible for almost 66% of the
total power consumption. The IC area is 2 2 mm , including
bonding pads and bypass capacitors. Table II shows the measured specifications compared to the DCS-1800 specifications
[1]. The specifications of the IC prototype comply with the
is degraded due to the limited
DCS-1800, only the
resolution of the measurement setup.
VI. CONCLUSION
A monolithic 1.8-GHz
-controlled fractional- PLL
frequency synthesizer is implemented in a standard 0.25- m
CMOS technology. The monolithic fourth-order type-II PLL
844
16
[11] B. De Muer and M. S. J. Steyaert, A single-ended 1.5-GHz 8/9 dualmodulus prescaler in 0.7-m CMOS with low phase-noise and high
input sensitivity, in Proc. Eur. Solid-State Circuits Conf. (ESSCIRC),
The Hague, Sept. 1998, pp. 256259.
[12] J. Craninckx and M. S. J. Steyaert, Low-phase-noise fully integrated
CMOS frequency synthesizers, Ph.D. dissertation, Katholieke Univ.
Leuven, Belgium, 1997.
[13] B. De Muer, M. Borremans, N. Itoh, and M. S. J. Steyaert, A 1.8-GHz
highly tunable low-phase-noise CMOS VCO, in Proc. IEEE Custom
Integrated Circuits Conf. (CICC), Orlando, FL, May 2000, pp. 585588.
[14] B. De Muer and M. S. J. Steyaert, Fully integrated CMOS frequency
synthesizers for wireless communications, in Analog Circuit Design,
W. Sansen, J. H. Huijsing, and R. J. van de Plassche, Eds. Norwell,
MA: Kluwer, 2000, pp. 287323.
[15] F. M. Gardner, Phaselock Techniques. New York: Wiley, 1979.
Bram De Muer (S00) was born in Sint-Amandsberg, Belgium, in 1973. He received the M.Sc.
degree in electrical engineering in 1996 from the
Katholieke Universiteit Leuven, Belgium, where
he is currently working toward the Ph.D. degree
on high frequency low-noise integrated frequency
synthesizers at the ESAT-MICAS laboratories.
He has been a Research Assistant with
ESAT-MICAS laboratories since 1996. His research
is focused on integrated low-phase-noise VCOs with
on-chip planar inductors and high-speed prescaler
design, leading to fully integrated
fractional-N synthesizers in CMOS
technology.
16
691
I. INTRODUCTION
692
(a)
(b)
Fig. 2. (a) External clock and 12 multiphase clocks relationship. (b) Multiphase clock layout.
693
that the next oversampled bit after the first 1-to-0 transition
was sampled near the center of data eye pattern. Each pin
of the data bus tracks the start phase of a data transfer
separately. After each pin catches the start of a data transfer,
the demultiplexed data of each pin is retimed into a single
internal clock domain. Since this process can be done in one
clock cycle, the masters can respond quickly as distance from
the signal source changes.
Since this scheme allows skew not only in clock line but
also among data lines, there is a possibility that some of
the demultiplexed parallel data are one internal clock cycle
earlier or later than the other demultiplexed data after retiming.
694
detector, charge pump, loop filter, clock divider, and a voltagecontrolled oscillator (VCO). With a six-stage differential VCO,
12 clock phases are available to oversample the incoming data
and to serialize parallel data into serial bit stream.
One of the critical building blocks of the PLL is the phase
frequency detector (PFD). A low precision PFD has a wide
dead zone (undetectable phase difference range), which results
in increased jitter. The jitter caused by the large dead zone can
be reduced by increasing the precision of the phase frequency
detector. Fig. 6 shows a conventional implementation of a
static PFD [8]. This conventional PFD is an asynchronous state
machine. The delay time to reset all internal nodes determines
the circuit speed. The critical path of the conventional PFD
is shown in bold lines in Fig. 6. The critical path forms a
feedback path with six gate delays. The dead-zone occurs when
the loop is in a lock mode and the output of the charge pump
Fig. 11.
695
does not change for small changes in the input signals at the
PFD. Any width of the dead-zone directly translates to jitter
in the PLL and must be avoided.
To overcome the speed limitation and to reduce the dead
zone, a new dynamic logic style PFD was designed. A
similar dynamic comparator was reported before [9]. But our
implementation requires fewer number of transistors. Fig. 7
shows the circuit diagram of the PFD. Conventional static
logic circuitry was replaced by dynamic logic gates. As a
result, the number of transistors in the PFD core is reduced
from 44 to 16. The critical path of this PFD is shown also
in Fig. 7. The critical path of this PFD is composed of threegate feedback path. The shortened feedback path delay and
dynamic operation allow high precision in the high-frequency
operation.
Fig. 8. shows the relation between dead zone of PFD and
the phase error of PLL. If the phase difference of EXT clock
and VCO clock is smaller than the dead zone, the PFD cannot
detect the phase difference. So the phase error signal of PFD
will remain zero, resulting in unavoidable phase error between
EXT clock and VCO clock. The minimum peak-to-peak phase
error caused by this dead zone is
Minimum Peak-to-Peak Phase Error
(1)
696
Fig. 13.
Fig. 14.
level settles to
(2)
697
C. Oversampler
The oversampler used in the data receiver is shown in
Fig. 14. Each oversampler is a cascaded sense amplifier and
uses four clocks for correct, timely sampling. It is very
important to reduce the probability of metastability by careful
design and layout. The same size is used for both PMOS and
NMOS in the core synchronizing amplifier to maximize the
loop bandwidth.
IV. EXPERIMENTAL RESULTS
Two prototype chips, master and slave, have been fabricated
in a 0.6- m double-metal CMOS process. Fig. 15 shows the
microphotograph of the fabricated master chip. This chips
occupies 4100 m 4300 m including pad area. The master
chip incorporates a common skew-insensitive I/O macro block,
a bus protocol handler, and a self-test circuit for chip and
system diagnostics. The common skew-insensitive I/O macro
block includes a charge-pump PLL for multiphase generation,
oversamplers, I/O buffers, parallel-to-serial converters, and a
bias generator for internal use. The core area for the skewinsensitive I/O macro block is 3600 m 700 m for 4-pin
interface. The microphotograph of the fabricated slave chip
is shown in Fig 16. It has the same die size as the master
chip. Many blocks are shared with the master chip. The skewinsensitive I/O macro block and the charge pump PLL are the
same as those of the masters. The slave chip includes a small
internal fast SRAM to verify correct read/write operations.
The measured charge pump PLL jitter histogram of the
master and the slave chips is shown in Fig. 17. Since the two
chips use the same PLL, it showed similar jitter performance.
The rms jitter is 15.7 ps when the tested chip is active. The
peak-to-peak jitter was measured to be less than 150 ps. This
PLL jitter characteristic is especially important for multiphase
operation.
Fig. 18 shows an output data waveform at 960 Mb/s. The
master chip is sending data to the bus according to the
predetermined bus protocol. The jitter at the output data is
larger than the jitter at the charge pump PLL clock due to the
extra modulation effect of supply voltage fluctuation to data
698
Fig. 17.
Fig. 18.
TABLE I
MAIN FEATURES OF THE CHIP
2 0.7 mm
Core Area
3.6 mm
Technology
Supply Voltage
Data Rate
960 Mb/s
PLL jitter
Power
3.3 V
Fig. 19.
699
V. CONCLUSION
A new high-speed skew-insensitive I/O scheme has been
described in this paper. Two chips that incorporated the new
I/O scheme using the low jitter PLL technique have been
fabricated in a 0.6- m double-metal CMOS process. Three
times oversampling technique relaxed the strict requirement of
setup and hold margins of high-speed chip-to-chip interfaces.
Newly designed fast phase frequency detector and a high
noise immunity VCO circuit improved jitter performance of
PLL. The measured PLL rms jitter was 15.7 ps. Accurate
multiphase clock generation for oversampling the bus signal
was made possible by utilizing the low jitter PLL. By using
such techniques, skew-insensitive data transfer was tested.
This skew-insensitive I/O scheme is useful for high-speed
ASIC-to-memory and ASIC-to-ASIC interfaces. This scheme
will become more important as the chip-to-chip data transfer
speed goes up.
REFERENCES
[1] M. Horiguchi et al., An experimental 220 MHz 1 Gb DRAM, in
ISSCC 1995 Dig. Tech. Papers, pp. 252253.
[2] M. Horowitz et al., PLL design for a 500 MB/s interface, in ISSCC
1993 Dig. Tech. Papers, pp. 160161.
[3] T. H. Lee et al., A 2.5 V CMOS delay-locked loop for an 18 Mbit,
500 Megabytes/s DRAM, IEEE J. Solid-State Circuits, vol. 29, pp.
14911496, Dec. 1994.
[4] E. Reese et al., A phase tolerant 3.8 GB/s data-communication router
for a multiprocessor supercomputer backplane, in ISSCC 1994 Dig.
Tech. Papers, Feb. 1994, pp. 296297.
[5] S. Kim et al., A pseudo-synchronous skew-insensitive I/O scheme for
high bandwidth memories, in Proc. Symp. VLSI Circuits, June 1994,
pp. 4142.
[6] M. Bazes and R. Ashuri, A novel CMOS digital clock and data
decoder, IEEE J. Solid-State Circuits, vol. 27, pp. 19341940, Dec.
1992.
[7] S. Kim et al., An 800 Mbps multi-channel CMOS serial link with 3
oversampling, in Proc. IEEE Custom Integrated Circuit Conf., 1995,
pp. 451454.
[8] I. Young et al., A PLL clock generator with 5 to 110 MHz lock range for
microprocessors, IEEE J. Solid-State Circuits, vol. 27, pp. 15991607,
Nov. 1992.
[9] H. Notani et al., A 622-MHz CMOS phase-locked loop with prechargetype phase frequency detector, in Proc. Symp. VLSI Circuits, June 1994,
pp. 129130.
700
784
(a)
I. INTRODUCTION
Fig. 2.
785
trol voltages generated from two main loop charge pumps. The
multiplexer selects one of four clocks as
and this clock
feeds the clock buffer whose function is to convert low swing to
full CMOS-level as well as provide the chip-wide output clock,
. The
drives the phase detector which compares
it to the reference clock. The output of the phase detector is used
by two charge pumps and four loop filters to control the delay
time of each main loop VCDL. Four-to-one clock switching
is implemented by the window finder and the state decoder
block. The window finder monitors the boundary where the seis switched and forces the state decoder to update
lected
the two-bit selection code at the switching event. The selection
code not only controls the clock selection at the multiplexer but
changes the configuration of two charge pumps and four loop
filters to accommodate the clock switching. Duty cycle correction (DCC) is employed to remove the duty cycle imperfections
and the output clock
. Fiof the input clock
and
, can
nally, although two input clocks,
be merged into one clock input, lower jitter clock source is pre, if possible, since it determines the jitter
ferred as the
characteristics of the whole DLL.
In this architecture, the clock selection scheme enables the
output clock to cover the entire phase range (modulo ). Furthermore, seamless clock switching is possible by optimizing
the main loop VCDL delay control scheme. Moreover, the phase
locking is achieved by fully analog control in all loops, so that
we can apply low-skew and low-jitter techniques, established in
conventional DLLs.
C. Reference Loop Design
The objectiveness of the reference loop is to provide quadrature clocks to the main loop. Since the main loop uses these
multiphase clocks as references, the phase distribution in the
output clocks should be preserved against a possible harmonic
lock. The reference loop phase detector depicted in Fig. 3(a) has
the capability to detect and escape up to the second harmonic
lock. This design is made of two level-sensitive AND/NAND logic
which requires 45 and 90 clocks as well as 0 and 180 clocks.
At one period lock, clocks and UP/DN output waveforms are
shown in Fig. 3(b). The phase detector asserts their UP and DN
outputs for equal duration due to 45 clock in order to avoid a
dead-zone problem, although the phase offset of the reference
loop gives negligible effects on the offset of the main loop output
clock. At the second harmonic lock as shown in Fig. 3(c), the
phase detector detects that the loop is in the harmonic lock due
to 90 clock and asserts only UP output to escape the harmonic
lock. By limiting the delay range of a delay line, there is no possibility of harmonic lock over third since the reference loop is
composed only of delay cells with no additional delay elements
such as the clock buffer.
D. Main Loop Design
The main loop design is focused on the selection control and
delay control of the main loop VCDL to achieve the infinite
delay range by using four finite-length VCDLs. Fig. 4(a) shows
the conceptual timing diagram of the main loop VCDL selection
clock is selected as
, the
control. Assuming
786
Fig. 4. Selection control of the main loop VCDL. (a) Conceptual timing
diagram. (b) Block diagram of the control logic.
Fig. 3. Reference loop phase detector. (a) Block diagram . (b) Operation at 1
period lock. (c) Operation at 2 period lock.
787
Fig. 5. Delay control of the main loop VCDL. (a) Single control with other clocks fixed. (b) Differential control with same speed. (c) Differential control with
3 speed difference.
Fig. 6. Configuration of the main loop phase detector, charge pumps, and loop
filters.
Fig. 7.
to
clock, 01 state. In result, the proposed DLL covers
the entire phase range and remains at the lock state in any direction switching by optimizing the control schemes of multiple VCDLs. Therefore, since this architecture makes it possible to emulate the infinite-length VCDL by using multiple finite-length VCDLs, the DLL overcomes the problem of conventional DLLs, described by the limited delay range and the initial
phase relationship constraint.
III. LOW JITTER SCHEME AND DCC
A. Low-Jitter Scheme
The jitter performance of the DLL is degraded by various
noise sources, typically in the form of supply and substrate noise
in high speed and highly integrated circuits. To reduce the jitter,
the loop bandwidth should be set as high as possible but must
have an upper limit for stability issues. Thus, low-jitter DLL
designs strongly depend on the delay characteristics of a delay
line with supply-noise injection. In order to design the delay
line with low supply-noise sensitivity, the replica biasing for
the delay control must be considered in noisy environment. The
replica biasing circuit, which consists of a half-replica of a differential delay cell and an operational amplifier (op-amp), sets
the low swing level of the delay cell to the reference voltage,
. In the conventional replica biasing, the
tracks the
supply variation with the same amount. Unfortunately, this is
not the optimal solution. The variation of the op-amp gain and
788
Fig. 9. (a) Block diagram of the duty cycle corrector [3]. (b) Circuit diagram
of the proposed duty detection stage.
The sensitivity of
over
to process variations should
be analyzed to guarantee a reliable operation. For example, the
of the transensitivity to the threshold voltage variation
can be obtained by (3), shown at the bottom of the
sistor
means
of transistor
. The senpage. In (3),
with a
of 100 mV.
sitivity value is in the order of
Similar analyses with other process parameters also show that
is kept nearly constant under moderate
the predetermined
process variations. This replica biasing circuit is commonly applied to all VCDLs of the reference loop and the main loop to
achieve the low-jitter characteristics through the whole DLL.
B. Duty Cycle Corrector
The duty cycle of clock signals within the DLL deviates from
its ideal value of 50% due to various asymmetries in signal paths
and voltage offsets in an off-chip generated reference clock.
For applications in which the timing of both edges of the clock
is critical, a duty cycle corrector (DCC) is required to maximize timing margins. A DCC [3] in Fig. 9(a) is configured as
the error-voltage feedback with a corrector stage and a duty
detection stage. The duty detection stage outputs the differen), which is proportional to the
tial control voltage (
. This differduty cycle error of inputted clocks
ential control voltage then effectively introduces offset voltage
at the corrector stage to correct the
to clock inputs
duty cycle of output clocks.
(3)
789
Fig. 12.
Fig. 10. Simulated mismatch sensitivity characteristics of the DCC with the
proposed duty detection stage.
Fig. 11.
(a)
(b)
Fig. 13. Jitter histograms at 400 MHz. (a) Quiet supply. (b) Added 2.5-MHz
300-mV square-wave supply noise.
790
TABLE I
PERFORMANCE CHARACTERISTICS OF THE PROTOTYPE DLL
0.57 V (nMOS) and 0.55 V (pMOS). The gate-oxide thickness is 5.8 nm. Fig. 11 shows the layout of the prototype chip.
The active area of the DLL occupies 0.13 mm .
Waveforms depicted in Fig. 12 shows two-bit selection code
with the reference clock input grounded, while running the input
clock at its nominal frequency of 400 MHz. In this configuration, the main loop phase detector always asserts DN signals.
Therefore, the selection code is continuously updated in accordance with sequences of 00, 01, 10, and 11. This means
the infinite times rotation of the output clock throughout the full
0 360 range.
Fig. 13(a) and (b) shows the jitter histograms of the DLL
clock output at 400 MHz. Fig. 13(a) shows 6.7 ps RMS and
54 ps peak-to-peak jitter characteristics with a quiet power
supply. With a 300-mV 2.5-MHz square-wave supply noise, the
peak-to-peak jitter increases to 150 ps, as shown in Fig. 13(b).
The ratio of the peak-to-peak jitter to the RMS jitter is well
maintained in spite of supply-noise injection. Supply-noise
sensitivity is measured to be 0.32 ps/mV.
Table I summarizes the DLL performance characteristics.
The DLL operates from 150- to 600- MHz frequency range
with a 2.5-V supply. Static phase error between the reference
clock and the output clock of the DLL is less than 20 ps.
Operating at 400 MHz, the DLL dissipates 60 mW.
V. CONCLUSION
We have described a dual-loop DLL architecture that allows
the unlimited delay range by using multiple VCDLs. The
reference loop generates four evenly spaced clocks without
a possible harmonic lock. Clock selection in the main loop
enables the DLL to cover the entire phase range and seamless
clock switching is achieved by optimizing the main loop
VCDL delay range control. Thus, this architecture can emulate
the infinite-length VCDL with multiple finite-length VCDLs.
To obtain low supply-noise sensitivity, the low-jitter scheme
generates a reduced swing voltage compared to supply noise
for the delay compensation of a delay line. Finally, a duty cycle
corrector presents a high immunity to process mismatches with
the help of two stacked source-coupled pairs configuration.
A prototype fabricated using 0.25- m CMOS technology
[1] M. Johnson and E. Hudson, A variable delay line PLL for CPU-coprocessor synchronization, IEEE J. Solid-State Circuits, vol. 23, pp.
12181223, Oct. 1988.
[2] T. Yoshimura, Y. Nakase, N. Watanabe, Y. Morooka, Y. Matsuda, M.
Kumanoya, and H. Hamano, A delay-locked loop and 90 phase shifter
for 800-Mb/s double data rate memories, in Symp. VLSI Circuits Dig.
Tech. Papers, June 1998, pp. 6667.
[3] T. H. Lee, K. S. Donnelly, J. T. C. Ho, J. Zerbe, M. G. Johnson, and
T. Ishikawa, A 2.5-V CMOS delay-locked loop for an 18-Mb 500Mbyte/s DRAM, IEEE J. Solid-State Circuits, vol. 29, pp. 14911496,
Dec. 1994.
[4] S. Sidiropoulos and M. A. Horowitz, A semidigital dual delay-locked
loop, IEEE J. Solid-State Circuits, vol. 32, pp. 16831692, Nov. 1997.
[5] K. Minami et al., A 1-GHz portable digital delay-locked loop with infinite phase capture ranges, in ISSCC Dig. Tech. Papers, Feb. 2000, pp.
350351.
[6] Y.-J. Jung, S.-W. Lee, D. Shim, W. Kim, C.-H. Kim, and S.-I. Cho, A
low-jitter dual-loop DLL using multiple VCDLs with a duty cycle corrector, in Symp. VLSI Circuits Dig. Tech. Papers, June 2000, pp. 5051.
[7] M. Bazes, Two novel fully complementary self-biased CMOS differential amplifiers, IEEE J. Solid-State Circuits, vol. 26, pp. 165168, Feb.
1991.
[8] M. J. M. Pelgrom, H. P. Tuinhout, and M. Vertregt, Transistor matching
in analog CMOS applications, in IEDM Dig. Tech. Papers, Dec. 1998,
pp. 915918.
Yeon-Jae Jung was born in Korea in 1974. He received the B.S. and M.S. degrees from the School
of Electrical Engineering, Seoul National University,
Seoul, Korea, in 1997 and 1999, respectively, where
he is currently working toward the Ph.D. degree.
He has worked on architectures and CMOS circuits
for high-speed I/O interfaces. His current research interests include high-speed CMOS circuits and communication ICs.
791
Soo-In Cho was born in Seoul, Korea, in 1957. He received the B.S. degree in
electronics engineering from Seoul National University, Seoul, Korea, in 1979.
He joined the Semiconductor Research and Development Center, Samsung
Electronics Company, Ltd., Kyungki-Do, Korea, in 1979, where he was engaged
in the design of CMOS logic LSI. Since 1983, he has been working on MOS
dynamic memory design.
582
Abstract This paper describes a phase-locked loop (PLL)based frequency synthesizer. The voltage-controlled oscillator
(VCO) utilizing a ring of single-ended current-steering amplifiers
(CSA) provides low noise, wide operating frequencies, and operation over a wide range of power supply voltage. A programmable
charge pump circuit automatically configures the loop gain and
optimizes it over the whole frequency range. The measured PLL
frequency ranges are 0.3165 MHz and 0.3100 MHz at 5 V and 3
V supplies, respectively (the VCO frequency is twice PLL output).
The peak-to-peak jitter is 81 ps (13 ps rms) at 100 MHz. The chip
is fabricated with a standard 0.8-m n-well CMOS process.
Index TermsCMOS phase-locked loop, current-steering amplifier, current-steering logic, frequency synthesizer, low noise,
low voltage VCO.
I. INTRODUCTION
583
loop on the VCO can also be used for reducing the PLL
jitter; however, it is likely to cause glitches or overshoots
whenever the frequency transition mode is activated due to
complicated feedback loops [8]. Hence, it is not a suitable
approach for microprocessor applications, wherein a smooth
transition between frequencies is usually required. While the
second type of VCO rejects power supply noise well, the
frequency range of operation may not be sufficient for some
applications. To widen the frequency range of differential-pairbased VCOs, complex MOS resistors [7] can be used at the
cost of higher
supply and more complex design.
The VCO design in this work utilizes a simple CSA circuit
[1]. Fig. 3(a) shows a CSA cell which consists of a current
source, , and a pair of NMOS devices.
is the input device
and
is the load. When
is high,
turns on, sinking
the bias current , while
shuts off. Under this condition,
the on resistance of
defines the output low voltage,
.
When
is low,
turns off and
is steered to
.
Under this condition, the resistance of the diode-connected
defines the output high voltage,
. By varying the bias
current, , a current-controlled CSA-based ring oscillator is
formed with an output voltage swing of
(4)
(typically between 1 and 2 V)
Equation (4) indicates that
varies with
. Thus, the voltage swing of the CSA cell
increases correspondingly with frequency. This is a desirable
feature because the signal level improves at high frequency
when the power supply switching noise becomes worse. Since
the voltage swing is limited by the diode-connected
,
the current source always operates in the saturation region;
consequently, very small switching noise is generated. For an
n-well process, the PMOS current source can be guarded by its
own well and isolated from the noisy p-substrate. The current
source also buffers the output from
, thereby reducing
the noise injected from
to the output. Any ground noise
(coupled from other circuitry within the chip) is rejected by
the CSA as a common mode noise because both its output
and input are referred to the same ground. By referring the
charge pump, loop filter, V/I converter, VCO, and other analog
584
(a)
(b)
Fig. 3. (a) Current steering amplifier (CSA) cell. (b) VCO using a three-stage
CSA ring oscillator.
585
Fig. 6. Measured PLL output at 155 MHz after 5 ms delay, i.e., after 775 000 clock cycles. The rms jitter (standard deviation) is 28 ps as shown.
586
SUMMARY
OF
TABLE I
MEASURED PLL PERFORMANCE
Frequency Range
Period (Short-Term) Jitter at 100 MHz
Long-Term Jitter at 155 MHz
Supply Voltage Range
Output Duty Cycle
VCO Linearity
VCO Current Consumption
Crosstalk between 2 PLLs
0.3165 MHz
13 ps rms, 81 ps peak-to-peak
28 ps rms after 5 ms delay
2.5 V to 7 V
50%, VCO frequency is twice
PLL output
2% from 10200 MHz
500 A at 200 MHz
50 dB down
comparison of test results between the use of this PLL and the
original crystal clock showed no perceptible difference even
under high magnification. Fig. 7 shows a smooth frequency
transition of the PLL from 33 to 100 MHz. No frequency
glitches and overshoots were observed during the transition
time. Since there are two independent PLLs on the same
chip, crosstalk between the two PLLs was also measured.
The signal coupling between the two PLLs is at least 50 dB
down. The measured performance of the PLL is summarized
in Table I.
V. CONCLUSION
In this paper, we demonstrated the design of a fully integrated CMOS PLL circuit that achieves wide operating
frequency range and low jitters (both short-term and longterm) over a wide range of power supply voltage. The key
513
I. BACKGROUND
<6
<6
<6
<6
II. INTRODUCTION
This paper describes a fully integrated PLL-based clock
generator/phase aligner used for the POWER3 microprocessor.
The microprocessor is fabricated in IBM CMOS6S technology
and contains approximately 12 million transistors. With the microprocessor actively executing instructions, this PLL achieved
cyclecycle jitter of 10.0 ps rms, 80 ps P-P in its application
environment and 8.4 ps rms, 62 ps P-P with the microprocessor
in a reset state with a portion of the clock tree active.
A simplified block diagram of the PLL clock generator is
shown in Fig. 1. The external reference or BUSCLK enters
before
a receiver and is divided by two by divider stage
The inentering the phase/frequency detector (PFD) as
from divider
is compared to
ternal feedback signal
by the PFD, which generates an error signal
, which
is used by the charge-pump and filter network to control
the voltage-controlled oscillator (VCO). The output frequency
and is used as the main
of the VCO is divided by
processor clock (PCLK) after passing through four levels of
clock buffering in an H-tree clock distribution network. The
processor clock is passed through a delay-matching receiver
completing the feedback path.
before entering divider
Since at equilibrium the inputs of the PFD will be matched
in frequency (and phase), the processor-to-bus frequency ratio
which is equal to the ratio
is equal to the ratio
allowing integer or noninteger frequency synthesis by
changing divider ratios. Since the technique does not require
clock choppers [2], the duty cycle and phase alignment are
relatively insensitive to environment and process tolerances.
The output of the VCO is also connected to frequency divider
which is used for the L2 cache clock (L2CLK). Since
the processor-to-L2
clock-frequency ratio is also adjustable to integer or noninteger
ratios. Other phase-synchronous clocks may be designed in
similar fashion, and quadrature or interstitial clocks may be
created by a polarity change at the divider input. Using the
is equal to
structure of Fig. 1, the VCO frequency
times the processor clock frequency
For cases when
is even, the processor clock edges are generated from only
one VCO clock edge; hence a nearly ideal 50% processor
clock duty cycle may be achieved through its independence
from the VCO duty cycle.
III. PROCESS TECHNOLOGY
The microprocessor and integral clock generator PLL are
fabricated in a five-layer CMOS process with 0.4- m feature
514
TABLE I
CMOS6S PROCESS SUMMARY
515
(a)
(b)
(a)
(b)
516
(b)
Fig. 5. (a) Process compensation circuit. (b) Temperature compensation
circuit.
517
(a)
G. Common-Mode Control
It is possible for common-mode voltages to develop in
the filter from leakage, drift, or device mismatch. Since the
common-mode voltage can introduce frequency offsets in the
VCO or even inhibit operation for extreme cases, the circuit
shown in Fig. 8 was used in conjunction with the filter clamps
described earlier. The common-mode voltage of the filter
and
is sensed by generating currents proportional to
and summing them across a load device to produce
A differential amplifier compares
to a reference
, which is proportional to
voltage and generates a current
is mirrored by two
the common-mode voltage. The current
identical current sources, which bleed current from both filter
capacitors simultaneously without affecting the differential
voltage between them. The maximum drain currents for this
structure, which corresponds to the case when both clamps
have activated, are approximately 16 A. For typical cases
where the common-mode voltage is below 600 mV, the bleed
currents are 1 A. Stability of the network is assured by
heavy dominant-pole compensation.
H. Voltage-Controlled Oscillator
The VCO design is based upon a delay-interpolating ring
oscillator structure [9][11], as shown in Fig. 9. In contrast
to the current-starved and current-modulated VCOs, which
are very commonly used for microprocessor clock generators,
delay-interpolating VCOs have relatively low-to-moderate
VCO gains and are well suited to fully differential control
and signal path circuit implementations. The lower VCO
gain of the delay-interpolating VCOs produces significantly
less jitter due to coupled noise than higher gain structures.
The limited operating frequency range for delay-interpolating
VCOs, which must be less than 2 : 1 to ensure monotonicity,
may be effectively augmented by selecting suitable divider
ratios or by adding programmability to the VCO signal paths.
The frequency limits of the VCO are determined by the
longest and shortest path delays through the structure. Fig. 9
composed
shows an example high-frequency limit of period
of three delay units and one mixer unit, and a low-frequency
518
(a)
(b)
Fig. 11.
Cyclecycle processor clock jitter: (a) quiet processor and (b) active processor.
limit of period
composed of six delay units and one mixer
unit. These frequency limits also affect the VCO gain (for
a given mixer design) as well as the center frequency. The
frequency limits may be independently controlled using the
multiplexers shown in Fig. 9, allowing flexible control of the
VCO operating range and greater than ten-to-one adjustment
range for VCO gain.
The delay elements and mixer designs are based upon
PMOS source-coupled pair differential amplifiers with NMOS
load networks [Fig. 10(a) and (b)] which allow voltagecontrolled swing adjustment through effective load-line
The high impedance
translation by adjusting the voltage
provided by the current source improves the supply noise
rejection for the source-coupled pair, and the N-well improves
the isolation to the p bulk substrate noise. The variation of
the threshold voltage due to bulk effect is eliminated using
bulk-to-source biasing throughout the structure. Sensitivity of
the VCO to low repetition rate, 100-mV steps on VDD and
AVDD is 0.418 ps/mV. Center-frequency common-mode voltage sensitivity is 3.5% over the full input range dictated by
519
VI. CONCLUSION
REFERENCES
[1] I. Young, M. Mar, and B. Bhushan, A 0.35 m CMOS 3-880 MHz
PLL N /2 clock multiplier and distribution network with low jitter for
microprocessors, in ISSCC Dig. Tech. Papers, Feb. 1997, pp. 330331.
[2] J. Alvarez, H. Sanchez, G. Gerosa, and R. Countryman, A widebandwidth low-voltage PLL for powerPC microprocessors, IEEE J.
Solid-State Circuits, vol. 30, pp. 383391, Apr. 1995.
[3] J. Cho, Digitally-controlled PLL with pulse width detection mechanism
for error correction, in ISSCC Dig. Tech. Papers, Feb. 1997, pp.
334335.
[4] I. Young, J. Greason, and K. Wong, A PLL clock generator with 5110
MHz of lock range for microprocessors, IEEE J. Solid-State Circuits,
vol. 27, pp. 15991607, Nov. 1992.
[5] V. von Kaenel, D. Aebischer, C. Piguet, and E. Dijkstra, A 320 MHz,
1.5 mW at 1.35 V CMOS PLL for microprocessor clock generation,
in ISSCC Dig. Tech. Papers, Feb. 1996, pp. 132133.
[6] P. E. Gronowski, P. Bannon, M. Bertone, R. Blake-Campos, G.
Bouchard, W. Bowhill, D. Carlson, R. Castelino, D. Donchin, R.
Fromm, M. Gowan, A. Jain, B. Loughlin, S. Mehta, J. Meyer, R.
Mueller, A. Olesin, T. Pham, R. Preston, and P. Rubinfeld, A 433
MHz 64b quad-issue RISC microprocessor, in ISSCC Dig. Tech.
Papers and Slide Supplement, Feb. 1996, pp. 222223.
[7] Z. Zhang, H. Du, and M. Lee, A 360 MHz 3V CMOS PLL with 1
V peak-to-peak power supply noise tolerance, in ISSCC Dig. Tech.
Papers, Feb. 1996, pp. 134135.
[8] D. H. Wolaver, Phase-Locked Loop Circuit Design. Englewood Cliffs,
NJ: Prentice-Hall, 1991, pp. 5961.
[9] J. F. Ewen, A. Widmer, M. Soyuer, K. Wrenner, B. Parker, and H.
Ainspan, Single-chip 1062 Mbaud CMOS transceiver for serial data
communication, in ISSCC Dig. Tech. Papers, Feb. 1995, pp. 3233.
[10] B. Lai and R. Walker, A monolithic 622 Mb/s clock extraction and data
retiming circuit, in ISSCC Dig. Tech. Papers, Feb. 1991, pp. 144145.
[11] S. K. Enam and A. Abidi, NMOS ICs for clock and data regeneration in gigabit-per-second optical fiber receivers, IEEE J. Solid-State
Circuits, vol. 27, pp. 17631774, Dec. 1992.
[12] D. W. Boerstler and K. Jenkins, A phase-locked loop clock generator
for a 1 GHz microprocessor, in Symp. VLSI Circuits Dig. Tech. Papers,
June 1998, pp. 212213.
[13] J. Silberman, N. Aoki, D. Boerstler, J. Burns, S. Dhong, A. Essbaum, U.
Ghoshal, D. Heidel, P. Hofstee, K. Lee, D. Meltzer, H. Ngo, K. Nowka,
S. Posluszny, O. Takahashi, I. Vo, and B. Zoric, A 1.0 GHz singleissue 64b PowerPC integer processor, in ISSCC Dig. Tech. Papers,
Feb. 1998, pp. 230231.
[14] V. von Kaenel, D. Aebischer, R. van Dongen, and C. Piguet, A 600
MHz CMOS PLL microprocessor clock generator with a 1.2 GHz
VCO, in ISSCC Dig. Tech. Papers, Feb. 1998, pp. 396397.
726
AbstractThis paper describes a delay-locked loop (DLL) circuit having two advancements, a dual-loop operation for a wide
lock range and programmable replica delays using antifuse circuitry and internal voltage generator for a post-package skew calibration. The dual-loop operation uses information from the initial
time difference between reference clock and internal clock to select
one of the differential internal loops. This increases the lock range
of the DLL to the lower frequency. In addition, incorporation of
the programmable replica delay using antifuse circuitry and the
internal voltage generator allows for the elimination of skews between external clock and internal clock that occur from on-chip
and off-chip variations after the package process. The proposed
DLL, fabricated on 0.16- m DRAM process, operates over the
wide range of 42400 MHz with 2.3-V power supply. The measured
results show 43-ps peak-to-peak jitter and 4.71-ps rms jitter consuming 52 mW at 400 MHz.
Index TermsDelay-locked loop, dual-loop operation,
high-speed DRAM, programmable replica delay, skew calibration.
I. INTRODUCTION
HE DELAY-LOCKED loop (DLL) has become an indispensable component in high-speed synchronous DRAMs
such as DDR SDRAM. Since the DLL determines the operation range of the DRAM and has a large effect on the data valid
window, a high-performance DLL that has a wider range and
lower jitter is essential for increasing the speed of DRAM. A
DLL can be categorized into either of two types, the digital
and the analog type. Although the digital DLL has robustness,
process portability, and design simplicity, it is difficult to use on
a very high-bandwidth DRAM (over 600 Mb/s) due to poor jitter
performance [1], [2]. Therefore, in spite of sensitivity on process
variation, the analog DLL, which ensures lower jitter by the continuous characteristics of analog operation, is more suitable in
the higher speed DRAM. In addition to the jitter performance,
another important issue of the DLL is the lock range. Process
variation makes the lock range of the analog DLL more limited
and results in a narrower operation range of the DRAM. The
limited range of the DLL limits the flexibility of implementation
on memory applications and increases test costs in mass production. For solving the limited lock-range problems, various types
Manuscript received October 2, 2001; revised January 29, 2002.
S. J. Kim, S. H. Hong, J. H. Cho, P. S. Lee, J. H. Ahn, and J. Y. Chung
are with the Advanced Design Team, Memory Research and Development,
Hynix Semiconductor Inc., Ichon-si, Kyoungki-Do 467-701, Korea (e-mail:
sejun.kim@hynix.com).
J.-K. Wee is with the Department of Electronics Engineering, Hallym University, Chunchun-si, Kangwon-Do 200-702, Korea.
Publisher Item Identifier S 0018-9200(02)04934-X.
727
(a)
(b)
(c)
Fig. 1.
(a) Block diagram and delay characteristic of conventional DLL. Cases of lock failure at initial control voltage: (b) initial V
.
=V
=V
728
(a)
Fig. 3.
(b)
(c)
Fig. 2. (a) Concept of proposed DLL. Two cases of loop selection according to
the initial time difference: (b) (1=2)
and (c) 0
(1 2)
.
< = T
<T
<T
<
729
(a)
(a)
(b)
Fig. 5. (a) Delay cell of the replica bias circuit. (b) Cross-sectional view of
bias line in the proposed DLL.
(b)
(c)
Fig. 4. (a) Case where the clock buffer produces an incorrect initial time
difference without reset controller. (b) Schematic and (c) timing diagram of the
reset controller.
shown in Fig. 4(a). Fig. 4(b) shows the schematic of the reset
controller. When IRESET is asserted, the clock buffer produces
the three clocks (ICLK, ICLKB, and REFCLK). ICLK and
ICLKB, which are the differential clocks, are input to VCDL
and REFCLK is input to the phase detectors as the reference
clock. Fig. 5(a) shows the delay cell and the biasing scheme
used in the proposed DLL. To reduce the supply voltage sensitivity, the VCDL is implemented as a series of the differential
delay cell with symmetric loads, as shown in Fig. 5(a) [9]. VBP
is a control voltage from the loop filter and VBN is generated
by the replica bias circuit [boxed area in Fig. 5(a)]. The replica
bias circuit makes the constant swing independent of VBP,
which provides better jitter performance and a wider operation
range [10]. For shielding the analog biases from the external
noise, VBP and VBN are physically enclosed with inter- and
intra-layers as shown in Fig. 5(b). This shielding technique
improves the jitter characteristic. DCLK and DCLKB in Fig. 3,
which are converted from a small swing output of VCDL to a
full swing output by amplifiers, each forms negative feedback
loops and also are changed to LCLK and LCLKB by replica
delays. LCLK and LCLKB are input to the phase detectors
730
(a)
(a)
(b)
(c)
2T
<T
< = 2T
<T
<
(b)
Fig. 7.
off-chip such as output load, clock slew-rate, and so on, may result in unavoidable skew. There are two methods for skew elimination, wafer trimmed by laser [12] and post-package tuning by
antifuse [13]. The wafer-level tuning is not effective because the
wafer tester is not precise, and the off-chip condition cannot be
considered. Although post-package tuning by antifuse is more
practical, the previous post-package method has some problems.
The previous post-package method uses external high voltage
through pins for rupturing the antifuse. Providing sufficient high
voltage for rupturing the antifuse can cause physical damage to
other circuits connected to the pin and can negatively affect the
reliability of the device. To remove the high-voltage problem,
the antifuse programming scheme by internal negative voltage
is used [14]. Fig. 9 shows the programmable replica delay circuit. The circuitry has three functional parts, the replica delay
of clock, the replica delay of the output buffer, and the tun-
Fig. 9.
731
Fig. 10. Antifuse circuit for skew calibration and SEM photograph of the
antifuse.
Fig. 11.
Fig. 12.
732
(a)
(b)
Fig. 13. Synchronized waveforms at (a) 42 MHz and (b) 400 MHz.
(a)
Fig. 14.
(b)
Measured jitter characteristics at 400 MHz in (a) a quiet supply and (b) with injected 1-MHz
733
(a)
(b)
Fig. 15. (a) Measured skew at 400 MHz before skew calibration and after skew calibration. (b) Range (full range not displayed for limitation of tester) and
resolution of skew calibration.
TABLE I
PERFORMANCE CHARACTERISTICS OF THE PROPOSED DLL
clock and the internal clock of the DLL. Also, an improved skew
calibration method demonstrated a practical post-package skew
calibration using the antifuse circuitry and the internal negative
voltage generator. The proposed DLL, fabricated on 0.16- m
DRAM process, achieves a wide range from 42 to 400 MHz,
and 43 ps peak-to peak jitter and 4.71 ps rms jitter at 400 MHz
that is applicable to high-speed DRAMs.
ACKNOWLEDGMENT
The authors are grateful to H. Ryu and Dr. Y. Kim for helpful
discussion about COB-type PCB design.
REFERENCES
734
[10] I. A. Young et al., A PLL clock generator with 5 to 110 MHz of lock
range for microprocessors, IEEE J. Solid-State Circuits, vol. 27, pp.
15991607, Nov. 1992.
[11] F. Herzel et al., A study of oscillator jitter due to supply and substrate
noise, IEEE Trans. Circuits Syst. II, vol. 46, pp. 5662, Jan. 1999.
[12] T. Hamamoto et al., A skew and jitter suppress DLL architecture for
high-frequency DDR SDRAMs, in Symp. VLSI Circuits Dig. Tech. Papers, June 2000, pp. 7677.
[13] S. Kuge et al., A 0.18-m 256-Mb DDR-SDRAM with low-cost postmold-tuning method for DLL replica, in IEEE Int. Solid-State Circuits
Conf. (ISSCC) Dig. Tech. Papers, Feb. 2000, pp. 402403.
[14] K. S. Min et al., A post-package bit-repair scheme using static latches
with bipolar-voltage programmable antifuse circuit for high-density
DRAMs, in Symp. VLSI Circuits Dig. Tech. Papers, June 2001, pp.
6768.
architectures.
1137
I. INTRODUCTION
1138
equation is required and used in the proposed AAPLL. The update equation is given by (2), in terms of the loop gain
that is proportional to the loop bandwidth.1
(2)
1Here,
the loop bandwidth and the loop gain have the following relationship:
=f .
=W
Fig. 3.
1139
Fig. 4.
Fig. 5.
of the capacitor
exceeds the discharging rate. Hence the
of the capacitor
and the pumping
capacitor voltage
current in the charge pump increase. As a result, the phase detector gain increases and so does the loop bandwidth. On the
other hand, when the phase error signal is off for the most part
1140
conventional differential VCOs with a current bias, this VCO allows the AAPLL to operate under a single 1.5-V power supply,
consuming 1.5 mA. Because the output signal of the VCO swings
rail-to-rail, no additional level shifter with a carefully designed
replica bias circuit is required to generate CMOS level outputs.
and
, sharpens the
The latch, configured with pMOS ,
edge of the output signal so that the added noise has little chance
to be converted as jitters. Eventually this latch helps the reduction
process of the VCO jitter [9], [10].
C. Duty-Cycle Corrector
Maintaining a 50% duty-cycle ratio for a clock signal is
extremely important in most high-speed clock recovery and
clock generation applications because several systems, such
as double-data rate (DDR) SDRAMs and pipelined microprocessors, use negative transition edges of a clock signal
in order to increase total system throughputs. This is often
achieved by a VCO running at twice as high as the desired clock
frequency, and then dividing the VCO frequency by 2. Other
approaches use a feedback-type duty-cycle corrector. Since
precise placement of the falling edge between two successive
rising edges of the VCO output signal is generally controlled by
an additional feedback loop, the duty-cycle correctors require
an extra training period to stabilize the feedback loop.
The AAPLL uses a feed forward-type duty-cycle corrector
instead of the feedback type in order to eliminate the extra feedback hardware and the training period, as shown in Fig. 9(a).
The duty cycle corrector utilizes multiphase signals generated
from a multistage differential VCO. The signal in Fig. 9(b)
and
selected from the multiphase signals turns on MOS ,
1141
(a)
A. Stability
(a)
Fig. 9. Feed forward-type duty-cycle corrector. (a) Duty-cycle corrector
schematic. (b) Conceptual diagram of the correcting operation.
1142
Fig. 11.
Fig. 12.
(5)
,
,
, and
are
, the
Here,
VCO gain, the nMOS threshold voltage, and the size of a resistor
is the
of the loop filter in Fig. 3, respectively. Here,
,
. The stability limit for the bandwidth voltage,
size of
which is one of the observable values in the measurement setup,
and resistor
is visualized in Fig. 11 for various capacitor
values. The figure shows that all the sequences
and
are within the stable region for the various resistor and
capacitor values.
(a)
B. Output Jitter
Recently, it was reported that a CP-PLL has an optimum loop
bandwidth that generates minimum jitter in the steady state [11].
A clean tone, that is assumed to have only noise floor and no
random walking phase noise, is used as a reference signal for the
jitter derivation. Because the AAPLL eventually achieves the
steady state locking with a clean reference signal like other conventional PLLs, the output cycle-to-cycle jitter of the AAPLL
can be calculated by
(b)
Fig. 13. Simulation results for the locking behavior of the AAPLL and
conventional ones. (a) Fixed narrow-bandwidth PLL. (b) Fixed wide-bandwidth
PLL.
(6)
,
,
, and
are the internal jitter
Here,
from the VCO, the jitter of the input signal, the rms value of
the charge-pump current variation, and the rms value of VCO
control voltage noise in the steady state, respectively.
C. Behavioral Simulation of a Locking Feature
Closed-form analysis of locking behaviors for the AAPLL is
difficult because of its nonlinear operation. In this paper, a simulation-based approach like the Monte Carlo Method is used instead. The AAPLL is modeled in a SPICE circuit simulator and
extensively tested by the circuit simulator. Fig. 12 shows the simulation setup for the AAPLL. Fig. 13 compares the simulated
locking behavior of the AAPLL with that of a conventional PLL.
The bandwidths of the conventional PLL are selected to have two
typical values. One is optimized for the initial locking, and the
other for the steady-state tracking. The gray line in Fig. 13(a) indicates an incoming signal in the phase domain. The solid line
in the figures shows the phase of the AAPLL output signal from
initial locking to steady-state tracking. The phase variation of
the conventional PLL optimized for steady-state tracking with a
narrow bandwidth is shown in the same figure as a dashed line.
1143
Fig. 17.
Fig. 18. Experimental results of the locking for a 180220-MHz input signal
by four steps.
Fig. 15.
Fig. 16.
In Fig. 13(b), the phase change of the conventional PLL optimized for initial locking is also shown as a dashed line. This simulation result gives several characteristics of the AAPLL. The
AAPLL controlled by the recursive algorithm achieves fast lock
in the initial locking period, comparable to the speed obtained
from a wide-bandwidth PLL because the consecutive error signals rapidly increase the loop bandwidth of the AAPLL. In the
steady-state tracking mode, the AAPLL substantially rejects the
input jitter due to the narrower loop bandwidth.
V. EXPERIMENTAL RESULTS
The AAPLL is fabricated in a 0.6- m single-poly triple-metal
n-well CMOS process [12]. The die size for the AAPLL is
0.11 mm . The total power consumption is less than 15 mW
with a single 3-V supply. A microphotograph of the AAPLL
1144
Fig. 20. 50% duty-cycle correction operation over the entire frequency range.
Fig. 22.
work.
TABLE I
AAPLL CHARACTERISTICS SUMMARY
Fig. 21.
VCO linearity.
1145
using analog technique and hence the required die size and the
power consumption are minimal.
APPENDIX
As shown in Fig. 2, the OR gate gives the control signal for
the switch according to the phase error signal in the bandwidth controller. When the phase error of the signal is high, the
controller signal from the OR gate feeds current to the bandand the voltage across the capacitor inwidth capacitor
. As a result, the bandwidth
creases at a constant rate
increases proportional to the normalized phase
voltage
. After the charging process, the controller signal
error
from the OR gate disconnects the path from the current source
discharges
and connects to the resistor. So the capacitor
. The switching action occurs every
through the resistor
clock cycle period.
of the bandwidth capacitor at time
The voltage
can be written as
(7)
where
at time
Joonsuk Lee (S99) received the B.S. and M.S. degrees in electrical engineering and computer sciences
from Korea Advanced Institute of Science and Technology (KAIST), Taejon, Korea, in 1995 and 1997,
respectively. Since 1997 he has been working toward
the Ph.D. degree at the same university.
From 1999 to 2000, he was with IBM Microelectronics, Boston, MA, as an Analog and Mixed Signal
Designer involved in a high performance sigmadelta
ADC/DAC project with Motorola, Lowell, MA. His
research interests include PLL/DLL, timing recovery
algorithms, high-speed SDRAM interface, and LAN and mixed-mode signal
processing technique for telecommunication ICs.
Mr. Lee is the Gold Medal winner of the Human-Tech Thesis Prize from Samsung Electronics Co. Ltd. in 1997, the Gold Medal winner of the Chip Design
Contest from LG Semicon Co. Ltd. in 1998, and the Gold Medal winner of the
Integrated Design Center (IDEC) Award in 1998.
(8)
,
. In the initial locking mode,
the AAPLL does the locking operation based on (8). Once
the AAPLL finished the phase and frequency locking, the
. In this case, the forgetting
phase error is far less than
factor and the proportional coefficient can be be replaced by
and
.
Here
REFERENCES
[1] J. Dunning et al., An all-digital phase-locked loop with 50-cycle lock
time suitable for high-performance microprocessors, IEEE J. SolidState Circuits, vol. 30, pp. 412422, Apr. 1995.
[2] B. Kim, D. N. Helman, and P. R. Gray, A 30-MHz hybrid analog/digital
clock recovery circuit in 2-m CMOS, IEEE J. Sold-State Circuits, vol.
25, pp. 13851394, Dec. 1990.
[3] M. Mizuno et al., A 0.18 m CMOS hot-standby phase-locked loop
using a noise immune adaptive-gain voltage-controlled oscillator,
ISSCC Dig. Tech. Papers, pp. 268269, Feb. 1995.
[4] G. Roh, Y. Lee, and B. Kim, An optimum phase-acquisition technique
for charge-pump phase-locked loops, IEEE Trans. Circuit Syst. II, vol.
44, pp. 729740, Sept. 1997.
[5] B. Chun, Y. Lee, and B. Kim, Design of variable loop gain of dual-loop
DPLL, IEEE Trans. Commun., vol. 45, pp. 15201522, Dec. 1997.
[6] P. F. Driessen, DPLL bit synchronizer with rapid acquisition using
adaptive Kalman filtering techniques, IEEE Trans. Commun., vol. 452,
pp. 26732675, Sept. 1994.
[7] B. Kim, Dual-loop DPLL gear-shifting algorithm for fast synchronization, IEEE Trans. Circuits Syst. II, vol. 44, pp. 577586, July 1997.
632
I. INTRODUCTION
633
634
Fig. 3. Complementary delay line with inverter delay elements for improved phase resolution.
B. Delay-Line Improvements
To overcome some of these limitations, we developed a
complementary delay line as shown in Fig. 3 for our DLL.
In this architecture, two parallel delay lines with weak cross
coupling are driven by complementary input signals ClkIn and
635
Fig. 5. Phasor diagram with phasors of signals from the taps of a complementary delay line with one inverter delay
50 :
indicating the first 180 of delay in the delay lines. The first
state transition in the EOC code indicates the first true tap
from the delay line that provides a signal with phase that
lags the phase of the signal from Tap 1 by more than 180
With this information, the DLL logic knows when to switch
between the true and complement taps of the delay line to
ensure full coverage of all 360 of phase space, with phase
steps of at most one inverter delay. Use of the EOC code also
prevents negative phase steps in the phase-transfer function as
taps are successively selected from the delay line. This allows
the complementary delay lines to provide infinite, monotonic
phase range for the DLL. The clocking signal for the EOC
detector, SampClk, is synchronized to the signal from Tap 1
by a replica timing network (not shown).
To illustrate the principle of infinite phase generation using
the EOC code with this delay-line scheme, refer to Fig. 5,
which shows a phasor diagram of the signals from the first
five true and complement taps of a complementary delay line
like the one shown in Fig. 3. The figure assumes that the
PVT conditions and operating frequency are such that the
propagation delay of each inverter stage is equal to 50 of
phase. In the figure, the solid lines correspond to signals from
the true taps, while dashed lines correspond to signals from
the complement taps. Because Tap 5 delivers a signal that is
delayed by 200 from the signal at Tap 1, the EOC detectors
thermometer code would indicate that Tap 5 is the first true
tap to provide a signal with phase beyond 180 relative to the
signal from Tap 1. With this information, the DLL knows to
switch between the true and complement taps after four stages.
Although the delay-line improvements discussed above reduced the required power and area of the delay line, improved
its jitter accumulation performance, enabled infinite phase
range, and improved the available phase resolution by a factor
of two, this phase resolution was still not good enough to
meet the requirements of our memory system. In the 0.4- m
process we used, the propagation delay of one inverter over all
anticipated PVT conditions varied from 100 to 300 ps. This
is much larger than the worst case phase step specification of
50 ps. Therefore, to ensure compliance with this specification,
the DLLs phase resolution needed to be improved by at least
six times over what the delay line provided.
To solve this problem, we used inverter phase blending. A simple, single-stage phase-blender circuit is shown
in Fig. 6(a). This circuit receives two phase-adjacent input
and
, which are separated in phase by one
signals,
inverter delay. The phase blender directly passes these two
and
signals with a simple delay to produce output signals
However, it also uses a pair of phase-blending inverters to
interpolate between these two input signals to produce a third
, having a phase between that of
and
output signal,
This effectively doubles the available phase resolution.
However, it is not sufficient to use equal-sized inverters
for the phase blending. Fig. 6(b) illustrates a simple model
[12] used for determining the ideal relative sizes of the two
lies
phase-blending inverters to ensure that the phase of
and
The model approximates
directly between that of
the two inverters with two simple switched current sources
sharing a common resistancecapacitance (RC) load. For two
the model
rising edge input signals separated in time by
yields the equation
(2)
636
(a)
(b)
(c)
(d)
(e)
Fig. 6. Phase blending for phase-resolution improvement. (a) Single-stage phase-blender circuit, (b) simple model of phase-blending inverters, (c) plot of
WA =(WA + WB ) = 0:50, (d) phase-blender output signal edges for w = 0:50, and (e) phase-blender
signal voltages in the simple model for w
output signal edges for w = 0:60:
637
638
(a)
(b)
Fig. 9. (a) A 16 : 1 duty-cycle correcting multiplexer circuit. (b) Duty-cycle correction control circuit.
Fig. 10.
639
(a)
(b)
Fig. 11. Test-chip micrograph showing on the left side (a) the analog DLL
of [6] and on the right side (b) the new digital DLL integrated into identical
interface cells.
640
(a)
Fig. 12.
(b)
Measured transmit eye diagrams at 3.3 V and 400 MHz of the high-speed interface cells with (a) the analog DLL of [6] and (b) the new digital DLL.
V. MEASURED PERFORMANCE
A. Test Chip
Both the digital DLL presented here and an implementation
of the analog DLL of Donnelly et al. [6] were integrated into
identical high-speed CMOS interface cells on opposite sides
of a single test chip. A micrograph of this test chip is shown in
Fig. 11. The test chip I/O was laid out symmetrically so that
either interface cell could be tested on the same hardware by
simply removing the test chip from the test socket, rotating
it 180 and reinserting it into the socket. This allowed a
true side-by-side comparison of the two DLLs operating in a
system. The test-chip circuits were fabricated using a standard
0.4- m, 3.3-V CMOS process with 0.65-V threshold voltages.
B. Test Results
Unless indicated otherwise, all test results described in this
section were measured with the analog and digital DLLs
operating in their respective high-speed interface cells at 3.3 V
and 400 MHz (800 Mb/s/pin) using the same test vectors.
Additionally, the test chip included noise-generator circuits,
which produced digital switching noise during the testing of
both interfaces.
Fig. 12(a) and (b) shows eye diagrams of the two interfaces
with the analog and digital DLLs, respectively. The diagrams
indicate the output timing performance of the interface cells
in the test system. Although the interface with the analog
DLL provided slightly better timing performance, 320 ps pp
versus 380 ps pp for the interface with the digital DLL, the
performances of both interfaces (and therefore, both DLLs)
were comparable. This is surprisingly good considering the
extensive use of poor PSRR elements, such as inverters, in
the signal path of the digital DLL. (Note: I/O circuit dutycycle distortion produced the unequal eyes in both diagrams.
This is unrelated to the DLLs.)
Fig. 13(a) and (b) shows receive shmoo diagrams for the
two interfaces with the analog and digital DLLs, respectively.
The diagrams indicate the CMOS interfaces valid timing windows for receiving data. On the diagrams, the -axis is supply
4.0 V) while the -axis indicates input
voltage (2.5 V
Mb/s
ns).
data positioning along a bit period (
The normal data position is in the center of the bit period. A
black dot in the diagram indicates incorrectly received data for
Ideally, the window
that combination of bit position and
should be entirely white, but realistically, it is limited by jitter
from the DLL and other sources. Therefore, this test measures
the amount of tolerable skew on the input timing over a range
of supply voltages. Although the interface with the analog DLL
delivers better timing performance than the interface with the
digital DLL (1.02 versus 0.92 ns), both meet the component
specification of 0.85 ns.
Fig. 14 is a circle plot of the measured phase of the DLLs
output signal ClkOut, illustrating the DLLs ability to provide
infinite phase range. The -axis indicates delay [or phase, as in
(1)] of the ClkOut signal relative to a fixed 400-MHz signal.
The -axis indicates cycle count. These data were measured by
probing the on-chip DLL output signal (ClkOut) and forcing
the DLLs phase-detector output low. This caused the DLLs
output phase to continually advance over time. The term circle
plot is used because this diagram is equivalent to sweeping a
phasor that represents the phase of ClkOut around the phase
plane, thereby drawing a circle in the phase plane. Because
the phase of ClkOut is measured relative to a fixed 400-MHz
ns
signal, the plotted delay appears modulo 2.5 ns, where
641
(a)
(b)
Fig. 13. Measured shmoo diagrams showing the 400-MHz receive timing windows of the high-speed interface cells with (a) the analog DLL of [6]
and (b) the new digital DLL.
Fig. 14.
Measured circle plot illustrating the infinite phase transfer characteristic of the digital DLL.
642
(a)
Fig. 15.
(b)
TABLE I
ANALOG AND DIGITAL DLL PERFORMANCE SUMMARY AT 3.3 V
AND
400 MHz
uses less power and area, and provides better timing performance (smaller long-term jitter) and phase resolution (smaller
maximum phase step), both DLLs enable the interface cells to
meet the component requirements when operating in the test
system. Additionally, the digital DLL has a higher maximum
operating frequency, works at lower supply voltages, and
requires much less effort to port to other processes (one versus
four man-months).
Fig. 15(a) and (b) shows plots of measured DLL power
V and measured DLL power
versus frequency at
MHz, respectively. Although
versus voltage supply at
both plots show that the digital DLL dissipated more power
than the analog DLL for all measured conditions, the plots illustrate the different characteristics of the power consumed by
the two DLLs. As mentioned earlier, the power of both DLLs
is distributed between IV power in the constant-current stages
and CV f power in the CMOS stages. The curves in Fig. 15(a)
show that the digital DLLs power dissipation has a greater
dependence on frequency than does the analog DLLs power.
The curves in Fig. 15(b) show that the digital DLLs power
dissipation has a predominantly square-law dependence on
supply voltage, whereas the analog DLLs power dissipation
has a mixed square-law and linear dependence. These trends
confirm that the power of the analog DLL has a relatively
higher IV term, whereas the power of the digital DLL has a
VDD = 3:3 V and (b) as a function supply voltage for f = 400 MHz.
relatively higher CV f term. This indicates that digital DLLs
have the potential for providing better power scaling than
analog DLLs as supply voltages decrease in the future.
Finally, we have shown in Table I and in Fig. 15(b) that
the digital DLL operates at lower supply voltages than the
analog DLL. Although the operation of the digital DLL was
limited to 1.7 V, this limitation was due to our use of several
analog elements in the digital DLL (i.e., it was a mostly digital
DLL). The digital DLL used an analog clock amplifier, two
analog duty-cycle error detectors (see Fig. 10), and an analog
quadrature phase detector (in a second loop, not shown). Using
an analog design for these circuit blocks in the digital DLL
was faster to implement without preventing evaluation of the
key digital blocks in the DLL, but their use determined the
minimum supply voltage of the digital DLL.
VI. CONCLUSION
We have described the architecture of a portable digital
DLL and demonstrated that it provides jitter performance
comparable to an analog DLL when fabricated in the same
3.3-V, 0.4- m standard CMOS process. Several circuits were
developed to enable the DLL to provide very fine phase
resolution, infinite phase range, and good duty-cycle performance throughout the signal path. Despite its relatively simple
architecture, the digital DLL meets all system specifications,
and it operates down to lower supply voltages than its analog
counterpart. Utilizing essentially only simple digital CMOS
gates, the DLL can be ported to new processes in minimal time. For these reasons, this digital DLL provides an
alternative to analog DLLs for clock alignment applications.
ACKNOWLEDGMENT
The authors thank J. McBride and P. Gordon for layout
support and S. Sidiropoulos for helpful insights.
REFERENCES
[1] A. Efendovich, Y. Afek, C. Sella, and Z. Bikowsky, Multifrequency
zero-jitter delay-locked loop, IEEE J. Solid-State Circuits, vol. 29, pp.
6770, Jan. 1994.
[2] J.-M. Han, J. Lee, S. Yoon, S. Jeong, C. Park, I. Cho, S. Lee, and D. Seo,
Skew minimization techniques for 256 Mb synchronous DRAM and
beyond, in VLSI Circuits Dig. Tech. Papers, June 1996, pp. 192193.
[3] A. Hatakeyama, H. Mochizuki, T. Aikawa, M. Takita, Y. Ishii, H.
Tsuboi, S. Fujioka, S. Yamaguchi, M. Koga, Y. Serizawa, K. Nishimura,
K. Kawabata, Y. Okajima, M. Kawano, H. Kojima, K. Mizutani, T.
Anezaki, M. Hasegawa, and M. Taguchi, A 256 Mb SDRAM using
register-controlled digital DLL, in ISSCC 1997 Dig. Tech. Papers, Feb.
1997, pp. 7273.
[4] T. Lee, K. Donnelly, J. Ho, J. Zerbe, M. Johnson, and T. Ishikawa, A
2.5 V CMOS delay-locked loop for 18 Mbit, 500 megabyte/s DRAM,
IEEE J. Solid-State Circuits, vol. 29, pp. 14911496, Dec. 1994.
[5] S. Sidiropoulos and M. Horowitz, A semidigital dual delay-locked
loop, IEEE J. Solid-State Circuits, vol. 32, pp. 16831692, Nov. 1997.
[6] K. Donnelly, Y. Chan, J. Ho, C. Tran, S. Patel, B. Lau, J. Kim, P.
Chau, C. Huang, J. Wei, L. Yu, R. Tarver, R. Kulkarni, D. Stark, and M.
Johnson, A 660MB/s interface megacell portable circuit in 0.3 m0.7
m CMOS ASIC, IEEE J. Solid-State Circuits, vol. 31, pp. 19952003,
Dec. 1996.
[7] N. Kushiyama, S. Ohshima, D. Stark, H. Noji, K. Sakurai, S. Takase,
T. Furuyama, R. Barth, A. Chan, J. Dillon, J. Gasbarro, M. Griffin,
M. Horowitz, T. Lee, and V. Lee, A 500-Megabyte/s data-rate 4.5M
DRAM, IEEE J. Solid-State Circuits, vol. 28, pp. 490508, Apr. 1993.
[8] M. Hasegawa, M. Nakamura, S. Narui, S. Ohkuma, Y. Kawase, H.
Endoh, S. Miyatake, T. Akiba, K. Kawakita, M. Yoshida, S. Yamada, T.
Sekigguchi, I. Asano, Y. Tadaki, R. Nagai, S. Miyaoka, K. Kajigaya, M.
Horiguchi, and Y. Nakagome, A 256 Mb SDRAM with subthreshold
leakage current suppression, in ISSCC 1998 Dig. Tech. Papers, Feb.
1998, pp. 8081.
[9] T. Saeki, Y. Nakaoka, M. Fujita, A. Tanaka, K. Nagata, K. Sakakibara,
T. Matano, Y. Hoshino, K. Miyano, S. Isa, E. Kakehashi, J. Drynan,
M. Komuro, T. Fukase, H. Iwasaki, J. Sekine, M. Igeta, N. Nakanishi,
T. Itani, K. Yoshida, H. Yoshino, S. Hashimoto, T. Yoshii, M. Ichinose,
T. Imura, M. Uziie, K. Koyama, Y. Fukuzo, and T. Okuda, A 2.5
ns clock access 250 MHz 256 Mb SDRAM with synchronous mirror
delay, ISSCC 1996 Dig. Tech. Papers, Feb. 1996, pp. 374375.
[10] B. Garlepp, K. Donnelly, J. Kim, P. Chau, J. Zerbe, C. Huang, C. Tran,
C. Portmann, D. Stark, Y. Chan, T. Lee, and M. Horowitz, A portable
digital DLL architecture for CMOS interface circuits, in VLSI Circuits
Dig. Tech. Papers, June 1998, pp. 214215.
[11] M. Griffin, J. Zerbe, A. Chan, Y. Jun, Y. Tanaka, W. Richardson, G.
Tsang, M. Ching, C. Portmann, Y. Li, B. Stonecypher, L. Lai, K. Lee,
V. Lee, D. Stark, H. Modarres, P. Batra, J. Louis-Chandran, J. Privitera,
T. Thrush, B. Nickell, J. Yang, V. Hennon, and R. Sauve, A process
independent 800 MB/s DRAM bytewide interface featuring command
interleaving and concurrent memory operation, in ISSCC 1998 Dig.
Tech. Papers, Feb. 1998, pp. 156157.
[12] S. Sidiropoulos, High-performance interchip signalling, Ph.D. dissertation, Computer Systems Laboratory, Stanford University, Stanford, CA, Apr. 1998. Available as Tech. Rep. CSL-TR-98-760 from
http://elib.stanford.edu/.
[13] I. Young, M. Mar, and B. Bhushan, A 0.35 m CMOS 3-880 MHz
PLL N/2 multiplier and distribution network with low jitter for microprocessors, in ISSCC 1997 Dig. Tech. Papers, Feb. 1997, pp. 330331.
[14] V. von Kaenel, D. Aebischer, C. Piguet, and E. Dijkstra, A 320 MHz,
1.5 mW at 1.35 V CMOS PLL for microprocessor clock generation,
in ISSCC 1996 Dig. Tech. Papers, Feb. 1996, pp. 132133.
[15] V. von Kaenel, D. Aebischer, R. van Dongen, and C. Piguet, A 600
MHz CMOS PLL microprocessor clock generator with a 1.2 GHz
VCO, in ISSCC 1998 Dig. Tech. Papers, Feb. 1998, pp. 396397.
643
Charles Huang received the B.S. degree in electrical engineering from the University of Fuzhou,
China, in 1982 and the M.S. degree in electrical
engineering from the University of Arkansas, Fayetteville, in 1990.
He was with ULSI and SGI, working in the area
of PLL and cache circuit design. He joined Rambus,
Inc., Mountain View, CA, in 1994, where he has
being engaged in high-speed CMOS DLL and I/O
circuit design.
644
Thomas H. Lee (S87M87), for a photograph and biography, see this issue,
p. 585.
Donald Stark received the B.S. degree from the
Massachusetts Institute of Technology, Cambridge,
in 1985 and the M.S. and Ph.D. degrees from
Stanford University, Stanford, CA, in 1987 and
1991, respectively, all in electrical engineering.
His research interests at Stanford included circuit
design and CAD tools for analysis of voltage and
current distributions in VLSI circuits. From 1987
to 1991, he was also a Member of the Western
Research Laboratory, Digital Equipment Corp., Palo
Alto, CA, working on CAD development and ECL
circuit design. From 1991 to 1993, he was with the Semiconductor Device
Engineering Laboratory, Toshiba Corp., Kawasaki, Japan, working on DRAM
design. In 1993, he joined Rambus, Inc., Mountain View, CA, where he
currently works on DRAM, high-speed I/O design, and CAD.
Mark A. Horowitz, for a photograph and biography, see p. 528 of the April
1999 issue of this JOURNAL.
565
Abstract This paper describes a register-controlled symmetrical delay-locked loop (RSDLL) for use in a high-frequency
double-data-rate DRAM. The RSDLL inserts an optimum delay
between the clock input buffer and the clock output buffer,
making the DRAM output data change simultaneously with the
rising or falling edges of the input clock. This RSDLL is shown to
be insensitive to variations in temperature, power-supply voltage,
and process after being fabricated in 0.21-m CMOS technology.
The measured rms jitter is below 50 ps when the operating
frequency is in the range of 125250 MHz.
Index TermsDelay-locked loops, double-data rate, DRAM.
I. INTRODUCTION
566
switches from LOW to HIGH. An added benefit of the twoNAND delay element is that two point-of-entry control signals
are now available. Both are used by the shift register to solve
the possible problem caused by the power-up ambiguity in the
shift register.
Fig. 3. Phase detector used in RSDLL.
567
568
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 11, NOVEMBER 1997
1683
AbstractThis paper describes a dual delay-locked loop architecture which achieves low jitter, unlimited (modulo 2 ) phase
shift, and large operating range. The architecture employs a core
loop to generate coarsely spaced clocks, which are then used by
a peripheral loop to generate the main system clock through
phase interpolation. The design of an experimental prototype in
a 0.8-m CMOS technology is described. The prototype achieves
an operating range of 80 kHz400 MHz. At 250 MHz, its peakto-peak jitter with quiescent supply is 68 ps, and its jitter supply
sensitivity is 0.4 ps/mV.
Index TermsClock synchronization, delay-locked loops, phase
interpolation, phase-locked loops.
I. INTRODUCTION
conventional approaches, Section II presents the dual interpolating DLL architecture. Section III discusses circuit design
issues that arose in the prototype implementation of the architecture in a 0.8- m CMOS technology. Section IV discusses
the experimental results, and concluding remarks follow in
Section V.
II. ARCHITECTURE
A. Conventional DLLs
A simplified block diagram of a conventional DLL [1] is
outlined in Fig. 1. The components are a voltage controlled
delay line (VCDL), a phase detector, a charge pump, and a
first-order loop filter. The input reference clock drives the
delay line which comprises a number of cascaded variable
delay buffers. The output clock clk drives the loop phase
detector (depicted in this example as a conventional flip-flop).
The output of the phase detector is integrated by the charge
pump and the loop filter capacitor to generate the loop control
voltage
. The loop negative feedback drives the control
voltage to a value that forces a zero phase error between the
output clock and the reference clock.
This simple design offers many advantages compared to
VCO-based PLLs. Due to frequency acquisition constraints,
PLLs usually resort to a specific type of phase detector, the
state-machine-based phase frequency detector (PFD). In contrast, DLLs can be easily implemented by using bangbang
controli.e., the control signal of the loop, rather than being
proportional to the phase error magnitude, can simply be a
binary up or down indication. Thus, in a bangbang
DLL the phase detector can be a replica of the input data
receiver resulting in an optimal placement of the sampling
clock in the center of the input receivers sampling uncertainty
window. Additionally, since DLLs do not use a VCO, phase
errors induced by supply or substrate noise do not accumulate
over many clock cycles. This improved noise immunity is
1684
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 11, NOVEMBER 1997
1685
core loop
(in seconds) is the delay established by the
core loop delay line, while the input delay
is the delay
for which the core loop phase detector and charge pump do not
generate an error signal. Since the core loop VCDL spans half
a clock cycle,
is equal to half an input clock period. By
using these loop variables, the input-to-output transfer function
of the core loop can be easily derived
(1)
where
(in rads/s) is the pole of the core loop as determined
by the charge pump current, the phase detector and delay line
gain, and the loop filter capacitor. Similarly, the noise-to-delay
error transfer function of the core loop can be shown to be
(2)
where
is the additional delay introduced in the core
loop from supply or substrate noise, and
is the delay
error seen by the core loop phase detector. This transfer
function indicates that noise induced delay errors can be
tracked up to the loop bandwidth and that the response of
the loop to a supply step consists of an initial step followed
by a decaying exponential with a time constant equal to
.
Before proceeding to analyze the response of the dual loop,
it should be noted that the linearized model of Fig. 3 uses
a simplifying assumption. The assumption is that the delay
error
introduced by supply or substrate variations is
identical in both loops and does not depend on the state
of the phase selection multiplexers. Since the supply and
substrate sensitivity of the peripheral loop depends on the
phase selection and will be typically higher due to the presence
of the final CMOS system clock buffer, this assumption is
not necessarily accurate. However, it does not affect the
conclusions drawn below about the stability of the loop, since
it only removes a modifying constant, which is equal to the
ratio in the delay sensitivities of the two loops. This constant
only affects the relative location of the poles and zeros of the
resulting transfer function, and, as it will be shown below,
the loop is unconditionally stable irrespective of the relation
between the individual poles and zeros. Using the model of
Fig. 3, it is straightforward to show that the transfer function
of the peripheral loop is identical in form to
that of the core loop. This result agrees with intuition since
in the dual loop system reference clock perturbations do not
1686
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 11, NOVEMBER 1997
(a)
(b)
error later. When the pole frequencies of the two loops are
very close, the system overshoots since the peripheral loop
compensates for the output delay error at approximately the
same rate as the peripheral loop. The worst case overshoot of
approximately 4.5% of the initial disturbance occurs when the
peripheral loop bandwidth is twice that of the core loop. As the
peripheral loop bandwidth increases, the overshoot becomes
progressively smaller since the peripheral loop corrects for
both the peripheral and core delay errors. Subsequently, the
influence of the slower core loop correction on the output
delay error is compensated by the peripheral loop. Therefore,
even in the worst case, the dual loop cascade exhibits only
minor overshoot.
III. CIRCUIT DESIGN
A. Overview
A more detailed block diagram of the dual loop is shown in
Fig. 5. This design uses a separate local differential clock as
the input to the delay line. Although the use of this clock
is not inherent in the loop architecture, it minimizes the
supply sensitivity in applications such as source synchronous
interfaces. To minimize the effects of input clock duty cycle
imperfections and common-mode mismatches, a duty cycle
adjuster (DCA) [2] is employed after the first clock receiving
buffer. The 50% duty cycle clock drives the core DLL. The
core delay line consists of six differential buffers. An extra pair
of buffers B B generate two clocks which drive the core
loop 180 phase detector. The output of the phase detector
controls the charge pump which forces clocks C and C to
be 180 out of phase. Since all the buffers in the core delay line
(including B and B ) have the same size, all the core VCDL
stages have the same fan-out and delay. Therefore, forcing
C and C to be 180 out of phase will generate six evenly
spaced by 30 clocks at the outputs of the core delay line.
The phase selection and phase inversion multiplexers are
differential elements controlled by the core loop control voltage. In order to eliminate jitter-sensitive slow clocks, all
buffers in the clock path need to have approximately the same
1687
(b)
Fig. 6. (a) Core loop delay buffer and (b) charge pump.
1688
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 11, NOVEMBER 1997
1689
(a)
(b)
Fig. 12. (a) Simplified FSM algorithm and (b) resulting loop behavior.
Fig. 10.
Fig. 11.
1690
Fig. 14.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 11, NOVEMBER 1997
Fig. 15.
Fig. 16.
case jitter (68 ps) with quiescent supply. The jitter histogram
consists of the superposition of two Gaussian distributions
resulting from the switching of the peripheral loop between
two adjacent interpolation intervals. The distance between the
peaks of the two superimposed distributions is about 40 ps,
which is in fair agreement with the simulation results. With
the noise generation circuits injecting a 750-mV 1-MHz square
wave on the chip supply, the peak-to-peak jitter increases to
400 ps (Fig. 16). It should be noted that simulation results
indicate that approximately 50% of this jitter is not inherent to
the loop, but is due to the supply sensitivity of the succeeding
static CMOS clock buffer and off-chip driver.
Fig. 17 illustrates the linearity of the interpolation process
in the peripheral loop. The figure shows the histogram of
the output clock with the peripheral loop FSM continuously
rotating that clock. The histogram was generated by keeping
1691
1692
Fig. 17.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 32, NO. 11, NOVEMBER 1997
TABLE I
PROTOTYPE PERFORMANCE SUMMARY
400 MHz. The phase offset between the reference clock and
the output clock of the loop is less than 40 ps. Operating at
250 MHz, the dual DLL draws 31 mA dc from a 3.3-V power
supply.
V. SUMMARY
Although DLLs are easier to design than PLLs and offer
better jitter performance, their main disadvantage is their
limited phase capture range. This disadvantage limits their
application to completely synchronous environments and complicates start-up circuitry. This paper presented a dual DLL
architecture which removes this limitation by using a core DLL
to generate coarsely spaced clocks which are then used by a
peripheral DLL to generate the output clock by using phase
interpolation. This architecture has unlimited (modulo 2 )
phase shift capability, therefore removing boundary conditions
and phase relationship constraints between the system clocks.
The only requirement is that the DLL input and reference
clocks are plesiochronous, making the dual DLL suitable for
clock recovery applications. In addition, the digital nature
of the peripheral loop control enables implementation of
[1] M. Johnson and E. Hudson, A variable delay line PLL for CPUcoprocessor synchronization, IEEE J. Solid-State Circuits, vol. 23, Oct.
1988.
[2] T. Lee et al., A 2.5 V CMOS delay-locked loop for an 18 Mbit, 500
MB/s DRAM, IEEE J. Solid-State Circuits, vol. 29, pp. 14911496,
Dec. 1994.
[3] M. Izzard et al., Analog versus digital control of a clock synchronizer
for a 3 Gb/s data with 3.0 V differential ECL, in Dig. Tech. Papers
1994 Symp. VLSI Circuits, June 1994, pp. 3940.
[4] M. Horowitz et al., PLL design for a 500 MB/s interface, in Dig.
Tech. Papers Int. Solid State Circuits Conf., Feb. 1993, pp. 160161.
[5] S. Sidiropoulos and M. Horowitz, A semi-digital delay locked loop with
unlimited phase shift capability and 0.08400 MHz operating range, in
Dig. Tech. Papers Int. Solid State Circuits Conf., Feb. 1997, pp. 332333.
[6] J. Maneatis and M. Horowitz, Precise delay generation using coupled
oscillators, IEEE J. Solid-State Circuits, vol. 28, pp. 12731282, Dec.
1993.
[7] J. Maneatis, Low-jitter process-independent DLL and PLL based
on self-biased techniques, IEEE J. Solid-State Circuits, vol. 31, pp.
17231732, Nov. 1996.
1021
AbstractA delay-locked loop (DLL) with wide-range operation and fixed latency of one clock cycle is proposed. This DLL
uses a phase selection circuit and a start-controlled circuit to
enlarge the operating frequency range and eliminate harmonic
locking problems. Theoretically, the operating frequency range of
the DLL can be from 1 (
max ) to 1 (3
min ), where
min and
max are the minimum and maximum delay of a
is the number of delay cells used
delay cell, respectively, and
in the delay line. Fabricated in a 0.35- m single-poly triple-metal
CMOS process, the measurement results show that the proposed
DLL can operate from 6 to 130 MHz, and the total delay time
between input and output of this DLL is just one clock cycle.
From the entire operating frequency range, the maximum rms
jitter does not exceed 25 ps. The DLL occupies an active area of
515 m and consumes a maximum power of 132 mW
880 m
at 130 MHz.
Index TermsDelay-locked loops, latency, phase-locked loops,
wide range.
I. INTRODUCTION
1022
Fig. 2.
Fig. 4.
and
, respectively. As a result, the period of the input
signal should satisfy the following inequality [7]:
Max
Min
(1)
Equation (1) shows that the DLL is prone to the false locking
problem when process variations are taken into account [7].
Therefore, some solutions [6][10] are proposed to overcome
this problem. They are described as follows.
First, the basic idea is to use a phase-frequency detector
(PFD) [5], because it has a capture range of 2 , 2 wider
than other phase detectors. So, the PFD is a better choice for
wide range operation. However, the PFD cannot be used in the
DLL alone without any control circuit because the DLL will
try to lock a zero delay. A PFD combined with a control circuit
is presented in [6]. Nevertheless, in some cases, especially for
high-frequency operations, the initial delay between ref_clk
and vcdl_clk, as shown in Fig. 1, may be larger than two clock
cycles and harmonic locking will occur.
Second, a solution called an all-analog DLL using a replica
delay line [7] has been developed to solve the narrow frequency
range problem of a conventional DLL. If the delay range of the
,
VCDL satisfies the relation
the DLL will have a maximum operation range of 7:1.
Third, a digital-controlled DLL called the self-correcting
DLL is proposed in [8]. The problem of false locking is
solved by the addition of a lock-detect circuit and the modified
phase detector. Although this self-correcting DLL avoids false
locking, the outputs of the VCDL are required to have an exact
50% duty cycle.
The DLL developed in [9] uses a stage selector for fast-locked
and wide-range operations, but the DLL requires an additional
VCDL, which increases the area. A similar DLL can automatically change its lock mode to extend the operation range, but
the latency of the DLL will be larger than one clock cycle [10].
The approach presented in this work uses a phase selection
circuit to automatically decide what number of delay cells
should be used. This can enable the DLL to operate in the
wide-frequency range. A new start-controlled circuit is also
presented for the DLL to solve false locking problems and keep
the latency of one clock cycle. The exact 50% duty cycle is not
necessary.
III. ARCHITECTURE OF THE PROPOSED DLL
The architecture of the proposed DLL is shown in Fig. 3. It
is composed of a conventional analog DLL, a phase selection
circuit, and a start-controlled circuit. Before the DLL begins to
lock, the phase selection circuit will choose an appropriate delay
cell to be a feedback signal (vcdl_clk) according to different frequencies of input signal. In other words, the number of the delay
cells may change at different input frequencies. The minimum
of the delay line is determined by one unit-delay
delay
where
cell. The maximum delay can be decided as
is the number of unit-delay cells. Thus, the operating freto
quency range of the DLL can be from
.
The linear model of the DLL is shown in Fig. 4, where the
is the charge-pump cursummer stands for a phase detector,
CHANG et al.: WIDE-RANGE DELAY-LOCKED LOOP WITH FIXED LATENCY OF ONE CLOCK CYCLE
Fig. 6.
Schematic of edge detection circuit. (a) Edge detection circuits. (b) Clock edge generation. (c) Latch
(a)
1023
N.
(b)
rent,
is the period of the input reference clock, is the
is the gain of the
capacitor value in the loop filter, and
VCDL which is proportional to the number of delay cells. In
the steady-state locked condition, the -domain transfer function can be expressed as [11]
(2)
1024
input and output of the VCDL will increase until it reaches one
clock period of the input signal. Thus, the DLL will not fall into
false locking and the latency is fixed to one clock cycle no matter
how long a delay the VCDL provides.
IV. CIRCUIT DESCRIPTION
A. Phase Selection Circuit
The phase selection circuit consists of two blocks: an edge detector and a multiplexer with a decoder, as shown in Fig. 5. The
schematic and timing diagram of the edge detector are shown
in Figs. 6 and 7, respectively. To guarantee that the latency of
the DLL is just one clock cycle, the first two clock phases in
Fig. 6 are reserved for measurement. In practice, the first two
clock phases could be included in the phase selection circuit to
improve the operating frequency range of the DLL. At the initial
state, the signal startb is set to low to reset the edge detector outputs (i.e., d3 d10) and the delay of the VCDL is set to its minimum value. When the signal startb goes high, the edge detector
will detect the rising edge of input signals in sequence during
the next two rising edges of ref_clk. Referring to Fig. 7(a), suppose that the signals all have rising edges in sequence during
one clock cycle, therefore, the outputs (d3 d10) are all high
and the multiplexer will select phase 10 as the output signal,
vcdl_clk. However, if the input frequency is higher, suppose that
the timing diagram is similar to Fig. 7(b). All the inputs have
rising edges during one clock cycle, but only the rising edges of
phases 1 4 in sequence lead the selected phase to be 4. The
vcdl_clk will be low until the selected phase is chosen. After
Fig. 10.
the vcdl_clk is decided, the DLL will start the locking process,
which will be explained later. By the decoder, signals (d3 d10)
are decoded to generate 3-bit control signals, which switch the
number of capacitors used in the loop filter for tuning the loop
bandwidth.
B. Start-Controlled Circuit
The schematic of the start-controlled circuit and the associated PFD are shown in Fig. 8. It is composed of only two
rising-edge trigger D-flip-flops (DFFs), two NAND gates, and
CHANG et al.: WIDE-RANGE DELAY-LOCKED LOOP WITH FIXED LATENCY OF ONE CLOCK CYCLE
Fig. 11.
Fig. 12.
1025
Fig. 13.
PFD are in low level. When startb goes to high, setupb will
also go to high. After two consecutive falling edges of vcdl_clk
trigger the DFFs, the down signal of the PFD will be activated
and let the delay of the VCDL increase. The delay of the VCDL
will increase until it is equal to one clock period of the input
signal due to the nature of negative feedback architecture. Since
the start-controlled circuit forces the delay of the VCDL to its
minimum value and controls the delay of the VCDL to increase
until its delay is equal to one clock period, the DLL will not fall
. In order to get
into false locking even when
equal delays for path1 and path2, dummy loads should be added
in point A. In comparison with [6], this start-controlled circuit
1026
Fig. 15.
Fig. 16.
has two advantages: the proposed circuit is simple, and the duty
cycle of ref_clk and vcdl_clk is not required to be exactly 50%.
Fig. 17.
Fig. 18.
C. Other Circuits
In this work, the dynamic logic style PFD [13] is adopted to
avoid the dead-zone problem and improve the operating speed.
To mitigate charge injection errors induced by the parasitic
capacitors of the switches and current source transistors, the
charge-pump circuit developed in [11] is used here. The delay
cell circuit is similar to [11]. The schematics of these circuits
are shown in Figs. 1012. The control voltage of the loop filter
is directly connected to nMOS rather than pMOS. Therefore,
the transfer curve of delay versus control voltage is monotonic
decreasing, as shown in Fig. 13.
V. EXPERIMENTAL RESULTS
The prototype chip is fabricated in a 0.35- m single-poly
triple-metal standard CMOS process. The microphotograph of
the chip is shown in Fig. 14. The capacitors used in the loop
filter are integrated in the chip and formed by metal-to-metal
capacitors. The experimental results show that the DLL can operate in the frequency range of 6130 MHz. Figs. 15 and 16
show the first four cycles of the DLL in the locking process
when the operating frequency is 6 and 130 MHz, respectively.
After the signal startb is high, the phase selection circuit will select one of the outputs of the VCDL as close as possible to the
next rising edge of the input clock, ref_clk. Figs. 15 and 16 also
show that after the signal startb is high, the first rising edge of
CHANG et al.: WIDE-RANGE DELAY-LOCKED LOOP WITH FIXED LATENCY OF ONE CLOCK CYCLE
the output clock of the VCDL, vcdl_clk, leads that of the input
clock, ref_clk. Since the signal startb will set the control voltage
in Fig. 3 to
, the proposed phase detector and the current-pump circuit will discharge the loop filter to increase the
delay of the VCDL. It will align the phases between the input
clock and output clock of the VCDL. Fig. 17 shows the jitter
histogram when the DLL operates at 130 MHz. Fig. 18 shows
the measurement results of rms jitter over different frequencies.
Table I gives the performance summary. The proposed DLL can
be seen to have a wide-operational range and a fixed latency of
one clock cycle.
VI. CONCLUSION
A DLL with wide-range operation and fixed latency of one
clock cycle is proposed. First, the multiphase outputs of the
VCDL are all sent to the phase selection circuit. Then the
phase selection circuit will automatically select one of the
delayed outputs to feedback. As a result, this DLL can operate
over a wide range without suffering from harmonic locking
problems. Ideally, this DLL can operate from
to
. The experimental results also demonstrate
the functionality of the proposed DLL. Moreover, at different
operating frequencies, the jitter performances are all in an
acceptable range and the latency is just one clock cycle. Since
the speed of the proposed circuits can be increased if the
more advanced process is used, the performance of the DLL
such as the operating frequency range can be improved with a
little hardware and design effort. The power consumption of
the digital part in the DLL and the total die area will also be
reduced.
REFERENCES
[1] B. Razavi, Monolithic Phase-Locked Loops and Clock Recovery Circuits: Theory and Design. Piscataway, NJ: IEEE Press, 1996.
[2] F. M. Gardner, Charge-pump phase-lock loops, IEEE Trans.
Commun., vol. COM-28, pp. 18491858, Nov. 1980.
[3] R. E. Best, Phase-Locked Loops: Theory, Design and Applications. New York: McGraw-Hill, 1998.
[4] R. L. Aguitar and D. M. Santos, Multiple target clock distribution
with arbitrary delay interconnects, Electron. Lett., vol. 34, no. 22, pp.
21192120, Oct. 1998.
[5] R. B. Watson Jr. and R. B. Iknaian, Clock buffer chip with multiple
target automatic skew compensation, IEEE J. Solid-State Circuits, vol.
30, pp. 12671276, Nov. 1995.
[6] C. H. Kim et al., A 64-Mbit 640-Mbyte/s bidirectional data strobed,
double-data-rate SDRAM with a 40-mW DLL for a 256-Mbyte memory
system, IEEE J. Solid-State Circuits, vol. 33, pp. 17031710, Nov.
1998.
[7] Y. Moon, J. Choi, K. Lee, D. K. Jeong, and M. K. Kim, An all-analog
multiphase delay-locked loop using a replica delay line for wide-range
operation and low-jitter performance, IEEE J. Solid-State Circuits, vol.
35, pp. 377384, Mar. 2000.
[8] D. J. Foley and M. P. Flynn, CMOS DLL-based 2-V 3.2-ps jitter 1-GHz
clock synthesizer and temperature-compensated tunable oscillator,
IEEE J. Solid-State Circuits, vol. 36, pp. 417423, Mar. 2001.
[9] H. Yahata, T. Okuda, H. Miyashita, H. Chigasaki, B. Taruishi, T. Akiba,
Y. Kawase, T. Tachibana, S. Ueda, S. Aoyama, A. Tsukinori, K. Shibata,
M. Horiguchi, Y. Saiki, and Y. Nakagome, A 256-Mb double-data-rate
SDRAM with a 10-mW analog DLL circuit, in Symp. VLSI Circuits
Dig. Tech. Papers, June 2000, pp. 7475.
1027
[10] Y. Okuda, M. Horiguchi, and Y. Nakagome, A 66400 MHz adaptive-lock-mode DLL circuit duty-cycle error correction, in Symp. VLSI
Circuits Dig. Tech. Papers, June 2001, pp. 3738.
[11] J. G. Maneatis, Low-jitter process-independent DLL and PLL based
on self-biased techniques, IEEE J. Solid-State Circuits, vol. 31, pp.
17231732, Nov. 1996.
[12] A. Chandrakasan, W. J. Bowhill, and F. Fox, Design of High-Performance Microprocessor Circuit. New York: IEEE Press, 2001, p. 240.
[13] S. Kim et al., A 960-Mb/s/pin interface for skew-tolerant bus using
low jitter PLL, IEEE J. Solid-State Circuits, vol. 32, pp. 691700, May
1997.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 11, NOVEMBER 2000
1553
AbstractA novel clock network composed of multiple synchronized phase-locked loops is analyzed, implemented, and
tested. Undesirable large-signal stable (mode-locked) states
dictate the transfer characteristic of the phase detectors; a matrix
formulation of the linearized system allows direct calculation of
system poles for any desired oscillator configuration. A 16-oscillator 1.3-GHz distributed clock network in 0.35- m CMOS is
presented here.
Index TermsClock network, multiple oscillator system, phaselocked loop.
I. INTRODUCTION
HE CLOCK distribution network of a modern microprocessor uses a significant fraction of the total chip power
and has substantial impact on the overall performance of the
system. For example, the 72-W 600-MHz Alpha processor [1]
dissipates 16 W in the global clock distribution, and another
23 W in the local clocks: more than half the power goes to
driving the clock net. The clock uncertainty budget for a global
clock is 10% of a clock period, which translates to a 10% reduction in maximum operating speed; as argued below, this penalty
is likely to increase for currently popular clock architectures.
Most conventional microprocessors use a balanced tree to distribute the clock [1][3]. Because the delays to all nodes are
nominally equal, trees may be expected to have low skew. However, at gigahertz clock speeds a large fraction of skew and jitter
comes from random variations in gate and interconnect delay.
The majority of jitter in a clock tree is introduced by buffers and
inter-line coupling to the clock wires; a relatively small amount
comes from noise in the source oscillator [4]. Therefore, a primary consideration in clock design is matching delay along the
clock path.
As clock speed increases, signal delay across a chip becomes
comparable to a clock cycle. For example, a 2-cm-long wire in
a 0.25- m process has a delay of 0.86 ns, while the clock might
be as high as 1 GHz; scaling to 4 GHz, the same wire (with
optimal buffering) will have a delay of approximately 0.43 ns,
compared to a clock period of 0.25 ns. In all practical cases a
signal that takes longer than a clock cycle to propagate would
be pipelined, and hence re-clocked. The fundamental weakness
of tree distribution (and networks that depend on tree matching)
Manuscript received March 24, 2000; revised June 24, 2000. This paper was
supported by the MARCO Focused Research Center on Interconnects, which
was funded at the Massachusetts Institute of Technology through a subcontract
from the Georgia Institute of Technology, and supported in part by a Graduate
Fellowship from the Intel Corporation.
V. Gutnik was with M.I.T. Microsystems Technology Lab, Cambridge, MA
02139 USA. He is now with Silicon Laboratories, Austin, TX 78749 USA
(e-mail: gutnik@mit.edu).
A. P. Chandrakasan is with M.I.T. Microsystems Technology Lab, Cambridge, MA 02139 USA (e-mail: anantha@mtl.mit.edu).
Publisher Item Identifier S 0018-9200(00)09441-5.
1554
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 11, NOVEMBER 2000
D. Active Feedback
Fig. 2.
GUTNIK AND CHANDRAKASAN: ACTIVE GHz CLOCK NETWORK USING DISTRIBUTED PLLs
Fig. 3.
1555
Fig. 7.
Fig. 5.
DLL architecture.
the use of multiple independent clocks [8], this approach produces a single fully synchronized clock. The rest of this section
examines small and large signal stability of a distributed PLL.
A. Small Signal
In a multiple-oscillator PLL large-signal and small-signal behavior are interrelated. In normal operation, the oscillators are
phase-locked, and jitter depends on the network response to
noise. Because startup is expected to take a negligibly small
fraction of time, the connection of the oscillators is optimized
for small-signal behavior rather than to make initial acquisition
more efficient. The linearized small-signal behavior, valid when
the oscillators are nearly in phase, is analyzed first.
B. General Derivation
1556
Fig. 9.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 11, NOVEMBER 2000
, as
(3)
Fig. 10. One-dimensional PLL array; symmetrical with the dotted-line
connections.
of size
, and the loop filter becomes
,a
matrix.
is an incorresponding
matrix. The network of oscillators
tuitively meaningful
is similar to a lumped circuit with a node for each oscillator
and a branch for each connection between pairs of oscillators.
Node voltages in represent oscillator phase, and branch currents represent the error signals on the output of the PD. is the
conductance matrix for with unity conductance branches.
for a four-oscillator network is shown in (1). Each off-diagonal
is 1 if there is a PD between node and node ;
entry
is the number of detectors attached to node .
(1)
This system has multiple poles at the same place where a singleoscillator PLL has single poles.
On the other hand, in a perfectly symmetrical array (call it
), the input to each oscillator is the phase of oscillators
and
(Fig. 10, with the dotted-line connections). The
matrix is the same because the physical arrangement of nodes
changes:
is identical, but
(4)
as in , it is necessary
To achieve the same phase margin in
to lower the gain . This can be shown with a geometrical ar,
gument: in , when the phase of oscillator changes by
the change is measured at two PDs, so oscillator feels twice
the feedback that it would have felt in , and at the same time,
and
both adjust in the opposite direction,
oscillators
giving four times the effective gain. Hence, the gain must be decreased by a factor of approximately four. Mathematically, the
is 1, but the largest eigenvalue of
largest eigenvalues of
is 3.5. Poles of the symmetrical system, solved via (2),
and
are plotted in Fig. 12(a). The key difference between
is the systems response to noise. In both cases, noise at frequenare attenuated. For
cies higher than the unity gain frequency
frequencies much lower than , the response can be calculated
via (2). Fig. 11 shows a Bode plot of noise at node in response
is much
to a noise source at node . Noise performance of
worse for intermediate frequencies because there is no feedback
so errors propagate forever. In , the feedback limits the influence of preceding stages, and this in turn attenuates noise. For
this reason, networks with feedback are preferred, despite the
more complicated stability calculation.
2) Two-Dimensional Array: A two-dimensional array is analyzed exactly the same as is a one-dimensional array, except
that the gain has to decrease by another factor of two because the
center oscillators see four neighbors rather than two. A 16-elegrid is implemented in this thesis. Its poles
ment array in a
are shown in Fig. 12(b).
GUTNIK AND CHANDRAKASAN: ACTIVE GHz CLOCK NETWORK USING DISTRIBUTED PLLs
Fig. 13.
Fig. 11. Comparison of noise responses for symmetrical and asymmetrical
networks.
(a)
(b)
Fig. 12.
Root locus for 1-D and 2-D PLL arrays. (a) 1-D array. (b) 2-D array.
1557
Mode-locking example.
(6)
is a stationary point. This is intuitively easy to see, in
so
reference to Fig. 13: each oscillator leads one neighbor, and
lags behind another neighbor by exactly the same amount. The
net phase error is zero, so clearly there is no restoring force to
drive the phases to 0. Because the nonlinearity does not change
are the same
for small deviations from , dynamics about
as those about 0 and hence this state is stable. The locking
of a distributed oscillator to nonzero relative phases has been
called mode-locking [9]. At startup, each oscillator in a distributed PLL starts at a random phase, so there is a nonzero
chance of converging to a mode-locked state. Simulations show
that for a network like the one shown here, the system ends
of random initial states. The probamode-locked from
array
bility goes up rapidly with the size of the system; a
ends up mode-locked well over 99% of the time.
Pratt and Nguyen proved several useful properties about systems in mode-lock [9]. The key result, generalized for nonCartesian networks, is that for a system in mode-lock, there
must be a phase difference between two oscillators such that
where is the number of nodes in the largest minimal
loop in the network and a minimal loop is a loop in the graph
that cannot be decomposed into multiple loops
This result suggests a way to distinguish between
mode-locked states and the desired 0-phase state: in mode-lock,
there must be at least one branch with a large phase error. If the
gain of the PD is designed to be negative for a phase difference
larger than , then all mode-locked states are made unstable
without affecting the in-phase equilibrium. Pratt and Nguyen
suggest that XOR PDs preclude mode-lock in a rectangular
network of oscillators because the response decreases for phase
, [9]. This result follows directly from the
errors larger than
result derived above: in a rectangular array, the largest minimal
. A PD described in the
loop has four nodes, so
, would be useful in nonrectangular
next section, with
networks, and where more gain near 0 phase is desirable.
IV. IMPLEMENTATION
(5)
1558
Fig. 14.
Fig. 15.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 11, NOVEMBER 2000
Fig. 16.
Fig. 17.
each only to a small section of the chip (tile) (Fig. 8). PDs
at the boundaries between tiles produce error signals that are
summed by an amplifier in each tile and used to adjust the frequency of the node oscillator
Because the proposed network has many nodes, the power
and size constraints on each node are even more stringent than
the constraints on a single global PLL. The oscillator, PD, and
loop filter of a working demonstration chip, fabricated in a standard 0.35- m single-poly triple metal process, are considered in
turn below.
A. Oscillator
The demonstration chip used an nMOS-loaded differential ring oscillator as a voltage-controlled oscillator (VCO)
comprise the differential
(Fig. 14). Transistors
, the tail current is driven
inverter. The differential pair is
, and
act as the nMOS load. The nMOS loads allow
by
noise.
fast oscillation and shield the output signal from
is a low-pass version of
generated by subthreshold
; supply noise coupling in through
leakage through PFET
of
is bypassed by
. The oscillation frequency
is only dependent on the supply voltage through capacitor
, and feedback
nonlinearity and the output conductance of
and
.
of the PLL compensates drift of
B. Phase Detector (PD)
The PD, shown in Fig. 15, has a sufficient nonlinearity, higher
gain at small input phase difference and less high-frequency
) is an nMOScontent than an XOR PD. The core (
GUTNIK AND CHANDRAKASAN: ACTIVE GHz CLOCK NETWORK USING DISTRIBUTED PLLs
1559
Fig. 19.
PD was placed between one of the nodes and the chip clock
input to lock the network to an external reference. The output
of the 16 oscillators was divided by 64 and driven off chip. At
V, the divided outputs were seen to be frequency
locked at 17 to 21 MHz, corresponding to oscillator phase lock
at 1.1 to 1.3 GHz. An oscilloscope plot of four locked output
signals is shown in Fig. 19. Long-term jitter between neighbors
is less than 30 ps. Cycle-to-cycle jitter is less than 10ps. The
oscillators, amplifiers and all the biasing draws 130 mA at 3 V.
A chip plot is shown in Fig. 20. (The rest of the area on the
mm
mm chip is taken up by test circuits.)
VI. CONCLUSION
Design and measurements on this chip confirm that generating and synchronizing multiple clocks on chip is feasible. Neither the power nor the area overhead of multiple PLLs is substantial compared to the cost of distributing the clock by conventional means. Most importantly, a distributed clock network
can take advantage of improved devices by shrinking the size of
the cells, lowering the overall skew and jitter, so performance
will scale with the speed of devices, rather than with the much
slower improvement of on-chip interconnect speed.
1560
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 11, NOVEMBER 2000
REFERENCES
[1] D. W. Bailey and B. J. Benschneider, Clocking design and analysis for
a 600-MHz Alpha microprocessor, J. Solid State Circuits, vol. 33, no.
11, pp. 16271633, Nov. 1998.
[2] C. F. Webb, A 400-MHz S/390 microprocessor, in ISSCC Dig. Tech.
Papers, Feb. 1997, pp. 168169.
[3] T. Yoshida, A 2-V 250-MHz multimedia processor, in ISSCC Dig.
Tech. Papers, Feb. 1997, pp. 266267.
[4] I. A. Young, M. F. Mar, and B. Bhushan, A 0.35-m CMOS
3-880-MHz PLL N/2 clock multiplier and distribution network with
low jitter for microprocessors, in ISSCC Dig. Tech. Papers, Feb. 1997,
pp. 330331.
[5] H. B. Bakoglu, J. T. Walker, and J. D. Meindl, A symmetric clockdistribution tree and optimized high-speed interconnections for reduced
clock skew in ULSI and WSI circuits, in IEEE Int. Conf. Computer
Design, NY, Oct. 1986, pp. 118122.
[6] P. Zarkesh-Ha, T. Mule, and J. D. Meindl, Characterization and modeling of clock skew with process variations, in Proc. IEEE 1999 Custom
Integrated Circuits Conf., pp. 441444.
[7] G. Geannopoulos and X. Dai, An adaptive digital deskewing circuit for
clock distribution networks, in ISSCC Dig. Tech. Papers, Feb. 1998, pp.
400401.
[8] F. Aneau, A synchronous approach for clocking VLSI systems, J.
Solid State Circuits, vol. SC-17, no. 1, pp. 5156, Feb. 1982.
[9] G. A. Pratt and J. Nguyen, Distributed synchronous clocking, IEEE
Trans. Parallel and Distributed Systems, Mar. 1995.
Vadim Gutnik (M00) received the B.S. degree in electrical engineering and
materials science from the University of California, Berkeley, in 1994, and the
S.M. and Ph.D. degrees in electrical engineering from the Massachusetts Institute of Technology, Cambridge, in 1996 and 2000, respectively.
Previous research interests have included micromechanical resonators, and
variable-voltage power supplies. He is currently working as a Design Engineer
at Silicon Laboratories, Austin, TX.
Dr. Gutnik received an NDSEG fellowship in 1994, and the Intel Foundation
Fellowship in 1997.
difference iiicrcasos rrorti eeru, one output is asserted for the fiill
durntiun of nn input pulse, while thc nthcr output is asscrtcd for
only tlie remniiidcr of the input pu1.s~rlurnt.ion after tlic first input
pulse ends, which is cqual to the input phase differcnco. Thus the
dotectnr has high gain near zero phnsc error that drops nff to zero
A S the input phnsc rliffcrence apprnnclics the iiiput pulw width
(Figurc 10.5.4).
174
Auknowlcdgii I C RIS:
The nutliors ~clmowlrxlge~ ~ p p from
~ r tlie
t
MRR.CO Focused Resenrch Contcr mi Interconnects fuiidod nlrM12 through a suhconLrnct from Georgia Tccli. Vodim Gutnilr was partly nuppo~.tedby a
gmdutite fellowship from 1nt.el Corp.
0-7803-5853~8/00/$10.00
02000 IEEE
Keferences :
Chip Boundary
i l e Boundary
............................................
M7
h14
Vt>iaa
Vout
X t
Loop Filter
6t
1
~6~
vco
4 g..............................................
501
I ~I
~
.-
- --
,..........
.................
fi
5. :;
M4
AI
Y2
M2
...............i
................
-0.2
I .
-0.1
0.2
0.1
,.031 .
., I
0.5
1.5
2
2.5
3
Slmulatlon time (mlcmsoconds)
175
- -
50
100
Spacing b c l w o n Signal Linu and G r o w l Line (11
a
111)
454
0-7803-5853-8/00/$10.0002000 IEEE
G:S
nimaf moulh
Caloriiiiclor
--
Niimbcr o f clinnnds
Snmpljnfi rate
100M
< 1UOJI\V
1 5 0 hu 400 luii'
I 15 [I111
,8 Id
li)-30Mndr~10"1icutnms/rii~
12M
< 3n1w
Sfl-10011nr
I O Mnd $. I O " ILICI~I'
LOOK
I 2 liil al 40 MI lz
500 KmdtlO"n/ctn' (lmrrtil)
Zfl Mmrlt Ill"n/cin? h l c a l x )
BW K
.7 11s
IflKrnd t
IO" n l c d
I Tbit/scc
Chip is 2x8mm2.
-4
I25 by 50 iiiicrotis
hiialog 2.2 V Di si131 I .Ir V
13" 6.6 and 5.2 pA
11.9
455
377
I. INTRODUCTION
378
Fig. 2.
Fig. 1. Block diagram of (a) a conventional DLL and (b) a DLL locking
operation and operating frequency range limitation.
(1)
(2)
The range of stuck-free clock period is determined by inequality
(2). If the target clock period satisfies inequality (2), the DLL
works without the stuck problem. However, it should be noted
that inequality (2) has the maximum range
, when
. In
, there is no range of
addition, if
that satisfies inequality (2), and the DLL is prone to the
can be as
stuck problem. Since the PVT variations of
much as 2:1 in a typical CMOS process, the stuck-free condi, and
tion can be satisfied over only a very narrow range of
thus a time-consuming and tedious circuit trimming job is required when process migrations are performed across different
processes.
B. Digital DLL's and Dual-Loop DLL's
Digital DLL's [3], [4] have been developed to overcome the
narrow frequency range problem of conventional analog DLL's.
A simplified block diagram of a typical digital DLL [3] is outlined in Fig. 2(a). Multistage delay cells in the VCDL provide
fixed and quantized delay times. The finite-state machine selects one clock output with closest phase to the reference clock's
by using digital control bits instead of using an analog control
voltage.
Therefore, major drawbacks in the digital DLL's are large
skew due to quantized delay time and large jitter due to control-bit updates during operation. To increase the resolution and
cover a wide delay range, a large delay cell array must be used,
and that inevitably increases chip area and power consumption.
In order to cope with these problems, Garlepp et al. [4] proposed a phase blending technique in a hierarchical structure for
improved phase resolution. However, the inherent problems of
digital DLL's are not solved entirely.
Dual-loop DLL's [6], [7] have been proposed to minimize the
problems of digital DLL's. A simplified block diagram of architecture proposed in [6] is shown in Fig. 2(b). A fine delay line,
which is analog controlled, is attached to a digital DLL in the
subsequent stage. In [7], a phase interpolator is cascaded to a
digital DLL for unlimited phase capture range. These dual-loop
architectures achieve both a wide frequency range and relatively
low jitter performance. However, due to digital DLL's inherent
nature, jitter histogram of the generated clock shows the superposition of two Gaussian distributions [7] resulting from the
control-bit updates. In addition, the overhead of chip area and
power consumption is significant.
379
Fig. 4.
Fig. 5. (a) Delay capture range of the replica delay line and (b) gain curve of
CSPD.
380
Fig. 6.
(3)
Fig. 7.
or equivalently in terms of
Max
(4)
If the delay range of the controlled delay cell satisfies the re, the DLL will have a frelation
quency range determined by the entire delay range of the delay
cell. However, even if we make the delay range wider and satin an effort to increase the
isfy
frequency range, the lock range is limited to only 7:1.
In some applications where the frequency range must be
larger than 7:1, changing the pump-current ratio of the CSPD
can make the frequency range wider. For example, with
, the frequency range of 9:1 can be obtained.
, the frequency range of 11:1 can be
With
obtained.
, especially
, may be
In high-frequency operations,
too short to drive the XNOR gate. So, a divide-by-two circuit and
a pair of delay cells are used to slow down the frequency of
Ref-CLK [11]. The new configuration shown in Fig. 6 is effectively the same as the one in Fig. 4 but offers a more robust
operation in the high-frequency operations.
C. Core DLL
Fig. 7 shows a simplified block diagram of the core DLL. It
consists of a VCDL, a dynamic phase detector, a charge pump,
and a loop filter. The core DLL generates eight-phase clock outputs through eight delay cells (DC's) in the VCDL. The core
DLL is the same as a conventional analog DLL except that it
has another control voltage Vcr. Vcr from the replica delay line
of the VCDL so
coarsely determines the delay time
is equal to
in the locked state. In the locked
that
state, the eighth clock output, CLK7 in Fig. 7, is aligned with
Ref-CLK.
In high-frequency operations, there may be some static phase
mismatch between CLK7 and Ref-CLK due to the long rise/fall
times of signal transition edges compared with the period of
the clock. So the fine-tuning is required. The dynamic phase
detector (PD) in the core DLL generates control signal Vcp, fine, and removes residual phase mismatch so that the
tunes
rising edge of Ref-CLK is exactly aligned with that of CLK7.
Fig. 8. (a) Core DLL with cell-level duty-cycle correction and (b) rising and
falling edge alignment.
381
Fig. 10.
Fig. 11.
Fig. 9. Triply controlled DC. (a) Circuit diagram of a DCE and (b)
configuration of a triply controlled DC.
the duty cycle of Ref-CLK only. With cell-level duty cycle correction, not only CLK7 but also the other intermediate clock outputs maintain a 50% duty cycle without any additional circuits.
Although two control voltages Vcp and Vduty are simultaneously adjusted in the coupled negative feedback loops, the stability is guaranteed by making one of its loops have a sufficiently
low bandwidth.
IV. CIRCUIT DESIGN
A. Triply Controlled Delay Cell
According to the noise analyses of [12] and [13], a
fast-slewing (short rise/fall time) delay cell with a fully
switching capability offers less phase noise. Although offering
a full swing output, a shunt-capacitor delay cell [14], with its
capacitor, would increase the chip area and power. Therefore,
we decided to use the current-starved inverter [15] as a basic
controlled delay cell. Since the current-starved inverter does
not require a level conversion circuit, which is required for a
differential delay cell, it has less chip area and power, although
substrate and supply noise might cause detrimental influence.
A triply controlled delay cell is used as the basic delay cell
element (DCE). The circuit diagram of the DCE and the configuration of one unit of DC are shown in Fig. 9. Four DCE's and
two inverters compose a DC and make its rising/falling delay
of the triply controlled
times symmetric. The delay time
delay cell is determined by six control signals: Vcr, Vcr_b, Vcp,
Vcp_b, Vduty, and Vduty_b. Of those signals, Vcr and Vcr_b
come from the replica delay line. In the DCE, the sizes of
MP1 and MN1 are made larger than the others' so that Vcr and
and
primarily. The other control
Vcr_b can control
signals, which are generated by the core DLL, make only small
and
for the fine-tuning of
.
adjustments to
Vcp and Vcp_b are used to align the rising edges of Ref-CLK
and CLK7. Vduty and Vduty_b are responsible for maintaining
the correct duty cycle and, thus, aligning the falling edges.
382
Fig. 12.
Clock waveforms at 62.5 MHz. (a) CLK0, CLK2 and (b) CLK0, CLK4.
with a conventional bang-bang type of phase detector or a proportional phase detector with wider pulse width.
V. EXPERIMENTAL RESULTS
The test chip has been fabricated using a 0.35-m, N-well,
triple-metal CMOS process. The threshold voltages in this
process are 0.42 V (NMOS) and 0.22 V (PMOS). The
gate-oxide thickness is 75 nm. Fig. 11 shows a microphotograph of the fabricated chip. The chip integrates the DLL with
an on-chip decoupling capacitance of 270 pF. The active area
of the DLL occupies 0.08 mm2 and the decoupling capacitor
0.12 mm2. Since the pulse currents of the multiphase clock
outputs are interspersed, the ac component of the supply
current is present at the eighth harmonic frequencies of the
clock. Therefore, the 270-pF on-chip capacitor is adequate to
reduce the on-chip supply noise induced by switching of digital
circuits.
The prototype chip operates from 62.5 to 250 MHz with a
3.3-V power supply. Fig. 12(a) shows the waveforms of CLK0
and CLK2 at 62.5 MHz. These clock outputs are the first and
the third clocks, respectively, and have a 90 phase difference.
Fig. 12(b) shows the waveforms of CLK0 and CLK4, which
are an inversion of each other with a 180 phase difference.
Fig. 13(a) and (b) shows the same waveforms at 250 MHz. In
spite of some ringing due to capacitance and inductance of the
board and measurement instrument, the measurement results
show that the clock outputs are aligned with precise phase relationships of less than 1% error over an operating frequency
range from 62.5 to 250 MHz. The delay range of the VCDL is
estimated to be between 4 and 16 ns. With minor change of device sizes of the VCDL, the operating frequency range could be
extended toward a higher frequency range.
Fig. 14(a) and (b) shows the jitter histograms in the clock
output CLK7. The frequency of Ref-CLK is 250 MHz. Fig. 14(a)
shows 4-ps rms and 29-ps peak-to-peak jitter characteristics
in a quiet power supply, where only the DLL is activated in
the chip. When other digital circuits are turned on, rms and
peak-to-peak jitter are increased to 6.4 and 44 ps, respectively,
and internal supply noise of about 200 mV is measured. If a
500-mV, 1.1-MHz square wave is injected externally on the
power supply, the peak-to-peak jitter increases to 83 ps, as
shown in Fig. 14(b). At 250 MHz, jitter supply sensitivity is
measured to be only 0.11 ps/mV. Furthermore, from 62.5 to 250
MHz, the clock outputs show almost flat jitter performance.
Since the delay range of the VCDL in the core DLL is primarily
set by Vcr and Vcr_b, the gain of the VCDL is nearly flat over
a wide range of operating frequency. The jitter performance of
the proposed DLL is better than or at least comparable to other
wide-range DLL's [2][7].
Table I summarizes the DLL performance characteristics.
The power dissipation is proportional to the operating frequency. Operating at 250 MHz, the DLL draws 12.6-mA dc
from a 3.3-V power supply.
Fig. 13.
Clock waveforms at 250 MHz. (a) CLK0, CLK2 and (b) CLK0, CLK4.
Fig. 14.
Jitter histograms at 250 MHz in (a) a quiet supply and (b) with added 1.1-MHz, 500-mV square wave noise.
383
384
TABLE I
PERFORMANCE CHARACTERISTICS
PROTOTYPE CHIP
OF
VI. CONCLUSION
By including a replica delay line with a CSPD, the core DLL
operates in a wide frequency range from 62.5 to 250 MHz. Since
the replica delay line occupies a quarter of the area of the core
DLL, the area cost and power consumption of the prototype
chip are much smaller than those of other wide-range DLL's
[2][7]. Both the analog-control scheme and the flat gain of
the VCDL offer a low-jitter performance of 4-ps rms and 29-ps
peak-to-peak, and a low supply sensitivity of 0.11 ps/mV. The
DLL incorporates dynamic phase detectors and triply controlled
delay cells with cell-level duty-cycle correction capability in
order to generate equally spaced eight-phase clocks.
The DLL can be used not only as an internal clock buffer of
microprocessors and memory IC's but also as a multiphase clock
generator for gigabit serial interfaces. With a faster VCDL with
minor change of device sizes, the DLL will operate at a higher
and wider frequency range.
REFERENCES
[1] M. Johnson and E. Hudson, A variable delay line PLL for CPU-coprocessor synchronization, IEEE J. Solid-State Circuits, vol. 23, pp.
12181223, Oct. 1988.
[2] T. H. Lee, K. S. Donnelly, J. T. C. Ho, J. Zerbe, M. G. Johnson,
and T. Ishikawa, A 2.5 V CMOS delay-locked loop for an 18 Mbit,
500 Megabyte/s DRAM,, IEEE J. Solid-State Circuits, vol. 29, pp.
14911496, Dec. 1994.
[3] A. Efendovich, Y. Afek, C. Sella, and Z. Bikowsky, Multifrequency
zero-jitter delay-locked loop, IEEE J. Solid-State Circuits, vol. 29, pp.
6770, Jan. 1994.
[4] B. W. Garlepp, K. S. Donnelly, J. Kim, P. S. Chau, J. L. Zerbe, C. Huang,
C. V. Tran, C. L. Portmann, D. Stark, Y.-F. Chan, T. H. Leen, and M. A.
Horowitz, A Portable Digital DLL for High-Speed CMOS Interface
Circuits, IEEE J. Solid-State Circuits, vol. 34, pp. 632644, May 1999.
[5] S. Tanoi, T. Tanabe, K. Takahashi, S. Miyamoto, and M. Uesugi, A
250622 MHz deskew and jitter-suppressed clock buffer using two-loop
architecture, IEEE J. Solid-State Circuits, vol. 31, pp. 487493, Apr.
1996.
[6] K. Lee, Y. Moon, and D.-K. Jeong, Dual loop delay-locked loop,, U.S.
patent pending.
[7] S. Sidiropoulos and M. A. Horowitz, A semi-digital dual delay-locked
loop, IEEE J. Solid-State Circuits, vol. 32, pp. 16831692, Nov. 1997.
[8] S. Kim, K. Lee, Y. Moon, D.-K. Jeong, Y. Choi, and H. K. Lim, A 960Mb/s/pin interface for skew-tolerant bus using low jitter PLL, IEEE J.
Solid-State Circuits, vol. 32, pp. 691700, May 1997.
[9] D.-L. Chen and M. O. Baker, A 1.25 Gb/s, 460 mW CMOS transceiver
for serial data communication, in IEEE ISSCC Dig. Tech. Papers, Feb.
1997, pp. 242243.
[10] Y. Moon, D.-K. Jeong, and G. Kim, Clock dithering for electromagnetic
compliance using spread spectrum phase modulation, in IEEE ISSCC
Dig. Tech. Papers, Feb. 1999, pp. 186187.
[11] Y. Moon, J. Choi, K. Lee, D.-K. Jeong, and M.-K. Kim, A 62.5250
MHz multi-phase delay-locked loop using a replica delay line with triply
controlled delay cells, in Proc. IEEE Custom Integrated Circuits Conf.,
May 1999, pp. 299302.
[12] B. Kim, High speed clock recovery in VLSI using hybrid analog/digital
techniques, Ph.D. dissertation, Univ. of California, Berkeley, Memo.
UCB/ERL M90/50, June 1990.
[13] A. Hajimiri, S. Limotyrakis, and T. H. Lee, Jitter and phase noise in
ring oscillators, IEEE J. Solid-State Circuits, vol. 34, pp. 790804, June
1999.
[14] M. Bazes, A novel precision MOS synchronous delay line, IEEE J.
Solid-State Circuits, vol. SC-20, pp. 12651271, Dec. 1985.
[15] D.-K. Jeong, G. Borriello, D. A. Hodges, and R. H. Katz, Design of
PLL-based clock generation circuits, IEEE J. Solid-State Circuits, vol.
SC-22, pp. 255261, Apr. 1987.
Yongsam Moon (S'96) was born in Incheon, Korea, on March 1, 1971. He received the B.S. and M.S. degrees in electronics engineering from Seoul National
University, Seoul, Korea, in 1994 and 1996, respectively, where he is currently
pursuing the Ph.D. degree.
He has been working on architectures and CMOS circuits for microprocessors. His current research interests include clock and data recovery for highspeed communication and high-speed I/O interface circuits.
Jongsang Choi was born in Korea on September 11, 1974. He received the B.S.
and M.S. degrees in electronics engineering from Seoul National University,
Seoul, Korea, in 1997 and 1999, respectively, where he is currently pursuing
the Ph.D. degree.
He has been working on architectures and CMOS circuits for high-speed communication. His current research interests include high-speed CMOS circuits
and gigabit network systems.
Deog-Kyoon Jeong (S'87M'89) received the B.S. and M.S. degrees in electronics engineering from Seoul National University, Seoul Korea, in 1981 and
1984, respectively., and the Ph.D. degree in electrical engineering and computer
sciences from the University of California at Berkeley, Berkeley, CA, in 1989.
From 1989 to 1991, he was with Texas Instruments Incorporated, Dallas, TX,,
where he was a Member oif the Technical Staff. He worked on modeling and
design of BiCMOS circuits and single-chip implementation of the SPARC architecture. Since 1991, he has been on the Faculty of the School of Electrical Engineering, Seoul National University, Seoul, Korea, as an Associate Professor.
His research interests include high-speed circuits, microrocessor architectures,
and memory systems.
Min-Kyu Kim was born in Seoul, Korea, in 1965. He received the B.S., M.S.,
and Ph.D. degrees in electronics engineering from Seoul National University,
Seoul, Korea, in 1988, 1990, and 1998, respectively.
From 1995 to 1996, he was with the Electronics and Telecommunications
Research Institute, Taejon, Korea, working on the development of high-speed
communication IC's for ATM switches. Since 1998, he has been working on
high-speed serial link technologies at Silicon Image, Inc., Cupertino, CA. His
current interests include circuit design for high-speed communication systems
and digital-interface display systems.
417
I. INTRODUCTION
418
Fig. 2. (a) Three-stage VCDL. (b) Waveforms with correct lock. (c)
Waveforms with false lock.
FOLEY AND FLYNN: CMOS DLL-BASED CLOCK SYNTHESIZER AND TEMPERATURE-COMPENSATED TUNABLE OSCILLATOR
419
Fig. 4.
Fig. 7.
(a)
(b)
Fig. 5. Nine-stage VCDL waveforms with (a) correct lock and (b) false lock.
from the lock-detect circuit disables the phase detector and activates the up control signal. Similarly, the lock detect under
output activates dn. Following power-on reset the lock-detect
circuit is initialized by setting over active. This ensures a faster
acquisition time for the DLL because the filter capacitor is continuously charged to a voltage level corresponding to 1.25 reference clock periods. At this VCDL delay, the release signal is
activated and the phase detector gains control of the loop and
brings the DLL to lock. The state of the output phases corresponding to a delay of nine reference clock periods is the same
as that corresponding to a single reference clock period delay.
This circuitry is therefore only capable of detecting incorrect
delays up to eight periods of the reference clock. This is not a
limitation of the design as any delays above this would be outside the delay range of the VCDL. In general, the error detection
periods of
logic can detect an incorrect lock delay up to
the reference clock, where is equal to the number of VCDL
output phases.
D. Voltage-Controlled Delay Line (VCDL)
Fig. 6 shows one of the VCDL delay stages. The stage is
designed to operate from a supply as low as 1.8 V and is similar
to that used in [7]. The stage propagation delay is proportional
to the tail current for the output charging and to the voltagecontrolled resistor (VCR) resistance for the output discharging.
420
Fig. 8.
Fig. 9.
synthesizer employs the DLL structure shown in Fig. 3 to generate the multiple clock phases that are then combined to produce the output clock. There are two steps in the generation of
the output clock. The first step combines the nine DLL output
phases, (1:9), to generate three clocks ck1, ck2, and ck3. Fig. 8
shows the clock waveforms. These three clocks are phase separated by one-ninth of a reference clock period and have a frequency three times that of the reference clock. Fig. 9 shows how
the 1, 4, and 7 output phases are combined in an optimized
AND-OR structure with symmetrical delays to generate the ck1
clock. Using identical logic the 2, 5, and 8 phases produce
the ck2 clock and the 3, 6, and 9 phases produce the ck3
clock.
The second step in generating the synthesizer output clock
is to combine these three clocks in another AND-OR structure
and
, running at
to produce a differential output clock,
nine times the reference clock frequency; see Fig. 8. This design
produces a 1.62-GHz output clock frequency for a 180-MHz
reference clock frequency. For a 0.5- m 3.3-V CMOS process
there is a bandwidth limitation of approximately 500 MHz for
reliable on-chip clock transmission [8]. The high bandwidth
available at the chip outputs is utilized (determined by the external pull-up resistor and load capacitance) [8] to produce the
1.62-GHz clock as shown in Fig. 10. The AND function of the
clock generation is performed in the chip core and the analog OR
function is performed in the I/O ring. External pull-up resistors
set the output swing and match the output impedance to that of
the test equipment. Damping resistors are included to avoid any
oscillations resulting from the combination of the lead and pin
inductance and load capacitance. This removes the necessity to
double bond these high-frequency outputs.
Fig. 11.
Fig. 12.
FOLEY AND FLYNN: CMOS DLL-BASED CLOCK SYNTHESIZER AND TEMPERATURE-COMPENSATED TUNABLE OSCILLATOR
Fig. 13.
Fig. 14.
421
Fig. 15.
Fig. 16.
Fig. 17.
Fig. 18.
= 1:8 V.
composed of the same delay stages as the VCDL and its temperature (and process) variations will therefore be the same (apart
from some minor random mismatch effects and thermal gradients across the die). vcntl thus compensates for the VCDL and
VCO temperature fluctuations. The last VCO stage has an additional tuning voltage, tune, which fine tunes the VCO frequency.
By varying the tune voltage it is possible to tune the VCO center
frequency to within 3%. A wider tuning range can be achieved
by varying the frequency of the DLL reference clock, ckref.
The schematic of the last VCO stage is shown in Fig. 12. This
stage is identical to the other VCO and VCDL stages except that
the VCR contains a transistor which is connected to the external
tune voltage. In all other stages this transistor is connected to
ground. The extra charging current required in this VCO stage
.
is provided by the controlled current source bias
V. CIRCUIT LAYOUT
The synthesizer and temperature-compensated tunable
oscillator were fabricated on a standard 0.5- m triple-metal
single-poly digital CMOS process. The die photomicrograph
of the device, containing both the synthesizer and temperature-compensated tunable oscillator, is shown in Fig. 13. The
synthesizer has an active area of 0.6 mm and the temper-
422
TABLE I
MEASURED SYNTHESIZER CHARACTERISTICS
the tune voltage. As can be seen from the plot, the relationship
is close to linear. It is possible to tune the frequency around a
center frequency in the range from 200 to 500 MHz by selecting
an appropriate input reference frequency. This ensures that this
scheme can be used for a wide variety of applications. The measured jitter on the 400-MHz output was 29 ps rms and 180 ps
peak-to-peak. Table I shows the measured synthesizer characteristics. Table II summarizes the measured characteristics of
the temperature-compensated tunable oscillator.
VII. CONCLUSION
In this paper, a robust self-correcting low-jitter DLL was used
as the basis for a low-voltage high-frequency synthesizer and a
temperature-compensated tunable oscillator. The DLL does not
require the VCDL control voltage to be set on power-up. The
DLL can recover from missing reference clock pulses and it
can track step changes in a variable reference clock frequency.
The synthesizer has significantly lower edge jitter than the traditional PLL-type synthesizer [9] and other reported DLL circuits
[10], [11]. The temperature-compensated tunable oscillator provides a temperature-stable tunable frequency that varies by just
1.8% over the 0 C to 85 C temperature range.
TABLE II
MEASURED TUNABLE OSCILLATOR CHARACTERISTICS
ACKNOWLEDGMENT
The authors wish to acknowledge contributions from the
following Parthus Technologies employees: J. Ryan, J. Horan,
C. Cahill, F. Fuster, J. Collins, B. Kinsella, M. Erett, and
S. Murphy. The authors also wish to thank R. Fitzgerald from
the NMRC for the die photo micrographs. The device was fabricated on the ESM (Newport) Wafer Fab through Europractice.
REFERENCES
hibits better jitter performance than that reported for the higher
voltage DLLs (3.3-V supply, 0.35- m CMOS, 4-ps rms jitter) in
[2] and (5-V supply, 0.7- m CMOS, 10-ps rms jitter) in [3]. The
measured jitter (rms) variation versus synthesizer output frequency for a 3.3-V supply is shown in Fig. 15. With the supply
reduced to 1.8 V, the rms jitter was measured at 4.9 ps for an
output frequency of 720 MHz. Fig. 16 shows this 720-MHz synthesizer output. Mismatched propagation delays and interblock
routing in the frequency multiplication block (Fig. 9) resulted
in 100-ps interperiod jitter.
Fig. 17 shows the temperature-compensated tunable oscillator frequency variation with temperature. Varying the ambient
temperature from 0 C to 85 C resulted in a total frequency
variation of 1.8%. Fig. 18 shows the variation of frequency with
[1] B. Kim, T. C. Weingandt, and P. R. Gray, PLL/DLL system noise analysis for low-jitter clock synthesizer design, in Proc. ISCAS, June 1994,
pp. 151154.
[2] Y. Moon, J. Choi, K. Lee, D. Jeong, and M. Kim, An all-analog multiphase delay-locked loop using a replica delay line for wide-range operation and low jitter, IEEE J. Solid-State Circuits, vol. 35, pp. 377384,
Mar. 2000.
[3] M. Mota and J. Christiansen, A high-resolution time interpolator based
on a delay-locked loop and an RC delay line, IEEE J. Solid-State Circuits, vol. 34, pp. 13601366, Oct. 1999.
[4] D. Foley and M. Flynn, CMOS DLL-based 2-V 3.2-ps jitter 1-GHz
clock synthesizer and temperature compensated tunable oscillator, in
Proc. IEEE Custom Integrated Circuits Conf., May 2000, pp. 371374.
[5] G. Chien and P. R. Gray, A 900-MHz local oscillator using a DLL-based
frequency multiplier technique for PCS applications, in ISSCC Dig.
Tech. Papers, Feb. 2000, pp. 202203.
[6] H. Chen, E. Lee, and R. Geiger, A 2-GHz VCO with process and temperature compensation, in Proc. ISCAS, June 1999, pp. 11 56911 572.
[7] A. Young, J. K. Greason, and K. L. Wong, A PLL clock generator with
5 to 110 MHz of lock range for microprocessors, IEEE J. Solid-State
Circuits, vol. SC-27, pp. 15991607, Nov. 1992.
[8] M. Horowitz, C.-K. K. Yang, and S. Sidiropoulos, High-speed electrical
signaling: Overview and limitations, IEEE Micro., vol. 18, pp. 1224,
Jan./Feb. 1998.
[9] H. C. Yang, L. K. Lee, and R. S. Co, A low-jitter 0.3-165 MHz CMOS
PLL synthesizer for 3-V/5-V operation, IEEE J. Solid-State Circuits,
vol. 32, pp. 582586, Apr. 1997.
[10] J. G. Maneatis, Low-jitter process-independent DLL and PLL based
on self-biased techniques, IEEE J. Solid-State Circuits, vol. 31, pp.
17231732, Nov. 1996.
[11] S. Sidiropoulos and M. A. Horowitz, A semidigital dual delay-locked
loop, IEEE J. Solid-State Circuits, vol. 32, pp. 16831692, Nov. 1997.
FOLEY AND FLYNN: CMOS DLL-BASED CLOCK SYNTHESIZER AND TEMPERATURE-COMPENSATED TUNABLE OSCILLATOR
423
4.1
Fuji Yang, Jay ONeill, Patrik Larsson, Dave Inglis, Joe Othmer
Agere Systems, Holmdel, NJ
2.5-3.125Gb/s serial links are commonly used for chip-to-chip
interconnects in high-speed network systems. In SONET OC-768
application, at least 16 on-chip SerDes transceivers are required
to guarantee total full duplex I/O throughput of 40Gb/s.
Published 2.5Gb/s SerDes transceivers consume between 150
and 200mW, not suitable for applications requiring hundreds of
on-chip SerDes transceivers [1]. Developing a low-power SerDes
transceiver is important for high throughput network ICs [2].
Another challenge is reduction of inter-channel noise coupling
when integrating many transceivers on the same chip. This lowpower 8-channel SerDes macrocell employs a shared-PLL architecture. As shown in Figure 4.1.1, on the transmitter side, the
on-chip TxPLL provides a half-rate clock to all transmitters. On
the receiver side, the RxPLL distributes I- and Q-phase clocks to
8 receivers. Each receiver has a phase interpolator to generate
an output phase-aligned with the in-coming data for clock and
data recovery. Sharing a single PLL between a group of transmitters or receivers reduces the power and avoids the potential
multi-VCO coupling problem found in a conventional one-PLLper-channel configuration. The macrocell realized in a 0.16m
CMOS process consumes an average power of 86mW per channel
at 1.5V power supply.
The transmitter 16:1 or 20:1 serialization starts with 4 shift-register based selectable 4:1 or 5:1 multiplexers. Their 4 outputs are
sent to a tree-based 4:1 multiplexer (Figure 4.1.1). A pMOS CML
output driver with on-chip 50 terminations is employed. The
output signal referenced to the ground makes the interface independent of the power supply. The output amplitude is set to
1Vpp, diff.
The receiver employs an interleaved integrate-and-dump frontend (Figure 4.1.1) [3, 4]. The integrate-and-dump operation
improves the SNR and eliminates the quadrature clock required
in a conventional half-rate front-end [5]. The integrator outputs
are de-multiplexed by the decision-latches controlled respectively by ck2i and ck2q, which are divide-by-2 clocks of the recovered
clock. The decision-latch outputs d1-d4 are fed into 4 shift-registers to realize the 4:16 or 4:20 de-serialization. The integrator is
implemented in a way similar way to that proposed in Reference
[4], but with a pMOS input stage. It has a gain of 2 allowing
relaxed offset and noise requirements of the latches. The receiver achieves 30mVpp,diff sensitivity with BER <10-12 at 2.5Gb/s.
The clock recovery is by a DLL based on an analog phase interpolator [6]. In contrast to the implementation in Reference [6], a
four-quadrant phase mixer is used here. Referring to Figure
4.1.2, the DLL consists of a bang-bang phase detector (PD), a PD
polarity control circuit, an amplitude control circuit, I- and
Q-charge-pumps and the four-quadrant mixer-based phase
interpolator. The analog phase interpolation is by mixing the Iand Q-phase clocks from the RxPLL with respective weights
(=Va-Vref) and (=Vb-Vref): CLK=*(I-IB)+*(Q-QB). Va and Vb are
independently generated by I- and Q-charge-pumps. The weights
and , ranging from negative to positive, directly control the
quadrant changes. This eliminates the potential phase discontinuity at quadrant crossings found in the circuit of Reference [6].
Figure 4.1.3 shows the schematic of one 4-quadrant mixer, where
0-7803-7335-9
2002 IEEE
9D
5[3//
7[ 3//
5;
FK
LQWHUSFNJHQ
3'
FN
FNL
LQW
GULYHU
FNL
'
'
LQW
'
G
G
G
G
FNT
FNT
,QSXW
GDWD
&/.
'
FORFNJHQ
FK
FK
FK
GG
XS
GQ
$PSOLWXGHFRQWURO
FK
FK
FK
XSL
3'SRODULW\FRQWURO
FK
9UHI
4GHWHFW
3KDVHGHWHFWRU
7;
&3L
GQL
,%
9D
9UHI
XST
9E
&3T
GQT
9PD[
'
$GHWHFW
4% 4
9PLQ
9D
9E
,S
,Q
9D
2XWSXWYHFWRU
,%
9UHI
9RS
03
03
9D9E
,
9UHI
9RQ
,,
,,,
,9
9PD[
(GHJU.)
9PLQ
FOLSLQJ
&RPPRQORDGFLUFXLW
Figure 4.1.3: Four-quadrant mixer schematic.
Figure 4.1.4: Relation between the phase shift and the weights Va and Vb.
5[3//
5;
7;
7[ 3//
Technology:
Supply Voltage
1.5V
Power dissipation
Active area
BER
Receiver Sensitivity
30mVpp,diff
87.1psPkPk
400ppm at 2.5Gb/s
622-3125Mb/s
TxoutputVDD sensitivity
0.06ps/mV
Output Amplitude
1Vpp,diff
0-7803-7335-9
2002 IEEE
SVGLY
0-7803-7335-9
2002 IEEE
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998
1987
I. INTRODUCTION
ITH the fast growth of the wireless application market, there is a growing need for smaller designs and
higher levels of integration for the reduction of costs and
size. Because of the very poor performance of integrated
resonators on silicon ICs, local RF oscillators are difficult
to integrate with regard to the phase noise requirements.
At the moment, external resonators are used with external
hyperabrupt tuning diodes. The integrated tuning diodes have
low performance because it is not possible to produce a
hyperabrupt pn-junction in our standard bipolar process. The
limiting factor is the inductor of the resonator, which can only
archive quality factors of about four. While the performance
of mobile telecommunication standards like global system
for mobile communication (GSM) or digital communication
system (DSC-1800) requires such high noise performance,
which cannot be reached in our technology with the use
of full integration of the local oscillator, the requirements
for cordless phones like for the Digital European Cordless
Telecommunications (DECT) standard seem to be achievable.
In a DECT system, the critical point concerning oscillator
phase noise is the emissions due to modulation. There the
emitted power at the output in the third adjacent channel is
specified to be smaller than 20 nW, which equals 47 dBm
[1]. With a maximum output power of 25 dBm, there is
a difference of 72 dB. The specified measurement filter to
get this power has a bandwidth of 1 MHz. So the noise
requirement at this point becomes 132 dB/Hz, which results
dBm
from:
dBm
dB Hz
dB Hz . This is the difference
between the noise level and the output-power normalized to
1 Hz. The frequency offset for this specification is the start
frequency of the measurement filter. As this is in the third
Manuscript received April 10, 1998; revised July, 17, 1998.
M. Zannoth, B. Kolb, and J. Fenk are with Siemens AG, Muenchen D81541, Germany.
R. Weigel is with the University of Linz, Institute for Communication and
Information Engineering, Linz A-4040 Austria.
Publisher Item Identifier S 0018-9200(98)08854-4.
MHz
MHz
WITH
COUPLED INDUCTORS
1988
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998
1989
III. RESULTS
To get the signal into a 50- measurement system a buffer
is added (see Fig. 9). As the signal is taken directly from the
1990
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 33, NO. 12, DECEMBER 1998
SUMMARY
Supply voltage
12 mA
Current of amplifier
6 mA
Output power
08 dBm
Center frequency
1.96 GHz
055 MHz/V
150 MHz
0102 dBc/Hz
0136 dBc/Hz
Suppression of harmonics
23 dB
Fig. 11.
2.7 V
Fig. 10.
TABLE I
VCO CHARACTERISTICS
OF THE
2 215 m
600 m 2 600 m
215 m
[5]
we get a value of 135 dB/Hz at an offset frequency of
4.7 MHz. The effective series resistance is about 16 .
It is taken from the series resistances of the inductor, the
tuning diodes, and the coupling capacitances. The amplitude
(3
)
of the oscillation was simulated to be 1.5
at a resonance frequency of 2 GHz. The noise factor of the
amplifier was supposed to be about two. The expressions from
[3] give a approximation for the expected noise. For better
calculation, nonlinear effects have to be considered. In [12],
such calculations as the correlation between waveform and
phase noise are shown.
The simulation with spectre RF shows nearly the same
values as the measured ones. In Fig. 11 the simulated and
measured noise are shown up to an offset frequency of 6 MHz.
In the simulation the equivalent circuits are taken from above.
The simulator uses nonlinear methods [13], [14] to calculate
noise. It gives nearly the same results as the measurement and
the small signal approximation from above.
The measurement shows that the free running oscillator has
a resonant frequency of 1.96 GHz, a phase noise of 136
1991
IV. CONCLUSION
A fully integrated bipolar VCO is realized (see Fig. 12) that
achieves a measured phase noise of 136 dB/Hz at 4.7 MHz.
The oscillator has a linear tuning characteristic with a tuning
range of 150 MHz at a center frequency of 1.96 GHz. Further
characteristics are given in Table I.
In this design two metal layers are used to build vertically
coupled integrated inductors. These have quality factors of
about four. Integrated varactor diodes are implemented by
using base-emitter diodes of transistors. With this design the
noise requirements of the DECT-specification of 132 dB/Hz
at 4.7 MHz frequency offset are achieved with a margin of
4 dB. The output power is 8 dBm at 50 , with a center
frequency of 1.95 GHz. For the use of this oscillator in a DECT
product, the varactor-capacitance will be increased until the
required center frequency of 1.88 GHz is reached. The design
has been realized in standard high-volume bipolar process with
of 25 GHz.
an
REFERENCES
[1] ETSI, Digital European Cordless Telecommunications (DECT) Common
Interface, Part 2: Physical Layer, Oct. 1992.
[2] L. L. Larson, RF and Microwave Circuit Design for Wireless Communications. Boston: Artech House, 1996.
[3] B. D. Leeson, A simple model of feedback oscillator noise spectrum,
Proc. Lett. IEEE, pp. 329330, Feb. 1966.
[4] G. Sauvage, Phase noise in oscillators: A mathematical analysis of
Leesons model, IEEE Trans. Instrum. Meas., vol. IM-26, pp. 408410,
Dec. 1977.
[5] J. Craninckx and M. S. J. Steyaert, A 1.8-GHz low-phase-noise CMOS
VCO using optimized hollow spiral inductors, IEEE J. Solid-State
Circuits, vol. 32, pp. 736744, May 1997.
[6] G. Palmisano, M. Paparo, F. Torrisi, and P. Vita, Noise in fully
integrated PLLs, in Proc. 6th Workshop Advances in Analog Circuit
Design AACD97, Como, Italy, pp. 119.
[7] J. Crols, P. Kinget, J. Craninckx, and M. Steyaert, An analytical model
of planar inductors on lowly doped silicon substrates for high frequency
analog design up to 3 GHz, in IEEE Symp. VLSI Circuit Dig. Tech.
Papers, 1996, pp. 2829.
[8] J. N. Burghartz, M. Soyuer, and K. A. Jenkins, Microwave inductors
and capacitors in standard multilevel interconnect silicon technology,
IEEE Trans. Microwave Theory Tech., vol. 44, pp. 100104, Jan. 1996.
[9] L. Dauphinee, M. Copeland, and P. Schvan, A balanced 1.5 GHz
voltage controlled oscillator with an integrated LC resonator, in Proc.
ISSCC97, Session 23, Analog Techniques, pp. 390391.
[10] I. B. Jansen, K. Negus, and D. Lee, Silicon bipolar VCO family for
1.1 to 2.2 GHz with fully-integrated tank and tuning circuits, in Proc.
ISSCC97, Session 23, Analog Techniques, p. 392.
[11] B. Razavi, A 1.8 GHz CMOS voltageControlled oscillator, in Proc.
ISSCC97, Session 23, Analog Techniques, pp. 388389.
[12] K. A. Hajimiriand and T. H. Lee, A general theory of phase noise in
electrical oscillators, IEEE J. Solid-State Circuits, vol. 33, pp. 179194,
Feb. 1998.
[13] CADENCE, Oscillator Noise Analysis in SpectreRF, application note to
SpectreRF, 1998.
[14] F. X Kartner, Untersuchung des Rauschverhaltens von Oszillatoren,
Ph. D. dissertation, Tech. Univ. Munich, Munich, Germany, 1988.
295
I. INTRODUCTION
II. CIRCUIT
The transistor schematic of the ncPFD is shown in Fig. 3(a).
The detector has a 0-rad phase offset. The main part of the
circuit is the nc stage [4]. Delays (two inverters) are inserted
at the reference and slave inputs in order to remove the dead
zone in the phase characteristics around rad phase error. In
Fig. 4, waveforms for the circuit in Fig. 3(a) are shown when
the slave input lags the reference input.
Manuscript received March 11, 1997; revised August 21, 1997.
The author is with Electronic Devices, Department of Physics and Measurement Technology, Linkoping University, S-58183 Linkoping, Sweden.
Publisher Item Identifier S 0018-9200(98)00732-X.
(a)
(b)
Fig. 3. (a) The ncPFD in zero degree phase offset version. (b) Modified
version with rad phase offset.
296
Fig. 4. Waveforms for the case when slave lags after the reference signal.
The pulse width of the up signal is larger than for the down signal.
Fig. 5. Phase characteristics of the ncPFD (solid line), conPFD (dashed line),
and the ptPFD (dash-dot line) from SPICE level-2 simulations of extracted
layout, VDD = 3:0 V and f = 50 MHz.
Fig. 7. Phase characteristics for three cases with different duty cycles. The
reference input duty cycle is 50% for all cases and the slave input has the
duty-cycles 45%, 50%, and 55% for the dashed, solid, and dasheddotted
lines, respectively.
297
Fig. 10. Waveforms for the case when the slave has a higher frequency than
the reference signal. The down signal has higher duty cycle than the up signal.
Fig. 8. The width of the dead zones of the ncPFD (solid), ptPFD (dashed),
and conventional PFD (dash-dot) as function of frequency. The frequency
resolution is 100 MHz and the supply voltage is 5.0 V. The plot is based on
SPICE simulations of extracted layout.
Fig. 11. Frequency sensitivity for the ncPFD (solid), ptPFD (dash-dot),
and conPFD (dashed). The plot is based on behavioral simulations with
20 different initial phases for each frequency and the mean-value for each
frequency is plotted. The reference frequency is 50 MHz.
298
Fig. 12. Frequency sensitivity for the ncPFD for a number of frequencies.
The plot is based on behavioral simulations with 20 different initial phases
for each frequency. The solid line is the mean value and the symbols are
the minimum and maximum values. The reference frequency is 50 MHz.
Fig. 14. Lock-in process of a third-order PLL with the ncPFD as phase
frequency detector. The loop filter and PLL data are shown in the upper right
corner.
Fig. 13. Frequency sensitivity for the ncPFD when the slave frequency is
4/5 of the reference frequency. For the initial phases of 0.0, 2.5, and 5.0 ns
the sensitivity is zero.
this false locking will not be stable, since a small phase change
results in a nonzero sensitivity and drives the loop back to lock.
One way to add small phase changes to the simulation is to
include phase noise which is always present in an oscillator.
When we add phase noise of approximately 300 ps peak-topeak to the simulations, the normalized minimum sensitivity
which was zero will increase to approximately 0.01. The
improvement is not significant but the sensitivity will be
nonzero and positive for all phases. Hence, false locking is
avoided. To further enhance the phase noise during the lock in
process, one could use dithering techniques, i.e., add the signal
from a noise/signal source to the control voltage of the VCO.
V. BEHAVIORAL MIXED-MODE SIMULATIONS
In order to understand the sensitivity to frequency errors
and lock-in properties of the proposed detector, a complete
third-order charge pump PLL system was simulated using a
multilevel mixed-mode simulator, Lsim [5]. The PFD was
represented by a schematic simulated in switch mode. The
VCO, phase-noise generator, and charge pump are represented
by behavioral models written in the hardware description
VI. EXPERIMENTS
The phase detection properties of the ncPFD have been
verified experimentally with a test chip. The test chip is a line
receiver for serial data that utilizes several parallel samplers
to receive bit rates of 2.0 Gb/s [7]. The phase detector was
used in a delay-locked loop (DLL) which generates control
signals for the sampling switches used in the line receiver.
The ncPFD, Fig. 3(a), was used as a -rad phase detector and
the delay line was half a wavelength long.
The skew between the reference and slave signals is not
possible to measure directly. This quantity has been measured
indirectly through measurement error compensation circuits to
be about 125 ps at
MHz. Unfortunately, there is no
control of how large the measurement error is.
The circuit blocks used to measure the offset are shown in
Fig. 15. The two clocks that we want to compare come from
the beginning and the end of the delay line. They are fed into
two matched inverter chains where the propagation delay for
rising and falling edges are matched against process variations
[8]. The delay from the multiplexer inputs to the oscilloscope
screen for the two signal paths are not matched. Two measurements are done to compensate this. One where the delay
line input signal goes uninverted through Output buffer 1 and
one where the same signal goes inverted through the Output
buffer 2. The measured skew including the measurement error
299
Fig. 15. DLL, phase offset measurement circuitry, and NMOS transistor to
access the control voltage.
Fig. 16. Oscilloscope screen dump of the drain voltage of an NMOS
transistor with external pull-up resistor where the gate is connected to the
control voltage. Four different lock-in procedures are shown. The initial
control voltages are 0.0, 1.0, 2.0, and 3.0 V for the curves from top to
bottom, respectively.
inv
inv
inv
inv
mux
mux
mux
mux
Buf
Buf
Buf
Buf
(1)
VII. CONCLUSIONS
(2)
where
is the real skew and inv
and inv
are the
delays through the four inverters long chains for falling and
rising edges through the left and right chain, respectively.
Similarly, inv
and inv
are for the five inverters long
chains. And mux
and mux
are the delays through the
multiplexers. The Buf
and Buf
are the delays through
the output-buffers and through the oscilloscope input-channels.
The sum of the skews (1) and (2) is
skew
skew
inv
inv
inv
inv
(3)
1654
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001
AbstractRotary traveling-wave oscillators (RTWOs) represent a new transmission-line approach to gigahertz-rate clock
generation. Using the inherently stable LC characteristics of
on-chip VLSI interconnect, the clock distribution network becomes a low-impedance distributed oscillator. The RTWO operates
by creating a rotating traveling wave within a closed-loop differential transmission line. Distributed CMOS inverters serve as both
transmission-line amplifiers and latches to power the oscillation
and ensure rotational lock. Load capacitance is absorbed into the
transmission-line constants whereby energy is recirculated giving
an adiabatic quality. Unusually for an LC oscillator, multiphase
(360 ) square waves are produced directly. RTWO structures
are compact and can be wired together to form rotary oscillator
arrays (ROAs) to distribute a phase-locked clock over a large chip.
The principle is scalable to very high clock frequencies. Issues
related to interconnect and field coupling dominate the design
process for RTWOs. Taking precautions to avoid unwanted signal
couplings, the rise and fall times of 20 ps, suggested by simulation,
may be realized at low power consumption. Experimental results
of the 0.25- m CMOS test chip with 950-MHz and 3.4-GHz rings
are presented, indicating 5.5-ps jitter and 34-dB power supply
rejection ratio (PSRR). Design errors in the test chip precluded
meaningful rise and fall time measurements.
Index TermsClocks, MOSFET oscillators, phase-locked oscillators, phased arrays, synchronization, timing circuits, transmission line resonators, traveling-wave amplifiers.
I. INTRODUCTION
The basic ROA architecture is shown in Fig. 1. A representative multigigahertz rotary clock layout has 25 interconnected
RTWO rings placed onto a 7 7 array grid. Each ring consists
of a differential line driven by shunt-connected antiparallel inverters distributed around the ring. This arrangement produces
a single clock edge in each ring which sweeps around the ring
at a frequency dependent on the electrical length of the ring.
Pulses are synchronized between rings by hard wiring which
forces phase lock.
Fig. 2 illustrates the theory behind the individual RTWO.
Fig. 2(a) depicts an open loop of differential transmission line
(exhibiting LC characteristics) connected to a battery through
an ideal switch. When the switch is closed, a voltage wave begins to travel counterclockwise around the loop. Fig. 2(b) shows
a similar loop, with the voltage source replaced by a cross-connection of the inner and outer conductors to cause a signal inversion. If there were no losses, a wave could travel on this ring
indefinitely, providing a full clock cycle every other rotation of
the ring (the Mbius effect).
In real applications, multiple antiparallel inverter pairs are
added to the line to overcome losses and give rotation lock.
Rings are simple closed loops and oscillation occurs spontaneously upon any noise event. Unbiased, startup can occur in
1655
Fig. 3. Waveforms of line voltage and line current for the 3.4-GHz clock
simulation example.
B. Waveforms
Fig. 3 shows simulated waveforms of a 3.4-GHz RTWO taken
at an arbitrary position on the ring. The design has the following
characteristics for reference:
Fig. 1.
phase.
Fig. 2. Idealized theory underlying the RTWO. (a) Open loop of differential
conductors to a battery via a switch. (b) Similar loop but with the voltage source
replaced by the inner and outer conductors cross-connected.
m
Conductors: Width
m
Pitch
m
Ring Length
Metallization: 1.75 m copper
nH
Loop inductance total
Process: 0.25- m CMOS
Nch total width: 2000 m
Pch total width: 5000 m
Number of inverters: 24 pairs.
Very large distributed transistor widths give substantial capacitive loading to the lines, thus lowering velocity to give a
reasonably low clock rate from a compact oscillator structure.
In application, up to 75% of this capacitance can come from
load capacitance, reducing the size of the drive transistors accordingly.
The upper traces of Fig. 3 show the simulated voltage waveforms on the differential line at points labeled A0, B0. The lower
traces show the current in the conductors to be 200 mA, while
the supply current is simulated at 84 mA with 4.5 mA of
ripple. This clearly illustrates that energy is recycled by the basic
operation of the RTWO. Just driving the 34 pF of capacitance
).
present would require 275 mA at this frequency (from
C. Phase Locking
Interconnected rings, as in Fig. 1(a), will run in lockstep, ensuring that the relative phase at all points of an ROA are known.
It is possible to use a large array of interconnected rings to distribute a clock signal over a large die area with low clock skew.
For example, referring to Fig. 1(a), all the points marked with
have the same relative phase as that arthe equals sign
bitrarily marked as 0 . At any point along the loop, the two
signal conductors have waveforms 180 out of phase (two-phase
1656
nonoverlapping clock). A full 360 is measured along the complete closed path of the loop. In principle, an arbitrary number of
clock phases can be extracted. Phase advances or retards depend
on the direction of rotation, and Fig. 4 shows the currentvoltage
relationships for clockwise and counterclockwise rotation.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001
D. Network Rules
Although the square-ring shape is convenient to show diagrammatically, it is only one example of a more general network solution which requires ROAs to conform closely to the
following rules.
1) Signal inversion must occur on all (or most) closed paths.
2) Impedance should match at all junctions.
3) Signals should arrive simultaneously at junctions.
From 1) above, any odd number of crossovers are allowed on
the differential path and regular crossovers forming a braided
or twisted pair effect can dramatically reduce the unwanted
coupling to wires running alongside the differential line.
The differential lines would typically be fabricated on the top
metal layer of a CMOS chip where the reverse-scaling trend of
VLSI interconnect offers increasingly high performance [10].
E. Fields and Currents
Fig. 5 illustrates a three-dimensional section of the ring structure connected to a pair of CMOS inverters expanded to show
the four individual transistors. The main current flow in the differential conductors is shown by solid arrows, the magnetic field
surrounding these conductors by dashed loops, and the capacitance charge/signal-boost current flowing through the transistors by dashed lines.
An important feature of differential lines is the existence of a
well-defined go and return path which gives predictable inductance characteristics in contrast to the uncertain return-current path for single-ended clock distribution [11].
Capacitance arises mainly from the transistor gate and depletion capacitance and interconnect capacitance does not dominate.
indicates intrinsic gate resistance, i.e., the ohmic path
implies a
through which the gate charge flows. The term
parasitic gate term, but in reality, most of this resistance is in
the series circuit of the channel under the gate electrode. This is
shared by the D-S channel, as illustrated by the triangular region
(shown with transistors operating in the pinchoff region).
1657
sponding to the two latched states. Except at the wavefront location where amplification takes place, the ring structures will
be terminated ohmically to the supplies.
The four-transistor full-bridge circuit minimizes supply
current ripple to the cross-conduction period.
G. Frequency and Impedance Relations
In simulation models (and indeed as fabricated), the RTWO
transmission line is built up from multiple RLC segments, and
therefore, these primary line constants must be identified.
Fig. 8(a) is the basic RF macromodel of a short length
(SegLen) of RTWO line with all significant RF components
and parasitics annotated (as per Fig. 5). Suffixes identify
per-unit-length perlen, lumped lump and total (or loop) values.
segments connected together, plus a crossover,
There are
to produce a closed ring of length RingLen.
Fig. 8(b) is a capacitive equivalent circuit for the transistor
and load capacitances. AC0 indicates an ac ground point (
and
).
of one such segThe differential lumped capacitance
ment is given approximately by
(b)
Fig. 8. Development of the rotary clock model. (a) Complete RF circuit.
(b) Capacitance circuit.
where
conductor separation;
conductor width;
conductor thickness.
The phase velocity is given by
where
(3)
For heavily loaded RTWO structures, can be as low as 0.03
m/s).
of (where is the free space velocity, i.e.,
The clock frequency is given approximately by
RingLen
(1)
where
interconnect capacitance for the line AB;
gate overlap and Miller-effect feedback capacitance;
total channel capacitance;
drain depletion capacitance to bulk (substrate);
load capacitance added to a line.
is used to convert the in-parallel to ground
(Note that the
values into in-series differential values of capacitance.)
is usually a small part of total capacitance and accurate formulas are available [12] if needed.
To calculate the per-unit-length differential inductance, i.e.,
accounting for mutual coupling, we use [13], expressed below.
(2)
SegLen
(4)
1658
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001
TABLE I
CHANGES OF CHARACTERISTICS WITH
(a)
(b)
(c)
Fig. 9. A four-port junction of two RTWO rings carrying anticlockwise signals, with a noncoincident signal arrival time.
Most of the remaining losses in Table I are attributed to crossconduction and parasitic
losses.
is a real loss mechanism
for gigahertz signals, and RTWO rise/fall times can be doubled
improves
by this phenomenon. In newer CMOS processes,
with shorter channel length.
(7)
and when the intrinsic gate resistance is low relative to the reactance of the gate capacitance.
(8)
RTWO rise and fall times are controllable by setting the cutoff
frequency of the transmission lines.
(9)
Edges become faster and cross-conduction losses are reduced
when the structure is more distributed.
, where
Table I lists characteristic changes with
with
, and
held
constant.
The most significant power loss mechanism for the RTWO is
power dissipated in the interconnect, given by
(10)
1659
Fig. 11. Segment of chip layout showing 90 routing beneath clock lines and
a tap to clock (CLK: CLK) loads.
Fig. 10.
1660
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001
Fig. 12.
(11)
D. Frequency/Impedance Adjustment
When the above condition is met, the capacitance can be taken
as being effectively lumped on the main RTWO ring at the tap
point for the purposes of predicting oscillator frequency and ring
impedance.
Although not immediately apparent, this condition is achievable in practice due to three factors. The first factor is that the
tap line velocity is relatively fast for SiO dielectric. It is ap, while the main RTWO oscillator ring might
proximately
. The second factor is that the
be operating at perhaps
tap length only has to be long enough to reach within a single
RTWO ring. The third factor is that it requires two signal rotations on the RTWO to complete a clock cycle. These three factors work together to make the RTWO rings physically small
compared to the expected speed-of-light dimensions. The distances to be spanned by the fast tap wires are therefore short
enough that transmission-line effects on these lines are unimportantcertainly at the clock fundamental frequency and even
at higher harmonics.
This can be illustrated by reference to a specific 3.4-GHz
RTWO, 3200 m long with 20-ps rise/fall times. Within one of
these rise or fall periods, a stub transmission line with velocity
is able to communicate a signal over a distance of 3 mm.
For a stub length of 400 m (to reach the center of the ring), this
equates to 3.75 round-trip times along the stub.
Fig. 12 shows simulated waveforms with 2 pF of total
to-ground capacitance at the end of one such stub. Reflected
energy gives rise to the ringing which is evident with this level
of capacitance. The line resistance of the stubs must be low to
maintain reflective energy conservation.
The ratiometric factors outlined above between ring length,
frequency, rise/fall time, and stub lengths are expected to hold as
ROAs are scaled to higher frequencies and smaller ring lengths
without requiring special stub tuning measures.
Capacitive Loading Limits: Substantial total-chip capacitive
loading can be tolerated by the RTWO relative to conventionally
resonant systems [8], [22], [23]. However, the loading effects of
interconnect, active, and stub capacitances cannot be increased
without limit. The consequential lowering of line impedance inlosses become a concern.
creases circulating currents until
Eventually, the impedance becomes so low relative to the loop
resistance that the relation (6) cannot be maintained, whereupon
oscillation ceases altogether.
1661
Fig. 13. Differential line with varing trace separations and capacitive inverter
loadings indicating the effects of altering several parameters.
Matching within a die should be better, but temperature gradients and transistor size variations as they affect capacitance will
lead to phase velocity changes requiring correction by the Skew
Control mechanism (described in Section III-A).
Temperature can alter frequency through variation of
and
. Inductance variation is assumed to be negligible
compared to capacitance variation and is not considered. Gate, but for
oxide thickness variation could potentially affect
SiO dielectric, with properties similar to quartz, this can be ignored. More significant are temperature variations of drain de.
pletion capacitance and of transistor
To tune an ROA clock to an exact reference frequency, allowing limited speed-binning and reduced internal phase mismatches, closed-loop control of distributed switched capacitors
[9] or varactors [25] is envisaged.
E. Active Compensation for Interconnect Losses
Resistive interconnect losses make it difficult to communicate high-frequency clock signals over a large chip without
waveshape distortion and attenuation, which impacts on the
practicality of reflective energy conservation schemes [6], [22],
[23]. The skin effect loss mechanism has been evident in clock
tree conductors for some time [26] and is frequency dependent.
High-speed H-trees tend to use hierarchical buffers within the
trees to maintain amplitude and edge rates.
Active compensation of VLSI differential transmission
lines to overcome clock attenuation was shown by Bumann
and Langmann [27] to be applicable to sine-wave signals.
Shunt-connected negative impedance convertors (NICs) were
used with linear compensation to prevent oscillations.
The distributed inverters used within RTWOs afford active
compensation for transmission-line losses, raising the apparent
of the resonant rings and helping to maintain a uniformly high
clock amplitude around the structure.
F. Logic Styles
Two-phase latched logic [28] is the style most compatible
with RTWO. It is highly skew tolerant and through dataflowaware placement [27] offers the potential to exploit the full 360
of clock phase to reduce clock-related surging [29], which in
future systems could exceed 500 A [30]. Conventional singlephase D-latch designs can be driven where timing improvements through skew scheduling [31] might be possible. A locally four-phase system to support domino logic [2] could be
implemented by wrapping two loops of RTWO line around the
region to clock. Unfortunately, all of these techniques are beyond the capability of current logic synthesis tools.
1662
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001
TABLE II
VARIATIONS WITH TEMPERATURE
TABLE III
VARIATIONS WITH DC SUPPLY VOLTAGE V
TABLE IV
INDUCED NOISE AS A FUNCTION OF VICTIM DISTANCE AND LENGTH
Fig. 14.
trace and one end connected to ground. Note the more sensitive
noise scale.
The absolute maximum coupling occurs if victim distance is
allowed to go to zero. In this case, mutual coupling between aggressor and victim is 100% with no cancellation effects from the
other differential trace. As a numerical example, it follows that
a 2.5-V signal with a rise time of 20 ps on a transmission line
has the 2.5-V gradient over 430 m of
with a velocity of
length (Fig. 4 illustrates the concept). Over the 60- m length
discussed above, this equates to 348 mV. Slower edge rates,
faster transmission lines, and lower supply voltages reduce this
figure proportionally.
Long-range inductive noise coupling from the differential
transmission line is expected to be small, since (from a distance)
the go and return currents are equal and opposite.
Potential problems exist in short-range magnetic coupling to
wiring in the vicinity of the clock lines. Inductance is lowered
Fig. 15.
1663
Fig. 16.
versus V
Fig. 19.
Fig. 17.
ring.
1664
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 11, NOVEMBER 2001
[29] L. Benni et al., Clock skew optimization for peak current reduction,
J. VLSI Signal Processing, vol. 16, pp. 117130, 1997.
[30] International Semiconductor Roadmap for Semiconductors (1999).
[Online]. Available: http://public.itrs.net/files/1999_SIA_Roadmap/Design.pdf
[31] I. S. Kourtev and E. G. Friedman, Timing Optimization Through Clock
Skew Scheduling. Boston, MA: Kluwer, 2000.
[32] MultiGig, Ltd. Rotary Explorer. [Online]. Available: http://www.
multigig.com/software.htm
[33] M. Kamon, M. J. Tsuk, and J. K. White, FASTHENRY: A multipole-accelerated 3-D inductance extraction program, IEEE Trans. Microwave
Theory Tech., vol. 429, pp. 17501758, Sept. 1994.
[34] BSIM Research Group. (20002001) The BSIM4 Short-Channel
Transistor Model. Univ. of California at Berkeley. [Online]. Available:
http://www-device.eecs.berkeley.edu/~bsim3/bsim4.html
1665
Steve Lipa (S00) received the B.S. degree in electrical engineering from the University of Virginia,
Charlottesville, in 1980, and the M.S. degree in
electrical engineering from North Carolina State
University, Raleigh, in 1993. He is currently working
toward the Ph.D. degree in electrical engineering at
North Carolina State University.
He is currently a Research Assistant and Laboratory Manager with the Microelectronics Systems
Laboratory at North Carolina State University. He
has ten years of experience as an Integrated Circuit
Design Engineer, primarily in the design of high-speed digital logic circuits.
His current research is in the area of high-speed clock distribution.
97
AbstractThe implementation of a dual-modulus prescaler (divide by 128/129) using an extension of the true-single-phase-clock
(TSPC) technique, the extended TSPC (E-TSPC), is presented.
The E-TSPC [1], [2] consists of a set of composition rules for
single-phase-clock circuits employing static, dynamic, latch, dataprecharged, and NMOS-like CMOS blocks. The composition
rules, as well as the CMOS blocks, are described and discussed.
The experimental results of the complete dual-modulus prescaler,
implemented in a 0.8 m CMOS process, show a maximum 1.59
GHz operation rate at 5 V with 12.8 mW power consumption.
They are compared with the results from other recent implementations showing that the proposed E-TSPC circuit can reach high
speed with both smaller area and lower power consumption.
Index Terms CMOS digital, high-speed circuits, prescalers,
single-phase-clock design.
I. INTRODUCTION
Manuscript received February 16, 1998; revised May 25, 1998. This work
was supported in part by FAPESP and CNPq, Brazil.
The authors are with the LSI/PEE, Escola Politecnica, University of
Sao Paulo, Sao Paulo, S.P. 05508-900 Brazil (e-mail: navarro@lsi.usp.br;
noije@lsi.usp.br).
Publisher Item Identifier S 0018-9200(99)00410-2.
effective channel length) is presented. The prescaler implementation purpose is the evaluation of the E-TSPC technique
potentialities.
This paper is organized as follow. In Section II, the principal
features of the E-TSPC technique, blocks and design rules,
are presented. In Section III, some different dual-modulus implementations are analyzed. Experimental results and comparisons are reported in Section IV, and the principal conclusions
are drawn in Section V.
II. E-TSPC CIRCUIT BLOCKS AND COMPOSITION RULES
A. Basic CMOS Blocks
An E-TSPC circuit should use any of the blocks: CMOS
static block, n-dynamic block [Fig. 1(a)], p-dynamic block
[Fig. 1(c)], n-latch block [Fig. 1(e)], p-latch block [Fig. 1(g)],
and high (PH) and low (PL) data-precharged blocks (Fig. 2).
In Fig. 1, the clocked transistors of the n- and p-latches are
placed close to the power rail, following the suggestion of [11].
This configuration can attain a higher speed but suffers chargesharing problems. Clocked transistors close to either the power
rail or the block output are admissible latch configurations.
In data-precharged blocks [10], some input signals, called
precharging inputs or pc-inputs, control the output precharge
(see Fig. 2). If all PH block pc-inputs are high, or if all PL
block pc-inputs are low, then the PH or PL block is precharged.
In this case, the PH block output goes to low, and the PL
block output to high. In Fig. 2, the CMOS static block that
is drawn, along
executes the logic function
with all equivalent PH and PL blocks [Fig. 2(b) and (c)]. The
pc-inputs of each block are also indicated. The PH and PL
blocks that have the output precharged when the clock is low
will be called n-Dp blocks; similarly, the PH and PL blocks
that have the output precharged when the clock is high will
be called p-Dp blocks.
B. Composition Rules
First, the definition of data chains, fundamental to the design
rules, is given.
Definition: An n-data chain is any noncyclic signal propagation path:
1) containing at least one n-latch, one n-dynamic, or one
n-Dp block;
2) starting in a circuit external input, or in the output of a
p-latch, p-dynamic, or p-Dp block; when this output is
followed by static blocks in the normal data flow, the
data chain starts in the output of the last static block;
98
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Fig. 1. Construction blocks of the E-TSPC circuit technique: (a) n-dynamic and (b) NMOS-like n-dynamic blocks; (c) p-dynamic and (d) NMOS p-dynamic
blocks; (e) n-latch and (f) NMOS-like n-latch blocks; and (g) p-latch and (h) NMOS-like p-latch blocks.
(b)
(a)
(c)
Fig. 2. Transformation from (a) a static block into data-precharged blocks: (b) PH blocks and (c) PL blocks.
99
Fig. 3. Example of n-data chains. The blocks mentioned in the text are named and hatched in the figure.
Composition Rule ( ):
The -data chain must have one of the following two
configurations:
) at least one dynamic block and one latch;
) at least two latches and an even number of inversions
(latches or static blocks) between them.
It is worth noting that these five composition rules are very
similar to the five rules proposed in the NORA technique [6].
In a circuit where all data chains obey the five rules, it can be
proved that (six theorems presented in [1] and [2]):
a) all data-precharged blocks are precharged during the
holding phase of the data chains to which they belong;
b) the dynamic and the data-precharged blocks are not
incorrectly discharged during the evaluation phase;
c) the output of the data-chain last latch is steady during
the holding phase of the data chain.
C. Exception Rule
Although the above-described rules are necessary to avoid
race problems, typical TSPC systems do not follow some of
them. The most common exception is found in connecting
two D-flip-flops (D-FFs), as shown in Fig. 4. In such a
100
CONDITIONS
FOR
TABLE I
CORRECT OPERATION OF
THE
NMOS-LIKE BLOCKS
101
Fig. 6. Transistor schematic of the divide-by-4/5 counter DG4 . The transistor width or, when the length is different from 0.8 m, the transistor width/length,
in m, is also indicated in the figure.
TABLE II
MAXIMUM SPEED AND POWER-CONSUMPTION RESULTS FOR
THE FOUR DESIGNED DIVIDE-BY-4/5 COUNTERS (SPICE
SIMULATIONS, SLOW PARAMETERS, AND VDD = 5 V)
Design
DG1
DG2
DG3
DG4
Speed
(GHz)
Power
(w/MHz)
0.98
3.27
1.28
4.45
1.39
4.85
1.67
5.62
102
TABLE III
AREA, SPEED, AND POWER-CONSUMPTION
RESULTS FOR FOUR DIFFERENT PRESCALERS
Fig. 8. Measured results for the prescaler maximum frequency (fmax ), left
axis (3 ), and current consumption at fmax , right axis (o), as a function of
the power supply.
and
blocks, Fig. 6, drive the clock signal to
the divide-by-32 counter. All four designs have similar
configuration.
IV. EXPERIMENTAL RESULTS
835
16
A CMOS Monolithic
-Controlled Fractional-N
Frequency Synthesizer for DCS-1800
Bram De Muer, Student Member, IEEE, and Michel S. J. Steyaert, Senior Member, IEEE
16
16
16
16
I. INTRODUCTION
HE END of the 20th century was characterized by the unrivaled growth of the telecommunication industry. The main
cause was the introduction of digital signal processing in wireless communications, driven by the development of high-performance low-cost CMOS technologies for VLSI. However, the
implementation of the RF analog front end remains the bottleneck. This is reflected in the large effort put into monolithic
CMOS integration of RF circuits both by academics and industry [1][3].
The goal of this work is the monolithic integration in standard CMOS technology of a frequency synthesizer to enable the
full integration of a transceiver front end in CMOS, including
a low-IF receiver and a direct upconversion transmitter [1]. To
achieve a high degree of integratability and fast settling under
fractional- synthesizer topology
low-noise constraints, a
fractional- synthesis circumhas been chosen [4] (Fig. 1).
vents the severe speedspectral purityresolution tradeoff of the
classic phase-locked loop (PLL) synthesizer, by providing synthesis of fractional multiples of the reference frequency. Spurious tones that emerge from the fractional division are whitened
action and ultimately filtered by
and noise shaped by the
the loop filter. To prevent degradation of the spectral purity by
Manuscript received November 5, 2001; revised January 31, 2002.
The authors are with the Katholieke Universiteit Leuven, Department
Elektrotechniek, ESAT-MICAS, B-3001 Heverlee, Belgium (e-mail: bram.demuer@esat.kuleuven.ac.be).
Publisher Item Identifier S 0018-9200(02)05856-0.
Fig. 1. Principle of
16 fractional-N synthesis.
SYNTHESIZER
A. Introduction
fractional- synthesizer is shown
A block diagram of a
modulator output controls the instantaneous
in Fig. 1. The
division modulus of the prescaler, such that the mean division
, with the number of bits of the
modulus is
modulator and the input word. The corresponding phase
changes at the prescaler output are quantized, leading to possible
spurious tones and quantization noise. By selecting higher order
modulators, the spurious energy is whitened and shaped to
high-frequency noise, which can be removed by the low-pass
loop filter. As a result, for a given frequency resolution, an arcan be chosen, by assigning the proper number
bitrary high
836
16 modulator. The internal modulator accuracy is 16 bit. From the five output bits, only four are used for stability
Modulators
16
837
16
18
<
15
838
Fig. 5. Simulation results. The phase error for (a) the MASH modulator and (b) the single-loop multibit modulator. The FFT of the current pulses CP
(c) the MASH modulator and (d) the single-loop multibit modulator.
[i] for
839
Fig. 7. Discrete time autocorrelation estimate of the modulator outputs for (a)
the MASH modulator and (b) the single-loop multibit modulator.
16
840
SUMMARY
TABLE I
LOOP PROPERTIES AND PERFORMANCE
FOURTH-ORDER TYPE-II PLL
OF THE
OF THE
841
Fig. 9. (a) Timing control circuit and signals to control the dummy and the output current branch of the charge pump. (b) Charge-pump circuit with (at the left)
the dummy current branch, denoted by the suffix d, and the output branch.
Conversion
at a fixed level (see Fig. 8). Additionally, the charge-pump current is designed to be at least a magnitude larger than the fixed
parasitic charge injection of the switch transistors. The current
switches are implemented with pMOS and nMOS transistors to
compensate charge injection. Finally, a timing control scheme
[Fig. 9(a)] is developed to control the charge-pump switches.
The up and down control pulses of the PFD are converted to synchronized control signals to drive both the output current branch
and the dummy current branch of the charge pump [Fig. 9(b)].
Fig. 9(a) shows the dummy and output control signals. The
is delayed versus the output control
by
dummy control
modifying the thresholds of the second inverter-string (indicated
always flows, preby high and low) such that the current
venting hard on/off switching of the current sources. To equalize
rise and fall times and force a perfect rad relation between
nMOS and pMOS control signals, latches at the outputs of both
inverter strings are implemented. Capacitors at the control outputs lower the rise and fall times to prevent large charge injections by fast switching.
842
Fig. 10.
To linearize the
conversion, the phase detection
is performed by a zero-dead-zone PFD [15], to prevent a hard
nonlinearity around 0 phase error. Due to the delay added in
the PFD, both the up and down current sources are on, for small
or zero phase errors, enabling the PFD to react to very small
phase errors. The on-time fraction of the charge pump due to
the delay is less than 10%. This value is a tradeoff between
dead-zone prevention and sensitivity to noise coupling, when
the charge pumps are on. To further minimize digital noise coupling, the sampling in the PFD and the computational events
modulator and prescaler are offset in phase. Consein the
quently, the phase-error decision making is done in a relatively
quiet environment. To make sure that the gains for positive and
negative phase-error detection are equal, the current source transistors are oversized to ensure sufficient matching. As a side efnoise, which can seriously affect the
fect, the current source
in-band noise, is decreased. Additionally, the timing control of
Fig. 9(a) provides synchronization between the two filter paths
and the switches of the charge pumps themselves, thereby ensuring equal positive and negative phase-error detection gain.
HSPICE simulations of the PFD charge-pump circuit are performed and show no dead zone and no gain mismatch with ideal
transistor matching.
V. EXPERIMENTAL RESULTS
Fig. 10 shows the IC microphotograph and the measurement
fractional measuresetup in which it is embedded. The
ments are performed by controlling the PLL divider moduli with
an HP80000 data generator, which generates the 4-bit control
output bit stream is generated using Matlab.
word. The 4-bit
modThis provides a flexible way to test different kinds of
ulators, without the need for redesigns. All presented measurements are performed with a 26-MHz reference frequency and at
16
N PLL at 1.76592
SUMMARY
843
OF
TABLE II
MEASURED SPECIFICATIONS COMPARED
DCS-1800 SPECIFICATIONS
TO THE
16
Fig. 13. Phase noise measurement with the MASH converter at 1.76592 GHz
compared to the simulated
noise at the output of the PLL (dashed), and
control (dash-dotted).
with the simulated PLL output without
16
16
noisy control pulses are close to the LC tank and the bonding
wires of the VCO power supply. Without proper shielding, the
VCO phase noise is seriously degraded by this noise coupling.
noise and the
noise as simIn Fig. 13, the measured
ulated in Section III (dashed) is compared. The dash-dotted line
control. The
is the simulated phase noise of the PLL without
noise leakage closely matches the measured resimulated
sults, except at very low offsets due to the limited memory. The
phase noise at high offsets is increased versus the simulated PLL
results due to noise coupling. Second-order tones are larger in
measurements, since the models in the simulator do not include
second-order effects and noise coupling. Tones at 520 kHz are
believed to come from subharmonic tones present in the
modulator output [5], which are amplified by mixing through
noise coupling. When comparing the results for the MASH and
the single-loop modulator, the measured results are less pronounced than the simulated results (see Fig. 6). The measured
phase noise for the single-loop modulator is however a few decibels lower than for the MASH modulator. Note that all measurements are performed for frequencies close to integer multiples
.
of
The measured settling time of the PLL is 226 s for a
104-MHz frequency step. The power consumption of the PLL
is 70 mW from a 2-V power supply. The fully integrated
low-phase-noise VCO is responsible for almost 66% of the
total power consumption. The IC area is 2 2 mm , including
bonding pads and bypass capacitors. Table II shows the measured specifications compared to the DCS-1800 specifications
[1]. The specifications of the IC prototype comply with the
is degraded due to the limited
DCS-1800, only the
resolution of the measurement setup.
VI. CONCLUSION
A monolithic 1.8-GHz
-controlled fractional- PLL
frequency synthesizer is implemented in a standard 0.25- m
CMOS technology. The monolithic fourth-order type-II PLL
844
16
[11] B. De Muer and M. S. J. Steyaert, A single-ended 1.5-GHz 8/9 dualmodulus prescaler in 0.7-m CMOS with low phase-noise and high
input sensitivity, in Proc. Eur. Solid-State Circuits Conf. (ESSCIRC),
The Hague, Sept. 1998, pp. 256259.
[12] J. Craninckx and M. S. J. Steyaert, Low-phase-noise fully integrated
CMOS frequency synthesizers, Ph.D. dissertation, Katholieke Univ.
Leuven, Belgium, 1997.
[13] B. De Muer, M. Borremans, N. Itoh, and M. S. J. Steyaert, A 1.8-GHz
highly tunable low-phase-noise CMOS VCO, in Proc. IEEE Custom
Integrated Circuits Conf. (CICC), Orlando, FL, May 2000, pp. 585588.
[14] B. De Muer and M. S. J. Steyaert, Fully integrated CMOS frequency
synthesizers for wireless communications, in Analog Circuit Design,
W. Sansen, J. H. Huijsing, and R. J. van de Plassche, Eds. Norwell,
MA: Kluwer, 2000, pp. 287323.
[15] F. M. Gardner, Phaselock Techniques. New York: Wiley, 1979.
Bram De Muer (S00) was born in Sint-Amandsberg, Belgium, in 1973. He received the M.Sc.
degree in electrical engineering in 1996 from the
Katholieke Universiteit Leuven, Belgium, where
he is currently working toward the Ph.D. degree
on high frequency low-noise integrated frequency
synthesizers at the ESAT-MICAS laboratories.
He has been a Research Assistant with
ESAT-MICAS laboratories since 1996. His research
is focused on integrated low-phase-noise VCOs with
on-chip planar inductors and high-speed prescaler
design, leading to fully integrated
fractional-N synthesizers in CMOS
technology.
16
1039
I. INTRODUCTION
Fig. 1.
time-to-market demands architectures providing easy optimization of power dissipation, fast design time and simple layout
work. High reusability, in turn, requires an architecture which
provides easy adaptation of the input frequency range and of
the maximum and minimum division ratios of existing designs.
The choice of the divider architecture is therefore essential
for achieving low-power dissipation, high design flexibility and
high reusability of existing building blocks. A modular architecture complies with these requirements, as shall be demonstrated
in this paper. The focus of the paper is first on the truly modular
architecture and on the implementation of the circuits. Then the
power optimization procedure and the design of the input amplifier are briefly discussed. Finally, a collection of measured
data and the conclusions are presented.
II. PROGRAMMABLE DIVIDER ARCHITECTURES
A. Architecture Based on a Dual-Modulus Prescaler
Fig. 1 depicts the divider architecture based on a dual-modulus prescaler [4], [5]. The design of the dual-modulus prescaler
itself has been extensively treated in the literature [4], [6][10].
On the other hand, the architecture of Fig. 1 has some undesirable characteristics. One readily notices the lack of modularity
of the concept: besides the dual-modulus prescaler, the architecture requires two additional counters for the generation of a
given division ratio. The programmable counterswhich are, in
fact, fully programmable dividers, albeit not operating at the full
RF frequencyrepresent a substantial load at the output of the
dual-modulus prescaler, so that power dissipation is increased.
Besides, the additional design and layout effort required for
the programmable counters increase the time-to-market of new
products. These properties led us to conclude that the dual-modulus-based architecture is not an interesting option for the realization of building blocks with high reusability, high flexibility,
and short design time.
1040
(a)
(b)
Fig. 2.
Programmable prescaler. (a) Basic architecture. (b) With extended division range.
(1)
is the period of the input signal
, and
are the binary programming values of the cells 1
to respectively. The equation shows that all integer division
(if all
= 0) to
(if all
=
ratios ranging from
In (1),
A. Realized Circuits
The modular architecture of Fig. 2(b) was applied in the realization of a family of fully programmable frequency dividers.
1In principle, it is also possible to divide by 3 , but the gap between this
value and the continuous division range makes it useless in standard synthesizer
applications.
1041
Fig. 3. Family of truly modular programmable dividers, and corresponding division range of the different implementations.
Fig. 4.
1042
Fig. 6.
TABLE I
SCALING OF CURRENTS IN THE 2/3 DIVIDER CELLS
E. Input Amplifiers
The input amplifier provides the required amplification of the
voltage-controlled oscillator (VCO) signal to digital levels,
determined by the sensitivity specifications and by the divider
IV. MEASUREMENTS
The control currents for the UHF and -band dividers can be
set externally, through input pins. The input amplifier current is
Fig. 7.
1043
Sensitivity of the UHF divider, for different divider current settings. Division ratio = 511, nominal current is I
= 10 A.
Fig. 8. Sensitivity curves of the L-band divider, for a few divider and amplifier current settings. Division ratio = 1023.
Fig. 10. Minimum input level for correct division of the frequency dividers,
as function of the input amplifiers consumption.
1044
Fig. 11. Phase noise of the reference and UHF divider, measured at 10 MHz. - - - Reference divider (I
Reference divider (I = 10 , F = 20 MHz, 20 dBm).
Fig. 12.
= 10 , F
= 20 MHz,
010 dBm).
level of the 20-MHz input signal. For the UHF divider, however, no dependency of the noise floor on the level of the input
signal at 640 MHz was observed. An increase of 25% in current
led to a change in noise floor from 122 dBc/Hz (nominal bias,
= 10 A) to 124 dBc/Hz (with
= 12.5 A). The noise
floor of the reference divider went from 127.5 dBc/Hz down
to 130 dBc/Hz, with increased bias. Fig. 11 shows that the
high frequency cells of the UHF divider (see Fig. 3) contribute
region.
significantly to the phase noise, specially in the
noise of about 15 dB is observed, compared
An increase of
to the noise of the single reference dividers cell.
C. Power Efficiency
Fig. 12 presents the power efficiency of the UHF and -band
dividers, in comparison to recently published data on low-power
dividers and tuning systems. Power efficiency is defined here as
the ratio of the dividers maximum operation frequency to its
1045
ACKNOWLEDGMENT
The authors wish to thank G. van Veenendaal, of PS-Systems Laboratory, Eindhoven, The Netherlands, for evaluation
work done on the programmable dividers. Many thanks go to
D. Kasperkovitz and J. de Haas for the support provided during
the project.
REFERENCES
[1] S. Wu and B. Razavi, A 900 MHz/1.8 GHz CMOS receiver for
dual-band applications, IEEE J. Solid-State Circuits, vol. 33, pp.
21782185, Dec. 1998.
[2] J. Craninckx and M. Steyaert, A fully integrated CMOS DCS-1800
frequency synthesizer, IEEE J. Solid-State Circuits, vol. 33, pp.
20542065, Dec. 1998.
[3] Q. Huang et al., GSM transceiver front-end circuits in 0.25-m
CMOS, IEEE J. Solid-State Circuits, vol. 34, pp. 292303, Mar. 1999.
[4] Y. Kado et al., An ultralow power CMOS/SIMOX programmable
counter LSI, IEEE J. Solid-State Circuits, vol. 32, pp. 15821587,
Oct. 1997.
[5] U. L. Rohde, RF and Microwave Digital Frequency Synthesizers. New
York, NY: Wiley, 1997.
[6] T. Seneff et al., A sub-1 mA 1.6 GHz silicon bipolar dual modulus
prescaler, IEEE J. Solid-State Circuits, vol. 29, pp. 12061211, Oct.
1994.
[7] J. Craninckx and M. Steyaert, A 1.75 GHz/3 V dual-modulus divide-by-128/129 prescalar in 0.7 m CMOS, IEEE J. Solid-State
Circuits, vol. 31, pp. 890897, July 1996.
[8] F. Piazza and Q. Huang, A low power CMOS dual modulus prescaler
for frequency synthesizers, IEICE Trans. Electron., vol. E80-C, pp.
314319, Feb. 1997.
[9] P. Larsson, High-speed architecture for a programmable frequency divider and a dual-modulus prescaler, IEEE J. Solid-State Circuits, vol.
31, pp. 744748, May 1996.
[10] J. Navarro Soares, Jr. and W. A. M. Van Noije, A 1.6 GHz dual-modulus prescaler using the extended true-single-phase-clock CMOS circuit
technique (E-TSPC), IEEE J. Solid-State Circuits, vol. 34, pp. 97102,
Jan. 1999.
[11] C. S. Vaucher and D. Kasperkovitz, A wide-band tuning system for
fully integrated satellite receivers, IEEE J. Solid-State Circuits, vol. 33,
pp. 987998, July 1998.
[12] N. H. Sheng et al., A high-speed multimodulus HBT prescaler for frequency synthesizer applications, IEEE J. Solid-State Circuits, vol. 26,
pp. 13621367, Oct. 1991.
[13] C. S. Vaucher, An adaptive PLL tuning system architecture combining
high spectral purity and fast settling time, IEEE J. Solid-State Circuits,
vol. 35, pp. 490502, Apr. 2000.
[14] C. S. Vaucher and Z. Wang, A low-power truly modular 1.8 GHz programmable divider in standard CMOS technology, in Proc. 25th Eur.
Solid-State Circuits Conf., Sept. 1999, pp. 406409.
[15] M. Mizuno et al., A GHz MOS adaptive pipeline technique using MOS
current-mode logic, IEEE J. Solid-State Circuits, pp. 784791, June
1996.
[16] A. R. Shahani et al., Low-power dividerless frequency synthesis using
aperture phase detection, IEEE J. Solid-State Circuits, vol. 33, pp.
22322239, Dec. 1998.
761
I. INTRODUCTION
The next section of the paper presents the CDR architecture and design issues. Section III deals with the design of the
building blocks. Section IV describes the experimental results.
II. ARCHITECTURE
The choice of the CDR architecture is primarily determined
by the speed and supply voltage limitations of the technology
as well as the power dissipation and jitter requirements of the
system.
In a generic CDR circuit, shown in Fig. 1, the phase detector compares the phase of the incoming data to the phase of
the clock generated by the voltage-controlled oscillator (VCO),
producing an error that is proportional to the phase difference
between its two inputs. The error is then applied to a charge
pump and a low-pass filter so as to generate the oscillator control
voltage. The clock signal also drives a decision circuit, thereby
retiming the data and reducing its jitter.
If attempted in a 0.18- m CMOS technology, the architecture of Fig. 1 poses severe difficulties for 10-Gb/s operation.
Although exploiting aggressive device scaling, the CMOS
process used in this work provides marginal performance
for such speeds. For example, even simple digital latches or
three-stage ring oscillators fail to operate reliably at these
rates. These issues make it desirable to employ a half-rate
CDR architecture, where the VCO runs at a frequency equal to
half of the input data rate. The concept of the half-rate clock
has been used in [2][5]. However, [2] and [3] incorporate
a bangbang phase detector, possibly creating a large ripple
on the control line of the oscillator and hence high jitter. The
circuit reported in [4] inherently has a smaller output jitter as a
result of using a linear phase detector, but it fails to operate at
speeds above 6 Gb/s in 0.18- m CMOS technology. The circuit
of [5] benefits from a new linear phase detection scheme, but it
may not operate properly with certain data patterns.
Another critical issue in the architecture of Fig. 1 relates to the
inherently unequal propagation delays for the two inputs of the
phase detector: most phase detectors that operate properly with
random data (e.g., a D flip-flop) are asymmetric with respect to
the data and clock inputs, thereby introducing a systematic skew
between the two in phase-lock condition. Since it is difficult
to replicate this skew in the decision circuit, the generic CDR
architecture suffers from a limited phase margin, unless the raw
speed of the technology is much higher than the data rate.
The problem of the skew demands that phase detection and
data regeneration occur in the same circuit such that the clock
still samples the data at the midpoint of each bit even in the
762
Fig. 1.
Fig. 2.
A. VCO
presence of a finite skew. For example, the Hogge PD [6] automatically sets the clock phase to the optimum point in the data
eye (but it fails to operate properly with a half-rate clock).
The above considerations lead to the CDR architecture shown
in Fig. 2. Here, a half-rate PD produces an error proportional to
the phase difference between the 10-Gb/s data stream and the
5-GHz output of the VCO. Furthermore, the PD automatically
retimes and demultiplexes the data, generating two 5-Gb/s
and
. Although the focus of this work
sequences
is point-to-point communications, a full-rate retimed output,
, is also generated to produce flexibility in testing and
exercise the ultimate speed of the technology. The VCO has
both fine and coarse control lines, the latter allowing inclusion
of a frequency-locked loop in future implementations.
In this work, a new approach to performing linear phase detection using a half-rate clock is described. Owing to its simplicity, this technique achieves both high speed and low power
dissipation while minimizing the ripple on the oscillator control
voltage.
It is interesting to note that half-rate architectures do suffer
from one drawback: the deviation of the clock duty cycle from
50% translates to bimodal jitter. As depicted in Fig. 3, since both
clock edges sample the data waveform, the clock duty cycle distortion pushes both edges away from the midpoint of the bits.
Typical duty cycle correction techniques used at lower speeds
are difficult to apply here as they suffer from significant dynamic mismatches themselves. Thus, special attention is paid
to symmetry in the layout to minimize bimodal jitter.
and
makes it difficult to stack differential pairs under
, the current variation is performed through mirror arrangements driven by pMOS differential pairs. Fig. 5 depicts
the small-signal gain and phase response of each delay stage.
While providing a phase shift of 60 , each stage achieves a gain
763
Fig. 4. (a) Three-stage ring oscillator. (b) Implementation of each stage. (c)
Transistor-level schematic.
Fig. 6. VCO gain partitioning. (a) Fine control. (b) Coarse control.
764
Fig. 8.
Fig. 7.
celled because both pulses are present only when a data transition occurs.
For linear phase comparison between data and a half-rate
clock, each transition of the data must produce an error pulse
whose width is equal to the phase difference. Furthermore, to
avoid a dead zone in the characteristics, a reference pulse
must be generated whose area is subtracted from that of the error
pulse, thus creating a net value that falls to zero in lock.
The above observations lead to the PD topology shown in
Fig. 7(a). The circuit consists of four latches and two XOR gates.
The data is applied to the inputs of two sets of cascaded latches,
each cascade constituting a flipflop that retimes the data. Since
the flipflops are driven by a half-rate clock, the two output seand
are the demultiplexed waveforms of
quences
the original input sequence if the clock samples the data in the
middle of the bit period.
The operation of the PD can be described using the waveforms depicted in Fig. 7(b). The basic unit employed in the circuit is a latch whose output carries information about the zero
crossings of both the data and the clock. The output of each
latch tracks its input for half a clock period and holds the value
for the other half, yielding the waveforms shown in Fig. 7(b) for
and
. The two waveforms differ because their corpoints
responding latches operate on opposite clock edges. Produced
, the Error signal is equal to ZERO for the portion of
as
and
overlap, and equal to the
time that identical bits of
XOR of two consecutive bits for the rest. In other words, Error
is equal to ONE only if a data transition has occurred.
It may seem that the Error signal uniquely represents the
phase difference, but that would be true only if the data were pe-
riodic. The random nature of the data and the periodic behavior
of the clock in fact make the average value of Error pattern dependent. For this reason, a reference signal must also be generated whose average conveys this dependence. The two waveand
contain the samples of the data at the rising
forms
contains pulses as
and falling edges of the clock. Thus,
wide as half the clock period for every data transition, serving
as the reference signal.
While the two XOR operations provide both the Error and the
Reference pulses for every data transition, the pulses in Error
are only half as wide as those in Reference. This means that
the amplitude of Error must be scaled up by a factor of two
with respect to Reference so that the difference between their
averages drops to zero when clock transitions are in the middle
of the data eye. The phase error with respect to this point is then
linearly proportional to the difference between the two averages.
In order to generate a full-rate output, the demultiplexed sequences are combined by a multiplexer that operates on the
half-rate clock as well. This output can also be used for testing
purposes in order to obtain the overall bit-error rate (BER) of
the receiver.
It is important to note that the XOR gates in Fig. 7 must be
symmetric with respect to their two differential inputs. Otherwise, differences in propagation delays result in systematic
phase offsets. Each of the XOR gates is implemented as shown
in Fig. 8 [7]. The circuit avoids stacking stages while providing
perfect symmetry between the two inputs. The output is singleended but the single-ended Error and Reference signals produced by the two XOR gates in the phase detector are sensed with
respect to each other, thus acting as a differential drive for the
charge pump. The operation of the XOR circuit is as follows. If
the two logical inputs are not equal, then one of the input transistors on the left and one of the input transistors on the right
off. If the two inputs are identical,
turn on, thus turning
. Since the average
one of the tail currents flows through
current produced by the Error XOR gate is half of that generated
is scaled differently,
by the Reference XOR gate, transistor
making the average output voltages equal for zero phase differreduces the
ence. Channel length modulation of transistor
precision of current scaling between the two XOR gates. This effect can be avoided by increasing the length of the device.
The gain of the PD is determined by the value of the resistor
and the tail current sources ( ). The voltage
is generated on chip in order to track the variations over temperature and
765
Fig. 10.
Fig. 9.
Determination of PD gain.
766
Fig. 11.
Lock acquisition.
Fig. 13. (a) Spectrum of the recovered clock. (b) Recovered clock in the time
domain.
Fig. 12.
Chip photograph.
a value relatively close to its value in phase lock. The loop goes
through a transition of 350 ns before it locks. The ripple on the
control line in phase lock is approximately 1 mV.
IV. EXPERIMENTAL RESULTS
The CDR circuit has been fabricated in a 0.18- m CMOS
process. Fig. 12 shows a photograph of the chip, which occumm . Electrostatic discharge (ESD)
pies an area of
protection diodes are included for all pads except the high-speed
lines. Nonetheless, since all of these lines have a 50- termina, they exhibit some tolerance to ESD. The circuit is
tion to
tested in a chip-on-board assembly. In this prototype, the width
of the poly resistors was not sufficient to guarantee the nominal
sheet resistance. As a result, the fabricated resistor values deviated from their nominal value by 30%, and the VCO center
frequency was proportionally lower than the simulated value at
the nominal supply voltage (1.8 V). The supply was increased to
2.5 V, to achieve reliable operation at 10 Gb/s. While such a high
supply voltage creates hot-carrier effects in rail-to-rail CMOS
circuits, it is less detrimental in this design because no transistor
in the circuit experiences a gatesource or drainsource voltage
of more than 1 V. This issue is nonetheless resolved in a second
design [9] by proper choice of resistor dimensions. The circuit
is brought close to lock with the aid of the VCO coarse control
before phase locking takes over.
Fig. 13(a) shows the spectrum of the clock in response to a
. The effect of the noise
10-Gb/s data sequence of length
shaping of the loop can be observed in this spectrum. The phase
767
supply. The VCO, the PD, and the clock and data buffers consume 20.7, 33.2, and 18.1 mW, respectively.
V. CONCLUSION
CMOS technology holds great promise for optical communication circuits. The raw speed resulting from aggressive scaling
along with high levels of integration provide a high performance
at low cost. A 10-Gb/s clock and data recovery circuit designed
in 0.18- m CMOS technology performs phase locking, data regeneration, and demultiplexing with 1 ps of RMS jitter.
REFERENCES
Fig. 15.
the SONET specifications, but the jitter analyzer must then generate large jitter and drives the loop out of lock. The loop bandwidth can be reduced to the SONET specifications if a means of
frequency detection is added to the loop [9]. The circuit is then
much less susceptible to loss of lock due to the jitter generated
by the analyzer.
Fig. 15 depicts the retimed data. The demultiplexed data
outputs are shown in Fig. 15(a). The difference between the
waveforms results from systematic differences between the
bond wires and traces on the test board. Fig. 15(b) depicts the
full-rate output. Using this output, the BER of the system can
, the BER is
be measured. With a random sequence of
. However, a random sequence of
smaller that
results in a BER of
. This BER can be reduced if
the bandwidth of the output buffer driving the 10-Gb/s data is
increased. Furthermore, if the value of the linear resistors is
adjusted to their nominal value, the increased operating speed
of the back-end multiplexer results in an improved BER [9].
The CDR circuit exhibits a capture range of 6 MHz and a
tracking range of 177 MHz. The total power consumed by the
circuit excluding the output buffers is 72 mW from a 2.5-V
0-7803-6608-5
2001 IEEE
0-7803-6608-5
2001 IEEE
Figure 5.3.6: (a) Recovered data and clock, (b) measured jitter transfer
characteristics.
0-7803-6608-5
2001 IEEE
1320
AbstractClock and data recovery (CDR) circuits are key electronic components in future optical broadband communication
systems. In this paper, we present a 40-Gb/s integrated CDR
circuit applying a phase-locked loop technique. The IC has been
fabricated in a 50-GHz f T self-aligned double-polysilicon bipolar
technology using only production-like process steps. The achieved
data rate is a record value for silicon and comparable with the
best results for this type of circuit realized in SiGe and IIIV
technologies.
Index Terms Bipolar digital integrated circuits, clocks, data
communication, high-speed integrated circuits, phase-locked
loops, synchronization.
Fig. 1. Block diagram of the fiber-optic link.
I. INTRODUCTION
Fig. 2(a) shows the used concept of the CDR for the fiberoptic link (Fig. 1) in more detail. The main processing blocks
are the demultiplexer consisting of two masterslave D-flipflops (DFF1, DFF2) in parallel and an additional masterslave
D-flip-flop (DFF3), which forms the phase detector together
(a)
1321
(b)
Fig. 2. CDR circuit: (a) block diagram and (b) timing diagram.
with DFF2 and the XOR gate. All these functions are integrated
in a single chip. The fixed 90 phase shifter, voltage-controlled
oscillator (VCO), and loop filter have been realized externally
with commercially available components.
Fig. 2(b) shows the timing diagram. The incoming 40-Gb/s
data signal is applied to flip-flops DFF1, DFF2, and DFF3.
DFF1 is toggled by CLK, DFF2 by CLK, and DFF3 by the
90 delayed clock signal. This results in the sampling of the
input in the vicinity of midbit and each following potential
transition. If a transition is present, the phase relationship of
the data and the clock can be deduced to be early or late. If
the midbit clock CLK is too early, DFF3 samples the same
bit; if it is too late, DFF3 samples the following bit. Under
locked conditions, DFF3 samples at the edge of the data eye.
The XOR compares the output samples of DFF2 and DFF3.
The result is fed to the loop filter. The output signal of the
loop filter serves as the control signal of the VCO.
The advantages of this concept are that all components
operate at half the data rate and that the input is demultiplexed
at the same time. The disadvantage is that the input signal has
to drive three DFFs in parallel.
IV. CIRCUIT
AND
DESIGN PRINCIPLES
1322
2 0.9 mm2 ).
TABLE I
DEVICE PARAMETERS (JUNCTION CAPACITANCES ARE ZERO-BIAS VALUES)
Meeting the required speed rather than low power consumption was the main aim of this design. All transistor
sizes are individually optimized with respect to the function
of the transistor in the circuit. Extra attention was given to
the on-chip wiring. The lines on the chips were classified as
critical or uncritical. For example, the lines driven by
emitter followers are critical because they support ringing,
while the lines driven by current switches are uncritical [10].
The critical lines are then shortened at the cost of the uncritical
ones. The longer signal lines are realized as microstrip lines
(with the lowest metallization layer as a ground plane), mainly
to improve simulation accuracy. This leads to the layout shown
in Fig. 4.
V. CHIP TECHNOLOGY
The circuit has been fabricated in a self-aligned doublepolysilicon bipolar technology [12]. The fabrication starts
with buried layer formation. A 1- m epitaxial layer is grown
and low
to compromise between high-transit-frequency
. The isolation consists
external collector-base capacitance
of a channel stopper implantation combined with LOCOS field
oxide. The active base is formed by 5-keV BF implantation.
This low implantation energy in combination with optimized
annealing conditions allows for very steep base profiles. This
results in a narrow base width of about 50 nm and thus enables
a high transit frequency. A selectively implanted collector
improves the current-carrying capability of the transistors to
of about 2
the high optimum collector current density
mA/ m . To minimize narrow emitter effects, an in situ doped
emitter-polysilicon layer is used [13]. This prevents a reduction
of cutoff frequency even for 0.5- m design rules. A three-level
metallization completes the process.
Fig. 5 shows a schematic cross section of a transistor.
Except for epitaxy, only process steps of a 0.5- m CMOS
production environment are necessary. The maximum transit
GHz at
V
frequency of the transistors is
mA m . Table I summarizes typical parameters
and
for transistors with effective emitter size of
m . The minimum gate delay for an ECL differentially
operating ring oscillator with output voltage swing of 800
mV - is measured to be 15.4 ps. This value is achieved for
a current per gate of 1.6 mA (see Fig. 6).
Fig. 6. Measured ECL gate delay versus current per gate with an 800-mV
differential voltage swing.
VI. MOUNTING
AND
MEASURMENT SETUP
1323
2 70 mm2 ).
D2 of the 1 : 2
(a)
(b)
Din .
VIII. CONCLUSION
An integrated clock and data recovery circuit operating up
silicon
to 40 Gb/s has been realized in a 0.5- m/50-GHz
bipolar technology using only production-like process steps.
This data rate is the highest reported value for this type of
circuit in a silicon technology. This demonstrates that all
1324
Fritz Schumann received the Diplomingenieur degree in electrical engineering from Technical University Berlin, Germany, in 1981.
Subsequently, he worked in the field of RF and
microwave hybrid circuit and system design for
telecommunication and radar applications. In 1992,
he joined the silicon bipolar IC design group, Corporate Research and Development, Siemens AG,
Munich, Germany. Since then, he has realized ICs
for wireless and fiber-optic communication systems
up to 60 Gb/s.
Alfred Felder was born in Bruneck, South Tyrol, Italy, in 1963. He received
the Diplomingenieur and Ph.D. degrees in electrical engineering from the
Technical University Vienna, Austria, in 1989 and 1993, respectively.
He joined Corporate Research and Development, Siemens AG, Munich,
Germany, in 1989, where he has been engaged in the development of analog
and digital high-speed silicon bipolar ICs for future optical communication
systems in the gigabit-per-second range. From 1996 to 1998, he was Manager
of the Technology Department of Siemens K.K. The department is the liaison
office of the Corporate Technology of Siemens AG in Japan, responsible
for the cooperation with Japanese companies in research. Since 1998, he
has been heading the business operation Signal Processing & Control within
the Siemens Semiconductor Group in Japan and has been responsible for
marketing of microcontrollers and digital signal processors.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 12, DECEMBER 2001
1937
I. INTRODUCTION
AND
IN
THE
1938
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 12, DECEMBER 2001
Fig. 2.
CDR/DEMUX architecture.
Small-scale integrated analog and digital building blocks implemented in this process have been demonstrated for 40-Gb/s
operation [2], [3].
IV. CDR/DEMUX ARCHITECTURE
The half-bit-rate architecture of the CDR/DEMUX IC (Fig. 2)
is based on the concept reported in [4] and has already demonstrated its functionality for 10-Gb/s applications.
The nonlinear bang-bang PD described in [5] was modified
for interlaced operation. The combination of the PD and the
1939
Fig. 3. PD principle.
The internal 40-GHz VCO is based on a differential Colpitts topology using microstrip transmission lines instead of a
spiral inductor (Fig. 5). At an oscillation frequency of 40 GHz, a
grounded microstrip line can be modeled quite accurately compared to other forms of inductors.
The microstrip transmission lines exhibit an inductive input
impedance, since for the odd mode they see a virtual ground
termination. It is physically implemented with the signal line on
the upper thick metallization layer and a shielding ground plane
on the lowest metal layer. This yields maximum inductive input
impedance per line length combined with minimum resistive
losses, which optimizes the quality factor .
As the PLL employs a high-speed and a low-speed filter
in parallel, the VCO has two separate frequency tuning inputs.
and
feed two ac-couThese tuning inputs
pled reverse-biased varactor diodes controlling the VCO frequency. The varactors have minimum size, resulting in a maximum VCO frequency modulation bandwidth as required by the
high-bandwidth bang-bang PD architecture.
It should be noted that the optimization of the free-running
VCO phase noise is not the major design goal. Since the
bang-bang PLL architecture provides very high bandwidth with
1940
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 12, DECEMBER 2001
Fig. 5.
Fig. 7 shows the circuit diagram of the latch. The latch structure can be subdivided into the latch core and the local clock
input stage.
The steepness of the PD curve is strongly determined by the
metastability and the clock phase margin (CPM) of the latch as
sample the bit transition when the PLL is in lock. Two high
current-biased emitter followers in the data path provide a high
CPM and a small metastability region.
For optimal clock distribution, a local clock input stage conand two emitter followers
sisting of termination resistors
is included in each latch cell, since a total clock line length
of several millimeters cannot be avoided. For this reason, current interfaces between each clock buffer and the latches are
employed. The concept of the clock distribution is illustrated
in Fig. 8. The two clock buffers are based on open-collector
transadmittance stages. As stated before, a local clock input
is included in
stage consisting of termination resistors
every latch cell. Each clock buffer is loaded with ten latches.
Impedance matching can be easily achieved by designing
according to the number of loading
the line impedance
can be locally adapted with respect
latches. The value of
to signal splits, as it is shown in Fig. 8. If the line impedances
(with being the number of loading
are chosen
latches), the clock amplitude can be increased. This is due
to the inductive peaking effect of the transmission line, as
indicated in Fig. 9.
The layout of the latch, which is shown as part of Fig. 10,
employs orthogonal data and clock inputs. This implementation
minimizes line length on the high-speed data path by running a
data channel directly through the cascaded latch cells (Fig. 10).
In addition, clock channels run beside the cells to simplify the
clock-tree routing.
1941
C. Bang-Bang PD
is
The measured PD transfer function
illustrated in Fig. 15. The curve shows a fairly steep slope in the
lock-in point, which implies good sampling capability at data
transition and a small metastability region, corresponding to the
observed high CPM of 240 .
D. Jitter Generation and Jitter Tolerance
is measured using
The rms jitter of the recovered clock
a sampling oscilloscope. By excluding the trigger jitter of the
ps) from the original measurement
test equipment (
result of less than 1.2 ps, the overall CDR rms jitter generation
can be calculated to be approximately 0.7 ps.
For SONET and SDH systems, jitter tolerance masks are defined. As standardization is not finalized for 40-Gb/s systems
yet, the tolerance mask is extrapolated from the 10-Gb/s specification. The measured jitter tolerance curve, given in Fig. 16, exhibits sufficient margin relative to the extrapolated BELLCORE
mask [10]. For jitter frequencies higher than the corner freMHz of the PLL, the jitter tolerance is dequency
termined by the CPM of the first latches and the PLL has no inMHz, the
fluence. For jitter frequencies lower than
jitter tolerance decreases with 20 dB per decade. Large amounts
of jitter can be tolerated at low frequencies, since the filter provides high gain in this frequency range.
1942
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 12, DECEMBER 2001
Fig. 9.
Fig. 10.
Fig. 11.
Fig. 12.
Fig. 13.
1943
Fig. 14.
Fig. 15.
Fig. 16.
Fig. 17.
BER measurement.
1944
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 36, NO. 12, DECEMBER 2001
OSNR measurement.
VII. CONCLUSION
In this paper, the implementation of a fully integrated
CDR/DEMUX for 40-Gb/s TDM application in a state-of-theart SiGe HBT process has been demonstrated. Key success factors in the design are, first, the robust half-bit-rate architecture
with bang-bang PD, and second, its implementation by partitioning the whole chip into key building blocks interconnected
via robust current interfaces. Therefore, schematic design and
layout have to be done concurrently.
This concept has been applied throughout the IC, but mainly
in the 40-Gb/s data and the 20-GHz clock distribution. Optical
system measurements show the feasibility of the CDR/DEMUX
as a single-chip-receiver IC.
REFERENCES
[1] T. F. Meister et al., SiGe base bipolar technology with 74-GHz f
and 11-ps gate delay, Proc. IEEE Int. Electron Devices Meeting
(IEDM), pp. 739742, Dec. 1995.
[2] J. Mllrich, T. F. Meister, M. Rest, W. Bogner, A. Schpflin, and H.-M.
Rein, 40-Gbit/s transimpedance amplifier in SiGe bipolar technology
for the receiver in optical-fiber links, Electron. Lett., vol. 34, pp.
452453, 1998.
[3] A. Felder, M. Mller, M. Wurzer, M. Rest, T. F. Meister, and H.-M.
Rein, 60-Gbit/s regenerating demultiplexer in SiGe bipolar technology, Electron. Lett., vol. 33, pp. 19841985, 1997.
[4] J. Hauenschild et al., A plastic packaged 10-Gb/s BiCMOS clock and
data recovery 1:4-demultiplexer with external VCO, IEEE J. SolidState Circuits, vol. 31, pp. 20562059, Dec. 1996.
[5] J. J. D. H. Alexander, Clock recovery from random binary signals,
Electron. Lett., vol. 11, pp. 541542, Oct. 1975.
[6] M. Wurzer et al., A 40-Gb/s integrated clock and data recovery circuit
in a 50-GHz f silicon bipolar technology, IEEE J. Solid-State Circuits, vol. 34, pp. 13201324, Sept. 1999.
[7] M. Wurzer et al., 42-GHz static frequency divider in a Si/SiGe bipolar
technology, in IEEE ISSCC Dig. Tech. Papers, Feb. 1997, pp. 123123.
[8] Z. Lao et al., 55-GHz dynamic frequency divider IC, Electron. Lett.,
vol. 34, no. 20, pp. 19731974, 1998.
[9] H.-M. Rein et al., Design considerations for very-high-speed
Si-Bipolar ICs operating up to 50-Gb/s, IEEE J. Solid-State Circuits,
vol. 31, pp. 10761090, Aug. 1996.
[10] SONET OC-192 Transport System Generic Criteria, Bellcore,
GR-1377-CORE, Dec. 1998.
Mario Reinhold was born in Mlheim/Ruhr, Germany, in 1972. He received the Diplom-Ingenieur degree in electrical engineering from the Ruhr-University Bochum, Germany, in 1998.
He joined Lucent Technologies, Optical Networking Group, Nrnberg, Germany, in 1998,
where his activities focused on the development
of various analog and digital high-speed bipolar
ICs for 40-Gb/s and advanced 10-Gb/s fiber-optic
communication systems. Since 2001, he has been
with CoreOptics Inc., Nrnberg, Germany, working
on a next-generation 40-Gb/s chipset.
1945
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 12, DECEMBER 2000
1949
I. INTRODUCTION
1950
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 12, DECEMBER 2000
(a)
Fig. 3.
(b)
Fig. 2. Two basic self-aligned CDR architectures. (a) With a linear PLL. (b)
With a binary PLL.
tion into a frequency-control (low-frequency) part and a phasecontrol (high-frequency) part. In general, binary PLL is similar in system behavior to a double integration delta modulator
with prediction [12] acting in the data-phase domain. Based on
analogy with the signal frequency response in delta modulator,
the phase-jitter transfer function has a nonlinear (slew-rate limited) mechanism for the phase-jitter frequency response. It is a
single-pole-like response with the bandwidth inversely proportional to the input jitter amplitude:
(1)
is 3-dB bandwidth of the binary PLL jitter transfer
where
is the jitter amplitude,
is the
function;
is the average data transition
bangbang frequency step,
for
pattern).
density factor (maximum
The shape of the Jitter Tolerance function can be characterized by means of jitter-tolerance scale function,
[5]. Modeled binary PLL jitter tolerance response,
,
. For
is shown in Fig. 3 for two frequency steps
comparison, Bellcore SONET mask is also shown on the same
plot with the corresponding unit interval (UI) values on the right
side of the graph1 . To satisfy the mask, jitter transfer bandwidth
defined by (1) should be set above 4 MHz at a jitter amplitude
ps and minimum average data transition
.
density,
The Jitter Generation in a binary CDR is proportional to the
, and delay in the PLL loop, measured as a
frequency step,
number of 100-ps clock periods required to propagate signal
from the phase detector output to its input:
(2)
Equations (1) and (2) were used to find a frequency step
and a delay
acceptable for the SONET applications as
1In
a 10-Gb/s system, UI
= 100 ps.
1951
TABLE I
COMPARISON OF LINEAR AND BINARY TYPE CDR ARCHITECTURES
stages
implemented with output current steering in
has a fixed gain and also
a differential pair [13]. Stage
provides open collector transmission line interface to drive the
CDR data bus. To alleviate conflicting requirements for large
data swing and low noise figure at low input amplitudes, the
AGC has two gain ranges: 77 dB (low gain range) and 720
dB (high gain range). Two differential pairs with a large and
a small emitter degeneration resistors are used to switched
the gain ranges. AGC-measured S11 is better than 15 dB in
a frequency range up to 10 GHz, noise figure 13.5 dB. The ac
bandwidth is adjustable in a range of 810 GHz.
C. CDR
As compared to original version of binary CDR [7], [11], in
the CDR presented here, the data decision and clock recovery
processes are split. This allows for independent optimization
of data decision threshold (slicing) without affecting clock recovery process. There are four decision channels in the CDR, all
driven by the CDR data bus. Channels 1 and 2 are identical decision circuits, as shown in the block diagram of Fig. 7(a). The
additional decision channel allows operation with two different
input slicing levels. The data decision threshold is set by a differential slicer circuit based on an emitter follower [Fig. 7(b)]. In a
long-haul receiver application, a high-performance limiting amplifier [2 in Fig. 7(a)] is required. Note that the AGC stabilizes
only a long-time averaged amplitude measured at AGC output
with a peak detector (not shown in Figs. 5 and 6). The limiting
amplifier stage was designed for 40-dB gain with bandwidth of
more than 16 GHz and input AM to output PM conversion less
than 1 ps in 20-dB input dynamic range. Two 20-dB gain-limiting amplifier stages similar to [14] were employed. The digital sampler is based on a masterslavemaster (MSM) flip-flop
1952
Fig. 5.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 12, DECEMBER 2000
configuration [Fig. 7(c)]. It helps to reduce the latching metastability region and to increase the clock phase margin in the decision circuit, as opposed to masterslave D-type flip-flop (DFF).
Schematic and layout of the latch was optimized for a minimum
latching time constant.
The BPD is formed with decision channels 3 and 4, and two
DFF circuits. The BPD takes three data samples according to the
timing diagram in Fig. 5. It generates a binary output with respect to the lead/lag phase of the VCO clock. The BPD uses only
one edge of the data transition, and is set to a tri-state condition
at the other edge or in the absence of data transitions according
to a truth table (Table II). As a result, recovered clock jitter is
not effected by asymmetry between the rising and the falling
edges or by the incoming data pattern. The BPD phase resolution is critical to the CDR jitter performance. Latch metastability at time sample T0 (see Fig. 5) limits the resolution. This
problem was circumvented by using a MSM-based decision circuit preceded by a limiting amplifier, as described above. Phase
resolution better than 1 ps was measured in the BPD circuit, as
shown in Fig. 8. A phase-delay sweep was arranged with two
clock generators: 5.0125 GHz for the data and 10.0000 GHz
for the clock. Frequency shift of 12.5 MHz for the data provided binary pulses at the output of the phase detector with
40-ns period corresponding to a 100-ps delay sweep. To analyze the output pulses, a digital scope was synchronized from
the phase-detector output using an HP 54118A trigger amplifier. This method allowed measurement of the output transition
region with the accuracy of 0.5 ps per one data cycle (200 ps).
Measured transition region at the phase detector output contains
two data cycles, confirming phase detector time resolution to be
less than 1 ps.
The CDR frequency-control loop and the phase-control loop
are separated as described in Section II. The bang-bang part of
1953
TABLE II
TRUTH TABLE OF BINARY PHASE DETECTOR
Fig. 9.
Fig. 10.
Fig. 11.
D. 1 : 8 Demultiplexer
The 1 : 8 DEMUX (Fig. 10) is similar in architecture to
the design presented in [15]. Seven 1 : 2 demultiplexer circuits are cascaded, with each stage optimized for the clock
frequency required. Each 1 : 2 demultiplexer consists of a
masterslavemaster flip-flop to capture the lead bit on the positive edge of the clock and a masterslave flip-flop to capture
the second bit using the negative edge of the clock. Utilizing
an extra latch in the MSM flip-flop ensures that the 1 : 2 data
outputs are aligned for further processing. The frequency of the
incoming CDR clock is divided by two at each demultiplexer
stage with a delay equal to the data delay in the 1 : 2 block.
E. Built-in
PRBS Generator
1954
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 12, DECEMBER 2000
Fig. 12.
Fig. 14. SiGe receiver eye diagrams measured in CDR mode at 9.1 GB/s. Input
data: 80 mV p PRBS 2
1.
cm in the test fixture was defined by the perimeter required for mounting I/O connectors in the housing metal box. A
large number of required I/O were used for testing purposes. The
receiver IC does not require external components, except decoupling and integration capacitors mounted beside the die. The
recovered clock and data eye diagrams at 9.1 Gb/s are shown
in Fig. 14. The IC input sensitivity is less than 4.5 mV at
measured at the 1 : 8 demultiplexer output with
AGC gain set to 20 dB (data eye closure in the test fixture was
de-embedded). DATA : 8 transition distortion apparent in Fig. 14
is due to long ribbon cable attached to the test fixture demultiplexed outputs. The receiver consumes in a mission mode 4.5 W
from 5 V.
The CDR performance IC was fully characterized at 10-Gb/s
on-wafer with a probe card. The die micrograph of the CDR
is shown in Fig. 15. It consists of an exact copy of the receiver
CDR layout plus the output buffers located in the DEMUX partition. In all of the measurements, input data were supplied singleended while unused differential input was terminated with 50 .
The CDR typical eye diagrams measured with 20 mV
PRBS data are shown in Fig. 16. The input sensitivity was meaas compared to 13.4 mV
sured to be 14 mV at
simulated considering thermal and shot noise in the decision
channel.
Phase noise of the recovered clock was measured with an
HP 4352B as a power spectrum density (Fig. 17). 10-Gb/s input
data were supplied with amplitude of 100 mV and
Fig. 15.
Fig. 16.
1955
Fig. 18.
Fig. 19. CDR jitter tolerance. Performance measured with the clock reference
level modulaton test marked with symbol .
Fig. 17. Phase-noise comparison of the CDR recovered clock, free running
VCO and data pattern generator (BERT) clock.
clock gives jitter RMS value of 0.78 ps. Phase noise was found
.
to be PRBS pattern independent up to a pattern of
The OC192 jitter compliant performance (at 9.953 28 Gb/s)
was verified with a jitter analyzer MX177 701 from Anritsu.
Jitter generation (in 80-MHz bandwidth) was measured to be
5.4 ps and 0.8 ps RMS as compared to 10 ps or 1 ps RMS
recommended by Bellcore [2]. The RMS jitter is very close to
the 0.78-ps value obtained in the phase-noise measurement.
Jitter transfer measurement (Fig. 18) showed, as predicted
by modeling, single-pole-like characteristics with no jitter
peaking. Jitter tolerance (Fig. 19) has a very wide safety margin
for SONET mask with a minimum of 40 ps (15 ps is
recommended). The shape of measured CDR jitter tolerance
response differs from the modeled in Fig. 3 because of test
setup limitations. This is seen from the measured jitter tolerance
of the test setup (BERT) with no CDR in the data path (shown
in the same plot of Fig. 19). Only in the frequency range of
40 kHz2 MHz measured jitter tolerance is determined by CDR
performance. In this frequency range, measured and modeled
jitter tolerance coincide. The upper frequency range of the
jitter tolerance response was also remeasured with a different
(see Fig. 5)
method, based on the reference voltage
1956
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 35, NO. 12, DECEMBER 2000
ACKNOWLEDGMENT
The authors thank their colleagues S. Szilagyi for the
microwave test fixture design, and D. Marchesan and
Dr. S. Voinigescu for useful discussions and distributed
components modeling. Special thanks to R. Hadaway for his
directions and to IBM Corporation for fabrication.
REFERENCES
[1] B. Beggs, GaAs HBT 10-Gb/s Product, in 1999 IEEE MTT-S Int. Microwave Symp. Workshop, Anaheim, CA, June 1319, 1999.
[2] SONET OC-192, Transport system generic criteria, Bellcore,
GR-1377-CORE, no. 4, Mar. 1998.
[3] R. C. Walker et al., A 10-Gb/s Si-bipolar Tx/Rx chipset for computer
data transmission, in ISSCC Dig. Tech. Papers, Feb. 1998, pp. 302303.
[4] T. Morikawa et al., A SiGe single-chip 3.3-V receiver IC for 10-Gb/s
optical communication systems, in ISSCC Dig. Tech. Papers, Feb.
1999, pp. 380381.
[5] Y. Greshishchev and P. Schvan, SiGe clock and data recovery IC
with linear-type PLL for 10-Gb/s SONET application, in Proc. 1999
Bipolar/BiCMOS Circuits and Technology Meeting, Sept. 1999, pp.
169172.
[6] Y. M. Greshishchev, P. Schvan, J. L. Showell, M.-L. Xu, J. J. Ojha, and
J. E. Roger, A fully integrated SiGe receiver IC for 10-Gb/s data rate,
in ISSCC Dig. Tech. Papers, Feb. 2000, pp. 5253.
[7] J. D. H. Alexander, Clock recovery from random binary signals, Electron. Lett., vol. 11, pp. 541542, Oct. 1975.
[8] C. R. Hogge, A self-correcting clock recovery circuit, J. Lightwave
Technology, vol. 3, pp. 13121314, Dec. 1985.
[9] J. Hauenschild et al., A two-chip receiver for short-haul links up to
3.5-Gb/s with PIN-preamp module and CDR-MUX, in ISSCC Dig.
Tech. Papers, Feb. 1998, pp. 308309.
[10] J. Hauenschild et al., A plastic packaged 10-Gb/s biCMOS clock and
data recovering 1 : 4-demultiplexer with external VCO, IEEE J. SolidState Circuits, vol. 31, pp. 20562059, Dec. 1996.
[11] R. C. Walker et al., A two-chip 1.5-GBd serial link interface, IEEE J.
Solid-State Circuits, vol. 27, pp. 18051811, Dec. 1992.
[12] R. Steele, Delta Modulation Systems. New York/Toronto: Wiley, 1975.
[13] M. Soda, T. Suzaki, and T. Morikawa et al., A Si bipolar chip set for
10-Gb/s optical receiver, in ISSCC Dig. Tech. Papers, Feb. 1992, pp.
100101.
Peter Schvan (M89) was born in Budapest, Hungary, in 1952. He received the M.S. degree in physics
from Eotovos Lorand University, Budapest, in 1975
and the Ph.D. degree in electrical engineering from
Carleton University, Ottawa, Ontario, Canada, in
1985.
In 1985, he joined Nortel Neworks, Ottawa, where
he started working in the area of BiCMOS and
bipolar technology development, yield prediction,
device characterization, and modeling. Recently, his
work has been extended to the design of multi-gigabit
circuits and systems. He is currently Senior Manager of a group responsible
for evaluating various high-performance technologies and demonstrating
advanced circuit concepts required for fiber optic communication systems. He
has authored and co-authored numerous publications.
Jugnu J. Ojha (M00) received the B.Eng. degree from Salhousie University
and the Technical University of Nova Scotia in 1987. He received the M.Sc. and
Ph.D. degrees from McMaster University, Hamilton, Ontario, Canada, in 1990
and 1994, respectively. His graduate work involved research in electronic and
optoelectronic devices, as well as optoelectronic properties of semiconductors.
He was with Nortel Networks, Ottawa, Ontario, Canada, from 1994 to 2000,
where he worked on a wide range of technologies, including design of circuits
for 10 and 40 Gb/s optical transmission systems using SiGe and InP HBTs. He
also led a program in MEMS technology, with a focus on optical applications,
including optical crossconnects. His other activities included next-generation
optical network development, as well as research on optical properties of SiGe
materials and devices. He recently joined Caspian Networks in Palo Alto, CA,
as Senior Advisor in Optical Networking.
1957
Jonathan E. Rogers (S00) received the B.A.Sc degree in engineering sciences (electrical option) from
the University of Toronto, Ontario, Canada, in 1999.
He is currently working toward the M.A.Sc in electronics at the University of Toronto. His area of research is clock and data recovery systems in deep
sub-micron CMOS.
In May, 1997, he joined Nortel, Ottawa, Ontario,
for a 16-month internship, where he performed clock
and data recovery system characterization, VCO design, and high-speed measurements on SiGe MMICs
under the guidance of Dr. Y. Greshishchev.
1120
I. INTRODUCTION
adjustment is further complicated by packaging issues, specifically that of aligning the recovered clock after the off-chip filter
and the decision IC.
In PLL-based CDRs, the clock phase in the decision circuit
is automatically synchronized to sample the center of the time
slot of each bit. Also, the PLL can be integrated onto a single
IC, greatly reducing temperature drift and phase relationship
problems.
Previous implementations of digital PLL-based CDRs at
40 Gb/s have employed half bit-rate clocking of CDR demultiplexer (DEMUX) combinations, to reduce the bandwidth
required in the buffers and digital gates [2][4]. By clocking
at 20 GHz to tolerate lower transistor bandwidth, the 2
parallel phase detector requires higher circuit complexity
with about twice the transistor count and a precise four-phase
voltage-controlled oscillator (VCO).
In this paper, we leverage the InP heterojunction bipolar transistor (HBT) technology operating at the full 40-Gb/s data rate
to simplify the CDR architecture.
II. CDR ARCHITECTURE
The typical transceiver architecture for a 40-Gb/s lightwave
system is shown in Fig. 1. The CDR IC of this work is highlighted in the receiver path.
The core of the PLL-based CDR is the phase detector. The
phase detector used here simultaneously recovers both clock and
data. The digital earlylate phase detector [5], consisting of data
flip-flops and combinatory logic gates, is shown in Fig. 2. In the
locked state, the chain samples the center of the incoming
1121
Fig. 3.
data time slot, while the chain samples the data zero crossings. The combinatory logic block, with inputs from the , ,
and latch chains, determines if the clock is early or late with
respect to the incoming data transition. This logic generates the
UPDOWN control signal for the VCO.
and are generated by decision circuits (respectively, after
two and four latches). The phase difference between and is
180 (one bit). is generated after three latches with an inverted
clock. If the clock phase is correct, is always in the middle
of and . EXOR ( ) and NAND ( ) gate combinatory logic
converts , , and to the UPDOWN pulses for controlling
the VCO. The logic equations are as follows.
UP
DOWN
The clock is early (slows down the VCO) if
and
,
. The clock is late (speeds up the VCO) if
UP DOWN
and
, UP DOWN
. The clock is correct if
. Obviously, data transitions are required for clock
recovery.
The chip also incorporates 15-dB gain-limiting amplifier
buffers at the data I/O, at the divide-by-2 clock output (for the
external DEMUX) and at the divide-by-32 clock output (for the
external coarse adjustment low-frequency lock loop). The static
frequency dividers use similar latches as those in the digital
phase detector. Figs. 3 and 4, respectively, show the final chip
block diagram and photograph.
The block diagram of Fig. 3 is laid over Fig. 4. To maintain
symmetry in the 40-Gb/s data and 40-GHz clock signals, the
phase detector layout contains four symmetric rows (as in the
block diagram of Fig. 2).
III. CDR DESIGN AND FABRICATION
Differential emitter-coupled logic (ECL) logic with 400-mV
differential voltage swings is used. To simplify the layout
process at a high bit rate, standardized digital and analog blocks
are designed and used.
All high-frequency inputs are terminated with on-chip 50resistors. Propagation delay of the 40-GHz clock to the ,
, and
latches and to the divider chain is a critical issue.
Matching propagation delay is achieved by symmetric layout
1122
Fig. 5. Buffer amplifier cascaded TASTIS architecture and transistor level schematic.
<
1123
Fig. 10. CDR IC used as single channel 1 : 4 DEMUX (40-Gb/s data, 10-GHz
clock). Electrical output at 10 Gb/s and BER versus optical power measurement
of the optical link of Fig. 8.
optical power. The received optical power is 29.5 dBm for the
typical system required bit-error rate (BER) of 10 .
V. CONCLUSION
A complex ( 1350 transistors and 1.75-mm square)
mixed-signal CDR chip with on-chip VCO, amplifiers, decision circuit, and clock dividers was successfully fabricated
with a state-of-the-art InP HBT technology. Fully functional
chips at speed were obtained from the first iteration. Data is
retimed at 40 Gb/s and a good control signal is made available
to the on-chip VCO. The CDR IC is used as a DEMUX to
PRBS RZ signal to an
convert an optical 40-Gb/s 2
1124
electrical 10-Gb/s 2
PRBS NRZ signal in an optical link
experiment. An optical sensitivity of 29.5 dBm is measured
at 10 BER.
REFERENCES
[1] R. Yu, R. Pierson, P. Zampardi, K. Runga, A. Campana, D. Meeker, K. C.
Wang, A. Peterson, and J. Bowers, Packaged clock recovery integrated
circuits for 40-Gb/s optical communication links, in GaAs IC Symp.
Tech. Dig., 1996, pp. 129132.
[2] M. Wurzer, J. Bock, H. Knapp, W. Zirwas, F. Schumann, and A. Felder,
A 40-Gb/s integrated clock and data recovery circuit in a 50-GHz f
silicon bipolar technology, IEEE J. Solid-State Circuits, vol. 34, pp.
13201324, Sept. 1999.
[3] J. Hauenschild, C. Dorschky, T. W. von Mohrenfels, and R. Seitz, A
plastic packaged 10-Gb/s BiCMOS clock and data recovering 1 : 4 demultiplexer with external VCO, IEEE J. Solid-State Circuits, vol. 31,
pp. 20562059, Dec. 1996.
[4] M. Reinhold, C. Dorschky, R. Pullela, E. Rose, P. Mayer, P. Paschke, Y.
Baeyens, J. P. Mattia, and F. Kunz, A fully integrated 40-Gb/s clock and
data recovery/1 : 4 DEMUX IC in SiGe technology, IEEE J. Solid-State
Circuits, vol. 36, pp. 19371945, Dec. 2001.
[5] J. D. H. Alexander, Clock recovery from random binary signals, Electron. Lett., vol. 11, pp. 541542, 1975.
[6] H.-M. Rein, Design considerations for very high speed Si-bipolar ICs
operating up to 50 Gb/s, IEEE J. Solid-State Circuits, vol. 31, pp.
10761090, Aug. 1996.
[7] M. Sokolich, D. Doctor, Y. Brown, A. Kramer, J. Jensen, W. Stanchina,
S. Thomas, C. Fields, D. Ahmari, M. Liu, R. Martinez, and J. Duvall,
A low power 52.9-GHz static divider implemented in a manufacturable
180-GHz InAlAs/InGaAs HBT IC technology, in GaAs IC Symp. Tech.
Dig., 1998, pp. 117120.
[8] G. Georgiou, P. Paschke, R. Kopf, R. Hamm, R. Ryan, A. Tate, J. Burm,
C. Schullien, and Y.-K. Chen, High gain limiting amplifier for 10-Gb/s
lightwave receivers, in Proc. 11th Int. Conf. InP and Related Materials,
1999, pp. 7174.
Alan H. Gnauck (M98SM00) received the B.S. degree in physics and the
M.S. degree in electrical engineering from Rutgers University, New Brunswick,
NJ, in 1975 and 1986, respectively.
In 1982, he joined AT&T (now Lucent Technologies) Bell Laboratories. He
has designed and built multigigabit amplifiers, multiplexers, demultiplexers,
and optical receivers, and performed record-breaking optical transmission
experiments at single-channel rates of from 2 to 40 Gb/s. He has investigated
coherent detection, chromatic-dispersion compensation techniques, CATV
hybrid fiber-coax architectures, wavelength-division-multiplexed (WDM)
systems, and system impacts of fiber nonlinearities. His WDM transmission
experiments include the first demonstration of terabit transmission. He is a
Technical Committee Member of the Optical Fiber Communications Conference (OFC) 2003. He holds twelve patents in optical fiber communications.
His current research interests include the study of WDM systems with
single-channel rates of 40 Gb/s.
Dr. Gnauck is an Associate Editor for IEEE PHOTONICS TECHNOLOGY
LETTERS.
Rajasekhar Pullela received the B.Tech. degree in electrical and communications engineering from the Indian Institute of Technology, Madras, India, in
1993. From 1993 to 1998, he was a graduate student researcher at the University
of California, Santa Barbara. During this period, he received M.S. and Ph.D degrees in electrical engineering, studying device physics and high-speed circuit
design.
During 19982000, he was a Member of Technical Staff with Lucent Technologies, Bell Laboratories, Murray Hill, NJ, designing high-speed ICs for fiberoptic communication systems. Since 2000, he has been with Gtran, Inc., Newbury Park, CA.
1125
1156
I. INTRODUCTION
IGITAL signal processing becomes economical in consumer applications. The main requirement there is low
cost in mass production. Digital processing and transmission
has to be carried out with low power and in cheap IC packages.
Data transmission between different digital signal processing ICs influences significantly the power consumption and
the system cost. For video signal transmission in 100 Hz TV
sets, typically 16 data lines in parallel are driven with 27 MHz
rail-to-rail nonreturn-to-zero (NRZ) data signals. Sharp data
transitions are in use to ensure reliable synchronous operation.
A power saving alternative could be found in low-swing highspeed serial data transmission in the range of 500 Mb/s or
more. However, this kind of high-speed data transmission
has to be asynchronous. The most economic solution avoids
separate transmission of the clock. In that case, clock recovery
from the NRZ data stream is required. In this paper we describe
a phase-locked loop (PLL) which is designed to process more
than 1 Gb/s data in a 0.5- m CMOS technology.
II. ARCHITECTURE
The PLL generally consists of three building blocks (Fig. 1):
1) phase comparator, detecting the phase difference between the data and the recovered clock;
Manuscript received December 15, 1996; revised February 6, 1997. This
work was supported in part by the German Ministry for Education and
Research under Contract 01M2880A.
M. Rau was with the University of Ulm, Germany. He is now with
Siemens AG, 81359 Munich, Germany.
T. Oberst was with the University of Ulm, Germany. He is now with
DASA, D-89077 Ulm, Germany.
R. Lares and A. Rothermel are with the Microelectronics Department,
University of Ulm, D-89081 Ulm, Germany.
R. Schweer is with Thomson Multimedia, D-78048 VillingenSchwenningen, Germany.
N. Menoux is with Thomson, 38240 Meylan, France.
Publisher Item Identifier S 0018-9200(97)04386-2.
2) loop filter, filtering the phase detector output and forming the control signal for the oscillator;
3) voltage controlled oscillator (VCO).
The unusual feature in our design is the phase detector, which
uses a delay-locked loop (DLL) to generate multiple sampling
clocks. Thus, the VCO can run at only half the data rate,
which means that we can detect a 1-Gb/s serial data stream
with a 500-MHz VCO. This relieves the timing constraints
in the phase detector logic and results in well correlated and
data independent control signals. Also, at the lower frequency
the VCO tuning range is large enough to compensate all
technology parameter variations. With this architecture we
could achieve higher data rates.
The block diagram of the circuit is shown in Fig. 2. No
external components are required for the PLL. The loop filter
capacitor is integrated on chip together with the VCO, the
phase comparator, and a charge pump. The data stream is
retimed in two flip-flops with the inverted and noninverted
clock. Two flip-flops are required because the clock has only
half the data rate. These two half-speed data streams are
combined in a multiplexer, forming an output stream at the
original data rate. A lock-in circuit is realized on chip, because
the phase comparator is not frequency sensitive.
III. PHASE COMPARATOR
The PLL adjusts the clock to an incoming data stream.
Because of the random nature of data there is not necessarily
a data transition at every clock cycle. The loop has to handle
sequences of consecutive zeroes or ones in the data stream.
The following phase comparator output signal properties are
essential.
First, the phase comparator must not give any output signal
if there is no data edge. Second, the duration of the control
signal pulses at the data transitions is important, especially if
there are few of them. In general, for a good loop performance
the control signal should be proportional to the phase error.
However, for very high operating frequencies, analog signals
depend on the data pattern and become highly nonlinear,
because they do not settle during the bit duration. It was
found by simulation that different phase detectors with analog
outputs [1], [2] limit the PLL operating frequency. On the other
hand, clock recovery schemes based on sampling techniques
[3], [4] result in uniform digital control pulses. They are
1157
(a)
(c)
Fig. 4. Operation of the phase detector: (a) data at sampling time B equals
data transition is late
the data at the preceding sampling time A
frequency up, (b) data at sampling time B equals the data at the following
data transition is early
frequency down, (c) data at
sampling time A
sampling time A equals the data at the preceding sampling time A
no data
edge, no control signal output.
)
)
1158
Fig. 5. DLL to generate all 90 phase shifted sampling clocks with high
accuracy.
AND
LOOP FILTER
V. VCO
Both high oscillation frequency and a wide tuning range
are required. We choose a ring oscillator design with variable
load capacitors (Fig. 7) based on [5]. Duty cycle is not an
issue here, because the flip-flops all are triggered with the
same edge; the DLL generates the required phase shifts. This
circuit can safely cope with all parameter variations. Fig. 8
shows the VCO tuning characteristic.
1159
Fig. 11. VCO clock output and data output eye pattern at 1 Gb/s with a
(215
1)-bit length pseudorandom input sequence.
Fig. 10. Maximum data rate versus pseudorandom sequence length for
error-free receiving during time of measurement (complies with error rate
smaller than 1 10011 ).
1353
I. INTRODUCTION
N A CLOCK and data recovery (CDR) circuit with integrated phase-locked loop (PLL), the reference clock is
extracted from the incoming data stream and is automatically
aligned to the center of the data pulse independent of its pattern.
Two CDR ICs have been reported operating at 10 Gb/s: one
with a linear PLL (LPLL) using a modified Hogge-type phase
detector (PD) [1] and the other with a binary PLL using a
bang-bang (Alexander)-type PD [2].1 CDR jitter characteristics
critical to SONET optical receiver design are Jitter Transfer
(Bandwidth and Jitter Peaking), Jitter Tolerance, and Jitter
Generation. In a linear CDR (LCDR), jitter transfer characteristics are independent of the jitter amplitude and can be
analytically predicted according to a LPLL theory. This feature
can be important for SONET applications, particularly if data
is to be retransmitted and jitter transfer must be well controlled.
This paper describes a 10 Gb/s LCDR with less than 1 ps
rms pattern-independent jitter generation required for SONET
application [3]. This jitter generation represents the best published CDR result. A number of techniques have been used to
achieve low jitter. First, a charge pump with tri-state is employed. This is a well known technique frequently used in conjunction with frequency-PD PLLs [4]. This technique is also
called switched-filter PLL [5]. Generally, this approach provides a hold mode in the PLL filter in the absence of data transitions and prevents variation of the voltage-controlled oscillator
1354
Fig. 3.
functions of
:
,
. The CDR circuit
. Fig. 3 shows analytically calcuwas designed for
with
4 MHz compared to OC192
lated
SONET mask. The bandwidth does not satisfy SONET requirement (to be below 120 KHz), but the jitter peaking is within the
recommended 0.1-dB jitter gain.
The CDR jitter tolerance is a measure of how much
peak-to-peak sinusoidal jitter can be added to the incoming
data before causing data errors2 due to misalignment of the
data and the recovered clock. In a CDR with LPLL, jitter
tolerance is defined by a jitter transfer function and the PLL
slew-rate capabilities [7]. Considering no slew rate limitations
in the loop (this is usually the case in a well designed CDR),
the frequency response of the jitter tolerance can be described
by the following function:
Fig. 2.
of
and
are 1 V differential. The IC can
also be used in a data recovery mode only, when the VCO clock
.
is overdriven with an external signal
(4)
(2)
(3)
(5)
[ps]
2The
[MHz]
(6)
GRESHISHCHEV AND SCHVAN: CLOCK AND DATA RECOVERY IC FOR SONET APPLICATION
Fig. 5.
Fig. 6.
PD simulated response.
1355
[MHz]
(7)
[MHz]
(8)
1356
(a)
Fig. 8.
(b)
Fig. 7.
D. PLL Filter
The PLL filter is split into internal and , and external
and
components (see Fig. 1). Resistors R make jitter performance less sensitive to the external parasitics and coupled noise.
(along with
in Fig. 8) performs smoothing of
Capacitor
the PD output pulses and introduces the required attenuation
used in (8). In the PLL, resistor
is limited by maximum voltage drop required for normal circuit operation. Pulse
smoothing also relaxes this constraint.
E. Cross-Talk Isolation
Two differential output buffers ( in Fig. 1) provide adjustable differential voltage swing up to 1 V . The buffers are
physically separated from the VCO and PD with transmission
line interfaces to prevent jitter generation due to cross-talk via
substrate and common grounds. The VCO is also separated
from the PD with similar transmission line interface. All of the
blocks have separate power-supply systems routed according
to the isolation and analogdigital ground splitting techniques
described in [6]. All of the CDR circuits are fully differential.
The 10 Gb/s inputs and outputs are terminated on-chip with
50- resistors.
IV. SIMULATION
Five levels of hierarchical PLL analysis were carried out:
analytical, behavioral linear, behavioral mixed-mode, circuit
schematic level, and post layout with distributed parasitics. The
last four levels are HSPICE-based. A mixed-mode behavioral
library of linear and digital components was developed. All
levels of simulation give consistent results, with increasing
GRESHISHCHEV AND SCHVAN: CLOCK AND DATA RECOVERY IC FOR SONET APPLICATION
Fig. 11.
1357
(a)
Fig. 12. CDR 9.529-Gb/s eye diagrams and the recovered clock. Input data 30
1 PRBS pattern.
mV , 2
The IC performed as simulated, except for the VCO oscillation frequency which was 5% lower than simulated. Measurements were done on-wafer using membrane probes from Cascade Microtech. The PLL was locked by an external sweeping of
the VCO frequency using the 10-GHz adjust input (see Fig. 1).
The locking range is 25 MHz and the PLL stays locked within a
200-MHz frequency range. Fig. 12 shows recovered clock and
CDR eye diagrams at 9.529-Gb/s data rate and 30 mV ,
PRBS input signal. Measured sensitivity was 14 mV at BER
10 . This value is close to the simulated 13.4 mV (Section III), which indicates that a sufficient level of cross-talk isolation is achieved. The jitter tolerance was measured for jitter
amplitudes below 40 ps . This jitter was generated by modu(see Fig. 1) with an exlating the clock slicing level
ternal signal from dc to 100-MHz frequency range. No data errors were detected associated with this jitter. This demonstrates
jitter tolerance of more than 40 ps compared to the 15 ps
SONET mask above 4 MHz.
To verify the maximum bit rate of the IC, it was tested at
13.25 Gb/s in a data recovery mode with an external clock
(Fig. 13). Sensitivity at 12.5 Gb/s was measured to
be 15.5 mV . Data-recovery clock-delay margin of 77 ps
at 10 Gb/s was the same as the BER tester delay margin,
confirming picosecond timing resolution of the decision circuit.
The recovered clock jitter was measured, with a digital oscilloscope, to be 1.85 ps rms versus 1.68 ps rms jitter of the refer-
1358
APPENDIX A
The CDR jitter transfer function is similar by definition to
. For the second-order
the PLL phase transfer function
charge-pump PLL of Fig. 2, the phase transfer function is [9]
(A.1)
This formula can be rewritten as
Fig. 13.
(A.2)
is a bandwidth of the
where
asymptotic single-pole low-pass transfer function
(A.3)
approaches the low-pass reThe jitter response
at
or at
. The asymptotic
sponse
jitter tolerance shape function can be defined from (4) and (A.3)
as
(A.4)
APPENDIX B
The following PLL filter components are found by solving
(1)(3):
(B.1)
(B.2)
ACKNOWLEDGMENT
The authors thank C. Kelly and P. Popescu for discussions,
J. E. Rogers for his contributions to layout design and simulations, M.-L. Xu for help with the output buffer layout, J. Showell
for assistance with the measurements, Dr. S. Voinigescu and D.
Marchesan for their expertise in SiGe components modeling,
and Dr. M. Copeland for advice on VCO phase noise analyses.
Special thanks to R. Hadaway for his support and to IBM corporation for fabrication.
REFERENCES
[1] T. Morikawa et al., A SiGe single-chip 3.3 V receiver IC for 10Gb/s optical communication systems, in ISSCC Dig. Tech. Papers, Feb. 1999,
pp. 380381.
[2] R. C. Walker et al., A 10Gb/s Si-bipolar Tx/Rx chipset for computer
data transmission, in ISSCC Dig. Tech. Papers, Feb. 1998, pp. 302303.
[3] Y. Greshishchev and P. Schvan, SiGe clock and data recovery IC
with linear-type PLL for 10 Gb/s SONET application, in Proc. 1999
Bipolar/BiCMOS circuits and Technology Meeting, Sept. 1999, pp.
169172.
[4] B. Razavi, Design of monolithic phase-locked loops and clock recovery
circuitsA tutorial, in Monolithic Phase-Locked Loops and Clock Recovery Circuits: Theory and Design, B. Razavi, Ed. New York, NY:
IEEE Press, 1996, pp. 405420.
GRESHISHCHEV AND SCHVAN: CLOCK AND DATA RECOVERY IC FOR SONET APPLICATION
1359
Peter Schvan (M89) was born in Budapest, Hungary, in 1952. He received the M.S. degree in physics
from Eotvos Lorand University, Budapest, in 1975
and the Ph.D. degree in electrical engineering from
Carleton University, Ottawa, ON, Canada, in 1985.
In 1985, he joined Nortel Networks, Ottawa,
ON, Canada, where he worked in the area of
BiCMOS and bipolar technology development, yield
prediction, device characterization, and modeling.
Recently, his work has been extended to the design
of multigigabit circuits and systems. He is currently
Senior Manager of a group responsible for evaluating various high-performance technologies and demonstrating, advanced circuit concepts required for
fiberoptic communication systems. He is the author or coauthor of numerous
publications.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 12, DECEMBER 1999
1951
I. INTRODUCTION
1952
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 12, DECEMBER 1999
(a)
(b)
(c)
Fig. 1. Clock recovery PLLs. (a) Standard, (b) phase selection, and (c)
loop filter.
feedback phase selection. ChP F denotes charge pump
The smoothing effect of the loop filter can also be used for
phase interpolation. If the Ctrl signal in Fig. 1(c) alternates between two different clock phases every second cycle of the reference clock, the result will be a VCO clock phase corresponding to the average of the two selected phases. In the test chip,
four levels of averaging phase interpolation were implemented
by circulating through four clock cycles and in each clock cyor
as the feedback clock. A quarcle selecting phase
is then achieved by
ter phase interpolation generating
for three consecutive clock cycles, then selecting
selecting
for the fourth cycle and repeating this sequence.
The architecture in Fig. 1(c) lends itself naturally to combining both averaging phase interpolation and standard currentmode interpolation. A test chip was built in a 0.25- m,
2.5-V digital CMOS process to evaluate the jitter performance
of the phase selection architecture. A block diagram of the
implemented VCO and phase control circuitry is shown in
Fig. 2. The phase select control code at the input consists
of seven bits, of which two are directly fed to a finite state
machine (FSM) that generates control signals for realizing
the averaging interpolation. The remaining five bits of the
from which the code for
control code represent
is generated by adding one. The FSM controls Mux1 to
and
in a four
select one of the codes representing
clock period repetitive cycle, as described above. The five
bits at the output of Mux1 are split into three bits coarse
select and two bits fine select. The three coarse bits select
two neighboring phases from a four-stage differential VCO
having eight evenly spaced output phases and send these two
phases to a current-mode interpolator. Mux2/Mux3 in Fig. 2
receive one coarse bit each, and the third coarse bit is used
to conditionally invert the output signals. The interpolator is
similar to the Type-I circuit in [9] and is controlled by a fourbit temperature code derived from the two fine select bits.
Both the current-mode interpolation and the averaging phase
interpolation are programmable in the test chip and can be
disabled. The two complementary multiplexers at the output
1953
(a)
(b)
(c)
(d)
1954
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 12, DECEMBER 1999
(a)
Fig. 6. Example VCO waveforms to estimate lower limit on
(b)
Fig. 5. Bias generation and one VCO delay stage of (a) replica bias scheme
and (b) diode clamping.
Good 1 noise performance has been shown, but their powersupply noise rejection is inferior to that of the standard analog
differential pair since they lack a high-impedance source,
Therefore, the analog style
making the delay depend on
differential pair is preferable in applications where powersupply noise is the main source of oscillator jitter. When a
differential pair with resistor loads is used as a delay cell in a
VCO, the frequency is regulated by changing the tail current
control voltage in Fig. 5(a). To
as implemented by the
achieve a large frequency tuning range, it is desirable that the
output swing and common mode do not change significantly
with frequency. Often the replica-bias scheme in Fig. 5(a) is
employed, which relies on good matching between a replica of
the delay stage (devices Mr1Mr3) and the VCO delay stages
to
, giving a known
to set the VCO output swing from
common mode and swing independent of the speed-regulating
current. A disadvantage of this technique is that the PMOS
load (Mr3) will operate as a current source at low frequencies,
introducing high gain in the replica feedback loop. To prevent
instability, a large compensation capacitor is required, which
introduces another pole in the PLL, leading to more intricate
design. Furthermore, the amplifier in the replica bias loop
requires additional headroom, thereby prohibiting lowoperation.
Fig. 5(b) shows a structure that achieves the good powersupply noise rejection of the analog differential pair, at the
operation. The PMOS diodes
same time enabling loware used for clamping the output voltage to a minimum level
of
giving a fixed common mode and swing
without the need for a replica bias circuit. This makes the VCO
suitable for a wide range of operating frequencies and supply
voltages. To guarantee clamping action, the NMOS tail current
must be larger than the current through the controlled
Furthermore, a proposed design goal for
PMOS load
V dd:
low oscillator noise suggests that the rise and fall times of the
output nodes should be made equal [16]. This is achieved
to each of the controlled PMOS
by reflecting half of
loads by the current mirror formed by devices Md1Md3.
Assuming that Md4 recently turned on, node a will be
discharged by a current of
At the same time, the complementary output node is pulled
by a current equal to
indicating equal
to
rise and fall times. A disadvantage of this oscillator is the
additional parasitic capacitance of the diodes, which makes the
maximum operating frequency lower than that of the replica
bias structure. The additional gate capacitance of the diode
loads can be eliminated by using NMOS diodes [17].
The minimum supply voltage for the VCO is
which has been verified by measurements
V. However, at this value of
down to
the VCO is no longer differential. An estimate of the minfor differential operation can be derived from
imum
the simulated VCO waveforms in Fig. 6. The VCO output
down to approximately
For
swings from
differential operation, it is required that both NMOS devices
in the differential pair (Md4, Md5) are turned ON at the
drop
crossover point of the waveforms. Assuming a
over the current source device Md6 leads to a minimum input
At the lowest limit of
this
voltage of
generated by the previous
input voltage is
of
stage in the oscillator, indicating a minimum
Measurements determined
and
to
be 0.53 and 0.85 V, respectively, indicating a minimum
of about 1.1 V assuming a
of 0.1 V. Note that this
is a theoretical number, since the differential operation of
Good
the VCO has zero tuning range at this value of
power-supply rejection can also be achieved by the regulatedsupply structure in [18]. However, the requirement of a large
decoupling capacitor generates contradicting design goals on
PLL bandwidth.
V. CHARGE PUMP
A. Bandwidth and Peaking Compensation
To reduce peak-to-peak jitter due to VCO noise, it is
advantageous to keep as high a PLL bandwidth as possible.
Traditional worst case design would keep the PLL bandwidth
and damping factor sufficiently far away from stability limits
under all variations of the input reference frequency, the
1955
(a)
Fig. 7. Current multiplier generating charge-pump biasing voltages
and Vqpb :
qbn
manufacturing process, and the division ratio in the feedThe concept of self-biasing introduced in [19]
back path
simplifies the design by eliminating process variations and
the input reference frequency from the stability constraints.
so
However, the PLL bandwidth is still a function of
that maximum noise suppression can only be achieved for a
In programmable applications,
can vary by more
fixed
than an order of magnitude, indicating that the variation in
instead of process
stability constraint can be dominated by
variations, as shown by the stability limit of a charge-pump
PLL [20]
(b)
Fig. 8. Jitter transfer functions for different division ratios. (a) Simulated
standard PLL. (b) Measured characteristics of Loop A with intentionally low
damping.
(1)
is the input reference frequency (or effectively the
where
is the VCO gain,
sampling rate of the phase detector),
is the charge-pump current, and is the loop filter resistance.
Other PLL design parameters, such as bandwidth and damping
Compensating loop parameters
factor, also change with
guarantees that the PLL is always operating
for changes in
with maximum bandwidth and fixed damping factor without
endangering stability. This can be done by setting the charge
pump current to
(2)
is a fixed reference current. This is realized by the
where
current multiplier in Fig. 7, which generates the charge-pump
by letting the individual bits of
control binary
current
weighted current sources.
The simulated jitter transfer function of a standard PLL in
is
Fig. 8(a) demonstrates the change of loop parameters as
altered. The damping factor is intentionally set low to show
The measured jitter transfer function of
its dependence on
The
Loop A in Fig. 8(b) shows the desired independence of
slight deviation of the curves is caused by transistor mismatch
in the current multiplier.
B. Charge Sharing
A common problem of many charge pumps is charge
sharing. For the charge pump in Fig. 9(a) (Type A), charge
sharing is caused by the parasitic capacitance in nodes pcs
is active, node pcs is charged to
and ncs [21]. When
When deactivating
some of the charge stored in node pcs
will leak through the current source device. Since the parasitics
(a)
(b)
Fig. 9. (a) Charge-pump suffering from charge sharing (Type A). (b) Charge
removal transistors eliminate charge sharing (Type B).
of nodes ncs and pcs can never be matched, this will lead to a
static phase offset, as shown in Fig. 10(a). This is the transfer
function of a phase-frequency detector followed by a Type A
charge pump. The two transistors Mp and Mn in the Type
B charge pump in Fig. 9(b) will remove the charge from the
nodes pcs and ncs when Up and Down are deactivated [22].
This leads to a large reduction in the phase offset, as shown
in Fig. 10(a).
For this application, static phase offset in Loop A is not
critical. However, when analyzing the cause of phase offset, a
source of increased jitter is revealed. Fig. 10(b) indicates that
the leakage from node pcs is larger than that from ncs. When
the PLL is locked, the leakage mismatch is compensated for by
earlier than
, giving a phase offset. Since
activating
the compensation charge is applied in the early portion of the
charge-pump activation time, it will cause voltage ripple on
1956
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 12, DECEMBER 1999
(a)
(b)
(c)
(d)
(a)
(e)
(b)
Fig. 10. Characteristics of the Type A and B charge pumps. (a) Transfer
function of PFD followed by charge pump. (b) Simulated IUp and IDown
when net output charge is zero.
the loop filter, leading to phase jitter at the VCO output. The
charge removal transistors Mn and Mp in Fig. 9(b) eliminate
the current tails resulting in a well-balanced Up and Down
and
cancel each other,
activation time such that
reducing the loop filter ripple. A further advantage of the Type
noise in the current source
B charge pump is reduced 1
to below
transistors achieved by periodically resetting their
0 V [23], [24].
A limitation of the Type B charge pump is a reduced
). If
dynamic range of the VCO control voltage (
is less than
there will be a current flowing
through the Mp device to the output when the Dwn control is
inactive. When NMOS devices are used for speed-regulating
will never drop below
, constraining
the VCO,
to be less than
which can easily be fulfilled. However,
the charge pump works only up to an output voltage of
limiting the upper tuning range of the
VCO. However, the charge pump in Fig. 9(a) has the same
and
is a similar
upper voltage limit. Mismatch in
source of jitter as charge sharing described above. For low
jitter, it is essential to have good matching, implying that the
should be saturated. Again,
devices controlled by
this requires
Charge removal can also be done by ac coupling [18], but
this requires careful timing of the control signals in the charge
pump. The solution to charge sharing in [21] is less suitable for
applications due to the common-mode restrictions
lowon the differential amplifier.
Fig. 11. Evolution of loop filter. (a) Ideal model, (b) MOS-only implementation, (c) improved resistor linearity for low-V dd operation, (d) improved
capacitor linearity, and (e) final model where C3 models the well-to-substrate
capacitance.
Fig. 12.
1957
Phase detectors may exhibit a dead zone, resulting in enlarged jitter. A common design technique to avoid a dead zone
is to make sure that both Up and Down output signals are fully
activated before shutting them both off. This is implemented
by generating a reset signal with an AND operation of Up
and Down output and introducing a delay before feeding back
this signal to reset the phase detector. It is this reset delay
and
in Fig. 10(b).
that causes the simultaneous
If the charge sharing in the charge pump is not perfectly
and
there will
cancelled or if there is a mismatch of
always be some current compensation, leading to phase offset
and loop filter ripple, as discussed in Section V. A longer
reset delay results in a longer period during which the VCO
is running at a different frequency due to the compensation
current. Therefore, the reset delay should be minimized under
the constraint that it has to be longer than the response time
of the PFD with some additional design margin to avoid a
dead zone.
A PFD with low logic depth is shown in Fig. 13, including
details of the Up section. Its operation is easiest to analyze
by assuming an initial state of
This implies that
and that
is
discharges
and sets
precharged high. A rising edge on
without changing the state of the RS flip-flop. The
path will assure that
internal weak feedback in the
is kept active even if
falls. At the next rising edge on V,
is activated, which sets
This triggers the RS
high, which shuts off
; and, at the
flip-flop to precharge
is deactivated in a similar way.
same time,
sets
which is
In summary, a positive edge on
reset by the next positive edge on This behavior is identical
to the two classical PFDs implemented by either four RS flipflops or two resettable Dflip-flops. The precharged gate and
the shorter logic depth of this implementation make the delay
shorter than for the standard PFDs. This allows a smaller
delay in the reset path for eliminating the dead zone, such that
loop filter ripple will be reduced and generate less noise. An
additional benefit of low logic depth is a reduction in phase
detector jitter caused by power-supply-dependent delays and
device noise.
The reset delay of this PFD can be further reduced by
directly reset the precharged gate
letting the signal
simultaneously as the RS flip-flop is reset. This technique
was not adopted in order to keep a conservative design,
guaranteeing operation with no dead zone. Similar precharged
gates have previously been used in PFD designs [27][29].
VIII. FREQUENCY DIVIDER
To enable high flexibility, the frequency divider in Fig. 2 is
) divider. The structure in
a fully programmable (
[30] based on a clock-gated dual-modulus prescaler followed
by a counter was chosen to achieve high speed at low supply
voltages. The divider was realized in standard static CMOS
logic, reaching a maximum operating frequency of 800 MHz in
simulations of worst case slow process variation at
V and
C This exceeded the simulated speed limit of
the VCO. The potential startup deadlock in [30] was eliminated
by logic that prohibits two consecutive clock pulse removals.
IX. PLL OPERATING RANGE
AND JITTER
1958
Fig. 14.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 12, DECEMBER 1999
(a)
(5)
is the PLL bandwidth,
is the measured longwhere
is the tracking
term self-referenced tracking jitter, and
jitter with respect to an ideal reference clock. Table I lists
and derived as function of operating frequency.
measured
The jitter reported here is lower than that in [32] due to an
improved measurement setup and more accurate measurement
equipment. The of this oscillator is better than that reported
and a
for bipolar implementations in [31]
of
(as derived from the
CMOS VCO with a
for a complete
Slide Supplement of [33]). A
reported for a stand-alone
PLL is similar to
VCO in [34], suggesting that noise contributions from the
other PLL components (charge pump, PFD, frequency divider)
are much smaller than the VCO noise. The tracking jitter
compares favorably with several other CMOS oscillators [e.g.,
[8], [33], and [35][37]. degrades significantly at 1600-MHz
operation, indicating the speed limit of the PLL. The
measurements are also used for estimating the VCO period
due to device noise, based on the relation [31]
jitter
(6)
(b)
Fig. 15. A 1.2-GHz tracking jitter histogram. (a) Including power supply
and board noise. (b) Supply and board noise cancelled.
where
is the VCO period. The derived period jitter
agrees to within 10% of the phase noise estimation technique
in [38].
The PLL tracking jitter is plotted as a function of frequency
for various power-supply voltages in Fig. 16. These measureand a fixed PLL bandwidth.
ments were taken with
the chargeSince the loop filter resistance changes with
in Fig. 7 was adjusted until a 3-dB
pump reference current
PLL bandwidth of 2 MHz was measured. A bandwidth of 2
TABLE I
PLL JITTER FOR V DD = 2:5 V MEASURED WITH A PLL BANDWIDTH OF 2 MHz
1959
TABLE II
PLL CHARACTERISTICS
1960
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 34, NO. 12, DECEMBER 1999
REFERENCES
[1] K. Bult, Analog broadband communication circuits in pure digital deep
sub-micron CMOS, in Proc. IEEE Int. Solid-State Circuits Conf., 1999,
pp. 7677.
[2] H. Neuteboom, B. M. J. Kup, and M. Janssens, A DSP-based hearing
instrument IC, IEEE J. Solid-State Circuits, vol. 32, pp. 17901806,
Nov. 1997.
[3] W. Lee, P. E. Landman, B. Barton, S. Abiko, H. Takahashi, H. Mizuno,
S. Muramatsu, K. Tashiro, M. Fusumada, L. Pham, F. Boutaud, E.
Ego, G. Gallo, H. Tran, C. Lemonds, A. Shih, M. Nandakumar, R.
H. Eklund, and I. C. Chen, A 1-V programmable DSP for wireless
communication, IEEE J. Solid-State Circuits, vol. 32, pp. 17661776,
Nov. 1997.
[4] F. M. Gardner, Phase-Lock Techniques. New York: Wiley, 1979.
[5] R. J. Baumert, P. C. Metz, M. E. Pedersen, R. L. Pritchett, and J. A.
Young, A monolithic 50200 MHz CMOS clock recovery and retiming
circuit, in Proc. IEEE Custom Integrated Circuits Conf., 1989, pp.
14.5.14.
[6] K. M Ware and C. G. Sodini, A 200-MHz CMOS phase-locked loop
with dual phase detectors, IEEE J. Solid-State Circuits, vol. 24, pp.
15601568, Dec. 1989.
[7] J. Sonntag and R. Leonowich, A monolithic CMOS 10 MHz DPLL
for burst-mode data retiming, in Proc. IEEE Int. Solid-State Circuits
Conf., 1990, pp. 194195.
[8] M. Horowitz, A. Chan, J. Cobrunson, J. Gasbarro, T. Lee, W. Leung,
W. Richardson, T. Thrush, and Y. Fujii, PLL design for a 500 MB/s
interface, in Proc. IEEE Int. Solid-State Circuits Conf., 1993, pp.
160161.
[9] S. Sidiropoulos and M. A. Horowitz, A semidigital dual delay-locked
loop, IEEE J. Solid-State Circuits, vol. 32, pp. 1683 1692, Nov. 1997.
[10] S. Kasturia, A novel fractional divider for improving the switching
speed of phase-locked frequency synthesizers, Bell Labs Tech. Memo.,
May 1995.
[11] J. G. Maneatis, personal communication, Feb. 1999.
[12] T. H. Lee, K. S. Donnelly, J. T. C. Ho, J. Zerbe, M. G. Johnson,
and T. Ishikawa, A 2.5 V CMOS delay-locked loop for an 18 Mbit,
500 megabytes/s DRAM, IEEE J. Solid-State Circuits, vol. 29, pp.
14911496, Dec. 1994.
[13] T. H. Hu and P. R. Gray, A monolithic 480 Mb/s parallel
AGC/decision/clock-recovery circuit in 1.2 m CMOS, IEEE J.
Solid-State Circuits, vol. 28, pp. 13141320, 1993.
[14] C.-H. Park and B. Kim, A low-noise 900 MHz VCO in 0.6 m CMOS,
IEEE J. Solid-State Circuits, vol. 34, pp. 15861591, May 1999.
[15] J. Lee and B. Kim, A 250 MHz low jitter adaptive bandwidth PLL,
in Proc. IEEE Int. Solid-State Circuits Conf., 1999, pp. 346347.
[16] A. Hajimiri and T. H. Lee, A general theory of phase noise in electrical
oscillators, IEEE J. Solid-State Circuits, vol. 33, pp. 179195, Feb.
1998.
[17] K. Iravani, F. Saleh, D. Lee, P. Fung, P. Ta, and G. Miller, Clock and
data recovery for 1.25 Gb/s Ethernet transceiver in 0.35 m CMOS,
in Proc. Custom Integrated Circuits Conf., 1999, pp. 261264.
[18] V. von Kaenel, D. Aebisher, C. Piguet, and E. Dijkstra, A 320 MHz,
1.5 mW at 1.35 V CMOS PLL for microprocessor clock generation,
in Proc. IEEE Int. Solid-State Circuits Conf., 1996, pp. 132133.
[19] J. G. Maneatis, Low-jitter process-independent DLL and PLL based
on self-biased techniques, IEEE J. Solid-State Circuits, vol. 31, pp.
17231732, Nov. 1996.
[20] F. M. Gardner, Charge-pump phase-lock loops, IEEE Trans. Communications, vol. COM-28, pp. 18491858, Nov. 1980.
[21] M. Johnson and E. Hudson, A variable delayline PLL for CPUcoprocessor synchronization, IEEE J. Solid-State Circuits, vol. SC-23,
pp. 12181223, Oct. 1988.
[22] P. Larsson and J.-Y. Lee, A 400 mW 50380 MHz CMOS programmable clock recovery circuit, in Proc. IEEE ASIC Conf. Exhibit,
1995, pp. 271274.
[23] I. Bloom and Y. Nemirovsky, 1=f noise reduction of metal-oxidesemiconductor transistors by cycling from inversion to accumulation,
Appl. Phys. Lett., vol. 58, no. 15, pp. 16641666, Apr. 1991.
805
I. INTRODUCTION
806
(a)
(b)
noise generated in the CDR [Fig. 1(b)]. In Fig. 1(b), two cases
[noise forward and present for voltage-controlled oscillation
(VCO)] have been shown to be equivalent in terms of VCOphase fluctuation [4]. In addition, because the phase drift of the
VCO output due to the random input data is random, Fig. 1(b)
gives a rough approximation of the noise due to the input data
pattern, with no jitter applied.
To suppress the jitter caused by additive noise, the CDR
should be designed so that the noise bandwidth of the PLL is
minimized. The output jitter (the phase deviation [in rad]) of
the CDR using the PLL technique is expressed as [4]
(1)
is the power spectra density of noise, is the input
where
is the natural angular frequency, is the
signal amplitude,
damping factor, and is the loop gain. When the loop gain
becomes larger, the jitter
becomes larger, as shown in the
Appendix. This means that smaller loop gain causes narrow
noise bandwidth, thereby suppressing jitter. It should be noted
that smaller loop gain leads to a smaller cutoff frequency of
the jitter transfer function of a PLL. On the other hand, in
order to suppress the jitter caused by noise generated in the
CDR circuit, the operation of the CDR circuit needs to be
made stable. This stability can be obtained by reducing the
signal fluctuation in the CDR circuit caused by the input of
consecutive data bits, device noise, and so on. This is the socalled suppression of jitter generation, which is specified for
SDH in ITU-T Recommendation G.958. The output jitter (in
rad) in this case (jitter generation) is expressed as [4]
(2)
where is the power spectra density of noise. This equation is
derived assuming that the instantaneous frequency deviation of
the VCO output is caused by disturbance due to random phase
becomes larger,
noise. In this equation, when the loop gain
becomes smaller, as shown in the Appendix. In
the jitter
other words, the jitter increases as the loop gain decreases.
Larger loop gain can reduce the jitter caused by the noise in
807
(3)
This function is plotted in Fig. 3. It is a curve when the time
and
of the lag-lead filter described in the
constants
Appendix are set to values that provide that the natural angular
becomes nearly 2 MHz, and the damping factor
frequency
is larger than the value where the jitter gain peaking is less
than 0.1 dB. Fig. 3, in which it is stipulated that the loop gain
is lower than 3.2 10 (1/s), indicates that the curve of the
smaller loop gain meets the ITU-T (STM-16) specification.
b) Jitter generation: As described in Section II-A, as the
loop gain becomes small, it becomes more difficult to suppress
the jitter generation because of the tradeoff between reducing
jitter generation and reducing the cutoff frequency of the jitter
transfer function. To solve this problem, we introduce the SF
PLL technique. Fig. 4 shows the SF CDR configuration, which
we originally proposed as a way to maintain a precise clock
signal, thereby achieving tolerance to the input of consecutive
data bits. (Our previous work, 156-Mb/s SF CDR [5], has no
GCA in Fig. 4.) The main features of SF circuit operation
are that the PC output can be transferred to the low-pass
filter (LPF) only when data transitions occur (sample mode)
and the LPF output can be constant during consecutive data
inputs (hold mode). These features prevent the phase drift of
VCO output during the input of consecutive data bits. We
thought that the equivalent high- operation of the SF circuit
could be utilized to reduce the jitter generation. Fig. 5 shows
simulation results of the change in differential output voltage
of the loop filter when the input signal changes from a 1/0repeated bitstream to consecutive data bits (in this case, 0) at
805 ns. Fig. 5 shows the filter output of the SF CDR levels out,
while that of the CDR without the SF circuit begins to degrade
at 805 ns. This means that the operation of the SF circuit is
equal to that of the larger RC time constant of the loop filter,
and the jitter generation due to the input signal pattern would
be more suppressed than that of the CDR without an SF circuit.
In other words, the SF circuit would provide equivalent highoperation and achieve low jitter operation.
We also thought this advantage of the SF circuit could
be used to solve the tradeoff problem. Fig. 6 shows the
SPICE simulation results of the cutoff-frequency (of the jitter
transfer curve) dependence of the jitter generation of the 2.5Gb/s CDRs both with the SF circuit and without it. Both
curves indicate that the jitter generation decreases as the
cutoff frequency increases. Furthermore, the jitter generation
of the SF CDR is 70% lower than that of the CDR without
the SF circuit. It is noteworthy that the suppression of jitter
generation by the SF circuit is marked. In addition, the jitter
characteristics of the SF CDR meet the STM-16 specifications
(rms jitter generation that is lower than 4 ps, and equivalent
jitter transfer specification in which the cutoff frequency at 3dB jitter gain is lower than about 2.8 MHz), while the CDR
circuit without the SF circuit fails to meet the specifications.
In the SPICE simulation, the source of jitter is the instability
of the circuit operation of the CDR, with no jitter applied at
random input. In the experimental results, the device noise,
the fluctuations of supply voltage, and so on are also noise
sources. The jitter generation in experiments is therefore larger
than that in the simulation results. The simulation results do,
however, show the characteristics of jitter generation versus
jitter transfer functions.
Given our findings, we conclude that in the design of the
CDR IC using the PLL technique, an IC used in backbone
networks or WANs, the loop gain must be large enough so that
the jitter generation meets the ITU-T specs, yet small enough
so that suitable jitter transfer characteristics are obtained.
III. CIRCUIT DESIGN
A block diagram of the CDR, including a GCA circuit
between the SF and VCO, is shown in Fig. 4 [6]. This CDR
can be used in both short- and long- distance transmission
systems by adjusting the loop gain through the GCA from
outside the chip. The main features of the CDR are 1) an SF
circuit for equivalent high- operation [5], 2) optimum timing
adjustment between extracted clock and input data, and 3) loop
gain control on optimization.
A. VCO
In Fig. 7, the circuit configuration of VCO is shown. The
oscillation frequency is controlled by the voltage swing of
VC1, VC2, which is determined by the feedback signal
from GCA. The free-running frequency is determined by the current IF1, IF2, which can be controlled from
outside the chip. The free-running frequency can be adjusted
from 2.2 to 2.8 GHz. It covers the free-running frequency
deviation caused by fluctuations in device performance. Fig. 8
shows the tuning-voltage (feedback-voltage) dependence of
the oscillation frequency. The simulation results are in good
agreement with the experimental results. The VCO modulation
frequency sensitivity is designed to be about 1 GHz/V. The
oscillation frequency range by a feedback signal is 200 MHz.
The tuning-range diagram is shown in Fig. 9. The total tunable
range is sufficiently wide, from 2.0 to 3.0 GHz.
808
B. GCA
A current-bypass GCA circuit is used in the CDR (see
Fig. 10). The gain of the GCA, which can be controlled from
outside the chip, can be varied from 40 to 0 dB. To lower
the jitter generation of the CDR, the gain should be higher. On
the other hand, to achieve the lower cutoff frequency of the
jitter transfer curve, it must be lower. Therefore, gain should
be adjusted according to the jitter specification in each case.
C. Delay Circuit
To reduce jitter generation, the edge-inclined circuit in the
90 -delay block shown in Fig. 4, which includes a capacitor
for delay control [5], is replaced by a chain of emitter-coupled
logic (ECL) buffer circuits without capacitors, the delay of
which can be adjusted from about 100 to 300 ps from outside
the chip. The edge-inclined circuit was employed to make
the 156-Mbit/s CDR smaller. But the delay needed for the
2.5-Gbit/s CDR is only 200 ps, which is much smaller than
that needed for the 156-Mbit/s CDR. Therefore, only a small
number of ECL circuits are needed for the 200-ps delay, and
809
the extracted clock and the input data. This timing adjustment
is attained by allowing the clock to trigger the center of the
data period by means of the phase-comparator (PC) output
810
(a)
(b)
Fig. 12.
Output waveforms of the CDR for LAN. (a) Data output. (b) Clock output.
(a)
(b)
Fig. 13.
Output waveforms of the CDR for WAN. (a) Data output. (b) Clock output.
Gb/s are shown in Fig. 12. The eye opening of the output data
was sufficiently wide, and clock extraction was very precise.
The rms jitter generation is 1.2 ps and the capture range is
over 150 MHz.
B. SF CDR for WAN
To lower the cutoff frequency of the jitter transfer curve, the
external capacitor for the loop filter of 0.1 F is added. The
loop gain is set to about 2 10 (1/s), which is the loop gain
when the jitter transfer curve meets the ITU-T jitter transfer
specification in Fig. 3.
The output waveforms when the loop gain is set to the
point above are shown in Fig. 13. Again, the eye opening of
the output data was sufficiently wide, and clock extraction
was very precise. The rms jitter generation is 3.6 ps, which is
larger than that of the CDR when its loop gain is adjusted for
Fig. 14.
811
the I/O circuit) in both cases is less than 35% of that in the
2.5-Gb/s PLLs reported previously [1], [2].
V. CONCLUSION
The design method of the CDR for both LANs and WANs
is presented. A new 2.5-Gb/s SF monolithic CDR IC using
the 0.5- m Si bipolar process has been developed. The CDR
IC can be used in the transmission receivers for both LANs
and WANs. The rms jitter generation of the CDR adjusted
for LANs is 1.2 ps. Furthermore, the jitter characteristics
of the CDR for backbone networks or WANs meet the
specifications for STM-16 given in ITU-T Recommendation
G.958. In addition, the power consumption of the CDR is
only 0.4 W.
APPENDIX
A. Loop-Gain Dependence of the Jitter Shown in Fig. 1
The jitter due to the noise in the input signal to the PLL is
expressed as (1). When the loop filter is a lag-lead type (the
and
and the
series and shunt register are respectively
and
shunt capacitance is ), the natural angular frequency
812
are expressed as
where
and
in (1) therefore becomes
The term
(A1.1)
(A1.2)
increases. Therefore, the
This also shows that
jitter in Fig. 1 increases as the loop gain increases.
B. Loop-Gain Dependence of the Jitter Shown in Fig. 2
The jitter due to the noise in the PLL can be expressed as
is
(2). In (2), the term
(A2.1)
Haruhiko Ichino (M89) was born in Yamaguchi, Japan, on January 26, 1957.
He received the B.S., M.S., and Ph.D. degrees in applied physics from Osaka
University, Osaka, Japan, in 1979, 1981, and 1994, respectively.
In 1981, he joined the Electrical Communication Laboratories, Nippon
Telegraph and Telephone Corp. (NTT), Tokyo, Japan. He has been engaged
in research and development of Gbit/s SSI-MSIs using bipolar transistors
(Si bipolar transistor and AlGaAs/GaAs HBT), with application to Gbit/s
optical communication systems and high-frequency satellite communication
systems. His work also includes low-power Gbit/s LSIs for SDH networks
and future ATM switching systems; and O-E, E-O converter modules. During
this research and development, he also worked on the modeling of a highspeed bipolar transistor, analyzing ECL gate delay and maximum operating
speed of GHz flip-flop, and high-speed design methodology based on gatearray and standard-cell approaches. His interests include high-speed packaging
and measurement systems. Since 1997, he has worked on research and
development of Gbit/s-interface hardware design of photonic transport network systems. Currently, he is a Senior Research Engineer, Supervisor, and
Photonic Network Systems Research Group Leader of the Photonic Network
Laboratory, NTT Network Innovation Laboratories, Kanagawa, Japan. He was
a Visiting Lecturer at Osaka University during 19951996.
Dr. Ichino is a member of the Institute of Electronics, Information and
Communication Engineers (IEICE) of Japan. He was a Secretary of IEICEs
Technical Group on Integrated Circuits and Devices.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 12, DECEMBER 2002
1781
I. INTRODUCTION
HE VOLUME of data transported over the telecommunications network increased at a compounded annual rate of
100% from 1995 to 2001 in the U.S. (and since 1997 in Europe), mainly due to increased internet traffic [1]. Contrast this
with the historic demand for bandwidth, which grew at an annual rate of between 6% and 10% before the mid-1990s. The
call for technologies, such as interface electronics, which expand the capacity of fiber-based transport links to 10 Gb/s (and
beyond) has risen in response to this explosion in data traffic.
Systems at 10 Gb/s per channel are currently implemented in
either OC-192 or STM-64 formats using the synchronous optical network (SONET OC-192) or European synchronous data
hierarchy (SDH STM-64), respectively.
A typical Gb/s receiver is shown in Fig. 1. It uses a photodiode to convert the incoming nonreturn to zero (NRZ) optical pulses from a single 10-Gb/s fiber channel to a current.
These current pulses are then converted by the transimpedance
(TZ) amplifier into a voltage. The pulses are low-pass filtered
to remove out-of-band noise, thereby improving the received
signal-to-noise ratio (SNR). An automatic gain control amplifier
(AGC) provides additional amplification while compensating
for variations in the received signal power. The clock required to
retime the incoming synchronous data stream is recovered from
Manuscript received April 8, 2002; revised June 25, 2002. This work was
supported by Micronet, NSERC, and the Nortel Institute at the University of
Toronto.
J. E. Rogers was with the RF/MMIC Group, Department of Electrical and
Computer Engineering, University of Toronto, Toronto, ON, Canada. He
is now with Inphi Corporation, Westlake Village, CA 91361 USA (e-mail:
jrogers@inphi-corp.com).
J. R. Long was with the RF/MMIC Group, Department of Electrical and Computer Engineering University of Toronto, Toronto, ON, Canada. He is now with
the Electronics Research Laboratory, Delft University of Technology, 2628CD
Delft, The Netherlands (e-mail: j.r.long@its.tudelft.nl).
Digital Object Identifier 10.1109/JSSC.2002.804337
1782
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 12, DECEMBER 2002
variations as well as noise. In addition, an early/late PD possesses intrinsic matching between the sampling and retiming
phases, allowing operation at speeds where it is difficult to
match the delay of a conventional analog phase detector to that
of the retiming latch. It is this combination of robustness and
speed that makes the bangbang PD an excellent choice for an
IC-based implementation.
Fig. 2 shows a block diagram of the CDR-PLL implemented
in this work. Data is compared to the voltage-controlled oscillator (VCO) clock at the loop input. If the falling clock edge
occurs before the data transition (early), the early/late phase de. If the clock edge falls after the
tector outputs a voltage,
data transition (late), then the phase detector outputs a voltage
. In the case where there is no data transition during the clock
period, the phase detector outputs a 0 voltage. The PD output
pulses are then filtered by integral and differential loop filter
branches. An important parameter of the early/late phase detector based CDR is the bangbang frequency step of the VCO,
,
or
(1)
is the output voltage of the phase detector,
where
gain of the proportional branch of the loop, and
tuning gain of the VCO in Hz/V.
is the
is the
A. System-Level Design
Characterization of early/late (bangbang) PLL behavior
is described by Walker [2] and analysis of its application to
SONET compatible CDR systems is found in the publication
by Greshishchev [3].
At the system level, design of the bangbang CDR is reduced
to the selection of two main system parameters. The first is
, which (somewhat inconthe bangbang frequency step
veniently) simultaneously governs jitter tolerance performance,
the related jitter transfer bandwidth, and jitter generation. Optimal jitter generation is realized by setting the bangbang frequency step to the lowest value which still meets the jitter tolof 3 MHz was selected, which inerance specification. A
cludes some margin so that the CDR exceeds the jitter tolerance
specification for OC-192.
The second system parameter is the loop stability factor ,
defined by Walker [2] as
(2)
ROGERS AND LONG: 10-Gb/s CDR/DEMUX WITH LC DELAY LINE VCO IN 0.18- m CMOS
Fig. 3.
1783
tages over conventional CMOS and other all-NMOS implementations. First, it has a relatively low output impedance making
it suitable for high-speed operation. MCML logic also benefits
from reduced logic voltage swing as well as from the elimination of lower mobility PMOS transistors compared to CMOS
logic. The MOS equivalent of a bipolar ECL gate is not practical, especially from a 1.8-V supply due to attenuation of the
signal by source followers and lack of headroom.
Another benefit of using MCML is reduced switching-related supply noise, due to the relatively constant current drawn
from the power supply. For improved supply rejection, the gain
stages and output buffers of the ring oscillator are implemented
as MCML inverter/buffers. An additional benefit of this simple
design is that the clocked phase detector elements can interface
with the ring oscillator without level shifting or swing adjustment.
The first goal of this design is to create a buffer which has the
widest possible bandwidth, while still having enough gain. A
minimum value of approximately 2 for the small-signal gain was
chosen, otherwise the gate noise margin becomes unacceptable.
Biasing of the circuit so that the large-signal switching speed
approaches maximum performance is now considered.
) is seFirst, an appropriate voltage swing (
lected. The voltage swing is made as large as possible, without
forcing the switching transistors into the triode region at any
time during the cycle. Larger drain-to-gate capacitance of the
MOSFET in the triode region limits the switching speed. Note
that gain is (first-order) dependent on the voltage swing, but
1784
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 12, DECEMBER 2002
(a)
(b)
Fig. 5.
MCML logic gates. (a) Inverter/buffer. (b) Cascode logic gate (AND).
The latch design (refer to Fig. 6) is critical to the jitter generation performance of the CDR system. Two important nonide-
ROGERS AND LONG: 10-Gb/s CDR/DEMUX WITH LC DELAY LINE VCO IN 0.18- m CMOS
Fig. 7.
(a)
1785
(b)
Fig. 6.
alities, metastability and hysteresis, directly translate into degraded jitter generation performance. This degradation occurs
above and beyond the jitter generation, which is inherent in the
system properties of the early/late phase detector-based PLL.
Metastability occurs when positive feedback during the latch
phase cannot ramp the output from the initial tracked voltage
to the voltage required for reliable recognition of the dig. An approximate value for the minimum time
ital state
is [7]
required to latch an initial voltage
(4)
term is not much greater than
It should be noted that the
one for a typical MCML latch with resistive loading in a typical
0.18- m CMOS technology.
Maximizing the tracking bandwidth reduces the pattern-dependent jitter (hysteresis) caused by the phase detector latches.
Increasing bandwidth by reducing the gain creates a tradeoff between resistance to metastability and the tracking bandwidth.
Tracking bandwidth is improved without increasing the likelihood of metastability by making device sizes in the input latches
(e.g., the master latch) significantly larger than the slave. This
reduces the relative loading of the master by the slave latch. This
technique was used in the latches which sample the incoming
full-rate data in the 1 : 2 demux and on the quadrature clock
1786
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 12, DECEMBER 2002
(a)
(a)
(b)
Fig. 9. NMOS varactor. (a) Varactor structure. (b) Varactor C V curve.
(b)
Fig. 8.
TABLE I
SIMULATED VCO SENSITIVITIES
ROGERS AND LONG: 10-Gb/s CDR/DEMUX WITH LC DELAY LINE VCO IN 0.18- m CMOS
1787
TABLE II
MEASURED VCO PERFORMANCE
Fig. 12.
Fig. 13.
Fig. 10.
open-loop and phase-locked and the phase noise of the reference source are all plotted in Fig. 10. At a 1-MHz offset from
the carrier, the free-running VCO phase noise is 103 dBc/Hz,
which falls to 127 dBc/Hz when the CDR is locked to a
sinusoidal data input at 2.5 GHz. The reference source phase
noise is also shown ( 135 dBc/Hz). The difference is primarily
due to frequency multiplication between the reference source
(2.5 GHz) and locked VCO (5 GHz), which adds a minimum
of 6 dB to the phase noise.
The fabricated oscillator has a measured center frequency of
4.45 GHz (10% slower than predicted by simulations). Subsequent measurements of individual component test structures revealed that the frequency shift is caused by unanticipated loss
and delay between the oscillator gain stages. Excessive resistive
losses in the top metal, inaccuracy in the modeling of parasitic
capacitances, and stray inductance between the delay line and
gain stages (which is not extracted from the physical layout for
simulation) all contribute to this error. The measured capacitance per unit length and resistance per unit length of the delay
line are 45% and 28% higher, respectively, than those derived
from the same simulation. This result exposes a sensitivity of
the circuit to the absolute loss and capacitance in the delay line
and, more importantly, sensitivity to variations in these parameters over process. Work is ongoing to analyze the architecture
for variations in the properties of the backend metal/dielectric
stack and the actual magnitude of these variations. This characterization work is aimed at improving the accuracy of the CAD
models thereby allowing better correlation between the simulation and measurement. Nevertheless, it is important to note that
oscillators from two separate fabrication runs showed only 0.2%
variation in VCO center frequency.
The measured external tuning range and bangbang frequency step (varied using the BB input) are 125 and 2.55 MHz,
respectively. Characterization of this varactor using a separate
test structure showed 40% less capacitance variation than
expected, thus explaining the smaller tuning ranges observed.
A VCO center frequency close to 4.98 GHz is needed in order
to conduct full-speed testing of the CDR including bit error rate
testing (BERT) at the SONET OC-192 rate (9.953 Gb/s). A noninvasive technique for adjusting the oscillation frequency was
1788
Fig. 14.
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 12, DECEMBER 2002
ACKNOWLEDGMENT
Circuit fabrication was facilitated by the Canadian Microelectronics Corporation. The authors thank Dr. Y. Greshishchev and
Dr. P. Schvan for providing access to test facilities at Nortel Networks Ottawa Laboratories.
ROGERS AND LONG: 10-Gb/s CDR/DEMUX WITH LC DELAY LINE VCO IN 0.18- m CMOS
REFERENCES
[1] M. J. Reizenman, Optical nets brace for even heavier traffic, IEEE
Spectr., pp. 4445, Jan. 2001.
[2] R. C. Walker, C. L. Stout, J.-T. Wu, B. Lai, C.-S. Yen, T. Hornak, and P. T.
Petruno, A two-chip 1.5-GBd serial link interface, IEEE J. Solid-State
Circuits, vol. 27, pp. 18051811, Dec. 1992.
[3] Y. M. Greshishchev, P. Schvan, M. Xu, J. Showell, J. Ohja, and J. E.
Rogers, A fully integrated SiGe receiver IC for 10-Gb/s data rate,
IEEE J. Solid-State Circuits, vol. 35, pp. 19491957, Dec. 2000.
[4] J. Savoj and B. Razavi, A 10Gb/s CMOS clock and data recovery circuit with frequency detection, in Proc. ISSCC, San Francisco, CA, Feb.
2001, pp. 7879.
[5] J. E. Rogers and J. R. Long, A 10Gb/s CDR/demux with LC delay
line VCO in 0.18m CMOS, in Proc. ISSCC, San Francisco, CA, Feb.
2002, pp. 254255.
[6] J. Hauenschild, C. Dorschky, T. Winkler von Mohrenfels, and R. Seitz,
A plastic packaged 10 Gb/s BiCMOS clock and data recovering 1 : 4demultiplexer with external VCO, IEEE J. Solid-State Circuits, vol. 31,
pp. 20562059, Dec. 1996.
[7] D. A. Johns and K. Martin, Analog Integrated Circuit Design, First
ed. New York: Wiley, 1997.
[8] J. R. Long and M. A. Copeland, The modeling, characterization, and
design of monolithic inductors for silicon RFICs, IEEE J. Solid-State
Circuits, vol. 32, pp. 357367, Mar. 1997.
[9] P. Andreani and S. Mattisson, On the use of MOS varactors in RF
VCOs, IEEE J. Solid-State Circuits, vol. 35, pp. 905910, June 2000.
1789
713
I. INTRODUCTION
II. ARCHITECTURE
A 0.5- m CMOS technology is not fast enough to directly
generate and receive a 4-Gbit/s stream (since the maximum
ring oscillator frequency is <2 GHz). Instead, we use parallelism to reduce the performance requirements of each circuit.
The transmitter generates the bit stream by an 8 : 1 multiplexer
that multiplexes current pulses directly onto the output channel
(Fig. 1). The receiver (Fig. 2) performs a 1 : 8 demultiplexing
by sampling with a bank of input samplers. Similar to the
transmitter, each sampler is triggered by individual clock
phases. Furthermore, clock/data recovery is achieved by a 3
oversampling of each data bit. Thus, the receiver requires a total of 24 clock phases to support both the oversampling and the
1 : 8 demultiplexing. Various techniques exist for generating
multiple clock phases [2], [3]. The receive side uses a sixstage ring oscillator ( -PLL) followed by phase interpolators
to generate intermediate phases (ick[23 : 0]) between the ring
-PLL, eight
oscillator edges (ck[11 : 0]) [1]. Similar to the
different clock phases tapped from a four-stage ring oscillator
( -PLL) control the transmitter multiplexing.
A timing recovery circuit extracts the clock from the multiple samples per bit by finding the positions of the data
transitions. Once the transitions are determined, a decision
logic selects the samples furthest from data transitions (phase
picking) as the received data byte. This approach is similar to
714
715
(a)
(1)
(2)
(b)
Fig. 7. Clock recovery architectures: (a) phase picking block diagram and
(b) data/clock recovery architectures.
716
(3)
Constants that determine the loop bandwidth in the equation
(V/rad) the gain of the filter,
are depicted in Fig. 7(a) with
the stabilizing zero in the filter, and
(rad-hertz/V) the
gain of the VCO.
is the noise induced onto the VCO,
and
is the sensitivity of the VCO to this noise. Thus,
the total amount of effective jitter depends on the tracking
bandwidth of the loop, the amount of supply and substrate
noise, and the sensitivity of the loop elements to the noise.
Because the feedback loop has a loop delay of at least one
clock cycle, the bandwidth of the loop is often chosen to be
<1/10 of the oscillation frequency for sufficient phase margin
and stability. The delay makes tracking high-frequency phase
noise ineffective because, if the phase error from on transition
is independent of the phase of the next transition, correction
2 This causes additional difficulties because such phase detectors can only
determine if transitions are early or late. The control loop is bangbang
control instead of linear control, which is less stable, has inherent dithering,
and requires additional frequency acquisition aid. Although a DLL (delay-line
based PLL) [8] can be used to eliminate the stability and frequency acquisition
problems, the phase spacing, when tapping phases from the buffer stages, is
sensitive to the input clocks duty cycle and amplitude.
(a)
(b)
Fig. 8. Effect of tracking bandwidth on jitter.
H (s)
varies with
less than , half the bit time, even if the peak-to-peak jitter can
greater than are
be much larger than a bit time. Changes
indistinguishable from a phase shift in the opposite direction,
.
Choosing between the two clock recovery systems depends
on the system requirements and noise behavior. We chose a
phase-picking architecture to explore the usefulness of the
higher phase-tracking capability. In such VLSI implementations, supply noise can be significant enough for the peak-topeak jitter to occupy a large fraction of the bit time, especially
since a PLL accumulates jitter. For the 4-Gbit/s link, we
chose a low oversampling ratio of 3 to maintain high input
bandwidth and to keep the number of clock phases manageable
(1 : 8 demultiplexing and 3 oversampling yields 24 phases).
With a bit time of 250 ps, the phase-picking scheme4 can track
the noise of the on-chip multiphase generator (PLL) from both
the transmit and receive sides to keep the total effective jitter
below the 83-ps quantization spacing. One limitation of the
phase-picker tracking is that the maximum rate of the tracking
depends on the data transition density. Since the PRBS signal
guarantees one transition per byte, the maximum tracking rate
of one sample spacing every transition is fast (83 ps/2 ns).
Although the tracking rate is high, the maximum static phase
error from the quantization is 41 ps (2% of the clock period,
8 bit time), causing an SNR penalty (Fig. 5). Whether or
not a 3 oversampled phase-picking approach with higher
tracking bandwidth than a PLL can achieve better performance
with the larger static phase error depends on the amount of
jitter induced by on-chip noise sources. If the lower SNR
penalty from the lower jitter compensates the higher SNR
penalty of larger static phase error, phase picking would be
the better choice.
IV. PHASE-PICKING ALGORITHM AND IMPLEMENTATION
The details of the phase-picking algorithm are illustrated in
Fig. 9. Picking the center sample requires finding and tracking
the bit boundaries. The decision logic first detects transitions
by an XOR of adjacent samples, indicating the bit boundary to
be in one of three possible positions. Fig. 10 shows an example
of the boundary detection with a portion of a sampled stream.
To find which of the three transition positions is the most
likely bit boundary, transitions corresponding to the same bit
boundary position are tallied. The position with the largest
total determines the bit boundaries.
The decision logic makes a new decision per byte of data. In
contrast to a higher order oversampling phase picker, the 3
oversampling limits the change of the selected sample position
to one sample position per byte. To guarantee sufficient
transitions for averaging any bit-to-bit variations of highfrequency noise (near the bit rate), the tally is across a sliding
window of 3 bytes. The transitions are accumulated from the
current byte, the previous byte, and the next byte (delaying the
data allows the noncausal information) so that the decision is
applied to the byte at the middle of the window. As a result
4 In
717
718
Fig. 11.
Fig. 13.
719
720
Fig. 15.
721
ACKNOWLEDGMENT
The authors would like to thank S. Sidiropoulos, B. Amrutur, K. Falakshahi, Vitesse Semiconductor, Prof. T. Lee,
Prof. L. Kazovsky, and their research groups for invaluable
discussions and assistance.
Fig. 18.
REFERENCES
[1] C.-K. Yang and M. Horowitz, A 0.8 m CMOS 2.5 Gbps oversampling
receiver and transmitter for serial links, IEEE J. Solid-State Circuits,
vol. 31, Dec. 1996.
[2] C. Gray et al., A sampling technique and its CMOS implementation
with 1-Gb/s bandwidth and 25 ps resolution, IEEE J. Solid-State
Circuits, vol. 29, Mar. 1994.
[3] J. Maneatis and M. Horowitz, Precise delay generation using coupled
oscillators, IEEE J. Solid-State Circuits, vol. 28, pp. 12731282, Dec.
1993.
[4] K. Lee et al., A CMOS serial link for fully duplex data communications, IEEE J. Solid-State Circuits, vol. 30, pp. 353364, Apr.
1995.
[5] A. Fiedler et al., A 1.0625Gb/s transceiver with 2 -oversampling and
transmit signal pre-emphasis, in ISSCC97 Dig. Tech. Papers, Feb.
1997, pp. 238239.
[6] A. Widmer et al., Single-chip 4
500 Mbaud CMOS transceiver,
IEEE J. Solid-State Circuits, vol. 31, pp. 20042014, Dec. 1996.
[7] F. M. Gardner, Phaselock Techniques, 2nd ed. New York: Wiley, 1979.
[8] W. Dally and J. Poulton, A tracking clock recovery receiver for 4-Gb/s
signaling, in Hot Interconnect97 Proc., Aug. 1997, p. 157.
[9] S. Sidiropoulos and M. Horowitz, A semi-digital DLL with unlimited
phase shift capability and 0.08400MHz operating range, in ISSCC95
Dig. Tech. Papers, Feb. 1995, pp. 332333.
[10] J. E. McNamara, Technical Aspects of Data Communication, 2nd ed.
Bedford, MA: Digital, 1982.
[11] S. Kim et al., An 800Mbps multi-channel CMOS serial link with 3
oversampling, in IEEE 1995 CICC Proc., Feb. 1995, p. 451.
[12] M. J. Pelgrom, Matching properties of MOS transistors, IEEE J.
Solid-State Circuits, vol. 24, p. 1433, Dec. 1989.
[13] J. A. Crawford, Frequency Synthesizer Design Handbook. Boston,
MA: Artech House, 1994.
[14] J. Proakis, Communication Systems Engineering. Englewood Cliffs,
NJ: Prentice-Hall, 1994.
722