Professional Documents
Culture Documents
Abstract – Power reduction is nowadays becoming the first Register-transfer level (RTL) becomes the most proper
consideration in VLSI design. Low power is one of major stage where power optimization has significant impact and
concerns in deeply scaled CMOS technologies. There have been power estimation is accurate [1].
many methods in very wide rang to achieve this objective. And
the Register-Transfer level (RTL) has become the most effective
stage in low power VLSI design, according to the significant
power optimization impact and accurate power estimation. In this
paper, some respective low power design techniques at RTL are
re-investigated at tsmc 45 nanometer CMOS technology. Clock
gating (CG) is one of the most widely used and effective technique
in RTL low power design. Without the enable signal, bus-specific
clock gating (BSC) and threshold-based clock gating (TCG) are
considered. Also an improved active-driven optimized bus-specific Fig. 1. Non-CG circuit: (a) Without enable (b) With enable [4]
clock gating (OBSC) is proposed in our laboratory. When the This paper explains some representative low power techniques
enable signal is taken into account, this paper explains local- at RTL, other techniques can be referred to any of excellent
explicit clock gating (LECG), enhanced clock gating (ECG),
references.
waste-toggle-rate-based (WTR) clock gating and the single
comparator-based clock gating (SCCG) techniques. Operand For Register-transfer level, digital circuits always contain
isolation is another useful design technique for reducing the some computations which are redundant, in other words,
power consumption by blocking some redundant operations. power reduction can be achieved by reducing these idle circuit
Memory splitting is an effective design solution for low power operations [5]. Clock gating (CG) is the most widely used and
design as well. These techniques have been experimented by using effective technique at RTL. Two typical non-CG circuits are
tsmc 45nm technology library and the proposed low-power RTL mentioned in [4] as shown in Fig. 1. Fig. 1(a) is the circuit
techniques are evaluated at gate level with logic synthesis results. without enable signal while the other one is with enable signal.
Without the enable signal, this paper explains bus-specific
Keywords – Low power design, RTL, Clock gating, Operand clock gating (BSC) [2], threshold-based clock gating (TCG)
isolation, Memory splitting
[3] and optimized bus-specific clock gating (OBSC) [1]
I. INTRODUCTION techniques. These techniques reduce the power consumption
by taking switching activity of signals into account. With the
Considering Moore’s Law and the trend of industrial enable signal as shown in Fig. 1(b), this paper includes local-
technology, integrated circuit densities and operating speeds explicit clock gating (LECG) [4], enhanced clock gating
have continued to go up during the past decades of years, and (ECG) [4], waste-toggle-rate-based (WTR) clock gating [4]
this change will be unabated [5]. With this rapid progress of and the single comparator-based clock gating (SCCG) [6]
technology, there will be larger chips, more complex design, techniques. Not only all bits of data values at two consecutive
faster operation time and then result in tremendously increased clk periods, but also the enable signal should be considered for
power consuming. Under this circumstance, power reduction wasted clk toggles in this kind of circuit. Operand isolation
has become the first consideration in VLSI design. According technique reduces power by blocking the propagation of
to infinite electricity resource, reducing power is highly switching activity through the circuit [7]. Because of the much
appreciated for tradeoff between more complex designs and less power consumption of operation to the half-size memory
less power consumption in a proper operation time. There are than the full size memories, memory splitting technique can
many techniques, such as multiple voltages design [9], pre- save the power.
computation, clock gating, operand isolation [7] and memory The low power techniques aforementioned are explained in
splitting [8], have been come up with in a wide range from Section II. Section III shows experimental results followed by
system level to layout level to achieve power reduction. conclusion in Section IV.
Fig. 4. Design with operand isolation Fig. 5. A memory with a large number of words split into two smaller
important constraint. If there is a constant input from outside memories
of the combination logic circuit, the pipeline can work or write to the full size memories. Even though the same
properly in this situation; however, if there is a variable input number of reads and writes are occurring in the new
from outside during an intact data transmitting procedure, then architecture, the power is reduced because each read or write
the pipeline could not work properly, because such an input is on only one of the smaller memories. To implement this
may change the data consistency in the data path [6]. change, replace the single memory with two memories, each
having half the number of words. Select a bit of the address,
G. Waste-toggle-rate-based (WTR) clock gating usually the MSB or LSB, to operate as a .bank select.. This bit
In [4], a RT level power reduction scheme is proposed. This should be used as the chip enable to one memory, and inverted
technique can be used for any applications that have power to be used as the chip enable to the other memory. This bit
problem when designers use traditional design flow. A novel should also be used as the select input to a new multiplexor on
wasting-toggle-rate based clock power reduction technique is the memory outputs. This mux selects the appropriate memory
introduced and verified along with traditional design flow. The bank.
proposed technique can choose optimal clock-gating style
selectively to minimize the power based on proposed wasting III. EXPERIMENTAL RESULTS
toggle-rate analysis at RT level, and the optimization is based
The BSC, TCG and OBSC techniques have been
on proposed power equations without simulating the design at
experimented on all the ISCAS’89 benchcircuits by using tsmc
gate level [4].
45nm technology library and Synopsys power complier. Also
H. Operand isolation the circuits are simulated for 10000 clock cycles (clock
Designs which do not fully utilize their arithmetic datapath frequency is 250 MHz) with random inputs. Table I reports the
components typically exhibit a significant overhead in power area, delay and power of the non-CG circuit, the BSC circuit,
consumption. Whenever a module performs an operation the OBSC circuit and the TCG circuit of each ISCAS’89
whose result is not used in the downstream circuit, power is circuit. Table II gives the comparative results of the three CG
being consumed for an otherwise redundant computation. circuits versus the non-CG circuit.
Operand isolation is a technique to minimize the power As shown in Table II, the traditional BSC circuit will
overhead incurred by redundant operations by selectively increase power for many circuits, and the average power is
blocking the propagation of switching activity through the increased by 208.96%. On the other hand, compared to the
circuit [7]. non-CG circuit, the OBSC circuit reduces 26.95% power on
In [7], it discusses how redundant operations can be average for all circuits, which is 16% more than the TCG
identified concurrently to normal circuit operation, and circuit. The area and delay of the OBSC circuit are increased
presents a model to estimate the power savings that can be by 14.44% and 5.77% separately. Meanwhile, we should know
obtained by isolation of selected modules at the register the impact of improvement of synthesis process.
transfer (RT) level as shown in Fig. 4. Based on this model, an For operand isolation, a simple benchmark is shown in Fig.
algorithm is presented to iteratively isolate modules while 4, and we use tsmc 45nm technology library and Synopsys
minimizing the cost incurred by RTL operand isolation. power complier. Also the circuits are simulated for 10000
clock cycles (clock frequency is 250 MHz). As shown in Table
I. Memory splitting II, AND isolation style gives the maximum power reduction
Fig. 5 shows a memory with a large number of words split which is 17.67% with only 3.17% delay increase, and isolated
into two smaller memories, each with half the number of candidate is a1.
words. For memory splitting, a simple benchmark is shown in Fig. 5
and we use tsmc 45nm technology library and Synopsys power
TABLE II splitting are the applicable techniques for low power VLSI
POWER CONSUMPTION AND REDUCTION VS. AREA AND DELAY
design.
circuit Power Delay Area
REFERENCES
[μW] % [ns] % [μm2] % [1] L. Li and K. Choi, Activity-driven optimized bus-specific-clock-gating
non_iso 63.54 n/a 1.26 n/a 596.95 n/a for ultra-low-power smart space applications, IET Commun., 2011, Vol.
5, Iss. 17, pp. 2501–2508.
AND_a0 58.76 -7.52 1.51 19.84 641.53 7.47
[2] T. Lang, E. Musoll, and J. Cortadella. Individual flip-flops with gated
AND_a1 52.31 -17.67 1.30 3.17 573.02 -4.01 clocks for low power datapaths. IEEE TCAS-II: Analog and Digital
OR_a0 59.86 -5.79 1.45 15.08 640.13 7.23 Signal Processing, 44(6), 1997.
[3] A. Bonanno, A. Bocca, A. Macii, E. Macii., and M. Poncino. Datadriven
OR_a1 56.22 -11.52 1.23 -2.38 621.35 4.09 clock gating for digital filters. Integrated Circuit and System Design.
LATCH_a0 60.87 -4.20 1.27 0.79 686.59 15.02 Power and Timing Modeling, Optimization and Simualtion.
LATCH_a1 57.92 -8.84 0.68 -46.03 642.47 7.63 [4] L. Li and K. Choi, Selective clock gating by using wasting toggle rate,
Electro/Information Technology, 2009. eit '09. IEEE International
complier. Splitting into 2 half-memory will gives 12.72% Conference, page(s): 399-404.
power reduction, however, the delay increases 18.18% (the [5] M. Pedram and A. Abdollahi, Low-power RT-level synthesis techniques:
a tutorial, Computers and Digital Techniques, IEE Proceedings, Volume
results table is not shown in the paper).
152, Issue 3, Page(s):333 ̢ 343, 6 May 2005.
IV. CONCLUSION [6] W. Wang, Y. C. Tsao, K. Choi, S. Park, M. K. Chung, Pipeline power
reduction through single comparator-based clock gating, SoC Design
Because of much effectiveness of power optimization and Conference (ISOCC), 2009 International , page(s): 480-483.
the accuracy of power estimation, RTL is proposed stage in [7] Munch, M., Wurth, B., Mehra, R., Sproch, J. and Wehn, N., Automating
RT-Level Operand Isolation to Minimize Power Consumption in
low power VLSI design. This paper re-investigated some Datapaths, Design Automation and Test in Europe Conference and
representative techniques at RTL for low power VLSI design Exhibition 2000 Proceedings. Page(s):624 ̢ 631, March 2000.
including BSC, TCG, OBSC, LECG, SCCG, WTR based CG, [8] Sequence Design, Inc, ĀPowerTheater User Guideā, 2007.
operand isolation and memory splitting with tsmc 45 nm [9] S . Raje and M. Sarrafzadeh, “Variable voltage scheduling,” in Proc.
technology. We found that if the modules or data paths are not Int’l. Workshop Low Power Design, Aug. 1995, pp. 9–14.
the critical path, OBSC, operand isolation and memory