You are on page 1of 121

Power Grid Analysis in VLSI Designs

A Thesis Submitted for the Degree of Master of Science (Engineering) In the Faculty of Engineering

By

Kalpesh Shah

Super Computer Education and Research Centre Indian Institute of Science Bangalore 560012 March 2007

Acknowledgements

My sincere gratitude to both my guides - Prof S K Nandy and Dr. Vish Visvanathan. Prof Nandy, thank you for your guidance right from the start of the MS curriculum till the end. I would not have dreamt of the final chapters had it not been for your timely guidance. To Vish, thank you for bearing with me and guiding me from the beginning till end, in your busy schedule at office. You are the one who encouraged me from enrolling for this program till end. Thank you for your valuable inputs and comments on the material. My sincere thanks to IISc and specifically SERC staff who helped me through various administrative work.

To my colleagues and managers at Texas Instruments, thank you for your cooperation you are a team I am proud of. Thanks for your support and the camaraderie. A special thanks to Harinath for approving my MS Program and Venugopal Puvvada, my manager when most of this work happened. Discussions with him made this work relevant to Multimillion gate designs and found real application.

Thanks to many of my friends with whom I discussed similar topics like my research throughout this period Ananth, Gokul, Mallik, Suravi, Saby, Bram, Ashish, Aishwarya and Sumedha. A special thanks to Anjana Ghose for all that you did for me while I was not in Bangalore.

Thanks to my family for having stood behind me like a rock. To my parents, thanks for your support and affection your unrelenting persistence helped me to complete last step. To Pratiksha thank you for being my invisible strength. Your constant reassuring presence and confidence in me drove me to this point in journey. To Bhavesh and Deepti thank you for being my savior at times of load at home. Without you folks, this thesis would not have materialized. And finally, thanks to little Harsh who came to this world halfway through my MS and Darsh who saw my MS from the age of 1 year you kept me giving unasked needed breaks and made everything so live.

Table of Contents
Acknowledgements.................................................................................................................. 3 Abstract ................................................................................................................................... 11 1
1.1
1.1.1 1.1.2 1.1.3

Introduction ...................................................................................................................13
Motivation ........................................................................................................................................13
Power Estimation ................................................................................................................................... 16 Power Supply Noise ............................................................................................................................... 17 MTCMOS Analysis ................................................................................................................................. 22

1.2 1.3

Terms ..............................................................................................................................................24 Thesis outline and Contribution......................................................................................................25

2
2.1 2.2 2.3
2.3.1 2.3.2

Toggle Activity Estimation...........................................................................................27


Overview .........................................................................................................................................27 Toggle Activity Estimation ..............................................................................................................29 Multi-million gate solution ...............................................................................................................30
Deriving automatic toggle frequency values.............................................................................................. 31 Hierarchical Modeling ............................................................................................................................. 35

2.4 2.5

Validation and Results ....................................................................................................................37 Summary .........................................................................................................................................38

3
3.1 3.2 3.3
3.3.1 3.3.2 3.3.3 3.3.4

Power Estimation.......................................................................................................... 39
Overview .........................................................................................................................................39 Current approaches to Power Analysis..........................................................................................42 Power analysis Tools ......................................................................................................................45
Power Compiler: [67] .............................................................................................................................. 45 Power Mill (or Nano Sim) [4][68] .............................................................................................................. 46 Prime Power [66].................................................................................................................................... 47 Other Tools ............................................................................................................................................ 47

3.4
3.4.1 3.4.2 3.4.3

Validation Flow ................................................................................................................................48


Netlist Setup:.......................................................................................................................................... 50 Vector Generation .................................................................................................................................. 50 Interconnect setup .................................................................................................................................. 51

3.5 3.6
3.6.1 3.6.2 3.6.3 3.6.4 3.6.5

Validation and Results ....................................................................................................................51 Power estimation applications ........................................................................................................60


Average power/ground bus currents ........................................................................................................ 60 Average power dissipation ...................................................................................................................... 61 Electro migration failures......................................................................................................................... 61 Power Routing........................................................................................................................................ 61 Gate Oxide Integrity Analysis .................................................................................................................. 62

3.7

Summary .........................................................................................................................................62

4
4.1 4.2
4.2.1 4.2.2

Power Supply Noise Analysis ..................................................................................... 63


Overview .........................................................................................................................................63 Cell Characterization.......................................................................................................................64
Current Characterization Methodology..................................................................................................... 65 Current Characterization Flow ................................................................................................................. 71

4.3
4.3.1

Power Grid network modeling ........................................................................................................72


Power Grid Current Waveform Modeling .................................................................................................. 74

4.4

Complete Flow ................................................................................................................................78

4.4.1 4.4.2 4.4.3

Timing Information Generation ................................................................................................................ 80 Power Grid Generator............................................................................................................................. 80 SPICE Simulation................................................................................................................................... 82

4.5
4.5.1 4.5.2

Validation and Results ....................................................................................................................82


Peak Power Results ............................................................................................................................... 83 Peak Dynamic IR Drop Results ............................................................................................................... 84

4.6

Summary .........................................................................................................................................87

5
5.1 5.2
5.2.1 5.2.2

Power Up Analysis........................................................................................................89
Switched PG Networks ...................................................................................................................91 Switch Network Analysis.................................................................................................................94
Switch Characterization .......................................................................................................................... 95 Current or Switch Prediction.................................................................................................................... 96

5.3 5.4

Results and Analysis.......................................................................................................................99 Summary .......................................................................................................................................104

6
6.1 6.2

Conclusion...................................................................................................................105
Summary .......................................................................................................................................105 Scope of Future Work...................................................................................................................106

References...................................................................................................................109

Appendix A Sample SDC file...............................................................................................115 Appendix B Sample SPEF Format......................................................................................116 Appendix C Power Waveforms Analysis........................................................................... 118 Appendix D Current Characterization sample spice deck ........................................... 119 Appendix E Waveform transformation example...............................................................120

Table of Figures
Figure 1.1 Power Dissipation in CMOS designs ......................................................................................13 Figure 1.2 Power Density trend in CMOS designs...................................................................................14 Figure 1.3 Leakage and Dynamic Power Dissipation [2].........................................................................15 Figure 1.4 Schematic of Power Grid in CMOS designs...........................................................................18 Figure 1.5 Normalized delay and normalized delay to voltage ratio........................................................21 Figure 1.6 Total power break up into leakage and active........................................................................23 Figure 2.1 Schematic of logic circuit 1......................................................................................................31 Figure 2.2 Schematic of Logic Circuit 2....................................................................................................32 Figure 2.3 Gated clock example ...............................................................................................................34 Figure 2.4 Gate Level Netlist for 'simple' design......................................................................................36 Figure 2.5 Timing Arcs in extracted model of 'simple' design..................................................................37 Figure 3.1 Venn diagram of Power Components.....................................................................................40 Figure 3.2 Power Estimation in Design Stages ........................................................................................45 Figure 3.3 Power Estimation Validation Flow...........................................................................................49 Figure 3.4 Legends for Validation Flow ....................................................................................................49 Figure 4.1 Voltage over time representation at an internal design node ................................................63 Figure 4.2 Schematic circuit for instantaneous voltage drop analysis ....................................................64 Figure 4.3 Inverter waveforms measured at different nodes...................................................................66 Figure 4.4 transition time vs. peak power for Inverter..............................................................................68 Figure 4.5 Transition time vs. peak power for nand gate.........................................................................68 Figure 4.6 Load vs. peak power for AND gate.........................................................................................69 Figure 4.7 Load vs. Peak power for OR gate...........................................................................................69 Figure 4.8 State Dependency on cell switching .......................................................................................70 Figure 4.9 Cell Characterization Flow.......................................................................................................72 Figure 4.10 Power Grid Modeling .............................................................................................................73 Figure 4.11 Peak IR drop Computation Flow ...........................................................................................79 Figure 4.12 Prime Time flow for arrival time computation .......................................................................80 Figure 4.13 Power Grid Generation Flow.................................................................................................81 Figure 4.14 PSN waveform of Proposed Method.....................................................................................86 Figure 4.15 PSN Reference Waveform ....................................................................................................86 Figure 5.1 Gated Power Supply ([74]) ......................................................................................................89 Figure 5.2 Layout of 1M gate with switch network ...................................................................................92 Figure 5.3 Current Glitch and Voltage Ramp at arbitrary switch output..................................................92 Figure 5.4 Typical PG network with Power Switches...............................................................................93 Figure 5.5 Schematic Switch network Analysis Flow...............................................................................95 Figure 5.6 Analysis model of Virtual Power Network...............................................................................96 Figure 5.7 Infinitesimal Time Division for Current Prediction...................................................................97 Figure 5.8 Reduced Switch Network for validation ................................................................................100 Figure 5.9 Voltage Ramp up over Time for various nodes ....................................................................103 Figure 5.10 Current comparison over time.............................................................................................103 Figure 1 1MHz, Peak: 838.9 uW.............................................................................................................118 Figure 2 100MHz, Peak: 840.7 uW.........................................................................................................118

Figure 3 1GHz, Peak: 838.2 uW.............................................................................................................118 Figure 4 1MHz base Waveform, 830.4uW .............................................................................................120 Figure 5 100MHz Transformation, 830.4 uW .........................................................................................120 Figure 6 1GHz Transformation for 1MHz, 830.4uW ..............................................................................121

List of Tables
Table Table Table Table Table Table Table Table Table Table Table Table Table Table 1.1 Consolidation of ITRS2003 Predictions ...................................................................................14 1.2 Generic Term Definitions ..........................................................................................................25 2.1 Comparison of Static vs Dynamic approaches for Power Estimation.....................................28 3.1 Power Modeling for CMOS gates.............................................................................................43 3.2 ISCAS89 circuit description ......................................................................................................54 3.3 Runtime comparison between vector less and SPICE............................................................55 3.4 Clock Power vs. Total Power....................................................................................................57 3.5 Power Estimation across various tools ....................................................................................60 4.1 Comparison of Peak power Dissipation ...................................................................................84 4.2 Comparison of percentage peak instantaneous IR drop.........................................................85 4.3 Comparison of percentage peak IR drop on ISCAS89 circuits...............................................85 5.1 Switch Prediction by proposed algorithm...............................................................................102 5.2 Voltage Prediction...................................................................................................................102 5.3 Power Up analysis - Runtime Comparison ............................................................................103

10

Abstract
Power has become an important design closure parameter in todays ultra low submicron digital designs. The impact of the increase in power is multi-discipline to researchers ranging from power supply design, power converters or voltage regulators design, system, board and package thermal analysis, power grid design and signal integrity analysis to minimizing power itself. This work focuses on challenges arising due to increase in power to power grid design and analysis. Challenges arising due to lower geometries and higher power are very well researched topics and there is still lot of scope to continue work. Traditionally, designs go through average IR drop analysis. Average IR drop analysis is highly dependent on current dissipation estimation. This work proposes a vector less probabilistic toggle estimation which is extension of one of the approaches proposed in literature. We have further used toggles computed using this approach to estimate power of ISCAS89 benchmark circuits. This provides insight into quality of toggles being generated. Power Estimation work is further extended to comprehend with various state of the art methodologies available i.e. spice based power estimation, logic simulation based power estimation, commercially available tool comparisons etc. We finally arrived at optimum flow recommendation which can be used as per design need and schedule. Todays design complexity high frequencies, high logic densities and multiple level clock and power gating - has forced design community to look beyond average IR drop. High rate of switching activities induce power supply fluctuations to cells in design which is known as

11

instantaneous IR drop. However, there is no good analysis methodology in place to analyze this phenomenon. Ad hoc decoupling planning and on chip intrinsic decoupling capacitance helps to contain this noise but there is no guarantee. This work also applies average toggle computation approach to compute instantaneous IR drop analysis for designs. Instantaneous IR drop is also known as dynamic IR drop or power supply noise. We are proposing cell characterization methodology for standard cells. This data is used to build power grid model of the design. Finally, the power network is solved to compute instantaneous IR drop. Leakage Power Minimization has forced design teams to do complex power gating multi level MTCMOS usage in Power Grid. This puts additonal analysis challenge for Power Grid in terms of ON/OFF sequencing and noise injection due to it. This work explains the state of art here and highlights some of the issues and trade offs using MTCMOS logic. It further suggests a simple approach to quickly access the impact of MTCMOS gates in Power Grid in terms of peak currents and IR drop. Alternatively, the approach suggested also helps in MTCMOS gate optimization. Early leakage optimization overhead can be computed using this approach.

12

1 Introduction
1.1 Motivation
VLSI industry is facing one of the biggest challenges in its evolution Power Integrity closure the next after cross talk induced integrity issues in previous decade. Power Dissipation has phenomenally increased across years as shown in Figure 1.1 giving rise to this challenge. Figure 1.2 shows the increase in power density due to ultra low scaling and hence increasing the components cramped in unit area.

100000 10000 18KW 5KW 1.5KW 500W Pentium proc 100 286 10 8008 1 4004 0.1 1971 1974 1978 1985 1992 Year 2000 2004 2008 8086 8085 8080 386 486

Power (Watts)

1000

Figure 1.1 Power Dissipation in CMOS designs

13

10000

Power Density (W/cm2)

1000

Rocket Nozzle Nuclear Reactor


8086 4004 8008 8080 8085 286

100

10

Hot Plate
386 486 1990 Year

P6 Pentium proc

1 1970 1980 2000 2010

Figure 1.2 Power Density trend in CMOS designs

Table 1.1 below shows consolidation of ITRS2003 [1] predictions on power as well as its impact on design as well as operating voltages.

2003 Vdd(High Perf) 1.2

2004 (90u) 1.2

2005 1.1

2006 1.1

2007 (65u) 1.1

2008 1

2009 1

2010 (45u) 1

2012 0.9

Vdd(Low Power)

0.9

0.9

0.9

0.8

0.8

0.8

0.7

0.7

High Perf Power (W)

149

158

167

180

189

200

210

218

240

Battery Operated(W)

2.1

2.2

2.3

2.4

2.5

2.6

2.7

2.8

PG Pads

1700

1800

2000

2100

2200

2300

2400

2400

2600

Table 1.1 Consolidation of ITRS2003 Predictions

14

Further, Figure 1.3 shows that there is leakage as well as dynamic component of power those are continuously increasing leakage dominating dynamic in newer technology nodes. [2] Next sections describe how these give rise to challenges in Power Grid analysis and leads to the work done.

Figure 1.3 Leakage and Dynamic Power Dissipation [2]

15

1.1.1 Power Estimation One of the challenges in Power Integrity analysis is to predict accurate power dissipation both average as well as peak - of design. Power Estimation is required for package thermal analysis, power minimization, and Power Grid design. The earliest proposed techniques of estimating power dissipation were strongly patterndependent circuit simulation based e.g. SPICE or fast SPICE simulators [3-6]. Besides being strongly pattern-dependent, these techniques are too slow to be used on modern very largescale integrated (VLSI) circuits for which high power dissipation is a major problem. In order to improve computational efficiency, other simulation-based techniques were proposed using various kinds of timing, switch-level, and logic simulation [7-9]. In these approaches, lookup tables are obtained by electrical simulation of the basic library elements, and the collected data are then used during gate level simulation. These techniques generally assume that the power supply and ground voltages are fixed, and only the supply current waveform is estimated. While they are indeed more efficient than traditional circuit simulation at the cost of some loss in accuracy, they remain strongly pattern-dependent and they are still slow for modern multi-million gate designs where whole chip can not be simulated together. In order to overcome the shortcomings of simulation-based techniques, research has been focused on probabilistic and statistical techniques for toggle estimation. The use of probabilities to estimate power was first proposed in [11]. In this work, a zero-delay model was made so that the transition probabilities could be estimated using signal probabilities. A probabilistic power estimation approach that does compute the toggle power and does not make the zero-delay or temporal independence assumptions, called probabilistic simulation was

16

proposed in a few papers. In this technique, the use of probabilities was expanded to allow the specification of probability waveforms. This approach assumed spatial independence, and was not restricted only to synchronous circuits. Another probabilistic approach was proposed, where the transition density measure of circuit activity was introduced by Farid N. [12]. An algorithm was also presented for propagating the transition density in to the circuit. This approach does not make a zero-delay assumption and makes only the spatial independence assumption. Result of this independence assumption makes computed density values insensitive to the internal circuit delays. Yet another probabilistic approach was presented in [13] by A. Ghosh et. al., where Binary Decision Diagrams (BDDs) were used to take into account internal node correlations and toggle power, at the cost of increased computation. This approach can become computationally expensive. Apart from that, latest literature describes more accurate toggle estimation methods based on Bayesian networks [14-16]. They get limited to handle high gate count designs. All of the above probabilistic and statistical techniques are applicable only to combinational circuits. They require the user to specify information on the activity at the latch outputs. This work addresses the toggle computation problem or pattern dependence problem for multimillion gate designs by extending Najms approach [12]. Using this average power estimation has been performed in various stages of the designs. 1.1.2 Power Supply Noise With a phenomenal rise in the switching speed in the VSLI circuits, the probability of large number of cells switching in a short period of time increases. A large number of simultaneous

17

switching occurring in a short period of time can cause a considerable amount of noise in the power supply network of a circuit. Power supply noise means decrease in voltage seen by cell Power Ground nodes. Schematic of Power Network gird is shown in Figure 1.4. The resistive parasitic R in the power distribution network is accountable for the resistive noise, which is the IR voltage drop in the PG network. Apart from R, on chip decoupling capacitance also plays a big role. The switching noise in the power distribution network must be contained to a tolerable level to ensure the reliability/performance of a circuit.

IO Pad

Vdd Pad Vss Pad

IO Pad

IO Pad

IO Pad

Vss Pad IO Pad 5

Vss Pad

IO Pad

1 IO Pad Vdd Pad Vss Pad IO Pad

Figure 1.4 Schematic of Power Grid in CMOS designs

Excessive voltage drops manifest themselves as glitches on the PG buses and cause: Erroneous logic signals
18

Degradation in switching speeds Reduction in Noise Margin and Driving Capability of the gates

According to a study on Pentium4 [26], power supply noise can reduce clock frequency by 6.5% on 130 nm node and can reduce clock frequency by 8% on 90 nm node. All these are handled through various margins in design flow as there are no efficient solutions available to address dynamic V drop problem in design flow. There is some work done to estimate peak power as well as decoupling capacitor in this regard. In [27], a pattern-independent, linear time algorithm is described that estimates the maximum current waveforms at various contact points in the circuit. The algorithm is first demonstrated for simple gate delay and current models. The expression for modeling the delays and current waveforms for a general gate is derived and the way to extend the algorithm under more general models is also described. The authors improved the work in [28]. In [29] measures of peak power are proposed in the context of sequential circuits, and a procedure is presented to obtain lower bounds on these measures, as well as providing the actual input vectors that attain such bounds. Automatic generation of a functional vector loop for near-worst case power consumption is attained. Paper [30] presents a statistical method for estimating the peak

power dissipation in VLSI circuits. The method is based on the theory of extreme order statistics and its application to the probabilistic distributions of the cycle-by-cycle power consumption, the maximum-likelihood estimation, and the Monte-Carlo simulation. It can be used to predict the maximum power of a VLSI circuit in the set of constrained input vector pairs as well as the complete set of all possible input vector pairs. The simulation-based nature of the method avoids the limitations of a gate-level delay model and a gate-level circuit structure. Also, the method produces maximum power estimates to satisfy user-specified error
19

and confidence levels. Experimental results show that this method typically produces maximum power estimates within 5% of the actual value and with a 90% confidence level by only simulating less than 2500 input vectors. Another technique described in [31] computes peak powers of design while maintaining the current waveform accuracy. It models logic gates by breaking the gates into various nodes. It then models various currents in terms of these nodes which are evaluated quickly during logic simulation to measure power. However, this is based on logical simulation so extremely difficult to scale. Chen and Ling [36] proposed an approach to estimate the power supply noise based on an integrated package-level and chip-level power bus model. Chang, Gupta, and Breuer [37] proposed an analytical model to estimate the ground bounce caused by the switching in the internal circuitry for sub-micron VLSI circuits. Jiang, Cheng, and Deng [38] proposed a Genetic Algorithm-based approach that considered the dependence of switching noise on input patterns under a distributed RC model of the PG network. Zhao, Roy, and Kho proposed an event-driven simulation based approach to calculate the worst case power supply noise under a distributed RLC model [39]. There are still more challenges in this area where very little work has been done. First, to analyze Power Ground (PG) noise, worst case vectors are required using which the parasitic network of chip is simulated. Not only the whole approach needs lot of data and memory but todays SPICE simulators are not able to handle such complexity in terms of runtime and capacity. Many times (read as all the time) determining the worst case vectors is not straightforward.

20

Second, todays design has huge PG network. It is known that the voltages seen at various nodes in this network will vary. A resultant voltage across power-ground bus for a macro impacts the delay as shown in Figure 1.5. Note that delay is non-linear at low voltages. Further, the change in delay to change is voltage is more non linear compare to delay this is of very important to designers as it can cause delay issues or design failures. Due to high dependency of delay to voltage, dynamic V-drop in PG network is fast becoming a critical concern for the chip designers [41][59-60].

normalized delay and normalized delay2voltage

Rise Delay Fall Delay risedelay2voltage_chan ge falldelay2voltage_chang e

1.2 1.15 1.1 1.05

0.95 0.9 0.85 0.8

Voltage

Figure 1.5 Normalized delay and normalized delay to voltage ratio

Third aspect to PG noise problem is that it is an iterative phenomenon [41]. When voltage across cell decreases due to sudden rise in switching activity, it also changes the delays and hence the simultaneous switching. This in turn can reduce/increase the dynamic noise issues. Reduce in a sense that the simultaneous switching may reduce all together or increase because it can move one hot spot of the design to some other hot spot. Handling of this is not a trivial task from analysis perspective.

21

Four, design methodologies today expect analysis to meet predefined PG noise targets. In reality, any acceptable voltage drop is fine if we meet the required timing goals. However, this is not done due to lack of analysis data. Five, it has been found that many times the device fail on testers due to excessive simultaneous switching in SCAN testing. This creates serious testability issues and hence not only we need to analyze dynamic V drop for functional mode but also some other modes like test. This work addresses the dynamic PG noise problem. The problem is also described as dynamic V drop problem in some literature. Based on the above-mentioned issues, the goal is to address the dynamic V drop problem with efficient runtime that addresses todays multi million gate designs. The goal is to also evaluate the impact of dynamic V drop on timing. 1.1.3 MTCMOS Analysis Leakage power consists of more than half of total power in todays ultra sub micron designs. See Figure 1.6 below.

22

Figure 1.6 Total power break up into leakage and active

Leakage power control and power network integrity have become one of the key area of interest for todays power sensitive designs. In comments on Power Consumption Problem at the 2002 International Electron Devices Meeting, Intel chairman Andrew Grove cited off-state current leakage in particular as a limiting factor in future microprocessor integration. [72] Designers have been coming out innovative way to reduce leakage power using various techniques reducing device power supply and frequency of operation [73], Multi-Vt transistor usage [74-79], controlling input states [74], memory leakage reduction [75], using reverse body bias [76], and using transistor stack [77]. A detailed study on sources of leakage power and reduction techniques can be found in [82]. Several techniques are available to reduce the leakage gated power supply using power switches is one of the most promising techniques. Power switches consist of several PMOS

23

transistors and controlling signals and are used to dynamically switch off or on the power supply to specific region in the chip. This work studies the challenges associated with using power switches and proposes fast analysis technique to estimate peak currents while Power ramp up of logic happens.

1.2 Terms
Generic terms used in this report are described below.

ASIC

Acronym for Application Specific Integrated Circuits. A custom or semi custom integrated circuit, such as a cell or gate array, created for a specific application. The complexity of ASICs typically requires significant use of CAD techniques.

Block

Also known as functional block or module. Any block within the design hierarchy instantiated one or more times that will be laid out separately is referred to as a block module. Block modules are defined divisions of a chip based on functionality and can be worked on independently of other functional blocks.

Netlist

A description of the circuit. The description can be a gate-level or RegisterTransfer level (RTL) one. It can also be in different languages like Verilog or VHDL or SPICE.

Physical Design

A portion of a chip or circuit corresponding to a block module that is laid out separately using a Physical Design tool. It is also referred to as a physical block, layout region, or layout block.

RTL Characterization

Acronym for Register Transfer Level Electrical analysis performed for the purpose of determining typical device performance characteristics and/or parametric limits.

24

CMOS

Acronym for Complimentary Metal Oxide Semiconductor. An MOS technology in which both P-channel and N-channel devices are fabricated on the same die.

Die

A single square or rectangular piece of silicon into which a specific semiconductor circuit has been diffused.

Electromigration Particle migration in aluminum or copper thin-film or polysilicon conductors at grain boundaries as a result of high current densities. Electromigration can lead to either an open circuit condition in a conductor or a short between adjacent connectors. Interconnect The metallization connecting two or more active elements on the surface of a die; also, the wires connecting the die to the package leads. Timing Window Timing window specifies the interval of each circuit node at which a transition activity is anticipated. For a single clock domain, the time interval can lie within a clock period. There can be more than one intervals or overlapping intervals based on complexity of path converging to the node.
Table 1.2 Generic Term Definitions

1.3 Thesis outline and Contribution


There are 3 distinct problems addressed in this work. First, Average Power Estimation using probabilistic toggle estimation for multi-million gate designs. Unless specified by the user, the approach calculates switching probabilities as well as switching rate at different nodes in the circuit (including primary inputs). We have studied switching activity calculation method with lot of literature already available and enhanced one of the techniques to meet multimillion gate design needs. This work helps in average dynamic

25

power estimation as well as addresses the challenges of toggle estimation which has varied applications like peak power estimation, power supply noise analysis and reliability analysis. Second, Dynamic Power supply Noise estimation. In this regard, a prototype flow is developed in conjunction with Prime Time STA flow and Spice to measure Power Supply noise. The work describes gate characterization methodology that involves one time SPICE simulation and how the PG network is modeled using the characterized data. Third problem addressed is power grid analysis where MTCMOS gates are inserted. The work focuses on MTCMOS analysis challenges and key factors to focus on when a bunch of logic turns ON from OFF state. In this regard, a flow is developed to estimate peak currents or optimize MTCMOS resistance and switches. We restrict out scope to CMOS circuits mapped on a predefined cell library and we follow the two step paradigm library modeling and analysis of design using modeled information. Library modeling involves description of cells, their functional, structural or electrical behavior as needed for block or design analysis, which happens once for all. Electrical behavior modeling happens through characterization using circuit simulator (e.g. SPICE [3]). The document is organized as below. Toggle estimation problem is addressed in chapter 2. Chapter 3 describes the various Power Estimation techniques and tools available in industry and compares the power numbers with the above toggle estimation method. Chapter 4 describes Power Supply Noise Estimation and Chapter 5 describes MTCMOS Power Up analysis. Finally, huge lists of publications are shown at the end for further reference.

26

2 Toggle Activity Estimation


2.1 Overview
In CMOS technologies, the chip components draw power supply current only during a logic transition if we ignore the small leakage current. The current is also proportional to the supply voltage value seen by the cell or macro. While this is considered an attractive low-power feature of these technologies, it makes the power estimation and voltage drop highly dependent on the switching activity inside these circuits [11][97]. It means, a more active circuit will consume more current and hence will contribute higher Voltage drop. The activity of circuit is known by running simulation patterns and analyzing the data. The pattern-dependence problem is serious. Often, the power of a functional block needs to be estimated when the rest of the chip has not yet been designed, or even completely specified. In such a case, very little may be known about the inputs to this functional block, and complete and specific information about its inputs would be impossible to obtain. This drives pattern independent toggle activity estimation problem, often referred as vector less approach. Since vector less approach does not require patterns, it is also called static whereas vector based approach is called dynamic. Table 2.1 compares these 2 approaches.

STATIC

DYNAMIC

Uses probabilistic approach as described Uses Logic simulation to generate switching in [12] or zero delay simulation based activity or SPICE simulation to calculate power.

27

STATIC approach. Vector-less approach.

DYNAMIC

Vector based approach. Hence quality is as good as input vectors. Imagine number of patterns possible for 100 inputs block.

Many times gives upper bound.

Gives accurate result.

Modeling of certain element (hard Since it is vector based, functional models can be macro/complex block) is difficult. Very fast. (few minutes-hours) used during simulation. Very slow.(few days-weeks)

Lot of research into products for average Can give instantaneous power. power estimation. Synopsys has: Power Compiler Synopsys has: Power Mill (Nano Sim)

Table 2.1 Comparison of Static vs Dynamic approaches for Power Estimation

This work describes the approach used for toggle frequency estimation and its limitations. Further it proposes solution to handle these limitations which makes the approach usable for big designs. Few terms are used below to clarify discussion: Transition Density: If a logic signal x(t) makes n(T) transitions in a time internal of length T, then the transition density of x(t) is defined as: D(x) = n(T)/T where T is very huge time (infinite ideally)

28

For large T, D(x) becomes time invariant function and hence there is no need to account for temporal correlation. Toggle Frequency: If a node x is toggling n(T) times over a time interval of length T, then the toggle frequency F(x) is defined as: F(x) = n(T)/(2*T) where T is very huge time (infinite ideally) Example, if the node is switching at 20 MHz, it is expected that the node will switch 2 times in 50 ns. As it can be seen, the toggle frequency can be converted to transition density or switching activity by the following equation, Toggle density = #of transitions/Period = Switching Activity All the three terms mentioned above are used interchangeably in this document. It should be noted that toggle frequency of a node has no direct relation with the clock domain(s) in which node (or logic) exists. We have used the clock domain frequency to upper bound the toggle frequency calculated by our approach. Signal Probability: Signal probability P(x) at a node x is defined as the average fraction of clock period in which the stead state value of x is logic high.

2.2 Toggle Activity Estimation


This section gives overview of Farid Najms work. Boolean difference of output is computed with respect to each input pin. Boolean difference of function y (output) depends on x(each of the input). It is defined as:

29

dy y = y x =1 x=0 dx

(1)

It was shown in [5] that, if the inputs xI to boolean logic are (spatially) independent, then the density of its output y is given by:
n

D( y ) = P(
i =1

dy ) D ( xi ) dxi

(2)

In (2), it is assumed that all inputs are independent. This can lead to inaccuracy where primary inputs will be diverging and than reconverging to primary outputs they are not really spatially independent. However, at a block, the primary inputs can be considered pretty much independent and hence the above approach can be modeled more accurately if the whole blocks boolean difference is computed. Given the signal probability and toggle density values at the primary inputs of a logic circuit, a single pass over the circuit, using (2), gives the density at every node. Note that apart from estimating toggle densities at the output node, we also need to calculate output signal probabilities to do toggle density estimation of subsequent circuit logic. This is simple for two input AND gate.
P(Y) = P(A)*P(B) or

P(Y) = 1 P(A)P(B) for NAND gate.

2.3 Multi-million gate solution


When we apply the above approach, it gives good results for designs which are small and can be analyzed flat and dominated by combinational logic. Beside, it is always not possible to run flat due to other logistic concerns like blocks are designed first or rest of the design is being

30

done hierarchically or there is reusable IPs in design which do not have net list. The approach described in previous section was extended to handle such requirements. We also came across several issues while applying this approach to some large designs [>5M gates] and implementing tool Toggle Frequency Calculator. In this section, we will discuss solutions those addresses each of the problem in detail. 2.3.1 Deriving automatic toggle frequency values 1 Primary Input Handling The toggle rate at Primary Input is not known. Since they are driven externally, there is no easy way to predict toggle rate for the same. The same is true for primary input signal probability. Consider the following Figure 2.1 and Figure 2.2.

Figure 2.1 Schematic of logic circuit 1

31

Figure 2.2 Schematic of Logic Circuit 2

In case of above, Input Clk or D going to block can be primary inputs. Unless user gives toggle rate, it is highly difficult to compute the same. We used static timing analysis [24][25] specifications to derive these inputs. They are, Input Delay Specification A constraint that specifies the minimum or maximum amount of delay from a clock edge to the arrival of a signal at a specified input port. Input delay specification is with respect to a clock that triggers events on that signal. Clock specification specifies the characteristics of a clock, including the clock name, source period and waveform. Mode Specifications specifies the constant values applied on certain port or pins to drive timing analysis in a specific mode. This means that these pins or ports are not toggling during the analysis. It also specifies the constant value to which the port or pin is tied to. For clock inputs, we used the toggle rate specified as per the clock specification. For non-clock inputs, we used the clock specified on the Input Delay specification. For constant ports, we used 0 toggle rate and static probability based on constant value tied i.e. if it is constant 0, static probability is 0 else it is 1.

32

A Sample SDC file with above command is shown in Appendix A. Note that SDC file is collection of commands in tcl format so we have shown the commands which are primarily required. 2 Sequential element modeling (e.g. flip-flops, latches) Sequential elements do not directly switch arbitrarily when the input switches. Hence, we can not apply the formula as mentioned in equation (1,2). We used following formula to compute toggle frequency at the output of sequential cells. Note that we are referring latches and basic flip-flops as part of sequential cells and not the complex macros. They are dealt separately. Qout = min(DataInput, clock/2) The upper bounding of clock/2 is required since we identified certain cases where Data Input toggles more than clock/2. This is explained below. For the cases, where data input is not toggling more than clock/2, output can not toggle more than Data Input. Above equation takes care of these facts. 3 Some Boolean gates were not taking care realistic scenarios: exor/exnor gates, mux Equation (1,2) can compute higher toggle rate than clock toggle rate. This can go even higher than clock toggle rate if there are more such gates in transitive fan out. We found that this is not the case on actual designs and in many cases, this was not intended behavior. We exceptionally identified such cells and clipped their toggle rate to half of the clock toggle rate. In similar fashion, we exceptionally identified mux cells and assigned the output toggle rate to maximum toggle rate of all inputs.

33

Complex loop handling These were handled by breaking the loops. We broke the loop at the 1st point where we found the loop forming.

Unconnected inputs going into logic This was handled by reverse tracking the first sequential cell encountered in the transitive fan out of unconnected inputs. This algorithm gives the clock controlling the toggle rate down the line. If the unconnected inputs are clocks, we assigned the worst toggle rate of the block itself.

Gated clocks or generated clocks Gated clock is a clock signal that can be modified by logic within the design, such as a clock that can be turned off to save power. Schematic of gated clock is shown in Figure 2.3.

Figure 2.3 Gated clock example

We made the gated elements transparent for toggle propagation. A clock gating cell is handled like a buffer. 7 Design Constraints Guidelines to do realistic usable toggle activity estimation

34

Some of the care needs to be taken despite of all the above solutions. For example, toggle estimation must be done based on the targeted application. This drives certain inputs used in 1-6 above. In the implementation, we kept certain hooks to give control to the user. 2.3.2 Hierarchical Modeling 1. Huge portion of the design is occupied by memories however memory output switching activity calculation is not straight forward 2. Complex functionalities: Hard macros 3. Multi-million gates cannot afford to have flat analysis due to cycle time and inherent limitations of probabilistic approaches. We needed to devise a method to do hierarchical analysis by modeling sub-blocks and using them as a black box. We used the timing modeling approach to handle (1), (2), (3). All standard library components are presently modeled in liberty file. [69] Static timing analysis tools can generate similar liberty file for blocks after completing the analysis. [25] This file has following information, Input pin 2 output pin timing arch Setup and Hold constraints for the data input and clock input Output timing with respect to either input pin or related clock

We derive output toggle frequency f(out) as below.

35

In case of input 2 output timing Arch


f(out) = maximum(all controlling input toggle rate)

In case of clock 2 output timing Arch


f(out) = average switching activity of clock domain

Figure 2.4 shows the gate level netlist of a design called simple. Figure 2.5 shows the timing arcs which will be extracted by Prime Time a leading industry timing analysis tool. [25] Timing arc information will be used to compute output toggle rate as explained below.

Figure 2.4 Gate Level Netlist for 'simple' design

36

Figure 2.5 Timing Arcs in extracted model of 'simple' design

There are combinational archs from i3 to out2 and i1 to out2. Hence, output toggle rate at out2 will be controlled by the same clock as i3 or i1. In this case, we assign maximum of i3 or i1 toggle rate at output pin. The other timing arch is clk2->out1. In this case, out1 will be assigned average switching activity of clk2. Thus using timing model information, we generate output toggle rates of memories, complex hard macros or blocks.

2.4 Validation and Results


Above changes were incorporated into executable code and applied to ISCAS89 circuits. The results were compared through power estimation as discussed in next chapter.

37

2.5 Summary
In this work, we address real issues being faced by large designs. Automatic toggle generation eases usability as well as improves accuracy. Hierarchical analysis helps in hierarchical design which is common methodology to handle design complexity.

38

3 Power Estimation
3.1 Overview
Accurate Power Estimates are necessary at various stages of the design in order to make correct architectural, implementation and cost tradeoffs.[61] Architectural level tradeoffs are higher level and involves software or instruction level power modeling or high level activity numbers for different blocks to do implementation tradeoffs. Many times weighted averages are used to identify best cost options [62-65]. Once the design gets converted to structural net list and Physical Design starts, Power Estimation mainly drives package design, PG network design and lower level power minimization. In this case, power dissipation is described as below.

P = (A*C*V^2*f) + (*A*V*Ishort) + (V*Ileak) Where A = activity factor this specifies the amount of switching at various internal nodes of design. Note that f is clock frequency which is readily available for most designs. Activity factor specifies about how much a node toggles per f transitions of clock. The activity factor can be derived from simulation patterns of the logic. C = capacitance Interconnect load capacitance or wire capacitance V = dynamic voltage voltage at which the logic operates f = frequency clock frequency at which the logic operates

39

Ishort = short-circuit current during switching During transition in CMOS logic, both NMOS and PMOS are ON for a momentarily of time. This time current finds a direct path from Power Supply to Ground. This is called short circuit current. It is dependent on input transition duration of CMOS. = duration of short-circuit current

Ileak = leakage current [72-80][32]

Figure 3.1 defines various components of power and their relation ship or contribution to total power estimation.

Cell Internal Switching Power can vary based on macro Size

Short Circuit power power dissipated by a momentary short circuit between the P and N transistors of a gate during switching
Internal Power

Switching power (70-80%) power dissipated by the charging and discharging of the load capacitance. (VDD ^ 2 ) * (Cload ( i ) * TR ( i ))
Cell

Static (leakage) power (5%): power dissipated by a gate when it is not switching
Cell(i )

ge(i) PCellLeaka

ASIC Flow characterizes libraries for average and leakage power.

Dynamic Power consists of Switching Power and Short Circuit Power


Figure 3.1 Venn diagram of Power Components

40

In this work, above power components and their computation are extensively studied. To address the problem in systematic manner, power estimation has been simplified the following way. These assumptions are acceptable given the global analysis that we are considering. Power supply and ground voltage levels throughout the chip are fixed so that it becomes simpler to compute the power by estimating the current drawn by every sub-circuit assuming a given fixed power supply voltage. Note that this does not mean that different blocks can not be at different voltage level. This allows pre-characterizing library components for required voltage points. The circuit is built of logic gates and latches or reusable IPs, and has the popular and wellstructured design style of a synchronous sequential circuit. In other words, it consists of flops driven by a common clock and combinational logic blocks whose inputs (outputs) are derived from flop outputs (inputs). It is also assumed that the flops are edge-triggered and, with the use of CMOS design technology, the circuit draws no steady-state supply current. This allows breaking down average power dissipation of the circuit into 2 components The power consumed by the flops The power consumed by the combinational logic blocks.

This chapter is organized as below. In the next section, we have further explained cell based power analysis. Next section briefly introduces tools used to compare power estimation as performed by toggle computation described in previous chapter. Later validation and results are described.

41

3.2 Current approaches to Power Analysis


Cell based power estimation consists of cell characterization and logic simulation or activity estimation. The characterization phase entails a set of electrical simulations of each library cell for all possible input transitions and for a wide range of fanin and fanout conditions. Timing and power information obtained in this way is used to construct lookup tables for the basic library elements [46][69]. Summing the leakage power of the designs constituent library cells derives the total leakage power of a circuit:

PleakageTotal =

Cell (i )

PCellLeaka ge(i)

(3)

Where PcellLeakage(I) is the leakage power dissipation of each cell. Technology library developers annotate the library cells with the approximate total leakage power dissipated by each cell. There is usually a single static power number per library cell but sometimes leakage power can depend on the logical condition of the cell. In this case, the library cell is annotated with a state dependent static power. A cells internal power is the sum of the internal power of all of the cells inputs and outputs as modeled in the technology library:

Internal

Pin ( i )

Ei * A(i ) * f (i )

(4)

Where Ei is the internal energy of each pin. In practice, the internal energy if a pin is characterized in the technology library and can be accessed by simple table look-up. Depending

42

on the required accuracy, different look-up tables can be provided by the library designers as explained in Table 3.1.

Pin Lookup Table Direction Indices

Onedimensional Twodimensional Threedimensional

Input/ Output Output

Input Transition OR Output load capacitance

Input transition and output load capacitance

Output

Input transition and output load capacitance of the two outputs that have equal or opposite logic values
Table 3.1 Power Modeling for CMOS gates

The switching power is calculated in the following way: Pswitching = (VDD ^ 2) *

Cell

(Cload (i ) * A(i ) * f (i ))

(5)

Where Cload(i) is the capacitive load of net i. Without any physical information, the load capacitance Cload(i) is calculated using the wire load model of the net and the fanout of the driving pin. Usually, this approach achieves relative accuracy. Apart from the approaches mentioned above, the following factors are also important for accurate power estimation.

43

1. Temperature dependency of power. Power consumption in CMOS depends on mobility factors, threshold voltage and doping concentrations. These factors are temperature dependent. Hence power also varies according to variation in temperature. 2. Voltage dependency of power. Voltage dependency of power is well known.

(P=C*V*V*f). This is true for CMOS technology also. If we model, the CMOS component as a capacitor, it is clear that power varies based on the variation on supply voltage. 3. Power increases with increase in frequency of operation. In fact, many designs now a day have different modes of operation. A high frequency mode when the device is operational and a low frequency mode when the device is in standby mode. The impact of frequency on power estimation is already being discussed in previous section. 4. Now a day, most of the designs have a significant chunk of flops or registers. According to one statistics, around 40-50% logic of the design contains flops. If all the flops are clocked throughout the operation, clock network consumes almost 50% of total power. It is sometimes helpful to analyze power consumption on clock network. This work analyzes clock power contribution to total power. 5. Process corner also impacts the currents and power consumption. This is especially true for leakage power. A typical VLSI process has leakage power variation of order of 4-6 from worst process to best process.

44

Based on power sensitivity and tool study analysis in this section, we propose a power estimation flow in typical design cycle as shown in Figure 3.2 below. Note that the power analysis varies from RTL design to pre layout netlist to post layout netlist.

Power Estimation (spreadsheet) Forward SAIF* Or Frequency Constraints

Architecture

RTL

Toggle Frequency Calculator

Logic Simulation

Unplaced Netlist

Placed Netlist Detailed Route Over

PIF File Generation

Power Estimation in Power Compiler (wire load, global SPEF, Detailed SPEF)

RC SPICE Netlist RC SPICE Netlist

NanoSim

PrimePower

Recommended

Least Preferred

* SAIF - Switching Activity File based approach


Figure 3.2 Power Estimation in Design Stages

3.3 Power analysis Tools


3.3.1 Power Compiler: [67] Formerly known as Design Power, power compiler is currently most widely used Synopsys tool. Power compiler, typically being used during synthesis, does power optimization as well as power estimation. This tool has static algorithms for calculating switching activity at various

45

circuit nodes and propagates the same. It is known fact that power compiler cannot estimate good switching activity for sequential cells. It should be also noted that most ASIC vendors have cell power modeling based on Synopsys Liberty syntax so it is highly important to have single cell power estimation close to Power Compiler number. Synopsys Reference Manual on Power Compiler [18] gives basic power calculation theory and description of terms being used in its tools. We used power compiler in two modes. One mode was to use power compiler as complete solution for power estimation. In this approach, we generated input switching activity from our vectors and specified to power compiler. Power compiler propagated the switching activity based on switching probability. It then calculates power. In this method, it used some assignment method for sequential cells and we went ahead with that because our aim was to verify default switching activity propagation algorithm of Power Compiler. Second mode was to use power compiler just as power calculation engine. In this approach, we generated switching activity at all the nodes by using methodology defined in Chapter 3 and used the power calculation engine. As mentioned earlier, power calculation engine is quite accurate and so based on power estimation; our aim was to evaluate switching activity determination accuracy of other methods. 3.3.2 Power Mill (or Nano Sim) [4][68] Power Mill is Synopsys tool (currently known as Nano Sim) with fast SPICE engine at core. It has been identified as nicely correlating for two of the single cell circuits and one small design

46

with SPICE. Power Mill is dynamic simulation based tool and hence it requires patterns for simulation. We used Power Mill to calculate average and peak power. The main reason was runtime advantage of PowerMill compare to SPICE. It should be noted here that Power Mill is capable of taking SPICE net list as input so any switching between from Power Mill and SPICE is transparent, if needed. 3.3.3 Prime Power [66] Prime Power is another offering in Synopsys power portfolio. This is dynamic vector based solution. However the key difference with Power Mill is that Power Mill is SPICE based tool whereas Prime Power is logic simulation based tool. In other words, Power Mill is more tuned for accuracy and Analog kind of designs whereas Prime Power is tuned to digital and specifically ASIC kind of designs with reasonably good accuracy. Prime Power has PLI interface with leading industry simulators e.g. VCS, Modelsim, Verilog etc. While doing logic verification with these simulators, if we instantiate one call/command, the PLI dumps binary files. These binary files can be used in Prime Power to do power estimation. It should be noted that Prime Power can do peak power analysis also. We used Prime Power for both average and peak power analysis. The simulator interface being used was VCS. 3.3.4 Other Tools This project used VTRAN for converting vectors to SPICE stimulus. VTRAN is one of the offerings as part of Synopsys and is generic translator of vectors from one format to another. It

47

is supporting all major industry formats as well as internal formats of many prominent ASIC/EDA vendors. VCS was used for logic simulation. There is no specific reason for using this simulator except that it is Synopsys offering so will go with Prime Power without major hurdles. There are few TI internal programs used to set up an automated flow. They are listed below. 1. genFuncTDL An internal utility to generate random vectors with specified clock rate. 2. SimOut A test constraint validation environment. 3. SDFAligner for translating SDF from one simulator to other simulator compatible format. 4. SigProbGen For converting vectors to input switching activity and probability calculator. 5. DREPGEN for generating data compatible for TFC. 6. ASCII benchmark data to Verilog netlist and SPICE netlist translator.

3.4 Validation Flow


The validation flow diagram, data management and color convention is shown in Figure 3.3. Some of the key steps are described below.

48

DREPGEN

DREPFILE + DATA

GENFUNC TDL

RANDOM TDL

VERILOG NETLIST

DC Scripts

TFC

USERFREQ FILE

SIGPROBGEN

TRANSLATER Verilog

POWER ESTIMATION

SWITCHING ACTIVITYFILE

VTRAN cmd

VTRAN

ISCAS89 Circuits

Spice NETLIST

POWER MILL

PWL FILE

SMOUT

CFG
TRANSLATER SPICE

CMD SDF

TEST Bench

POWER

PrimePower

PIF

VCS_PIF

Full VCD

COMPARISON AND REPORT

Figure 3.3 Power Estimation Validation Flow

n n n n n n n

White : Third Party tools Green : Automatically generated data or written translator Grey : TI tools Default : standard inputs/outputs Blue: Final Output Elipse : Data file(s) Rhombus : Process Block(s)

Figure 3.4 Legends for Validation Flow

49

3.4.1 Netlist Setup: Standard industry benchmark circuits ISCAS89 are used for the validation. The circuits complexity ranges from 14 gates to 22000 gates. The detail statistics of the circuit is mentioned in Table 2. [71] To make the validation complete, two single cell circuits are added for micro level validation. ISCAS89 benchmark circuits were mapped to 130nm technology for analysis. Note that there is no optimization or synthesis being used while mapping the circuits to 130nm technology however predetermined set of cells was used. They are, 2,3,4 inputs AND/NAND gates 2,3,4 inputs OR and NOR gates Buffers and inverters 2,3 inputs ex-or and ex-nor gates Flops

3.4.2 Vector Generation Random vectors were generated for all the ISCAS89 circuits. The numbers of vectors were based on circuit complexity and number of gates. They vary from 4 vectors to 38000 vectors approximately. The same set of vectors is used for logic simulation and SPICE simulation as well as derivation of switching activity and static probabilities for Input Pins.

50

3.4.3 Interconnect setup All the circuits can be estimated as synthesized Verilog netlist and hence the parasitic information was not available. To make comparison more realistic, no load modes were used in power compiler and in SPICE simulation. The logic simulation was based on SDF generated from Synopsys.

3.5 Validation and Results


The complete data from different tools are shown in Table 3.5. Table 3.2 describes circuits used for benchmarking. Table 3.3 compares run time between dynamic method and modified toggle computation method for some of the big design blocks. Table 3.4 shows power estimation for clock network vs. total power estimation. All the power data is dynamic power in uW. The power numbers mainly reflect the cell internal power and switching power only due to gate input capacitances as no interconnects were assumed. All the experiments are done at nominal operating point i.e. normal process, 25 C temperatures and 1.2 voltage (nominal voltage). Clock network power is 50% of total dynamic power but this is not true in all cases. Run time reduction from static approach is more than 1000 times. Prime Power reported power is optimistic in many cases to PowerMill. This is not in our expectation and we are looking into it. TFC is within 30% of PowerMill reported power. However there are certain exceptions where it reports 30% optimistic power or >50% pessimistic power. Power Compiler is >50% pessimistic in most of the cases.

51

Design Name s111

IN

OUT

Flops

Boolean (gates+inv) 0 8

s1196

14

14

18

388+141

s1238

14

14

18

428+80

s13207

31

121

669

2573+5378

s13207_1

62

152

638

2573+5378

s1423

17

74

490+167

s1488

19

550+103

s1494

19

558+89

s15850

14

87

597

3448+6324

s15850_1

77

150

534

3448+6324

s208_1

10

66+38

s27

8+2

s298

14

75+44

s344

11

15

101+59

s349

11

15

104+57

52

Design Name s35932

IN

OUT

Flops

Boolean (gates+inv) 12204+3861

35

320

1728

s382

21

99+59

s38417

28

106

1636

8709+13470

s38584

12

278

1452

11448+7805

s38584_1

38

304

1426

11448+7805

s386

118+41

s4

s400

21

106+58

s420_1

18

16

140+78

s444

21

119+62

s5

1+0

s510

19

179+32

s526

21

141+52

s526n

21

140+54

s5378

35

49

179

1004+1775

s641

35

24

19

107+272

53

Design Name s713

IN

OUT

Flops

Boolean (gates+inv) 19 139+254

35

23

s820

18

19

256+33

s832

18

19

262+25

s838_1

34

32

288+158

s9234

19

22

228

2027+3570

s9234_1

36

39

211

2027+3570

s953

16

23

29

311+84

Table 3.2 ISCAS89 circuit description

Design

TFC + Power Compiler Runtimes (in mts) PowerMill runtime (CPU Hr) S13207 3 23

S13207_1

24

S15850

25

S15850_1

26

S35932

250

54

Design

TFC + Power Compiler Runtimes (in mts) PowerMill runtime (CPU Hr) S38417 6 189

S38584

205

S38584_1

212

Table 3.3 Runtime comparison between vector less and SPICE

Design Name s4

CLK Power 2.13

Total Power 3.35

%CLK/Total 63.6

s27

6.39

10.91

58.61

s208_1

17.05

30.43

56.04

s298

29.84

54.12

55.14

s344

31.97

61.11

52.32

s349

31.97

61.14

52.29

s382

47.04

91.73

51.28

s386

12.79

32.28

39.62

s400

47.04

94.51

49.77

55

Design Name s420_1

CLK Power 34.1

Total Power 53.75

%CLK/Total 63.46

s444

44.76

84.83

52.77

s510

12.79

29.43

43.46

s526n

44.76

85.94

52.08

s526

44.76

85.89

52.11

s641

40.5

117.38

34.5

s713

40.5

123.07

32.91

s820

10.66

72.29

14.74

s832

10.66

72.5

14.7

s838_1

68.21

99.96

68.24

s953

61.81

102.37

60.38

s1494

12.79

158.7

8.06

s1488

12.79

158.24

8.08

s1423

157.73

356.1

44.29

s1238

38.37

150.51

25.49

s1196

38.37

151.17

25.38

56

Design Name s5378

CLK Power 381.55

Total Power 751.75

%CLK/Total 50.75

s9234_1

449.75

891.59

50.44

s9234

485.99

632.35

76.85

s13207_1

1359.9

1908.3

71.26

s13207

1426

1718

83

s15850

1272.5

1971.3

64.55

s15850_1

1138.2

2630.3

43.27

s38417

3289.1

4659.3

70.59

s35932

3450.5

9654

35.74

s38584_1

2920.7

8339.6

35.02

s38584

2966.3

8057.2

36.82

Table 3.4 Clock Power vs. Total Power

%new Design Name Power Compiler Proposed Approach Prime Power Power Mill power/ power compiler s111 5.5 2.23 0 2.87 -59.42

%power compiler/ PowerMill 91.62

%new approach/ PowerMill -22.24

%prime power/ PowerMill -100

57

%new Design Name Power Compiler Proposed Approach Prime Power Power Mill power/ power compiler s4 3.72 3.35 2.93 2.79 -9.95

%power compiler/ PowerMill 33.43

%new approach/ PowerMill 20.16

%prime power/ PowerMill 4.95

s5

2.49

1.34

0.47

1.72

-46.12

44.66

-22.05

-72.61

s27

12.69

10.91

10.03

9.36

-14.01

35.54

16.55

7.14

s208_1

44.91

30.43

22.4

29.03

-32.25

54.7

4.81

-22.84

s298

67.33

54.12

40.05

41.42

-19.62

62.57

30.67

-3.31

s344

85.24

61.11

56.55

65.7

-28.31

29.74

-6.99

-13.93

s349

86.48

61.14

56.66

65.86

-29.3

31.31

-7.16

-13.97

s382

83.57

91.73

52.75

53.15

9.76

57.25

72.6

-0.75

s386

75.15

32.28

42.78

48.46

-57.05

55.07

-33.4

-11.73

s400

83.96

94.51

52.77

53.3

12.58

57.51

77.32

-1

s420_1

70.19

53.75

45.6

44.12

-23.43

59.11

21.83

3.37

s444

83.79

84.83

52.9

53.64

1.24

56.22

58.15

-1.38

s510

64.68

29.43

18.23

47.43

-54.51

36.36

-37.96

-61.57

s526n

85.2

85.94

53.54

53.89

0.87

58.1

59.48

-0.65

s526

85.41

85.89

53.67

54.08

0.57

57.93

58.83

-0.75

58

%new Design Name Power Compiler Proposed Approach Prime Power Power Mill power/ power compiler s641 159.77 117.38 72.37 93.34 -26.53

%power compiler/ PowerMill 71.17

%new approach/ PowerMill 25.76

%prime power/ PowerMill -22.46

s713

162.62

123.07

74.51

96.57

-24.32

68.41

27.44

-22.84

s820

119.02

72.29

47.96

73

-39.27

63.04

-0.98

-34.3

s832

119.18

72.5

48.03

73.34

-39.17

62.51

-1.14

-34.51

s838_1

126.27

99.96

93.41

75.78

-20.84

66.63

31.91

23.27

s953

159.75

102.37

85.98

88.5

-35.92

80.51

15.67

-2.85

s1494

187.71

158.7

98.28

136.47

-15.45

37.54

16.29

-27.99

s1488

203.99

158.24

98.16

135.83

-22.42

50.18

16.5

-27.73

s1423

406.56

356.1

244.9

278.03

-12.41

46.23

28.08

-11.92

s1238

302.45

150.51

128.2

151.55

-50.24

99.57

-0.69

-15.41

s1196

296.7

151.17

126.5

151.13

-49.05

96.33

0.03

-16.3

s5378

1041.2

751.75

584.3

688.62

-27.8

51.2

9.17

-15.15

s9234_1

1480.6

891.59

704.7

812.36

-39.78

82.26

9.75

-13.25

s9234

1300.4

632.35

508.2

472.82

-51.37

175.03

33.74

7.48

s13207_1

2853

1908.3

1533

1677.46

-33.11

70.08

13.76

-8.61

59

%new Design Name Power Compiler Proposed Approach Prime Power Power Mill power/ power compiler s13207 2572 1718 1436 1418.89 -33.2

%power compiler/ PowerMill 81.27

%new approach/ PowerMill 21.08

%prime power/ PowerMill 1.21

s15850

2640.3

1971.3

1400

1361.52

-25.34

93.92

44.79

2.83

s15850_1

3272.6

2630.3

1539

1945.25

-19.63

68.24

35.22

-20.88

s38417

7654.6

4659.3

4352

4688.74

-39.13

63.26

-0.63

-7.18

s35932

17606

9654

6789

8513.75

-45.17

106.79

13.39

-20.26

s38584_1

12031.7

8339.6

5630

6738.36

-30.69

78.56

23.76

-16.45

s38584

10951.4

8057.2

4261

6235.13

-26.43

75.64

29.22

-31.66

Table 3.5 Power Estimation across various tools

3.6 Power estimation applications


Once the power estimation has been done, the data can be used in a post-processing step to investigate various circuit properties. Note that some of them are applications of average toggle calculation method we described above. 3.6.1 Average power/ground bus currents Consider the problem of computing the average current in the power or ground bus branches. This can be solved using toggle densities and average power consumption for each library cell.

60

We can approximate the average power for each cell based on toggle densities and approximate power or ground network as distributed or lumped R and C. SPICE simulating this power network, one can estimate average power/ground bus currents. [31] 3.6.2 Average power dissipation As a direct consequence of the power estimation described above, it should be clear that the analysis gives overall average power dissipation, summing over all circuit nodes. 3.6.3 Electro migration failures Electro migration [93][94] is a major reliability problem caused by the transport of atoms in a metal line due to electron flow. Under persistent current stress, this can cause deformations of the metal, leading to either short or open circuits. The electro migration failure depends on average and root mean square RMS current densities in metal leads. The average current in each metal lead can be estimated by the method described in this chapter and thus potential electro migration current can be addressed either in power network or signal lead. 3.6.4 Power Routing It has been noticed that inaccurate power estimation normally is the root cause of over design of power network. By estimating accurate power number, it is possible to have dense power grid on a block and light power grid on some other block and thus reducing the overall IR drop problem also.

61

3.6.5 Gate Oxide Integrity Analysis Reduction in gate oxide thickness in submicron technologies has resulted in increased electric field at the gate oxides. Excessive electric field > 5MV/cm can cause damage to the gate oxide and also reduce the Time Dependent Dielectric Breakdown strength (TDDB). The excessive electric field are caused by undershoot and overshoot at gate terminal. High duty cycle of overshoot/undershoots will result in permanent failure of the transistors. The Failure in Time (FIT) rate represents the probability of device failure in 10 years of operation. In this regard, the duty cycle of signal input pins are measured based on toggle density.

3.7 Summary
Based on our validation flow and analysis of results, it can be found that there is a way to estimate a good power number with minimum run time as shown Table 3.3. However as the method suggests, the toggle frequency calculation method has certain limitations as it is based on probabilistic algorithms and it does not have timing information or it does not do any logical simulation. Some power designers may be interested in having good accuracy at the cost of run time. We have proposed a power estimation flow that caters the need of power user as well as normal users also.

62

4 Power Supply Noise Analysis


4.1 Overview
Figure 4.1 below gives a representative voltage waveform at an internal node in digital designs while they are operational. The fluctuations arise due to switching CMOS logic and inductances in power supply, package and interconnect.

Max Voltage

Time Average IR Drop

Voltage

Min Voltage Increases Propagation Delay

Time

Figure 4.1 Voltage over time representation at an internal design node

The dips in voltages are due to sudden change in currents during logic switching since inductance will have additional di/dt noise. Apart from that, in CMOS currents are higher while logic switches compare to average currents used for average IR drop analysis. This causes additional i(t)*R drop where R is resistance of Power Grid. Total drop seen at the sink of current is:
deltaV = L(di/dt) + i(t)*R

63

Most popular technique to control this IR drop is to insert decoupling capacitors in the design. Figure 4.2 shows electrical representation of inductance and dynamic switching of cell that causes Power supply noise and decoupling capacitors that helps in meeting this instantaneous need.

Lpd

Vdd Pin

Rpd

Vdd
Cpd LpsVss Pin Rps

Idd
Rnd

Vdd Net
Cnd

Cdecap
Cell

Vss
Cps

Iss
Rns

Vss Net
Cns

Figure 4.2 Schematic circuit for instantaneous voltage drop analysis

This work focuses on computing instantaneous IR drop (deltaV) or actual voltage (Vdd-deltaV) at Cells Power/Ground ports. Vdd is ideal voltage source here and constant over time. Here also our approach is focused on cell based designs. Next section explains the cell characterization and modeling needed for block level analysis. Using this characterization, we build a power grid network that can be simulated. This is discussed in section 5.3. Section 5.4 explains the prototype flow we developed and chapter ends with validation results and conclusion.

4.2 Cell Characterization


Definition: Cell characterization is a process through which data is prepared for every cell for usage in the design. Process involves SPICE

characterization as well as post processing of data. The process needs

64

to be absolutely in complete alignment between characterization and its usage. 4.2.1 Current Characterization Methodology For instantaneous Power Grid analysis, we analyzed cell peak current waveforms. Figure 4.3 shows transient waveform of inverter cell which was simulated at 250MHz. (VDD is power pin and VSS is ground pin) It has voltage waveform of primary input and primary output (VA, VY) of inverter. It also has current waveform in VDD and VSS port (IRVDD, IRVSS). The voltage waveform at VDD and VSS port is seen. (VVDD_INV1, VVSS_INV1) Note that current waveform at VDD and VSS are similar except one difference transition direction. The current waveform at VDD when output is charging is same as current waveform at VSS when output is discharging and vice versa. This is true in this case for inverter but it can vary if the cell is not balanced properly. However in any case the amount of charge supplied/discharged will be constant since it is governed by load connected at output.

65

Output is rising. There is notable symmetry for rise/fall. This helps us to characterize only one current and do the analysis at Power/Ground network.

Output is rising. This alignment is preserved for better results during current waveform generation. Same is true for Output falling.

Figure 4.3 Inverter waveforms measured at different nodes

66

In this work, we have maintained temporal relation ship between Power and Ground current waveforms and decoupled the simulations i.e. they are simulated separately and IR drop results are merged. We performed simulations and arrived at following conclusions. The shape of the current waveform remains the same if the patterns used are same across different frequencies. Note here that the overall simulation time decreases when frequency increases for a same set of patterns. This is not a surprise as the load being charged and discharged is same during each transition for the same slew and for the same set of patterns. In case of CMOS gate, shape of current waveform remains same for very high frequencies (period ~= 3 times of 0-100% slew). (Appendix C) The slew or transition time (used interchangeably) plays a big role for peak power determination of cells. When the slew decreases, the width of the current spike decreases with increase in peak. Figure 4.4 and Figure 4.5 shows the peak power variation for different input transition times. Note the variation of ~2x for inverter and ~1.5x for 2 input NAND gate.

67

Figure 4.4 transition time vs. peak power for Inverter

Figure 4.5 Transition time vs. peak power for nand gate

Peak power varies while change in output load. The change is as expected since capacitance increase along with MOS resistance provides exponential voltage ramp up. Peak is largely dependent on MOS ON resistance as well as initial voltage. Figure 4.6 and Figure 4.7 shows the plot of variation for AND as well as OR gate. Note that the variation is ~1-3% across wide range of load.

68

Figure 4.6 Load vs. peak power for AND gate

Figure 4.7 Load vs. Peak power for OR gate

For cell characterization, pattern dependency is not critical. This is expected as most of the circuits will be 1-2 level of logic where each pattern will activate/deactivate most of the transistors. However, soon when cells start becoming larger, some logic may not get activated during switching. In this case, it is important to choose useful patterns for cell current characterization.

For cell characterization, transition direction matters for a given power supply. It means that output rise transition or fall transition are important to capture during

69

characterization and use them appropriately during use. (Figure 4.3) In our case, we capture rise and fall transition together and use them for analysis, making proposed approach direction independent. Figure 4.8 State Dependency on cell switching

Figure 4.8 State Dependency on cell switching

We also established few corollaries those will be used later in discussion. 1. Slew impacts the short circuit current of the device. For multi-stage block, slew impacts 1st stage the most and the overall current waveform is unaffected due to this change. The impact varies from lo to hi when the design stages are decreasing. 2. Glitches or hazardous transitions can contribute to peak current need of the circuit. Modeling glitches in non-SPICE analysis is not trivial. It is desired that glitches are reduced by robust design practices. In this work, it is assumed that there are no glitches in the design.

70

3. The temporal correlation between different inputs influences the characterization data a lot. This is due to simultaneous switching. We have used the least affecting combination i.e. 0 skew between multiple inputs in our analysis this is worst case also. (Figure 4.8) 4.2.2 Current Characterization Flow Current Source generation involves time variant current waveform determination for each cell. This is current waveform as it is seen at VDD pin of cell when the cell output is rising or falling. The flow is shown in Figure 4.9. Sample SPICE deck is shown in Appendix D. PERL Program that takes input from SPICE simulation has following options available. In our case, we took last option with 75ps as sampling interval. 1. full Whole current data available in the punch file is given as output in two column format, first column giving the simulation time and the second column giving the current value corresponding to each simulation time instance. 2. fixed The total simulation time is divided into 8192 points and the current value at these 8192 time-values is obtained either directly, if available or by interpolation. 3. Interval filtered An interval in picoseconds is specified and according to that, the program obtains the time-values for which the data is expected. Again, the current data corresponding to these time-values is obtained directly, if available or by interpolation.

71

Cell SPICE Deck

SPICE simulation @ 10 MHz

Perl Processing to Sample VDD currents


Figure 4.9 Cell Characterization Flow

Using the above methodology, we characterized all the cells which were being instantiated in ISCAS89 circuits.

4.3 Power Grid network modeling


This section describes the Power Grid network building using the cell characterization data. Power Grid offers resistance, capacitance as well as inductance to the switching logic. Figure 4.10 shows schematic of typical power grid. [45] The power & ground supply pins are modeled as ideal voltage sources. The methodology however vastly varies in terms of current source modeling and capacitance estimation [50 51 52 53]. This work also focuses on current source modeling which is described in next sub section.

72

Each such arm Represents resistance

Figure 4.10 Power Grid Modeling

Once, the power grid is determined along with capacitance and current source distribution, it can be realized as matrix data structure and can be solved for computing voltages at desired nodes specifically the nodes where cell components are connected as below.
V*Y=I

Where V is voltage value at each node, Y is admittance or resistance of PG segment, I is current that we have characterized.
OR v(t) = Z * i(t) ( Z = R jW for power network ) V(w) = z(w) * i(w)

73

In our work, we have computed resistances and capacitors based on technology data for 130nm node. A sample program was written to realize the mesh structure as shown in Figure 4.10 for VDD network and VSS was taken as ideal ground. This is not an issue since we can lump all the VSS network elements to VDD network. After determining Power Grid Current Waveform, we solved the network through SPICE simulations. 4.3.1 Power Grid Current Waveform Modeling Power Grid Current waveform modeling involves following steps: 1. Compute Toggle frequency for each of the instance in design as proposed in Chapter 2. 2. Using the current characterized data for the cell, transform the current data at the above computed toggle frequency. 3. Compute the input arrival for each of the instance in design. This is done using Static Timing Analysis. Compute the shift required in current waveform with reference to clock edge. For simplicity, we have assumed 0 skew for clock network. 4. Hook up the current sources and solve the PG network. 5. Determine the PG model simulation time. There are explained further below. 1 Read the characterized data.

74

Characterized data was transformed from time domain to frequency domain. The sampling is done at fixed frequency (much higher than common design frequency values) 1000/75 ~ 13.33 GHz and [t, i(t)] are stored. I(t) = i(0)d(0) + i(0+Ts)d(0+Ts) + i(0+2*Ts)d(0+2*Ts) + N Samples Where, Ts is sampling frequency in this case 13.33 GHz i(t) is current value at time t d(t) = 1 when t=n*Ts else 0. n ranges from 1,,N For computation efficiency N may be chosen as power of 2 N = 2 ** n (n is integer) Now, the Fourier transform of the samples have been performed: I[k] = i[n]* 2 Model the current waveform for each Boolean gate at computed toggle frequency. A compression factor (M) is defined to meet the targeted frequency of the cell under consideration. M = targeted frequency/cell characterized frequency (10MHz in this work) Transformation allows preserving base of the current transients. This would not have been possible in a time domain while we scale frequency. Hence, the need of frequency domain transformation. Appendix E shows the waveform generated after transformation from 1 MHz waveform. As it can be seen, 1GHz waveform is not per expectation. This is not an issue since apart from clock cells, other cells are not expected to switch at 1

75

GHz average toggle frequency. Beside, this can be handled by having higher frequency characterization for clock cells. Current data is compressed by compression factor. When the data was transformed to frequency domain and the frequency spectrum was seen, the notable point was that we had a good chunk of lower frequency components signifying the approximate triangles of SPICE waveform and most of the medium to high frequency components were zero - signifying the zero or low-leakage portion of the power waveform. 3 Attach the current waveform at a PG node where this cells power or ground pin is connected. 4 Compute the total simulation time If all instances in the design are applied with respective waveforms, metrics solver gives peak voltage drop value from 0 to LCM (period of all gates) Computing lowest common multiplier (LCM) is computationally intensive for most designs. Even if we do that, the generated simulation time is prohibitively high. The memory space also becomes high. In reality we are using a smaller number than that to ensure less simulation time and more realistic data. Instead we computed simulation time as below. Tstop = f(minimum toggle frequency, max delay) = Time Period of minimum freq cell + maximum delay of all cell outputs = 2000 ns (for minimum frequency as 1 MHz and 1000 ns as worst delay) 5 Establishing temporal relationship

76

Do timing analysis and based on input arrival time, the current waveforms are shifted along time axis. The purpose behind timing analysis is to establish temporal correlation between various nodes of the design i.e. even though 2 or more nodes have same toggle frequency; this will not switch all instances in design simultaneously unless needed. In this work, we have chosen to work with toggle frequency and delay instead of timing window [28][45]. The reasons, Not all circuit nodes switch in all the clock cycles. Average activity computation establishes relative amount of switching among various nodes. This is possible because activity estimation techniques consider circuit functionality. Average switching activity for most of nodes is believed at 20% of the controlling clock frequency. In certain solutions, the average switching activity for non clock signals is assumed to be 10% only. Timing window method uses classical path sensitization to identify the interval of switching. Inherent assumption of STA that all activity on a path should finish within 1 clock period (unless specified explicitly using multi-cycle path), the timing intervals for all nodes will lie within a clock period. This makes whole approach of pseudo dynamic simulation pessimistic. (see results) During timing analysis, we collected 2 sets of data. One, sensitization edge of the node i.e. whether the node is rising or falling at that time and second, delay of the node from reference node. Definition: Reference nodes are those nodes that can be considered as 0 delay nodes. All the flip-flop outputs are considered as reference node in our analysis. When the input clock to the flip-flop has some propagation

77

delay associated with it, the reference node will have delay associated with it. It can be seen that any frequency higher than 1 MHz will have at least some repetition in its current signature i.e. a node is switching at 50 MHz (20ns) will have 50 repetitions of its current signature in 1000 ns simulation. By changing the minimum frequency, we can change the simulation time considerably. For example, by changing minimum frequency to 50 MHz, we can ensure that all the current sources with less than 50 MHz do not contribute (or contributes an average current) to dynamic V drop analysis and in that case maximum simulation time can become only 20 ns. In all our analysis we have assumed 1 MHz as minimum frequency. Number of points in piece wise linear current waveform is based on the sampling resolution that we did as first step after reading characterized data. An increase or decrease in this frequency can change the accuracy trading some runtime. In our analysis, we have assumed 75 ps as sampling interval. Clock network toggles all the time. Also many designs aim for smaller insertion delays as well as near zero skew. This makes clock network as one of the largest contributor of total current as well as peak current.

4.4 Complete Flow


Cell characterization and PG network modeling is explained in Figure 4.11. We take Verilog Netlist as an input and calculate average toggle frequency of each circuit node using simulation less approach. The frequency constraints are user conditions to drive the frequency calculation

78

of any node. Alternatively frequency constraints can be generated from logic simulation or functional patterns. SDC contains timing constraints of the design. This is used in toggle activity calculation as well as timing analysis. Timing information consists of max delay for paths converging to any node and sensitization edge across that path. Current signatures for each of the blocks (library macros as well as hierarchical block) are generated from current models, timing information and activity estimation. The document explains, all the three processing steps toggle calculation, timing measurement, current signature generation and block modeling in detail. Once the current signatures are hooked to parasitic PG-network, a transient simulation is performed to measure V-drop at each macro node as well as dynamic transient current waveform is generated for the power-ground pins. The V-drop data is being fed to timing analysis engine to analyze impact of V-drop to timing.

Netlist

Frequency Constraints

SDC

Toggle Frequency Calculator

Timing Analysis

PWL Generator

Current Char

RLC netlist with current sources

SPICE Simulation

Peak Dynamic Power/Supply Noise


Figure 4.11 Peak IR drop Computation Flow

79

Next sections explain Power Grid Generator, Timing Information Generation and SPICE simulation details. 4.4.1 Timing Information Generation Timing information was generated using Prime Time. Prime Time requires Verilog netlist, SDC and SPEF (Standard Parasitic Exchange Format) files as an input. We also wrote a tcl script (Prime Time supports TCL command language) to get arrival time information for all nodes of the circuit. Prime Time flow is shown in Figure 4.12 below. Sample SDC file [24][25] and SPEF used are shown in Appendix A and B.

SDC File

Verilog Netlist

SPEF

Prime Time

Arrival Time Computation

Timing Report
Figure 4.12 Prime Time flow for arrival time computation

4.4.2 Power Grid Generator The Power Grid Generator flow is expanded further below in Figure 4.13.

80

Cell Char @ fix frequency (10MHz in our work)

Cell Flow

Toggle Frequency Calculator

Perl Code (Processes various Inputs)

Timing Report (delay information)

MATLAB Program -Compression Factor computed (M) - M based compression in freq domain

Analysis Flow

Perl Code PG Mesh Generation Current PWL hookup PG Network

Figure 4.13 Power Grid Generation Flow

PERL program combines the toggle frequency values obtained using TFC and delay values for corresponding nodes for all the nodes. The output file containing this information for all the cells is given to MATLAB. MATLAB program It is given two inputs. One being the current data at prototype frequencies for all the gates. The other input is a file containing delay and average activity information for all the cells of the circuit. Depending upon the activity, the prototype current data is compressed. And this data is shifted by the amount equal to the delay at that node. The same procedure is repeated for all the cells. This information about the current data for all the cells is stored in a file. The second input is a file, which contains the following information about the VLSI circuit for which we have to obtain the power data.

81

Based on the generated current signatures, a new PG network is created. After this, all the macro instances are replaced with the corresponding current signatures. In our analysis, we took a PG network with uniform Power Grid and ideal GND. We did not do any actual power routing but attached the current sources randomly. This is compared with actual spice circuits for all macros in the same PG network at the same locations. 4.4.3 SPICE Simulation Now, each cell is replaced by current source driven by its corresponding PWL data. Package R, L & C is attached to the top-level power pins. SPICE simulation is performed. The voltage at each node of the power mesh is punched. The IR drop for each cell is calculated using a CODAC (Characterization & Optimization of Digital & Analog Circuits) program (TI Internal Program), which subtracts power supply from the minimum voltage obtained at each node to give the Peak Dynamic IR Drop at that node. This is done for all the nodes of the circuit. The same CODAC program can be used to calculate the Average Dynamic IR Drop at each node of the circuit.

4.5 Validation and Results


In this work, we have done following simplifications: Modeled power grid by creating an nxm mesh. The resistance of each arm in mesh was derived from Ohm/um number. We also assumed 2 such arms in parallel to comprehend multi-layer chip scenario. Matrix solver was not developed as part of this work. Instead, we used SPICE simulators available.

82

We executed the flow as explained in previous section. Instead of 1MHz, we used 10MHz for characterization. This is to reduce the amount of data. We still did 13.33GHz sampling of cell data. 4.5.1 Peak Power Results Three small circuits were studied to stabilize the above approach. These three circuits are TWOAND :- The circuit consist of two AND gate one after the another. ANDOR :- The circuit consists of one AND gate followed by one OR gate. 2AND-1OR :- This circuit has two AND gate at the first level. The outputs of these AND gates are given to an OR gate whose output is the final output. The peak power data is obtained for three small circuits using the approach described in the report and using SPICE simulation. The data obtained using average switching activity approach and SPICE for 100 Mega Hz and 500 Mega Hz input frequency is given below in Table 4.1.

PEAK POER (Watts)

TWOAND FREEQUNCY Our Spice Approach

AND-OR

2AND-1OR

Our SPICE Approach SPICE

Our Approach

0.0016817 100 MHz 0.0016 0.0009409 0.0008421 0.0019253 0.0019

83

500 MHz

0.00168113

0.0016

0.0009410

0.00086539

0.00192531

0.0018

Table 4.1 Comparison of Peak power Dissipation

4.5.2 Peak Dynamic IR Drop Results For determining peak Dynamic IR drop, initially three circuits were used. 100 Inverter Chain It is a chain of 100 inverters with the output of the previous inverter acting as the input of the next. Delay of the chain is higher than the frequency of operation. 32 Bit Shift Register This 32-bit shift register is series/parallel shift register. Depending upon the input and selection criteria, the input is shifted in series or parallel manner. 16 Bit Adder This is 16-bit binary adder. Carry Forward logic is used for addition.

Following points are taken into account while generating the net lists for these circuits. Package RLC is added to each power pad. Ideal voltage source is attached to each power pad. Uniform mesh structure is used and all leaf cells are placed randomly on to it. Reduced interconnect network was used using driving point admittance estimation for power as well as signal lines. No existing decoupling capacitors were estimated.

The peak Dynamic IR drop data is obtained using Average Activity approach, Timing Window approach and SPICE simulation. The data obtained is shown in Table 4.2.

84

Circuit 100 Inverter Chain

%Drop in average activity 1.65

%Drop in Timing Window Approach 6

SPICE %Drop 1

32 Bit Shift Register

17.5

40

12

16 Bit Adder

31

NA

19.16

Table 4.2 Comparison of percentage peak instantaneous IR drop

It is clear that the accuracy of the Average Activity method is better than Timing Window method. To check the performance of this approach, Average Activity method was applied to a few industry standard circuits. Table 4.3 below shows the comparison of the maximum Dynamic IR Drop in a circuit using average switching activity and Power Mill. Power Mill is a SPICE based transient analysis tool offered by Synopsys. It is now called Nano Sim.

circuit s27

%V Drop using avg activity 4.5

%Vdrop in Power Mill 5.8

%Error -22.4138

s344

6.3

6.6

-4.54545

s349

6.2

7.5

-17.3333

s444

8.6

13.3

-35.3383

s1238

13.4

13.3

0.75188

s298

12.5

15

-16.6667

Table 4.3 Comparison of percentage peak IR drop on ISCAS89 circuits

85

Power Supply Noise waveforms for average activity approach to spice simulation with actual logic is shown in Figure 4.14, Figure 4.15 below.

Figure 4.14 PSN waveform of Proposed Method

Figure 4.15 PSN Reference Waveform

86

4.6 Summary
We proposed novel PG network modeling technique. The approach involves average switching activity calculation, transient current characterization of basic Boolean gates of library, derivation of PG network model and doing transient simulation of the PG model using vector less approach. The results are derived from this simulation as desired. Further, our global average switching activity calculation method ensures that we can consider global timing impact due to global voltage drop without causing extra runtime. This reduces the need of local maximum voltage drop analysis on timing [26]. It is also noted in our approach that we have detailed data of voltage drop across chip/block and based on this profile, we can also use suitable decoupling placement at required location. The validation is done and results are compared with dynamic fast SPICE simulator (Nano Sim) and proved that this average switching rate calculation gives as close results as dynamic vector analysis. However, the advantage comes from the fact that average switching activity also gives accurate analysis of average V drop. Hence the approach we are suggesting gives both average and dynamic PG noise results simultaneously. The approach is scalable to multimillion gate designs by using the technique proposed by Blaauw et al [55]. There is further possibility to expand this work to understand decap sensitivity as well as to skew the analysis for certain end target e.g. PG grid robustness or Monte Carlo based analysis for higher accuracy and coverage.

87

88

5 Power Up Analysis
One of the popular techniques to reduce leakage is to use gated power supply. [74, 79, 80]. Shekhar [74] has highlighted a technique called sleep transistor and challenges associated with that. This technique proposes to gate power supply using a high threshold transistor when not required as shown in Figure 5.1. The sleep transistor also known as power switch turns off power supply when a portion of chip is idle and thus saving leakage current. Apart from design challenges, the technique has additional Design Analysis challenges as mentioned below.

Figure 5.1 Gated Power Supply ([74])

1. When Power Supply turns on from off state, a huge capacitive load gets charged causing a huge surge in current causing Power Supply Noise (PSN). This can couple with signal lines causing state change or delay change. It can also remain within supply

89

network but causing huge dynamic IR drop that in turn affects circuit performance. The goal is to predict the surge and control that. 2. The transistor in series with the supply acts as a huge resistor in normal mode of operation causing additional IR drop. This in turn degrades performance. The IR drop across the transistor can be as high as 5-20mV. The goal is to do an average IR drop analysis to access the impact of switch. 3. Optimization of switches to get the best leakage improvement. The optimization has area penalty or IR drop or Power Supply Noise as cost parameters. For example, low number of switches gives good leakage improvement but high IR drop and Power Supply noise. 4. When power supply goes down, all sequential logic in the virtual power domain losses its state. This puts extra constraint overall on system behavior. There is also a technique where the state is preserved through retention flops. [2, 81] The technique does need extra power routing to save state as well as control logic. The timing analysis needs to capture the mode switching. 5. Placement and Routing of extra signals, special cells (like retention flops etc) and virtual power network. 6. Leakage and number of power switch trade off 7. Power routing closes immediately after floor plan. The switches need to be placed by this time. It is important to have early power up analysis flow to compute required

90

number of optimal switches meeting the peak current surge as well as IR drop and leakage needs. Often, PSN is non-negotiable parameter and design-planning goal is to identify total number of switches that limits PSN to user-defined level. This paper describes an analytical method to determine optimum number of power switches and power up glitch. Section II elaborates on switched PG network and PSN problem. Section III outlines the approach to analyze such networks. Section IV correlates the results we have achieved with SPICE and the efficiency of algorithm.

5.1 Switched PG Networks


Power Supply Noise is widely acknowledged research domain in todays high performance designs. There is various analysis techniques also proposed in literature. [26-31] However, there is not much awareness on Power Supply Noise caused by turning on the power domains when gated power supply is used. Figure 5.2 shows switch network for 1M-gate design and Figure 5.3 shows a current glitch and voltage ramp on an arbitrary switch output. Note that the current surge can remain for a considerable amount of time causing performance impact to on blocks.

91

Power Switch

Figure 5.2 Layout of 1M gate with switch network

Figure 5.3 Current Glitch and Voltage Ramp at arbitrary switch output

A typical PG network with Power Switches can be represented as shown in Figure 5.4. Some of the characteristics of this network are: [87] 2 domains one golden domain and non-gated power supply, second multiple virtual domains and switched power supply. All virtual domains are unconnected within. They are connected to golden domain through switch network.

92

Switch network consists of one or more different kind of switches for a given domain. Switch network across virtual power domains are not shared. Random logic is connected to golden domain as well as all virtual domains. Control logic enables any one or more virtual domains to turn on/off any time. Further, any switch network consists of parallel network or sequential network or combination of both. Parallel configuration allows all switches to turn on simultaneously whereas sequential configuration allows each switch to turn on one by one after some delay.
Switch Control Logic

Offchip Power supply

NonGated Power Network

VDD

Switch Network

SW

Virtual Power Network

Logic Network

Logic Network

ZOOM
SW1 SW VDD SW2 SW VDD SW3 SW VDD

N Switches

Parallel Configuration
VDD SW1 SW VDD SW2 SW VDD SW3 SW
D1 D1 D1

N Switches

Sequential Configuration
Figure 5.4 Typical PG network with Power Switches

When the power supply is off and virtual network is disconnected, the current that passes through is leakage current. If leakage current of the virtual logic is significantly higher than that of switch network leakage, leakage current improvement happens. When the switches are turned on i.e. when the power supply connects to virtual power network, the loads in virtual

93

power network start getting charged. Loads include interconnect capacitances, gate capacitances as well as the circuit diffusion/diode caps. The amount of current being sunk by these caps depends on the ability of switch network to provide charge in a given time. Due to fast current need of the virtual power domain, there is L*di/dt noise being injected into circuit that can affect normal functioning of the golden power domain. Note that despite of capacitive load dominating, the peak current is still limited by saturation current of switch causing current profile we got in Figure 5.3.

5.2 Switch Network Analysis


Switch Network Analysis (SNA) early in design-planning includes decision of switch network topology, identification of switches to be used, total system timings for turning on/off power domains as well as total power supply noise contribution by a switch network. Sequential configuration allows configuring delay such that the peak current at any point of time can be controlled to meet the specification of system noise and hence the tradeoff between the total time systems requires to on/off virtual network and the noise criteria. This information should go to the placement and routing tools for physical design. Further, switch network contribution comes from maximum current surge it causes and the point of optimization there is total number of switches of each type in the network and delay. Following assumptions are made to keep the analysis simple but in reality the solution can be extended to handle them. Delay between two consecutive switches is same. 2 types of switches exist in the network.

94

Voltage at any node in virtual power network is of the same value at any time instant during power ON if there is zero static IR drop.

Switch Network is sequential. Parallel configuration essentially means a BIG switch all transistors forming a BIG switch with characteristic lumped to a single MOS.

High-level flow for the analysis is shown in block diagram Figure 5.5.

Switch IV Characterization

Current prediction that charges capacitive load

Determination of required parameters


Figure 5.5 Schematic Switch network Analysis Flow

5.2.1 Switch Characterization Switch IV Characterization includes current being sourced through switch for different voltages between golden and virtual power port of switch. This is achieved using transient SPICE simulation of the switch. The data is stored in value-pair (voltage-current) format for further processing. Switch characterization also involves switch ON resistance measurement. This is resistance that switches offer during normal functionality i.e. when switches are turned ON and virtual power network is connected to golden power network. This is measured by putting 10mV battery across switch and measuring current. This resistance value is later used for average IR drop analysis across switch.

95

Note that the 1 st characterization IV characterization that we did also is resistance characterization. This resistance varies for different value of voltages across switch so it is also called non-linear resistance characterization. 5.2.2 Current or Switch Prediction Current prediction is done based on simplified extracted model of block under consideration as Figure 5.6. The switch network is modeled along with its detailed connectivity and timing whereas the logic connected to virtual domain is modeled as capacitive load. Current through switch is predicted in infinitesimal small time duration. The CV characteristic is applied here as below: Current(I) =dq/dt OR dq = I dt But dq = C * dv Hence dv = I * dt / C 1 2 3

VDD

Switch Network

Vout Extracted Total Cload

Figure 5.6 Analysis model of Virtual Power Network

Equation 3 forms the basis of Algorithm 1 described in next section. The delay between two consecutive switches is used to predict the charge being supplied by the switch to virtual power
96

network domain. The IV table of the switch is used to predict current by further dividing delay into infinitesimal small time duration as shown in Figure 5.7. Based on the initial voltage and charge supplied, the voltage has been derived when the next switch just starts turning on. This process continues till either all switches are turned on or the specified voltage level is reached. Further, the same method continues if all the switches are turned on but voltage value is lower than the ideal voltage value (VDD golden) to predict the maximum surge in current. Predicted number of switches is used to predict static IR drop across switch network as explained in Algorithm 2. This is another important parameter that will not be discussed further in this chapter.

Figure 5.7 Infinitesimal Time Division for Current Prediction

Parameters those can be analyzed through this setup include: Total number of switches required reaching a required voltage value. Alternatively, voltage value that can be reached with given number of switches.

97

Maximum current surge that will happen given the number of switches. Delay impact of consecutive switches while they turned on. IR drop across switch network

5.2.2.1 Algorithm for Power Switch Network Analysis: Initialize load voltage to zero and current charging to Zero. { For each, infinitesimal small times period, predict the current based on the voltage at lumped load from IV table of the switch type. Identify the actual current based on the number of switches turned on at the particular instance of time. Track the current at VDD i.e. if the new current is greater than old one, assign maximum surge current to new current. Calculate the rise in voltage in the infinitesimal small time based on equation (3). Continue till either all the switches are turned on or the desired voltage level is reached. } Print maximum surge current and voltage level reached after turning on some specific switches as required by user.

98

Above algorithm is developed for the case where the delay between 2 consecutive switches in sequential switch network is same. However, it is possible to extend for different delay scenario. In this case, we need to use timing information from Static Timing Analysis or simulations. 5.2.2.2 Algorithm for Static IR drop analysis across power switches: { Read switch characterization data for static IR drop, read ON Channel resistance (RON) Determine total number of switches required to reach desired voltage level desired voltage level is specified by user by Algorithm for power Switch Network Analysis Effective resistance of the switches predicted above (N) is: RON/N Compute power consumption of switched off or virtual power network using any methods described in this work (can be outside this work also!) Compute average current consumption of the virtual power network. Iavg = Pavg/VDD Static IR drop across switch network is: Iavg*RON/N. }

5.3 Results and Analysis


Traditional approach to study above would be full-fledged SPICE simulation that includes virtual power network and switch network where each switch is turned on after some delay. Note that here we are talking about thousands of switches in switch network and about million

99

gates in the virtual network or more. This will take weeks to simulate even with fast SPICE simulators available in market. Also it is very late in design cycle! Alternately we can reduce the virtual power network by modeling the interconnect load and gate capacitance with a huge distributed capacitance and on channel transistor resistance with effective resistance in series with each distributed C to reduce the number of active elements and simulate the reduced power network using SPICE (Figure 5.8). This approach gives orders of improvement in terms of simulation time but the run time is still days. This can be done during design planning or after detailed design is over!

Figure 5.8 Reduced Switch Network for validation

The technique we presented in last section is static in nature and reduces the runtime to few minutes and gives very good correlation to techniques described above. The algorithms described above were analyzed with switches designed in TIs 90 nm node. All the results below are for a 1M equivalent gate block. 1M Gates could not be simulated using SPICE along with switches so a simplified model described in previous paragraph was employed to get

100

SPICE accuracy data while keeping switch network intact. We had employed switch network with two kinds of switches for this analysis [87]. One set of switches took the virtual domain till a specific voltage level and second kind of switches with high capacity were turned on in a sequential manner to measure surge in current. Table 5.1 shows prediction of switches for given voltage. When the numbers of switches are increasing the algorithm gives results within 1% accuracy to SPICE based simulation whereas when the numbers of switches are less, the inaccuracy is within 10%. In other words, the actual number is quite close to realistic number with accuracy 1-10%. This table also shows the current surge prediction and the switch number which turns ON causing maximum peak. Essentially, along with surge, we predict the switch at which the maximum surge occurs. This helps to further optimize the 2nd type of switch network. Table 5.2 shows voltage prediction given the number of switches. The advantage of whole solution comes from the superlative run time improvement that enables early analysis and tradeoffs in the design Table 5.3. The runtime clearly outweighs the small inaccuracy in switch prediction or voltage prediction. Note that runtime does not include switch IV characterization time since it is one time effort. In static analysis, we can dump lot more information quickly as per the need to understand certain behavior for tradeoff analysis. We can also predict time domain behavior of voltage and current using the approach described in this work. Figure 5.9 compares predicted voltage over time to few arbitrary nodes simulated in SPICE. Figure 5.10 compares predicted current over time to current measured at VDD. This is good considering that the analysis is targeted for early trade off analysis.

101

Actual Vdesired (mV) #Switches

Switches by Algorithm

Current Surge (mA)

Current Surge after #switches

20

380

403

950

123

69

760

771

881

114

271

1560

1554

749

100

583

2340

2328

467

97

869

2964

2971

266

81

1170

4368

4308

24

43

Table 5.1 Switch Prediction by proposed algorithm

Surge Current # Switches Simulated Voltage (mV) Voltage by Algorithm Surge after switch # Current (mA) voltages %Error in

780

63

70.54

892

101

11

1560

280

273.53

784

94

-0.2

2340

587

589.26

546

78

0.38

3120

926

927.7

263

64

0.18

Table 5.2 Voltage Prediction

102

No. of switches

Simulation Time (in days)

Algorithm Runtime (in mts)

780

~1.5

<1

1560

~4

<1

2340

~5

<1

2940

~6

<1

Table 5.3 Power Up analysis - Runtime Comparison

1400 1200

Voltage in mV

1000 800 600 400 200 0 Time Predicted SPICE@node1 SPICE@node2

Figure 5.9 Voltage Ramp up over Time for various nodes

1000

Current in mA

800 600 400 200 0 Time Predicted SPICE

Figure 5.10 Current comparison over time

103

5.4 Summary
There are various techniques to improve leakage power of the design - gated power supply or sleep transistor or switched power network is one of the efficient methods to reduce the leakage power. The analysis techniques described in this work helps in giving quick data for architecture level decisions while using switched network technique. The runtime is in few seconds and hence Design Team can do lots of iterations to get the optimum number of switches. The analytical method to calculate total no of switches is fast since it involves one time SPICE simulation only IV characteristic of switch - and rest of the analysis is performed using static analysis. We have also analyzed power on glitch for the design using the method that contributes to Power Supply Noise during power up. All the results are closely matching with SPICE simulation.

104

6 Conclusion
6.1 Summary
Power Grid analysis challenges being faced by CMOS technology is discussed in this thesis. For robust power grid, designs need to go through following analysis: Accurate Power Estimation Instantaneous IR drop analysis and decap planning Power Up analysis for designs using MTCMOS for leakage reduction

The key results of this work can be summarized as follows: 1. Successfully implemented hierarchical probabilistic toggle computation approach that is applicable to multi-million gate designs maintaining the desired accuracy 2. Power Dissipation in cell based CMOS design discussed. A flow is proposed to do power estimation in various design stages that can improve the accuracy of estimation. The flow also helps user to make run time and accuracy tradeoffs 3. Proposed the cell characterization methodology for instantaneous IR drop analysis as well as Power Up analysis for MTCMOS 4. Discussed a prototype flow developed for instantaneous IR drop estimation based on average toggle rate computed by the proposed toggle methodology in this work. This flow estimates instantaneous as well as average IR drop numbers during same simulation.
105

5. Power Up analysis for MTCMOS based digital designs. The methodology is validated using prototype flow and gives superlative run time improvement compare to Spice. The methodology also helps in MTCMOS gate optimization.

6.2 Scope of Future Work


Analysis approaches proposed in this work helps in robust power grid analysis. The work has some extensions possible to further help designs. First, power estimation proposed in this work relies on gate level netlist. An RTL level power estimation helps block designer to trade off power early in the design like MTCMOS usage or multi-Vt usage as proposed in [17]. Second, it is possible to improve pre-layout and post layout power number correlation. One of the reasons for them to be different is clock tree expansion and buffer insertion while doing placement and routing in design to meet timing constraints. Early estimation techniques can be developed to estimate additional cell count to better correlate power numbers in various stages. Third, the amount of cell characterization data stored for each cell is very huge. A typical ASIC technology contains 2000-4000 cells. This data reduction is possible if we can just store the current signatures during transition and use that to model current source in block level analysis. This will also eliminate the need of frequency domain transform being performed here. Techniques used in some of the commercial tools in conjunction with the analysis approach presented in this work can help improving data reduction. Fourth, we have not got into details of decoupling capacitance for instantaneous IR drop analysis in this work. It is possible to further extend the work to extensively study various

106

decoupling capacitors intrinsic due to NWELL, non switching gates, RAMs as well as intentional being distributed by user. Decoupling capacitor estimation, characterization and what-if impact analysis on instantaneous IR drop is import area for further research. Fifth MTCMOS analysis approach proposed in this work is useful early in design planning to make efficient tradeoffs of MTCMOS switches vs. noise tolerance levels in design. In this work, we have modeled switch power network with a lumped capacitance. This does not model time domain behavior of PG network due to PG resistance. A more accurate approach can be developed that models distributed RC for PG network once placement and power routing is done. It is our belief that this will give quick accurate analysis of actual network compare to SPICE like simulations.

107

108

7 References
1. Semiconductor Industry Assoc., International Technology Roadmap for Semiconductors, 2003 Update -

http://public.itrs.net/Files/2003ITRS/Home2003.htm 2. 3. 4. Nam Sung Kim, David Blaauw et al, Leakage Current: Moores Law Meets Static Power, IEEE Computer, Dec 2003. The SPICE Home Page, http://bwrc.eecs.berkeley.edu/Classes/IcBook/SPICE/ Rabe, D; Jochens, G.; Kruse, L.; Nebel, W, Power-simulation of cell based ASICs: accuracy- and performance trade-offs, Proceedings of Design automation and test in Europe, Feb 1998 5. 6. F. Najm, A survey of power estimation techniques in VLSI circuits, IEEE Trans. VLSI System., vol. 2, pp. 446455, Dec. 1994. C. Y. Tsui, M. Pedram, and A. Despain, Efficient estimation of dynamic power dissipation under a real delay model, in Proc. IEEE Int. Conf. Computer-Aided Design, 1993, pp. 224228 7. B. J. George et al., Power analysis and characterization for semi custom design, in Proc. Int. Workshop Low Power Design, 1994, pp. 215218. 8. J.-Y. Lin et al., A cell-based power estimation in CMOS combinational circuits, in Proc. IEEE Int. Conf. Computer-Aided Design, 1994, pp. 304309. 9. H. Sarin and A. McNelly, A power modeling and characterization method for logic simulation, in Proc. IEEE Custom Integrated Circuits Conf., 1995, pp. 363366. 10. Synopsys Design Power, (http://www.synopsys.com/products/power/power.html) 11. N. Waste and K. Eshragian. Principles of CMOS VLSI Design. VLSI Systems Series. Addison-Wesley, 1985. 12. Najm, F.N, Transition Density, a stochastic measure of Activity in Digital Circuits, DAC, pp. 644-649, June 1991. 13. Ghosh, A.; Devadas, S.; Keutzer, K.; White, J, Estimation of average switching activity in combinational and sequential circuits, DAC, pp. 253-259, June 1992 14. S. Bhanja, N. Ranganathan, Dependency Preserving Probabilistic Modeling of Switching Activity using Bayesian Networks, 38th Design Automation Conference, pp. 209-214, 2001. 15. HUGIN API reference manual. Version 5.3. http://www.hugin.com 16. David Heckerman, A tutorial on learning with Bayesian Networks, ftp://ftp.research.microsoft.com/pub/tr/tr-95-06.pdf, March 1995. 17. Agarwal, A.; Mukhopadhyay, S.; Raychowdhury, A.; Roy, K.; Kim, C.H, Leakage power analysis and reduction in nanoscale circuits, IEEE Micro, Volume 26, Issue 2, pp. 68-80, March 2006. 18. Keshavarzi, A.; Tschanz, J.W.; Narendra, S.; De, V.; Daasch, W.R.; Roy, K.; Sachdev, M; Hawkins, C.F, Leakage and process variation effects in current testing on future CMOS circuits, IEEE Design & Test of Computers, Volume 9, Issue 5, pp. 36-43, Sept 2002. 19. Dresig, F. Lanches, P. Rettig, O., et al, Simulation and reduction of CMOS power dissipation at logic level, Design Automation, 1993, with the European Event in ASIC Design. Proceedings, pp. 341-246, Feb 1993. 20. An-Chang Deng Yan-Chyuan Shiau Loh, K.-H, Time domain current waveform simulation of CMOS circuits, IEEE international conference on Computer aided design 1988, pp. 208-211, Nov 1988.

109

21. F.N. Najm, R.Burch, P. Yang, and I.N. Hajj. Probabilistic Simulation for Reliability Analysis of CMOS VLSI Circuits. IEEE Transactions on CAD, 9(4):439-450, April 1990. 22. Randal S and Tom Phoenix and Brian d foy, Learning Perl, 4th Edition, OReilly & Associates, ISBN 0596101058 23. Matlab Tutorial, http://www.math.ufl.edu/help/matlab-tutorial/ 24. Synopsys, Inc, Using the Synopsys Design Constraints Format, Application Note, Sept 2005. 25. Himanshu Bhatnagar, Advanced ASIC Chip Synthesis: Using Synopsys Design Compiler Physical Compiler and Primetime, 2nd Edition, Kluwer Academic Publishers, ISBN: 0792376447. 26. Martin Saint-Laurent, Swaminathan, "Impact of Power Supply Noise on Timing In High Frequency Microprocessors", IEEE Trans on Advanced Packaging, pp. 135-144, Feb 2004 27. Kriplani, H.; Najm, F.; Hajj, I, Improved Delay and Current Models for Estimating Maximum Currents in CMOS VLSI Circuits, ISCAS 94, pp. 435-438, June 1994. 28. Kriplani, H.; Najm, F.N.; Hajj, I.N, Pattern Independent Maximum Current Estimation in Power and Ground Buses of CMOS VLSI Circuits: Algorithms, Signal Correlations, and Their Resolution, IEEE Trans on CAD of international circuits and systems, pp. 9981012, Aug 1995. 29. Hsiao, M.S.; Rudnick, E.M.; Patel, J.H., Peak Power Estimation of VLSI Circuits: New Peak Power Measures, IEEE Trans on VLSI Systems, pp. 435-439, Aug 2000 30. Qing Wu; Qinru Qiu; Pedram, M, Estimation of Peak Power Dissipation in VLSI Circuits Using the Limiting Distributions of Extreme Order Statistics, IEEE Trans on CAD of integrated Circuits and Systems, pp. 942-956, Aug 2001. 31. Boliolo, A. Benini, L. de Micheli, G. Ricco, B., Gate-level power and current simulation of CMOS integrated circuits, Very Large Scale Integration (VLSI) Systems, pp. 473-488, Dec 1997 32. Anantha Chandrakasans Home Page: http://www-mtl.mit.edu/~anantha/publications.html,

http://www.fetchbook.info/search_Anantha_Chandrakasan/searchBy_Author.html 33. FFT Tutorial, http://www.ele.uri.edu/~hansenj/projects/ele436/fft.pdf 34. Jeff Tranter and Paul Raines, Tcl/Tk in Nutshell, OReilly Associates, ISBN 1565924339 35. Alan V. Oppenheim, Ronald W. Schafer, John R. Buck, Discrete Time Signal Processing, 2nd Edition, Prentice Hall, ISBN 0137549202 36. Chen, H.H.; Ling, D.D, Power Supply Analysis Methodology for Deep-Submicron VLSI Chip Design, DAC, pp. 638-643, June 1997. 37. Yi-Shing Chang; Gupta, S.K.; Breuer, M.A, Analysis of Ground Bounce in Deep-Submicron Circuits, VLSI Test Symposium, pp. 110116, May 1997 38. Yi-Min Jiang; Kwang-Ting Cheng; An-Chang Deng, Estimation of Maximum Power Supply Noise for Deep Sub-Micron Designs, International sym on low power electronics and design, pp. 233-238, Aug 1998. 39. Zhao, S.; Roy, K.; Koh, C.-K, Estimation of Inductive and Resistive Switching Noise on Power Supply Network in Deep Sub-Micron CMOS Circuits, International conference on Computer Design, pp. 65-72, Sept 2000. 40. S. Bobba, I.N.Hajj, Maximum voltage variation in the power distribution network of VLSI circuits with RLC Models, Proc of ISLPED, Aug2001

110

41. Bai, G.; Bobba, S.; Hajji, I.N, "Static Timing Analysis Including Power Supply Noise Effect on Propagation Delay in VLSI Circuits", DAC, pp. 295-300, 2001. 42. G. Steele, et al., Full-Chip Verification Methods for DSM Power Distribution Systems, Proc. Of DAC, pp. 744-749, 1998 43. R. Chaudhry, D. Blaauw, R. Panda and T. Edwards, Current Signature Compression For IR-Drop Analysis, Proc. Design Automation Conference, pp. 162-167, 2000 44. S. Bobba and I. N. Hajj, Estimation of maximum current envelope for power bus analysis and design, Proc. of ISPD, pp 141-146, Apr 1998 45. Rishi Bhooshan (TI) et.al, A Unique Method For Dynamic Voltage Drop Analysis and Decoupling Capacitance Estimation,, VDAT 2003 46. Cirit, M.A., Characterizing a VLSI standard cell library, Digital Object Identifier 10.1109/CICC, pp.25.7.2-25.7.4, May 1991 47. Debnath, S.P.; Sukumar, J.; Udaykumar, H, A methodology for fast vector based power supply and substrate noise analyses, International conference on VLSI Design, pp. 808-811, Jan 2005. 48. Dalal, A.; Lev, L.; Mitra, S.; Design of an efficient power distribution network for the UltraSPARC-I microprocessor, IEEE conference on Computer Design: VLSI in computers and processors, pp. 118-123, Oct 1995 49. Chen, H.H.; Schuster, S.E.; On-chip decoupling capacitor optimization for high-performance VLSI design, VLSI Technology, Systems and Applications, pp. 99-103, June 1995. 50. Larsson, P, Power supply noise in future IC's: a crystal ball reading, Custom Integrated Circuits, pp. 467-474, May 1999. 51. Sotman, M.; Popovich, M.; Kolodny, A.; Friedman, E, Leveraging symbiotic on-die decoupling capacitance, Electrical Performance of Electronic Packaging, pp. 111-114, Oct 2005 52. Larsson, P, Resonance and damping in CMOS circuits with on-chip decoupling capacitance, IEEE Transactions on Circuits and Systems-I, vol 45, pp. 849-858, Aug 1998 53. Larsson, P, Parasitic Resistance in an MOS Transistor Used as On-Chip Decoupling Capacitance, IEEE Journal of Solid State Circuits, vol 32, pp 574-576, Apr 1997 54. Chaudhry, R.; Panda, R.; Edwards, T.; Blaauw, D, Design and analysis of power distribution networks with accurate RLC models, International conference on VLSI Design, pp. 151-155, Jan 2000 55. Min Zhao; Panda, R.V.; Sapatnekar, S.S.; Edwards, T.; Chaudhry, R.; Blaauw, D, Hierarchical analysis of power distribution networks, DAC, pp. 150-155, June 2000 56. IBM Methodology for Power Supply Noise - http://www.research.ibm.com/da/nova.html 57. R. Heald et. al, Implementation of a 3rd Generation Sparc V9 64b Microprocessor, Proc IEEE ISSCC, pp. 412-413, 2000 58. Yi-Min Jiang Kwang-Ting Cheng, Analysis of Performance Impact Caused by Power Supply Noise in Deep Submicron Devices, DAC, June 1999 59. Apache Design Solutions, Reshaping Nanometer Flows with Physical Power Integrity, http://www.apache-da.com, White Paper, May 2003. 60. Anthony Ralston, Philip Rabinowitz, A First course in Numerical Analysis, 2nd Edition, Dover Publications, ISBN 048641454X. 61. Kalpesh Shah, SNUG 2006 Panel Discussion

111

62. H. Mehta, R.M.Owens, M.J.Irwin, Energy Characterization Based on Clustering, 33rd Design Automation Conference, June 1996. 63. D. Brooks, V. Tiwari, and M. Martonosi, Wattch: A framework for Architectural-Level Power Analysis and Optimizations, Proc of International Symposium on Computer Architecture, pp. 83-94, June 2000 64. V. Tiwari, S. Malik, and A. Wolfe, Power Analysis of Embedded Software: A First Step toward software power minimization, IEEE Trans VLSI Systems, vol2, no. 4, pp 437-445, 1994 65. E. Macii, M. Pedram and F. Somenzi, High Level Power Modeling and Estimation, IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems, vol 17, November 1998. 66. Synopsys Prime Power - http://www.synopsys.com/products/power/primepower_ds.pdf 67. Synopsys Power Compiler - http://www.synopsys.com/products/power/power_ds.pdf 68. Synopsys Nanosim - http://www.synopsys.com/products/mixedsignal/nanosim/nanosim.html 69. Synopsys Liberty Format - http://www.synopsys.com/partners/tapin/lib_info.html 70. M Horowitz and R Gonzalez, Energy dissipation in general purpose Microprocessors, IJSSC, vol31, Sept 1996. 71. Brglez, F. Bryan, D. Kozminski, K. , Combinational profiles of sequential benchmark circuits, ISCAS, vol 3, pp. 1929-1934, May 1989. 72. R. Wilson and D. Lammers, Grove Calls Leakage Chip Designers Top Problem, EE Times, 13 Dec 2002; www.eetimes.com/story/OEG20021213S0040. 73. Intel SpeedStem technology, http://www.intel.com 74. Y.Ye, S Borkar, V. De, A New Technique for Standby Leakage Reduction in High-Performance Circuits, 1998 Symposium on VLSI Circuits, June 1998. 75. M. Powell et al., Reducing Leakage in a High Performance Deep-Submicron Instruction Cache, IEEE Trans. VLSI, Feb 2001, pp 77-89 76. Ali K., Charles H. et al., Effect of reverse body bias for low power CMOS circuits 77. Kaushik R, Mark C.J., Dinesh S., leakage control with efficient use of transistor stacks in single threshold CMOS 78. Shekhar Borkar, Low Power Design Challenges for the Decade, 2001. 79. Kumagai, K.; Iwaki, H.; Yoshida, H.; Suzuki, H.; Yamada, T.; Kurosawa, S.; A Novel Powering Down Scheme for low Vt CMOS Circuits, 1998 Symposium on , 11-13 June 1998. Pages:44 45 80. Mutoh, S.; Douseki, T.; Matsuya, Y.; Aoki, T.; Yamada, J., 1V high-speed digital circuit technology with 0.5&mu;m multi-threshold CMOS, IEEE ASIC Conference, 1993. 81. Akamatsu, H.; Iwata, T.; Yamamoto, H.; Hirata, T.; Yamauchi, H.; Kotani, H.; Matsuzawa, A.; A low power data holding circuit with an intermittent power supply scheme for sub-1V MT-CMOS LSIs, VLSI Circuits, 1996. Digest of Technical Papers., 1996 Symposium on , 13-15 June 1996 Pages:14 15 82. Ye, Y.; Borkar, S.; De, V. , A new technique for standby leakage reduction in high-performance circuits, Symposium on VLSI Circuits, June 1998. Page(s): 40-41 83. Das, K.K.; Joshi, R.V.; Chuang, C.T.; Cook, P.W.; Brown, R.B., New digital circuit techniques for total standby leakage reduction in Nano-scale SOI technology, pp. 309-312, ISSCC, Sept 2003. 84. Wenxin Wang; Anis, M.; Areibi, S, Fast techniques for standby leakage reduction in MTCMOS circuits, ISOCC, pp. 21-24, Sept 2004

112

85. Fei Li; Lei He; Saluja, K.K.; Estimation of maximum power-up current, DAC, pp. 51-56, Jan 2002 86. Calhoun, B.H.; Honore, F.A.; Chandrakasan, A.P, A leakage reduction methodology for distributed MTCMOS, JSSC, pp. 818-826, May 2004 87. Royannez, P.; Mair, H.; Dahan, F.; Wagner, M.; Streeter, M.; Bouetel, L.; Blasquez, J.; Clasen, H.; Semino, G.; Dong, J.; Scott, D.; Pitts, B.; Raibaut, C.; Uming Ko, 90nm Low Leakage SoC Design Techniques for Wireless Applications, ISSCC, pp. 138-139, Feb 2005. 88. R. Heald, et al., Implementation of a 3rd Generation SPARC V9 64b Microprocessor, Proc. IEEE ISSCC, pp 412-413, 2000 89. P. Gronowski, W. Bowhill, R. Preston, M. Gowan, and R. Allmon, High Performance Microprocessor Design, IEEE Journal of Solid State Circuits, vol 33, no 5, pp. 676-686, Apr 1998. 90. J. Darnauer, D. Chengson, B. Schmidt, and E. Priest, Electrical Evaluation of Flip-Chip package Alternatives for Next Generation Microprocessor, Electronic Components and Technology Conference, pp. 666-673, 1998 91. S. Borkar, Low Power Design Challenges for the Decade, Proc. of ISLPED, 2000 92. V. Tiwari, D. Singh, S. Rajgopal, G. Mehta, R. Patel and F. Baez, Reducing Power in High performance Microprocessors, Proc. of Design Automations Conference, 1997 93. Wachnik, R.A.; Filippi, R.G.; Shaw, T.M.; Lin, P.C, Practical benefits of the electromigration short-length effect, including a new design rule methodology and an electromigration resistant power grid with enhanced wireability, Sym on VLSI Technology, pp. 220-221, June 2000. 94. J. Kitchin, Statistical Electromigration Budgeting for Reliable Design and Verification in a 300-MHz Microprocessor, Symposium on VLSI Circuits Digests, pp. 115-116, 1995 95. T .H. Cormen, C. E. Leiserson, R. L. Rivest Introduction to Algorithms, PHI 96. Chapra, S.C, Canale R P Numerical Methods for Engineers 3rd Ed., McGraw-Hill 1998. 97. Rabey, Digital Integrated Circuits Design, Pearson Education, Second Edition, 2003

113

114

Appendix A Sample SDC file


create_clock period <value> [get_ports clk] set_input_delay <value> -clock clk1 [get_ports IN*] set_case_analysis 0 [get_ports *reset* *scan_mode*] report_timing <file name>

115

Appendix B Sample SPEF Format


*SPEF "IEEE 1481-1997" *DESIGN "s27" *DATE "Mon Dec 13 10:05:00 1999" *VENDOR "TI" *PROGRAM "vlog2spef" *VERSION "1.0" *DESIGN_FLOW "Dummy From Verilog" *DIVIDER / *DELIMITER : *BUS_DELIMITER [] *T_UNIT 1 NS *C_UNIT 1 PF *R_UNIT 1 KOHM *L_UNIT 1e-3 UH *I NO210_3:A I *L 0.1 *D NO210 *P G2 I *L 0.1 *CAP 0 G2 0.1 1 NO210_3:A 2 G2:0 0.1

0.1

*RES 0 G2 G2:0 0.1 1 NO210_3:A G2:0 0.1 *END

*D_NET G1 0.1 *PORTS G17 O *L 0.1 G3 I *S 0.1 0.1 G2 I *S 0.1 0.1 G1 I *S 0.1 0.1 G0 I *S 0.1 0.1 PREZ I *S 0.1 0.1 CLK I *S 0.1 0.1 *CONN *I NO210_2:A I *L 0.1 *D NO210 *P G1 I *L 0.1 *CAP 0 G1 0.1 1 NO210_2:A 2 G1:0 0.1

0.1

*D_NET G17 0.1 *CONN *I IV110_1:Y O *L 0.1 *D IV110 *P G17 O *L 0.1 *CAP 0 G17 0.1 1 IV110_1:Y 2 G17:0 0.1

*RES 0 G1 G1:0 0.1 1 NO210_2:A G1:0 0.1 *END

*D_NET G0 0.1 0.1 *CONN *I IV110_0:A I *L 0.1 *D IV110 *P G0 I *L 0.1 *CAP 0 G0 0.1 1 IV110_0:A 2 G0:0 0.1

*RES 0 G17 G17:0 0.1 1 IV110_1:Y G17:0 0.1 *END

0.1

*D_NET G3 0.1 *CONN *I OR210_1:A I *L 0.1 *D OR210 *P G3 I *L 0.1 *CAP 0 G3 0.1 1 OR210_1:A 2 G3:0 0.1

*RES 0 G0 G0:0 0.1 1 IV110_0:A G0:0 0.1 *END

*D_NET PREZ 0.1 0.1 *CONN *I DTP10J_0:PREZ I *L 0.1 *D DTP10J *I DTP10J_1:PREZ I *L 0.1 *D DTP10J *I DTP10J_2:PREZ I *L 0.1 *D DTP10J *P PREZ I *L 0.1 *CAP 0 PREZ 0.1 1 DTP10J_0:PREZ 2 DTP10J_1:PREZ 3 DTP10J_2:PREZ 4 PREZ:0 0.1

*RES 0 G3 G3:0 0.1 1 OR210_1:A G3:0 0.1 *END

*D_NET G2 0.1 *CONN

0.1 0.1 0.1

116

*END *RES 0 PREZ PREZ:0 0.1 1 DTP10J_0:PREZ 2 DTP10J_1:PREZ 3 DTP10J_2:PREZ *END

PREZ:0 0.1 PREZ:0 0.1 PREZ:0 0.1

*D_NET G5 0.1 *CONN *I DTP10J_0:Q O *L 0.1 *D DTP10J *I NO210_1:A I *L 0.1 *D NO210 *CAP 0 DTP10J_0:Q 1 NO210_1:A 2 G5:0 0.1 *RES 0 DTP10J_0:Q 1 NO210_1:A *END

*D_NET CLK 0.1 *CONN *I DTP10J_0:CLK I *L 0.1 *D DTP10J *I DTP10J_1:CLK I *L 0.1 *D DTP10J *I DTP10J_2:CLK I *L 0.1 *D DTP10J *P CLK I *L 0.1 *CAP 0 CLK 0.1 1 DTP10J_0:CLK 2 DTP10J_1:CLK 3 DTP10J_2:CLK 4 CLK:0 0.1

0.1 0.1

G5:0 0.1 G5:0 0.1

0.1 0.1 0.1

*D_NET G6 0.1 *CONN *I DTP10J_1:Q O *L 0.1 *D DTP10J *I AN210_0:B I *L 0.1 *D AN210 *CAP 0 DTP10J_1:Q 1 AN210_0:B 2 G6:0 0.1 *RES 0 DTP10J_1:Q 1 AN210_0:B

*RES 0 CLK CLK:0 0.1 1 DTP10J_0:CLK CLK:0 0.1 2 DTP10J_1:CLK CLK:0 0.1 3 DTP10J_2:CLK CLK:0 0.1 *END

0.1 0.1

*D_NET G10 0.1 *CONN *I DTP10J_0:D I *L 0.1 *D DTP10J *I NO210_0:Y O *L 0.1 *D NO210 *CAP 0 DTP10J_0:D 1 NO210_0:Y 2 G10:0 0.1 *RES 0 DTP10J_0:D 1 NO210_0:Y

G6:0 0.1 G6:0 0.1 *END

0.1 0.1

G10:0 0.1 G10:0 0.1

117

Appendix C Power Waveforms Analysis


AND Gate power waveforms at different frequency points. Note that waveform shape and peaks are matching across frequency range.

Figure 1 1MHz, Peak: 838.9 uW

Figure 2 100MHz, Peak: 840.7 uW

Figure 3 1GHz, Peak: 838.2 uW

118

Appendix D Current Characterization sample spice deck


* *epic *epic *epic *epic

tech="voltage 1.2v" "vdd 0 1.2 0.01" "vss 0 0 0.01" "invoke spice3 %input %output"

* spice options .inc /user/kalpu/cloc/autochar/userware/spice_options noprint * temperature = 25 .temp 25 .inc ../user_data/models_strong noprint *.inc /db/pdk/1233c035a/current/models/current/tis/model.paths.strong noprint .inc /user/kalpu/cloc/autochar/subckt/sr40/an210h noprint PVDD 1.2 vvdd vdd 0 PVDD RVDD VDD VDD_inv1 1000 RVSS VSS_inv1 0 1000 xinv1 A B Y VSS_inv1 vdd_inv1 an210h *10 MHz VA A 0 PULSE 0 PVDD 1n pslew pslew pslew 50n 100n Vb B 0 PVDD Pslew 0.01n pload 50ff CY Y 0 pload .tran 0.01ns 250ns .MEASURE TR AVGPWR AVG P(Vvdd) FROM=20ns TO=60ns .punch tr V(Vdd_inv1 vss_inv1) .punch tr I(VVDD) .punch tr I(rvdd) .punch tr V(A B Y) *.punch tr I(rvdd rvss) .end

50n 100n *Vb B 0 PULSE 0 PVDD 1n pslew

119

Appendix E Waveform transformation example

Figure 4 1MHz base Waveform, 830.4uW

Figure 5 100MHz Transformation, 830.4 uW

120

Figure 6 1GHz Transformation for 1MHz, 830.4uW

121

You might also like