You are on page 1of 12

882

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 7, JULY 2008

Designing a 3-D FPGA: Switch Box Architecture


and Thermal Issues
Aman Gayasen, Vijaykrishnan Narayanan, Mahmut Kandemir, Member, IEEE, and Arifur Rahman

AbstractThree-dimensional (3-D) integration is an attractive


technology to reduce wirelengths in a field-programmable gate
array (FPGA). However, it suffers from two problems: one, the
inter-layer vias are limited in number, and second, the increased
power density leads to high junction temperatures. In this paper,
we tackle the first problem by designing switch boxes that maximize the use of the vias. Compared to the previously used subset
switch box, our best switch box reduces the number of vias by
about 49% and area-delay product by about 9%. For the second
problem, we utilize the difference in power densities between CLBs
and some of the hard blocks in modern FPGAs to distribute the
power more uniformly across the FPGA. The peak temperature in
a two-layer FPGA reduces by about 16 C after our change.
Index TermsField-programmable gate arrays (FPGAs),
switch-box, thermal issues, three-dimensional (3-D) integration.

I. INTRODUCTION

IELD-PROGRAMMABLE gate arrays (FPGAs) are consistently improving in capacity and performance, and are
now among the most popular devices in the market. With their
regular structure, they also scale easily to future technologies.
However, the large overheads of their programmable interconnect are severely limiting their growth. In an SRAM-based
FPGA, the programmable interconnect resources take almost
70% of the die area and consume the major part of FPGA
power. Furthermore, for most designs, they also constitute
more than 50% of the critical path delay. Therefore, a reduction
in the interconnect resources will greatly benefit FPGAs.
Three-dimensional integration is a promising technique for
reducing wire-lengths. It involves the stacking of multiple silicon wafers interconnected with vias. If every layer in a 3-D
chip implements a normal (2-D) FPGA, stacking reduces the
average Manhattan distance between logic blocks, which leads
to fewer interconnect resources. Consequently, 3-D integration
of FPGAs (which we refer to as 3-D FPGA) is an attractive technique to improve the performance of FPGAs. Other gains, such
as reduced design footprint and the ability to integrate different
technologies, further favor 3-D FPGAs.
Manuscript received November 10, 2006; revised March 26, 2007. This work
was supported in part by the National Science Foundation under NSF CAREER
0093085, NSF CCF 0702617, and by a grant from MARCO/GSRC.
A. Gayasen is with R&D Department, Synopsys, Sunnyvale, CA 94043 USA
(e-mail: gayasen@cse.psu.edu).
N. Vijaykrishnan is with the Departments of Computer Science and Engineering and Electrical Engineering, Pennsylvania State University, University
Park, PA 16802 USA.
M. Kandemir is with the Computer Science and Engineering Department,
Pennsylvania State University, University Park, PA 16802 USA.
A. Rahman is with Xilinx Research Laboratories, San Jose, CA 95124 USA.
Digital Object Identifier 10.1109/TVLSI.2008.2000456

Three-dimensional technology, however, suffers from two


problems: 1) the inter-layer vias are limited in number and
2) stacking increases the power density inside the package,
which leads to high junction temperatures. These two issues are
the focus of this paper. We present architecture-level solutions
for increasing via utilization and for reducing the junction
temperatures.
The inter-layers vias are limited because they are large compared to the minimum feature size on the die. While the finest
vias currently available are about 1 m 1 m with a pitch of
about 3 m [1], the global wiring pitch within a die is about 290for 65-nm technology [2]. Although fabrication engineers are
trying to reduce the via dimensions, the minimum feature size on
the die is also shrinking. Therefore, the inter-layer vias are expected to remain larger than the wire dimensions in metal layers
within a die. In this paper, we tackle this problem by designing
switch boxes that maximize the use of the vias. We design six
types of switch boxes, each varying in the flexibility provided
for inter-layer connectivity. The architectures are modeled in
VPR, which we extended for 3-D. Empirical evaluation using
MCNC benchmarks shows that, compared to the subset switch
box used in previous studies [3], our best switch box reduces the
number of vias by about 49% and area-delay product by about
9%.
Junction temperature is a growing concern in FPGAs. Recent
articles on thermal management from leading FPGA manufacturers [4], [5] clearly indicate the growing importance of
thermal issues in FPGA designs. Improvements in fabrication
technology, circuit design, architecture, and tools, have all
contributed toward an increase in FPGA logic density as well as
clock frequency. Increased logic density and performance have
in turn led to an increase in power densities, which manifests
itself in the form of high temperatures. Since 3-D integration
stacks multiple silicon layers, it increases the effective power
density, which makes 3-D integrated circuits (ICs) suffer from
severe thermal issues. In this paper, we utilize the difference
in power densities between CLBs and some of the hard blocks
in modern FPGAs to distribute the power more uniformly
across the FPGA. Experimentation with a fabric resembling the
Virtex-4 FPGA [6] shows a reduction in the peak temperature
of a 2-layer FPGA by about 16 C after our change.
The remainder of this paper is organized as follows. Section I
discusses related work. Section II gives a brief overview of
2-D switch boxes and 3-D technology. In Section III, we
explore six 3-D switch box (SB) topologies for the case when
the vias are fewer than the horizontal wires.1 The switch box
topologies explored in this study are described in Section III-A.
1A

preliminary version of this work was presented at FCCM-06 [7].

1063-8210/$25.00 2008 IEEE

GAYASEN et al.: DESIGNING A 3-D FPGA: SWITCH BOX ARCHITECTURE AND THERMAL ISSUES

Section III-B explains the experimentation methodology and


Section III-C analyzes the exploration results. Section IV
analyzes (see Section IV-A) the thermal profiles in FPGAs, and
then proposes an alternate organization of a two-layer FPGA
to reduce the peak die temperature (see Section IV-B). Finally,
Section V summarizes the contributions of this paper.
II. RELATED WORK
A. 3-D FPGAs
The advantages of 3-D FPGAs have evoked significant interest, and several studies have looked at them in the past. More
than a decade ago, Alexander et al. [8] presented a 3-D FPGA
that used package-level integration to stack multiple 2-D FPGAs
interconnected using solder bumps. The minimum pitch of these
vertical interconnects was 100 m. Campenhout et al. [9] proposed opto-electronic FPGAs, in which the inter-chip communication used optical links. The optical links provide a large vertical channel density. The Rothko 3-D FPGA [10] was a 3-D extension of the Triptych sea-of-gates architecture [11], consisting
of routing and logic blocks. The 3-D integration was done at the
wafer-level and inter-layer communication used metal vias. A
dynamically reconfigurable 3-D FPGA was presented in [12],
which consisted of three physical layers: routing and logic block
layer, routing layer, and memory layer. Recently, Lin et al. [13]
analyzed the performance benefits of a monolithically stacked
3-D FPGA. Their 3-D integration technology provided very fine
vias, which allowed them to stack the configuration memory on
top of the rest of the FPGA (logic blocks and interconnects).
Researchers have also looked at theoretical models for 3-D
FPGAs. Rahman et al. [14] presented an analytical model for
predicting interconnect requirements in 3-D FPGAs, and estimated over 50% reduction in channel width, interconnect delay,
and power dissipation, when compared to 2-D FPGAs. Kwon
et al. [15] recently extended this model to incorporate clustered
logic blocks (similar to Virtex-2 [6]).
On the computer-aided design (CAD) front, Ababei et al.
[3], [16] recently presented a partitioning-based placement algorithm for 3-D FPGAs, which primarily focused on reducing
the inter-layer vias. However, their router was not timing-driven.
Although several researchers have proposed 3-D FPGAs,
the detailed routing architecture of a 3-D FPGA remains unexplored. Ababei et al. [3] assumed a subset switch block (see the
definition in Section II-B). Although Wu et al. [17] designed
universal 3-D switch blocks, they used track count as the sole
metric of quality. Furthermore, they assumed that the number
of inter-layer vias is the same as the horizontal channel width.
In todays technology, especially if we stack more than two
layers, the vias are much thicker than the horizontal wires (1
m versus 0.1 m), which makes this assumption impractical.
B. Thermal Issues
Package designers have been considering thermal issues
for a long time. Heat sinks, spreaders, and fans are the most
common examples of package level techniques. Instead of
considering variations in the temperatures on the die, they
design the package to support the worst case specifications of

883

the design. They typically provide the user with the thermal
of the package, which is used to estimate the
resistance
using
junction temperature
(1)
is the ambient temperature, and
refers to the
where
total power consumed by the chip.
As designing the package for the worst case junction temperature started becoming too expensive, researchers started
looking at design level solutions to reduce the temperature. A
common example is dynamic thermal management (DTM),
where the design is run at a reduced power (and performance)
if the chip temperature increases beyond a previously set
threshold. Thermal sensors measure the temperature, and
power is reduced by lowering the clock frequency or the supply
voltage, and clock-gating [18].
Design level techniques can also aid in removing the heat generated by the design. For example, thermal-aware floorplanning
tries to reduce the hotspots on the die by distributing the temperature uniformly [19], [20]. Researchers have mostly focused
at microprocessors in these works. Thermal placement is a similar technique applied at the placement stage. Chen and Sapatnekar [21] proposed a partition-driven algorithm for standard
cell thermal placement. Thermal floorplanning and placement
are particularly attractive because they impact the performance
less than DTM.
On the modeling front, several researchers have developed
tools for estimating the die temperature. Among them, HotSpot
[22] is an architecture-level thermal simulator, which can perform transient as well as steady-state temperature estimation.
HS3d [23] is another architecture-level tool that performs only
steady-state temperature estimation, but is orders of magnitude faster than HotSpot. Both HS3d and HotSpot provide
the flexibility to set several package and die parameters, such
as the spreader thickness, package-to-air thermal resistance
, and substrate thickness. Since, in this work, we
look at only steady-state temperatures, we use HS3d.
Recently, some researchers have proposed solutions for
thermal issues in 3-D ICs too. Cong et al. [24] suggested a
thermal-driven floorplanning for 3-D. Goplen and Sapatnekar
[25] also proposed a temperature-driven placement algorithm
for 3-D standard cell application-specific integrated circuits
(ASICs). Studies have also indicated that careful insertion of
thermal vias can reduce the peak temperature [26], [27].
Thermal issues in FPGAs are relatively unexplored. Some
researchers have proposed the use of distributed sensors for
monitoring temperatures in FPGAs [28], [29]. They, however,
considered only configurable logic blocks (CLBs) in the fabric,
and consequently, observed very little temperature variations
across the die. In contrast, we focus on platform FPGAs,
containing embedded circuit blocks including high-speed transceivers, multipliers, delay-locked loops (DLLs), and memories
[6], [30]. Here, we first characterize the temperature distribution in a modern 2-D FPGA, and then observe how it changes
when we stack multiple such layers. Next, we propose changes
in the placement of hard blocks in the 3-D FPGA to reduce the
die temperature.

884

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 7, JULY 2008

TABLE I
VIA PROPERTIES

Fig. 1. Two kinds of stacking. (a) F2f. (b) F2b.

III. BACKGROUND

Fig. 2. 2-D switch boxes.


Universal.

X ,Y ,X ,Y

mark their sides. (a) Subset. (b)

A. 3-D Technology Overview


3-D chip design is a promising methodology to alleviate many
interconnect problems. Current state of the art chips are 2-D,
which means that they have only one plane of active layer that
contains all the devices. Note that although no transistor (device)
is stacked on top of other transistor (device), the metal wires interconnecting these devices typically span multiple layers, with
the higher layers occupied by global wires. 3-D ICs extend this
concept to the devices by stacking multiple device layers in the
vertical dimension. In this paper, we use face to refer to the
side of the wafer with top-most metal layer and back to the
opposite side.
Several technologies, such as beam recrystallization, silicon
epitaxial growth, processed wafer bonding, and solid phase
crystallization, enable the vertical integration of multiple device
layers [31]. Among these technologies, wafer bonding is particularly promising. It involves the bonding of two fully processed
wafers (on which the devices and interconnects have already
been fabricated). Since the individual wafers are fabricated
separately, it is possible to integrate completely different technologies, and have a very large number of layers. The inter-layer
vias in this technology can be as fine as 1 m 1 m at a 3 m
pitch [1]. The wafers can be bonded in two ways: face-to-face
(f2f) or face-to-back (f2b). In the former, a wafer is inverted
to bond with another wafer [see Fig. 1(a)]. This reduces the
area overhead of the inter-layer vias because they do not need
to pass through the silicon substrate. However, this limits the
number of layers to only two. The second way, face-to-back,
does not invert the wafer [see Fig. 1(b)]. Consequently, it can
integrate more than two layers of Silicon. However, since the
inter-layer vias now need to pass through the Si layer, they take
up die space. F2f alone is not compatible with flip-chip because
we need to keep the face side exposed for bumping. F2f with
through die via (TDV) may be compatible with flip-chip. In
this case, we use TDVs to bring signals outside the stacked-die
for bonding. Compared to f2f, f2b is more scalable (we can add
additional layers and use the same process). Note that these two
techniques can also be combined to use various combinations
of f2f and f2b layers, mixed with back-to-back (b2b) stacking
as well. In this study, we evaluate the f2f and f2b wafer-bonding
techniques for 3-D FPGA integration.

Since the wafer-bonding 3-D technology is still being perfected, several methods are being explored. These methods result in different via dimensions and wafer thicknesses. For this
study, we explore three different methods, which result in the
via dimensions shown in Table I. Via 1 reflects the process from
Tezzaron [1], which uses a wafer thickness of 10 m. Depending
on the process steps, we may need handle wafers to support the
thin wafers. For a two-layer f2f stack, we may be able to avoid
the handle wafers if we bond first and then thin the wafers. At
the other extreme is via 3 that uses 50 m wafers, which reflects
the process in [32]. A larger wafer thickness imparts mechanical strength to the wafers, and eliminates the need for handle
wafers. Via 2 reflects an intermediate process that we use to illustrate the trends due to via dimensions. Note that via length
is important only for f2b integration technology. An integration
technology from MIT uses silicon-on-insulator (SOI) wafers to
reduce the device layer thickness to less than a micrometer [33].
We do not model this technology in this work.
B. 2-D Switch Boxes
Our study will focus on island-style SRAM-based FPGAs.
FPGAs from Xilinx and Altera belong to this category. The
CLB consists of lookup tables (LUTs) and flip-flops (FFs).
Routing wires (tracks) and programmable switches constitute
the routing channel. Channel width refers to the number of
tracks in a channel. The CLBs connect to the channel through
connection boxes. The routing wires connect among themselves
through switch boxes.
Switch box topology refers to the connectivity provided by
the switch box. Researchers have explored several topologies
[34][38] (see Fig. 2). The subset (also called disjoint) topology,
used in Xilinx XC4000 FPGAs, connects tracks of the same
number in all four directions. This divides the channel into disjoint sets of tracks and a net uses the same track number for
its route. Universal topology provides more flexibility than disjoint. It facilitates connectivity for all possible global routes of
two-terminal nets.
Research has shown that the universal switch box results in
fewer tracks in the channel [39]. Hyper-universal switch boxes
provide even greater flexibility, and facilitate the connectivity

GAYASEN et al.: DESIGNING A 3-D FPGA: SWITCH BOX ARCHITECTURE AND THERMAL ISSUES

Fig. 3. 3-D FPGA.

for all possible global routes of k-terminal nets [40]. However,


they use more switches than universal switch boxes.
IV. 3-D DETAILED ROUTING ARCHITECTURE
We extend the island-style architecture of 2-D FPGAs to
3-D (see Fig. 3). The CLB consists of LUTs and FFs. The
switch box is modified to connect the inter-layer vias (ILVs)
to the horizontal wires (CHANX and CHANY), and also with
other ILVs. The ILVs form channels in the vertical direction
(CHANZ). The architecture is symmetric in the and directions, i.e., CHANX and CHANY contain the same number of
tracks. CHANZ, however, differs from CHANX and CHANY
in its width, which is influenced by the via density provided by
to refer to the number of vias
the 3-D technology. We use
for the horizontal channel
(i.e., vertical channel width) and
.
width. Fig. 3 shows the case when
CHANZ differs from CHANX and CHANY in another
respect too. The length of these vias depends on the wafer
thickness, which is typically much smaller than the average
2-D wire-length (e.g., wafer thickness 10 m for Tezzarons
150 m
process [1], length of a wire spanning 4 CLBs
in a 65-nm process). These differences between vertical and
horizontal channels must be accounted for to design a good
3-D FPGA. Next, we describe the various 3-D architectures we
explored. Where appropriate, we also discuss how technology
parameters influence our design.
A. Switch Box Topology
of a switch box (SB) refers to the number
The flexibility
of wires to which each incoming wire can connect. Previous
of 3 provides
studies have shown that for a 2-D FPGA, an
good routability [34]. In such SBs, a track connects to one track
on each of the other three sides of the SB. Subset and universal
topologies are examples of such SBs (see Fig. 2).
These 2-D SBs are extended to 3-D by adding two more faces,
which contain terminals for vertical wiresone for going up
and another for going down. Since the vias will be fewer than
the horizontal wires, the two vertical faces will contain fewer
terminals than the other four.

885

and
Fig. 4 shows the SBs we created for this study for
. Normally, the 3-D SB is visualized as a cube, where
each face of the cube represents one of the directions. However,
for ease of illustration, we have flattened the SB and shown it as
,
a hexagon, where each side represents a direction: North
, East
, West
, top
, or bottom
.
South
Furthermore, we show only the connections to the vertical faces
( and ). For all SBs, the horizontal wires (CHANX and
CHANY) use either the subset or universal connections among
themselves. These connections were described in Section II-B
and illustrated in Fig. 2. For clarity, we do not show the horizontal connections in Fig. 4. The first four SBs use subset connections among the horizontal wires, and the last two use universal. Fig. 4 also tabulates the connections from the vertical
refers to the th terminal on the
face of
faces, where
the SB. The Appendix formally describes the six SBs.
The first SB [subset, see Fig. 4(a)] is an extension of the 2-D
subset SB. This SB connects the same track number on all sides.
Consequently, the entire routing fabric gets divided into disjoint
subsets, and a net uses the same track number for its entire route.
Note that only the first of the horizontal wires connect to
the vias. While these wires have a flexibility of 5 (three connections to the other horizontal directions and two to the vertical
ones), the other wires connect to only horizontal tracks (flexi). Apart from decreasing the routing flexibility, this
bility
results in a difference in the capacitive loads of the horizontal
wires: large for the first wires and small for the rest.
The second SB [subset-split, see Fig. 4(b)] modifies the
subset SB by allowing the first horizontal tracks to connect
to the vias going above, and the last to those going below.
This implies that now there are twice as many horizontal wires
that connect to the vertical wires. Therefore, if nets do not
fan-out at the SB, then this SB provides greater flexibility to
the vertical directions. A limitation, however, is that the first
can only go above, and the last , only below. Consequently, if
a net needs to fan-out to both top and bottom, then it needs to
use two horizontal tracks (compared to one for subset). This SB
distributes the capacitive loads on the horizontal tracks more
evenly than the subset SB.
The subset-split SB, although more flexible than subset,
suffers from the disjoint property of subset SBs: the entire
routing fabric is divided into disjoint subsets and a net can
use only one of those subsets. This disjoint subset consists of
(where
vertical track and horizontal tracks and
). In order to improve upon this, we
modified the connections to the vertical faces as shown in
connects to track 1 on the side
Fig. 4(d). Now, terminal
, but track 0 on side
. This allows the net to switch tracks
at the SBs. We call this SB subset-twist.
The main objective of the subset-twist SB is to improve the
flexibility in the vertical direction. Another way to achieve this
is by adding more switches to the vertical facesthe approach
used by the next, subset-more SB [see Fig. 4(c)]. Here, the verterminals on
tical terminal connects to both and
). The extra
the horizontal faces (where
switches have a twofold effect. On the one hand, they improve
the flexibility in the vertical direction, and on the other, they increase the area of the SB and the capacitive loads on the wires.

886

Fig. 4. 3-D switch boxes for

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 7, JULY 2008

H = 4, V

= 2. (a) Subset. (b) Subset-split. (c) Subset-more. (d) Subset-twist. (e) Universal-twist. (f) Universal-more.

The next two switch boxes use universal connections among


the horizontal wires. The vertical connections in the universaltwist SB are identical to the subset-twist SB [see Fig. 4(e)]. However, due to universal connections among the horizontal wires, it
provides greater flexibility. The last SB, universal-more further
increases the flexibility by adding more switches to the vertical
connects to
faces. For example, in Fig. 4(f), track 0 on side
side. These extra switches imboth, tracks 1 and 3 on the
prove the flexibility in the vertical direction, but also increase
the area of the SB and the capacitive loads on the wires.
B. Experimentation
We modified VPR [41], an FPGA place and route tool available from University of Toronto, to model our 3-D FPGA architectures. We refer to this tool as 3-D VPR. It uses simulated annealing to place the logic blocks and then routes the nets using a
modified path-finder algorithm. Both placement and routing are
timing-driven, i.e., they try to reduce the delays of critical paths.
The 2-D placement algorithm of VPR optimized the following cost function:

where
is the number of nets in the design,
is the number of sink pins of net ,
is the estimated
delay from the source of net to sink number . For each net
,
, and
denote the and spans of its bounding
factor compensates for the fact that
box, respectively. The
the bounding box wire-length model underestimates the wiring
necessary to connect nets with more than three terminals. Its
value depends on the number of terminals of net .
and
are the average channel capacities in the - and
-directions respectively, over the bounding box of net . The
value of adjusts the weight given to congestion in the cost
function. The larger the value of , the more wiring in narrow
channels is penalized relative to wiring in wider channels. A
value of 1 has been previously found to work best, and is used
in this work.
, to reduce
To the 2-D cost function, we add a term,
the vertical span of the nets. This is similar to what was proposed
in [3], except that, similar to the congestion cost terms for - and
-directions, we incorporate congestion in

GAYASEN et al.: DESIGNING A 3-D FPGA: SWITCH BOX ARCHITECTURE AND THERMAL ISSUES

Fig. 5. Experimentation flow.

By varying the values for , , and for two of the benchmark


,
, and
to give the
designs, we found
smallest critical path delay. Hence, we use these values in all
our experimentation.
1) Architecture and Technology Parameters: The logic
blocks in our experiments consist of four four-input LUTs and
four FFs, with ten inputs and four outputs. All the inputs are
equivalent, and so are the outputs, that is, every input pin can
internally drive any LUT input. The pins are uniformly distributed around the sides of the CLB. Each output pin connects
to 25% of the tracks in the adjacent channel, and every input pin
connects to 60% of the adjacent tracks. All horizontal segments
(CHANX and CHANY) in the routing fabric span 1 CLB, and
are driven by tri-state buffers.
The vertical channel (CHANZ) has vias that transcend only
single layer. When these vias are very short (10 m), we use
minimum size pass transistor
switches to drive
them. However, for the case when they are 50 m high, we use a
5X tri-state buffer switch to drive them. In contrast, the buffers
driving the CHANX and CHANY segments are always 5X the
minimum, and consist of two stages.
We calculated the resistance and capacitance values for the
vias and horizontal wires by using the Predictive Technology
Model (PTM) [42]. The vias and wires are made using copper
in our target technology. Timing parameters for switches were
derived from Spice simulations using 65 nm BPTM.
We explored a spectrum of 3-D technologies: with the via
properties shown in Table I, number of layers varying from 2 to
5, and either f2f or f2b bonding technology. The finest vias of
1 m thickness are in line with Tezzarons process [1], while
the coarsest ones (of 5 m thickness) are reflecting the process
from [32]. A perfect alignment between layers is assumed.
2) Experimentation Flow: Fig. 5 shows the experimentation
flow. A design in blif format is packed into clusters (CLBs) of
4-LUTs using T-VPack. On the basis of the number of CLBs
in the design, 3-D VPR creates the smallest FPGA fabric that

887

would contain the design. It takes the number of layers as an


input, and finds the minimum square size of one layer, assuming
that all layers contain the same number of CLBs. The packed
netlist is then placed and routed using 3-D VPR to find the minimum number of vias for a large horizontal channel width (
for five layers). The router performs a binary search over the
number of vias to find the minimum value. Fixing the number
of vias to 130% of the minimum value, we reroute the design
to find the minimum possible channel width. Thus, this flow
gives priority to reducing the number of vias instead of channel
width, which makes sense because the vias take more area than
the horizontal wires. However, most FPGAs provide more than
the minimum number of channels to ensure good performance
for the worst case too. On similar lines, we add 30% to the
minimum via and channel-width numbers while evaluating the
FPGA. Using these values (which may be different for every design), we reroute the design to obtain the critical path delay of
the routed design. This flow is repeated for every switch-block
type for all the 20 MCNC benchmark designs.
3) Area Model: VPR estimates area by counting the number
of transistors in the fabric. This works because the 2-D FPGA
area is transistor-dominated. In the case of 3-D, however, we
must add the via areas to the transistor areas. The two types of
3-D integration technologies discussed in Section II-A need different area models. In the case of f2f bonding, the ILVs do not
pass through the Silicon (see Fig. 1). Consequently, they do not
take any die area. In contrast, the f2b bonding requires vias to
pass through the Silicon (through-vias). In this case, every via
consumes some Si area. We incorporate the area overhead of
these through-vias in our area estimates, as shown in the following:

The
and
numbers are estimated by
VPR modified for 3-D. The via area is calculated by counting
the number of vias and multiplying it by the area taken by a via.
While comparing the area of two architectures, we estimate the
total FPGA area and divide it by the number of CLBs in the
fabric to estimate the area per CLB. Thus, the area numbers in
Section IV include the area for one logic block (CLB) and the
routing resources (horizontal wires, switches, and vias) associated with it.
C. Results and Analysis
Here, we show the results for two extremes of 3-D integration: first, a simple stack of two layers and, second, a more aggressive stack of five layers. Together they capture the trends
seen by varying the number of layers in a 3-D FPGA. While
the two-layer FPGA can be fabricated using f2f or f2b wafer
bonding, the five-layer FPGA must be fabricated using f2b. For
all these technology points, we evaluate the effects of different
via dimensions shown in Table I. The metric we primarily look
at to evaluate an architecture is the area-delay product (ADP),

888

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 7, JULY 2008

TABLE II
EXPERIMENTAL RESULTS FOR UNIVERSAL-TWIST (65 nm, 3-m PITCH VIAS)

Fig. 6. Comparing 2-D and 3-D FPGAs.

because it is inversely proportional to the throughput of the device [43]. In all the figures in this section, we plot the geometric
means over 20 MCNC benchmarks.
The first step towards evaluating 3-D FPGAs is comparing
them with 2-D FPGAs. Fig. 6 shows the average area (per CLB),
delay, and ADP for 1, 2, and 5 layers in 65-nm technology. For
both 2 and 5 layers, it shows the results for the three via technologies of Table I. The key 2-layers-f2f-3 m in Fig. 6 refers
to the use of two device layers, stacked using f2f bonding with
vias at 3- m pitch (via 1 in Table I). Fig. 6 uses the same switch
box (universal-twist) for all cases.
The area is estimated as explained in Section III-B3. Note that
area reduces as we increase the number of layers or reduce the
pitch of the vias. The smallest area is obtained when five layers
are used with 3- m-pitch vias, in which case, the CLBs area is
only 84% of the single-layer case. Furthermore, we observe that
the area of the two-layer FPGA using f2f bonding remains constant with increasing via pitches. This happens because the vias
in this case are accommodated within the transistors footprint,
and the CLB area is determined by the transistors.
The critical path delay also reduces with increasing number
of layers (second set of bars in Fig. 6). The five-layer FPGA
with 5- m-pitch vias (best case) reduces the delay by 24.7%
compared with the single layer case and by 14% compared with
the two-layer case. This happens because interconnect lengths
(and hence delays) reduce as we increase the number of layers.
F2f and f2b technologies do not have any significant impact on
the delay.
The reduction of area and delay in 3-D combine to significantly reduce the area-delay product of the FPGA (third set
of bars in Fig. 6). The five-layer FPGA reduces the area-delay
product by 36% (for 3- m-pitch vias), while a two-layer
FPGA does so by about 20%, when compared to a single-layer
FPGA. These results justify the interest in 3-D FPGAs and
demonstrate that we can obtain significant improvements even
by the relatively simple integration of two FPGA layers. The
results also indicate that even by using the moderately aggressive 5- m-pitch vias, we can significantly improve upon 2-D
FPGAs. Table II tabulates some of the results for five-layer
FPGA using universal-twist switch box.
Now, we explore the different switch boxes to find which one
gives the best values for area, delay, and area-delay product.
Fig. 7 shows the results for five layers, using 65-nm process and
3- m-pitch vias (via 1 in Table I). The results for two layers

Fig. 7. Comparing the switch boxes for five-layer FPGA.

follow a similar trend. The first set of bars in Fig. 7 compare the
flexibility in the vertical direction of the various SBs by looking
at the minimum number of vias they take for the designs to route.
Observe that the universal-more type of SB provides the greatest
flexibility (minimum number of vias). In fact, it uses only 49%
of the vias needed by the subset SB. It also results in the minimum channel width among all the SBs. However, the total area
is determined by both, the vias and the number of transistors in
the fabric. Since universal-more uses extra switches to increase
flexibility, we observe that the total area taken by the FPGA
using universal-more SB is larger than that of the one with universal-twist SB. This indicates that the universal-twist SB provides greater flexibility per switch than the universal-more SB.
While the area metric reduced to 88% by using universaltwist SB instead of the subset SB, the critical path delay does
not show such a strong variation. This happens because the
timing-driven router of 3-D VPR gives less weight to congestion
for timing-critical nets, which implies that they almost always
take the shortest possible route. The smallest delay is obtained
for the subset-split case. Note that adding more switches to the
SB increases the delay, which is explained by the larger parasitic capacitances due to these switches. Because the variation

GAYASEN et al.: DESIGNING A 3-D FPGA: SWITCH BOX ARCHITECTURE AND THERMAL ISSUES

889

Fig. 8. Comparing the switch boxes for different via technologies for five-layer
FPGA.
Fig. 10. Thermal profile of 4VFX100.

Fig. 9. Comparing the switch boxes for different process nodes for five-layer
FPGA.

in delay is not much, the trend for area-delay product is similar


to that for area. The universal-twist offers the lowest area-delay
product, 91% of that for the subset SB.
Next, we explore how the via properties affect the choice of
SB for the five-layer FPGA. Fig. 8 compares the area-delay
product for different SBs for the three via technologies of
.
Table I. The -axis is labeled as
Intuitively, as the vias become larger, we will prefer the SB
that provides the minimum number of vias. Fig. 8 demonstrates
this trend. As vias become larger, the difference between the
area-delay products for universal-twist and universal-more
(which produces the minimum number of vias) reduces. This
happens because, as vias become larger, the area taken by
the vias starts dominating the total area. However, even for
10- m-pitch vias (the largest case), the universal-twist SB
continues to provide the smallest area-delay product.
We also look at the effect of technology scaling on the performance of our SBs in a five-layer FPGA (see Fig. 9). The vias are
assumed to remain at 3- m pitch while the CMOS technology
scales from 65 to 45 and 32 nm. Again, the universal-twist remains the best SB for all process nodes. Since the via dimensions remain constant among the different process nodes, the
area penalty due to through-vias increases as transistor dimensions shrink. Consequently, the universal-more SB (which gives
the minimum number of vias) improves as process scales. However, even for the 32-nm node, the universal-twist SB remains
the best from an area-delay product perspective.
V. THERMAL ISSUES IN 3-D FPGAS
Die temperature must be controlled because it impacts the
timing, leakage power, package design, and lifetime of the

device. Circuits run slower when they are hot, and their lifetime
reduces exponentially with increasing temperature. Besides,
plastic packages can only withstand relatively low temperatures. Furthermore, leakage power increases exponentially with
temperature, which can cause a thermal runaway.
All these factors have forced chip manufacturers to employ
techniques to control the die temperature. Section I described
some of these techniques.
Thermal issues in FPGAs are relatively unexplored. Some researchers have proposed the use of distributed sensors for monitoring temperatures in FPGAs [28], [29]. They, however, considered only CLBs in the fabric, and consequently, observed
very little temperature variations across the die. In contrast, we
focus on platform FPGAs, containing embedded circuit blocks
including high-speed transceivers, multipliers, DLLs, and memories [6], [30]. Here, we first characterize the temperature distribution in a modern 2-D FPGA, and then observe how it changes
when we stack multiple such layers. Next, we propose changes
in the placement of hard blocks in the 3-D FPGA to reduce the
die temperature.
A. Thermal-Characterization of FPGAs: 2-D to 3-D
Most modern FPGAs incorporate hard blocks in the fabric.
These blocks exhibit different power characteristics, leading to
variations in power densities within the chip. We calculated the
power numbers for blocks in a Virtex-4 FX100 device by using
Xilinx power spreadsheets and observed that the power densities
vary from 0.78 for the DSP blocks to 11.46 for the DCMs (see
Table III). Such a vast range results in large temperature variations within the FPGA die (see Fig. 10). The hotspots occur near
the MGTs and DCMs, which are about 14 C above the coolest
portions.
Table IV shows the temperatures for 3-D FPGAs consisting
of identical FPGA layers of 4VFX100. The temperatures were
estimated using HS3d [23] with the parameters listed in Table V.
value of 0.5 C/W reflects the thermal resistance
The
of a high-end package with a moderate heat sink. Note that, for
both 2-D and 3-D FPGAs, we used the same power numbers
for individual blocks (listed in Table III). This doubles the total
FPGA power for a two-layer FPGA compared to a single-layer
FPGA. This is a pessimistic estimate, because power consumption in the routing fabric is expected to reduce when we stack

890

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 7, JULY 2008

TABLE III
POWER DENSITIES IN 4VFX100 (FREQ: 500 MHz)

TABLE IV
EFFECT OF STACKING ON TEMPERATURE

Fig. 11. Effect of stacking on peak temperature.

TABLE V
PARAMETERS FOR TEMPERATURE ESTIMATION IN HS3D

for a four-layer FPGA. This large variation in temperature indicates that the peak temperature could be reduced by distributing
the hot blocks more evenly across the fabric. Interestingly, 3-D
technology parameters change the temperatures only minutely.
For a four-layer FPGA, layer thickness changes the peak temperature by up to 4.4 C, while thermal vias could decrease the
peak temperature by up to 3.4 C. Fig. 11 shows the effect of
stacking on temperature, as well as the possible variations because of 3-D technology parameters.
B. Thermal-Aware 3-D FPGA Organization

multiple layers (because of reduction in the number of switches


and their load capacitances). We estimated temperatures for two
extremes of 3-D technologies: one with very thin layers and
fine vias (Tezzarons process, Via 1 of Table I), and another
with 5- m vias and 50- m layers (Via 3 of Table I). For both
these technology nodes, we also varied the number of inter-layer
thermal vias between the two extremes of no thermal vias to the
maximum possible number of thermal vias. Table IV shows the
temperatures for these two corners along with a more realistic
number based on the via pitches in Table I.
As expected, the peak temperature increases with increase
in the number of layersfrom 89.4 C for a 2-D FPGA to
220.7 C for a four-layer FPGA using Tezzarons process. The
intra-package temperature variation also increases with increase
in the number of layers, from 14.4 C for a 2-D FPGA to 55.0 C

Recently, a study proposed alternate organizations for a 2-D


FPGA to reduce the intra-die temperature variations [44]. Using
a fully utilized Virtex-4 FX100 FPGA as an example, it demonstrated a reduction in peak die temperature of about 6 C. Since
temperature variation is larger in a 3-D FPGA, we would expect
thermal organization to have a greater impact. To demonstrate
this, we design a thermal-aware two-layer FPGA. For ease of
experimentation, we consider only four types of blocks in the
FPGA, namely, CLB, BRAM, DSP, and MGT. These blocks
consume the majority of the area in 4VFX100. The peak temperature for a 2-D FPGA containing these blocks is 86.9 C.
In the first case, we stack two identical such layers to form a
two-layer stacked FPGA [see Fig. 12(a)]. The peak temperature
for this FPGA is 128.5 C. Note that stacking the hot blocks significantly increases the power density, and therefore, the temperature. Hence, next, we keep all the MGTs, DSPs, and BRAMs
on a single layer. The second layer now consists only of CLBs
[see Fig. 12(b)]. This change in floorplan can be implemented
easily with the column-based modular architecture of Virtex-4
(ASMBL) [6]. This reduces the peak temperature to 112.9 C
(two-layer thermal in Table VI). The temperature variation also

GAYASEN et al.: DESIGNING A 3-D FPGA: SWITCH BOX ARCHITECTURE AND THERMAL ISSUES

891

Fig. 12. 3-D FPGA organizations. (a) Two-layer stacked. (b) Two-layer thermal.

TABLE VI
THERMAL-AWARE 3-D FPGA DESIGN

drops from 25.7 C for the stacked design to only 2.6 C for the
thermal-aware design.
In the previous experiments, the heat sink is attached closest
to the layer consuming the maximum power. Previous studies
have suggested that this should be preferred. In fact, researchers
have proposed thermal-aware 3-D floorplanning that tries to
place the hot blocks closer to the sink [24]. In order to see the
effect of sink placement, we attached it to the layer containing
only CLBs in the two-layer thermal organization. Table VI also
shows the temperature for this case (two-layer thermal inverted).
We observe that the temperature increases only very slightly because of this change. This happens because the vertical distances
are small compared to the horizontal dimensions of the FPGA.
VI. CONCLUSION
We demonstrated that 3-D FPGAs can provide significant advantages over 2-D by reducing the interconnect area and the
total area-delay product. The 3-D FPGA with five layers and
3- m-pitch vias reduces the area-delay product of a 2-D FPGA
by 36%. This number may increase even further with improvements in 3-D technology.
We designed and evaluated several switch boxes for 3-D
FPGAs and showed that the area-delay product depends
heavily on the switch box topology. In 65-nm technology, the
area-delay product for our universal-twist switch box is 15%
lower than that of the subset switch box for 5- m-pitch vias.
We further showed that the universal switch boxes become even
better with scaling process technology, as well as with larger
vias. However, adding more switches to the universal SB does
not provide any benefit.
Three-dimensional integration, however, increases the die
temperature. Our experiments indicate that the peak temperature for a four-layer FPGA could be 2.4 times that of a
single-layer FPGA. However, the large variation in temperature
within the 3-D package allows us to reorganize the 3-D FPGA
to reduce the peak temperature. For a two-layer FPGA, the peak
temperature reduced by 16 C when the design was altered to
create a more uniform temperature profile.

In this work, we used single-length segments. Most modern


FPGAs use a mixture of different length segments. Incorporating this into a 3-D FPGA forms part of future work. The performance of a 3-D FPGA could be further improved by using
direct vertical connections among neighboring CLBs. Furthermore, asymmetric 3-D interfaces could be used to mix f2f, f2b,
and b2b stacking.
APPENDIX
FORMAL DESCRIPTION OF 3-D SWITCH BOXES
For brevity, we show only the connections to the inter-layer
vias. The connections among horizontal wires is either subset
and
refer
(disjoint) or universal (see Fig. 2). Sides
,
,
, and
to the vertical faces of the 3-D SB.
refer to the horizontal faces. Horizontal channel width is referred to as nodes_per_chan. The number of inter-layer vias is
vias_per_chan.
SUBSET:
;
SUBSET_SPLIT:
if
if

then
then
;

else
;
end if
else if

then
;

end if
SUBSET_TWIST:
UNIVERSAL_TWIST:
then
if
if

then
;

else if

then
;
then
;

else if
end if
else if
if

then
then
;
then

else if
;
else if

then

892

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 7, JULY 2008

;
end if
end if
SUBSET_MORE:
//Works because its assumed that
if

then
then

if
;
else
;

;
end if
end if
UNIVERSAL_MORE:
//Works because its assumed that
if

then
then

if
;
;

then

else if

;
;
else
;
end if
else if
if

then
then
;
;
then

else if
;
;
else
;
end if
end if
REFERENCES

[1] S. Gupta, M. Hilbert, S. Hong, and R. Patti, Techniques for producing


3D ICs with high-density interconnect, Tezzaron Semiconductor,
Naperville, IL, 2005.
[2] ITRS, International technology roadmap for semiconductors, 2003
[Online]. Available: http://public.itrs.net
[3] C. Ababei, H. Mogal, and K. Bazargan, Three-dimensional place
and route for FPGAs, in Proc. Asia South-Pacific Des. Autom. Conf.,
Shanghai, China, 2005.
[4] A. Telikepalli, Designing for power budgets and effective thermal
management, Xcell J., vol. 56, no. 56, pp. 2427, 2006.
[5] Altera Corporation, San Jose, CA, Thermal management for 90-nm
FPGAs, Appl. Note 358, 2004.
[6] Xilinx, San Jose, CA, Xilinx products documentation, 2006 [Online].
Available: http://www.xilinx.com/literature
[7] A. Gayasen, N. Vijaykrishnan, M. Kandemir, and A. Rahman, Switch
box architectures for three-dimensional FPGAs, presented at the
Field-Program. Custom Comput. Mach. (FCCM) Napa Valley, CA,
Apr. 2006.

[8] A. J. Alexander, J. P. Cohoon, J. L. Colflesh, J. Karro, and G. Robins,


Three-dimensional field-programmable gate arrays, in Proc. Int.
ASIC Conf., 1995, pp. 253256.
[9] J. V. Campenhout, H. V. Marck, J. Depreitere, and J. Dambre, Optoelectronic FPGAs, IEEE J. Sel. Topics Quantum Electron., vol. 5, no.
2, pp. 306315, Mar./Apr. 1999.
[10] M. Leeser, W. M. Meleis, M. M. Vai, S. Chiricescu, W. Xu, and P.
M. Zavracky, Rothko: A three-dimensional FPGA, IEEE Des. Test
Comput., vol. 15, no. 1, pp. 1623, Jan./Mar. 1998.
[11] G. Borriello, C. Ebeling, S. A. Hauck, and S. Burns, The triptych
FPGA architecture, IEEE Trans. Very Large Scale Integr. (VLSI) Syst.,
vol. 3, no. 4, pp. 491501, Dec. 1995.
[12] S. Chiricescu, M. Leeser, and M. M. Vai, Design and analysis of a dynamically reconfigurable three-dimensional FPGA, IEEE Trans. Very
Large Scale Integr. (VLSI) Syst., vol. 9, no. 1, pp. 186196, Feb. 2001.
[13] M. Lin, A. E. Gamal, Y. Lu, and S. Wong, Performance benefits of
monolithically stacked 3D-FPGA, presented at the Int. Symp. Field
Program. Gate Arrays, Monterey, CA, 2006.
[14] A. Rahman, S. Das, A. Chandrakasan, and R. Reif, Wiring requirement and three-dimensional integration of field-programmable gate arrays, in Proc. Int. Workshop Syst.-Level Interconnect Prediction, 2001.
[15] Y.-S. Kwon, P. Lajevardi, A. Chandrakasan, F. Honore, and D. E.
Troxel, A 3-D FPGA wire resource prediction model validated using
a 3-D placement and routing tool, presented at the Int. Workshop
Syst.-Level Interconnect Prediction, San Francisco, CA, 2005.
[16] C. Ababei, Y. Feng, B. Goplen, H. Mogal, T. Zhang, K. Bazargan, and
S. Sapatnekar, Placement and routing in 3D integrated circuits, IEEE
Design Test, vol. 22, no. 6, pp. 520531, Nov./Dec. 2005.
[17] G.-M. Wu, M. Shyu, and Y.-W. Chang, Universal switch blocks for
three-dimensional FPGA design, in Proc. ACM/SIGDA Int. Symp.
Field-Programmable Gate Arrays, 1999.
[18] D. Brooks and M. Martonosi, Dynamic thermal management for highperformance microprocessors, presented at the 7th Int. Symp. HighPerf. Comput. Arch., Nuevo Leone, Mexico, 2001.
[19] K. Sankaranarayanan, S. Velusamy, M. Stan, and K. Skadron, A case
for thermal-aware floorplanning at the microarchitectural level, J.
Instruction-Level Parallelism, vol. 7, Oct. 2005 [Online]. Available:
http://www.jilp.org/vol7
[20] Y. Han, I. Koren, and C. A. Moritz, Temperature aware floorplanning, presented at the 2nd Workshop Temperature-Aware Comput.
Syst. (TACS-2), Madison, WI, Jun. 2005.
[21] G. Chen and S. Sapatnekar, Partition-driven standard cell thermal
placement, presented at the Int. Symp. Phys. Des., Monterey, CA,
2003.
[22] K. Skadron et al., Temperature-aware microarchitecture, presented
at the 30th Int. Symp. Comput. Arch. (ISCA), San Diego, CA, 2003.
[23] G. M. Link and N. Vijaykrishnan, Thermal trends in emerging technologies, presented at the Int. Symp. Quality Electron. Des. (ISQED),
San Jose, CA, 2006.
[24] J. Cong, J. Wei, and Y. Zhang, A thermal-driven floorplanning algorithm for 3D ICs, presented at the Int. Conf. Comput.-Aided Des., San
Jose, CA, Nov. 2004.
[25] B. Goplen and S. S. Sapatnekar, Efficient thermal placement of standard cells in 3D ICs using a force directed approach, presented at the
Int. Conf. Comput.-Aided Des., San Jose, CA, 2003.
[26] J. Cong and Y. Zhang, Thermal via planning for 3-D ICs, presented
at the Int. Conf. Comput.-Aided Des., San Jose, CA, Nov. 2005.
[27] B. Goplen and S. S. Sapatnekar, Thermal via placement in 3D ICs,
presented at the ACM Int. Symp. Phys. Des., San Francisco, CA, 2005.
[28] S. Lopez-Buedo, J. Garrido, and E. Boemo, Dynamically inserting,
operating, and eliminating thermal sensors of FPGA-based systems,
IEEE Trans. Components Packag. Technol. (CPM), vol. 25, no. 4, pp.
561566, Dec. 2002.
[29] S. Velusamy et al., Monitoring temperature in FPGA based SoCs,
presented at the Int. Conf. Comput. Des. (ICCD), San Jose, CA, 2005.
[30] Altera Corporation, San Jose, CA, Altera product datasheets, 2006
[Online]. Available: http://www.altera.com/literature
[31] K. Banerjee, S. J. Souri, P. Kapur, and K. C. Saraswat, 3-D ICs: A
novel chip design for improving deep submicron interconnect performance and systems-on-chip integration, Proc. IEEE, vol. 89, no. 5, pp.
602633, May 2001.
[32] Y. Yamaji et al., Thermal characterization of bare-die stacked modules with Cu through-vias, presented at the Electron. Components
Technol. Conf., Orlando, FL, 2001.
[33] C. S. Tan and R. Reif, Multi-layer silicon layer stacking based on
copper wafer bonding, Electrochem. Solid-State Lett., vol. 8, no. 6,
pp. G147G149, 2005.
[34] J. Rose and S. Brown, Flexibility of interconnection structures for
field-programmable gate arrays, IEEE J. Solid-State Circuits, vol. 26,
no. 3, pp. 277282, Mar. 1991.
[35] G. Lemieux, S. Brown, and D. Vranesic, On two-step routing for
FPGAs, in Proc. Int. Symp. Phys. Des., Napa Valley, CA, 1997.

GAYASEN et al.: DESIGNING A 3-D FPGA: SWITCH BOX ARCHITECTURE AND THERMAL ISSUES

[36] S. Wilton, Architecture and algorithms for field-programmable gate


arrays with embedded memory, Ph.D. dissertation, Dept. Elect.
Comput. Eng., Univ. Toronto, Toronto, ON, Canada, 1997.
[37] M. I. Masud and S. Wilton, A new switch block for segmented
FPGAs, presented at the Int. Workshop Field Program. Logic Appl.,
Glasgow, U.K., 1999.
[38] P. Hallschmid and S. Wilton, Detailed routing architectures for embedded programmable logic IP cores, presented at the ACM/SIGDA
Int. Symp. Field-Program. Gate Arrays, Monterey, CA, 2001.
[39] Y.-W. Chang, D. F. Wong, and C. K. Wong, Universal switch blocks
for FPGA design, ACM Trans. Des. Autom. Electron. Syst., vol. 1, no.
1, pp. 80101, Jan. 1996.
[40] H. Fan, J. Liu, Y.-L. Wu, and C.-C. Cheung, On optimum switch
box designs for 2-D FPGAs, presented at the 38th ACM/SIGDA Des.
Autom. Conf. (DAC), Las Vegas, NV, 2001.
[41] V. Betz and J. Rose, VPR: A new packing, placement and routing tool
for FPGA research, presented at the Int. Workshop Field-Program.
Logic Appl., London, U.K., 1997.
[42] Arizona State University, Tempe, Predictive technology model, [Online]. Available: http://www.eas.asu.edu/~ptm
[43] V. Betz, J. Rose, and A. Marquardt, Architecture and CAD for DeepSubmicron FPGAs. Norwell, MA: Kluwer, 1999.
[44] P. Sundararajan, A. Gayasen, N. Vijaykrishnan, and T. Tuan, Thermal
characterization and optimization in platform FPGAs, presented at the
Int. Conf. Comput.-Aided Des., San Jose, CA, Nov. 2006.

Aman Gayasen received the B.Tech. degree in


electrical engineering from Indian Institute of
Technology, Delhi, India, in 2001, and the Ph.D.
degree in computer engineering from Pennsylvania
State University, University Park, in 2006. His
Ph.D. dissertation was on the implications of future
technologies on the design of FPGAs.
He is a Senior R&D Engineer with Synopsys, Sunnyvale, CA. In the past, he has worked with Ikos Systems on behavioral synthesis for emulators. His research interests include reconfigurable devices and
systems, nanotechnology, and all aspects of EDA.

893

Vijaykrishnan Narayanan is a Professor with the


Department of Computer Science and Engineering
and Electrical Engineering with Pennsylvania State
University, University Park. His research interests
include the areas of energy-aware reliable systems,
embedded systems, on-chip networks, system design
using emerging technologies (such as 3-D and Nanotechnology) and computer architecture. For more
information, visit http://www.cse.psu.edu/~vijay.

Mahmut Kandemir (M03) received the Ph.D. degree in electrical engineering and computer science
from Syracuse University, Syracuse, NY, in 1999.
He is an Associate Professor with the Computer
Science and Engineering Department, Pennsylvania
State University, University Park. His main research
interests include optimizing compilers, I/O intensive
applications, and power-aware computing.
Dr. Kandemir is a member of the ACM.His
research is supported by NSF, DARPA, SRC and
MICROSOFT.

Arifur Rahman received the B.S. degree from


Polytechnic University, Brooklyn, NY, and the M.S.
and Ph.D. degrees from Massachusetts Institute of
Technology (MIT), Cambridge, all in electrical engineering. His Ph.D. dissertation was on performance
modeling of 3-D integrated circuits.
He is a Senior Member of technical staff at Xilinx
Research Laboratories, San Jose, CA. He has published more than 25 papers on this subject as well
as field programmable gate arrays and sensors. He is
an inventor with 8 patents granted and more than 25
patents pending.

You might also like