Professional Documents
Culture Documents
ECE Dept.
Dept. of ECE
Univ. of Minnesota
Univ. of California, Santa Barbara
Minneapolis, MN 55455
Santa Barbara, CA 93106
satish,ababei,kia@ece.umn.edu wanggang,kastner@ece.ucsb.edu
ABSTRACT
N
W
E
E
S
S
Figure 1: SRAM-based Switch Box.
General Terms
Design, Performance, Experimentation
1. INTRODUCTION
Prohibitive ASIC mask costs and stringent time-to-market
windows have made FPGAs an attractive implementation platforms in recent years. However, circuits implemented on FPGAs are typically slower, occupy more area, and consume more
power than ASIC circuits [25]. The FPGA routing architecture
is the main culprit in making FPGAs worse than ASIC chips
in area, delay and power; a typical FPGA routing architecture
uses about 70-90% of the total transistors on the die [6].
A significant body of work from the past two decades focused
on switch box design and segmented routing architectures. The
basic idea is to use highly flexible switches where horizontal
and vertical tracks meet, to facilitate all possible connections
between the adjacent tracks. A sketch of the disjoint switch
box is shown in Figure 1.
Assuming all tracks have unit length, the disjoint switch box
(see Figure 1) can route a large subset of possible routing trees
to connect the terminals of a net. As a result, the overall
channel width will not be high. However, flexibility in routing
1
This work was supported in part by a grant from NSF under
contract CAREER CCF-0347891
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
FPGA05, February 2022, 2005, Monterey, California, USA.
Copyright 2005 ACM 1-59593-029-9/05/0002 ...$5.00.
21
Traditional architecture
definition
Joint pattern usage analysis
[Sections 3.1,3.2]
VPR
place
and route
Statistical
Pattern Finder
(VPF)
HARP
architecture design
[Section 3.3]
MCNC
Circuits
[Section 4]
Compare
delay, area, power
2. PRELIMINARIES
[Section 5]
3.
22
sink1
5
3
sb A
sourc e
CLB
(i,j+1)
CHANX(i,j,t)
CLB
(i,j)
CHANY
(i, j+1,t)
CLB
(i+1,j+1)
sw itc h box
(i,j,t)
CHANX(i+1,j,t)
CHANY
(i, j,t)
sb C
6
6
sb B
6
sink2
CLB
(i+1,j)
sink3
30
30
C: Multi-length
D: Unit-length
compare pattern distributions for different routing considerations on the same virtex style architecture
30
A: routability driven w/o bend cost
B: routability driven w/ bend cost
C: timing driven
25
30
C: multi-length
D: unit-length
Percentage
(%)
percentage(%)
Percentage
(%)
percentage(%)
35
25
2020
20
20
15
15
1010
10
10
A: r-driven,
no bend
cost
B: r-driven
bend cost
C: timingdriven
5
0
1111 0101 1001 0110 1010 0011 1100 1110 1011 0111 1101
1111 0101 1001 0110 1010 0011 1100 1110 1011 0111 1101
the source or a sink; (iii) Report this distance as the result of the
current direction; (iv) Take the minimum among all directions
as the pattern length of the current switch box. Following the
same example shown in Figure 3(b) and assuming the numbers
by the segments indicate the segment lengths, switch boxes sbA ,
sbB and sbC have pattern lengths 6, 2 and 5 respectively.
23
40
35
30
25
percentage(%)
percentage(%)
20
15
4
3
2
10
5
0
0
0
10
15
20
25
30
35
40
45
10
15
pattern length
25
30
35
40
45
20
20
pattern length
0.7
percentage(%)
percentage(%)
0.6
15
10
0.5
0.4
0.3
0.2
5
0.1
0
0
0
10
20
30
40
50
pattern length
60
70
80
90
6
pattern length
10
12
24
3
2
1
0
3
2
1
0
3
2
1
0
Source
0 1 2 3
0 1 2 3
Wire 3 Wire 4
3
2
1
0
3
2
1
0
out
Wire 4
2-LUT
Wire 3
out
in 1
in 2
Wire 2
Wire 1
0 1 2 3
Disjoint
0 1 2 3
Universal
0 1 2 3
Wire 1
Wilton
Wire 2
in 1
in 2
the previous section, we design a new switch box, which will include hard-wired routing patterns. This means that we remove
a certain number of programmable switches (derived from statistical analysis of the routing profiles of various circuits) from
the switch boxes and replace them with wires. The composition of these hard-wired patterns are chosen after careful analysis of the routing profiles, hence the effect on the routability
of circuits is minimized. We have experimentally verified the
minimum effect of HARP switches on the routing of circuits
(see Section 5).
Figure 7 shows some of the possible HARPs that can be
present inside the switch boxes. The hard-wired patterns are
shown using solid lines indicating that they are wires and not
programmable switches. The next section describes how the
routing tool is made aware of these patterns and how they are
exploited to reduce the delay, area and power dissipation.
Wire D
SB
SB
Wire A
Wire B
Wire C
Wire C
"H"
Wire B
Wire A
"L"
"T"
"+"
25
SB
SB
Unused
Dangling branch
Hard-wired T pattern
Hard-wired L pattern
Note that p and y can be combined into one switch configuration which makes two disjoint connections (one between
right and bottom segments, and the other between top and left
segments). The same is true with q and x. See Figure 13 for
We use the delay and area models in VPR and the power
model developed by [16] to estimate the circuit delay, total
area of the chip and the total power dissipation after inserting
the hard-wired switches. VPR uses an Elmore delay model to
estimate the delay of every net. In this model, pass transistors are represented as resistors and diffusion capacitances to
3
For example, the T shapes of Figure 13 eliminate one edge
from the side that T does not connect to.
26
Control
If "off"
Cdiff
Cdiff
Cdiff
Pass transistor
Rwire
wire
Cwire/2
Cwire/2
5. EXPERIMENTAL RESULTS
We inserted the hard-wired patterns in the switch boxes
and used a multi-segment routing architecture with routingsegments of lengths 1, 2, 6, and long lines (similar to the Virtex architecture) for all simulation experiments. HARPs were
not inserted on long lines, though. We placed and routed 10
MCNC circuit benchmarks of the VPR package on HARP architectures and report the results of circuit delay, area, leakage
power, total power dissipation and channel width.
We updated the delay look up tables used by the placement
tool of VPR to reflect delays of HARP connections. However,
this had only a marginal impact on the placement quality. The
reason is that these delay lookup tables are built assuming no
congestion is present, and hence the best routing resources for
4
Results for only 10 of the benchmarks are reported for want
of space. The others yield similar results
27
Circuit
misex3
alu4
apex4
ex5p
des
seq
apex2
spla
pdc
ex1010
Avg
Ratio
Delay (x108 )
Vtx
HARP
6.31
5.33
7.12
5.97
7.17
5.98
6.40
5.50
7.89
6.00
6.39
5.31
7.45
5.92
12.40
8.93
14.20
10.70
15.50
11.50
9.083
7.114
78.32%
Area (x106 )
Vtx
HARP
2.88
2.71
3.11
2.74
2.83
2.58
2.36
2.23
6.02
5.75
3.58
3.46
4.01
3.80
9.90
9.70
14.80
13.70
8.95
8.66
5.844
5.533
94.68%
Channel Width
Vtx
HARP
20
23
19
21
22
24
22
25
16
18
20
24
21
24
28
33
33
39
19
23
22
25.4
115.45%
Leakage power
Vtx
HARP
0.119
0.094
0.129
0.096
0.117
0.092
0.098
0.080
0.251
0.209
0.149
0.121
0.168
0.134
0.428
0.342
0.637
0.505
0.378
0.314
0.2474 0.1987
80.32%
Total power
Vtx
HARP
0.223
0.222
0.235
0.225
0.183
0.166
0.173
0.164
0.415
0.408
0.271
0.270
0.281
0.281
0.525
0.479
0.750
0.652
0.477
0.448
0.3533 0.3315
93.83%
Energy (x108 )
Vtx
HARP
1.407
1.183
1.673
1.343
1.312
0.993
1.107
0.902
3.274
2.448
1.732
1.433
2.093
1.663
6.51
4.278
10.65
6.976
7.393
5.152
3.715
2.637
70.98%
SB2
HARP
SW
SB1
HARP
SW
SB1
HARP
overlap
SB2
7.
6. RELATED WORK
There has been a lot of work on programmable architectures
to improve the performance of FPGAs. Modern FPGAs utilize
multi-length horizontal and vertical segments. Recently, there
has been a flurry of research in structured ASIC solutions [24],
which aim to provide a middle ground between ASICs and
FPGA. Tong et. al. [18], Jayakumar and Khatri [11], and
28
Circuit
misex3
alu4
apex4
ex5p
des
seq
apex2
spla
pdc
ex1010
Average
Ratio
Delay x(108 )
4.98
5.67
5.74
5.6
6.36
5.30
6.10
8.31
9.42
11.4
6.88
0.758
Area x(106 )
2.68
2.73
2.59
2.19
5.66
3.45
3.74
9.38
13.5
8.45
5.437
0.930
Channel Width
24
22
26
26
18
26
25
33
40
24
26
1.20
Leakage power
0.093
0.0101
0.093
0.062
0.188
0.121
0.114
0.314
0.490
0.299
0.188
0.760
Total power
0.226
0.228
0.168
0.157
0.388
0.268
0.270
0.465
0.645
0.429
0.324
0.917
Energy x(108 )
1.125
1.292
0.964
0.879
2.467
1.420
1.647
3.864
6.075
4.891
2.463
0.663
8. REFERENCES
[1] J. H. Anderson, F. Najm, and T. Tuan. Active Leakage
Power Optimization for FPGAs. In Proc. of
ACM/SIGDA International Symposium on Field
programmable gate arrays, 2004.
[2] V. Betz and J. Rose. VPR: A New Packing, Placement
and Routing Tool for FPGA Research. In International
Workshop on Field-programmable Logic and Applications,
1997.
[3] V. Betz, J. Rose, A. Marquardt. Architecture and CAD
for Deep-Submicron FPGAs. Kluwer Academic
Publishers, 1999.
[4] Y.-W. Chang, Y.-T. Chang, An Architecture-Driven
Metric for Simultaneous Placement and Global Routing
for FPGAs, In Proc. of ACM/IEEE Design Automation
Conference, 2000, pp. 567-572.
[5] Y.-W. Chang, D. F. Wong, Universal Switch Modules
for FPGA Design, ACM Trans. Design Automation of
Electronic Systems, Vol. 1, No. 1, Jan. 1996, pp. 80-101.
[6] A. DeHon, Reconfigurable architectures for
general-purpose computing, Ph.D. dissertation,
Massachusetts Institute of Technology, 1996.
[7] A. Gayasen, K. Lee, N. Vijaykrishnan, M. Kandemir,
M.J. Irwin, and T. Tuan A Dual-Vdd Low Power FPGA
Architecture. In Proceedings of International Conference
on Field Programmable Logic and Applications, 2004.
[8] HARP Architecture Generator and P&R Tool URL,
http://www.ece.umn.edu/users/kia/ (click on the
Download link)
[9] The International Techonlogy Road Map for
Semiconductors, 2003 Edition.
[10] M. Imran Masud. FPGA Routing Structures: A Novel
Switch Block and Depopulated interconnect Matrix
Architecture. M.A.Sc. Thesis, University of
British-Columbia, 1999.
[11] N. Jayakumar and S. P. Khatri, A METAL and VIA
Maskset Programmable VLSI Design Methodology using
PLAs, In Proceedings of International Conference on
Computer Aided Design, 2004.
29