Professional Documents
Culture Documents
TABLE I
S TATE TABLE OF A DYNAMIC T ERNARY C ONTENT A DDRESSABLE
M EMORY (*: S AME B ITLINES AND S EARCHLINES )
Fig. 6. Port charge variation prior to and after over the refresh duration.
and process variation are similar as DRAM. Variable retention 2) Write/Refresh Control: The write/refresh control is
time does not incur a cell failure as the shortest retention time used for switching between regular write and sensed
is still larger. The charge stored in a cell is vulnerable to out data write during refresh. The active high refresh
coupling through data pattern dependency (DPD), which is sig- signal (REF) activates the upper TG to pass SO and SO
nificant due to the column read [column access strobe (CAS)] values [through (A) and (B) respectively]. The register
through SAs. But, a DTCAM matches across a word (row) on data for regular data write is denoted as “Reg”.
the contrary and storage data are never sensed directly. The 3) Write Driver: The driver used for writing the refreshed
charge sense during refresh is performed through decoupled data is same as the dateline-maskline driver shown
read transistors as shown in Fig. 4. The frequent search in a in Fig. 3. The output of the write/refresh control is
DTCAM affects the retention time due to associated global coupled together for SOs/Reg values and are provided
SL capacitance. A 10% safety measure is considered in the as the register value to the write driver. The write driver
search duration to maintain a reliable data retention. is partially illustrated in Fig. 7 as it is discussed earlier
in Section II. It can be observed from Fig. 6 that the
B. Refresh Module write control (W) signal is enabled during the whole
The timing of refresh duration and associated signal varia- refresh duration which writes the SO/SO values into the
tion are shown in Fig. 6. A readline precharge is executed in DTCAM cells.
every dynamic memories at the start of each refresh duration. Search drivers are disabled during the refresh and all match-
The improvement in refreshing time is achieved by performing lines are set at mismatch to avoid any false match. Use of the
read and write in alternate cycles during the refresh. Elabo- column sense amplifier leads to a small area overhead but
ration on the refresh scheme is made based on the structure the use is essential to provide a low search overhead. Sensed
shown in Fig. 7 and illustrated refresh timing diagram (Fig. 6). out values (SO and SO) are discharged after the refresh.
Primarily the structure contains three modules: 1) refresh Readlines (Q and Q) are also released to avoid any false
driver; 2) write/refresh control; 3) write driver which is also coupling through the refresh transistors (RC1 or RC2 ) during
shared during the regular write operation. The discussion on non-refresh states. The write signal is deselected at the end of
the refreshing is made on these for an 8-entry DTCAM. refresh duration which allows the regular CAM search.
1) Refresh driver: An active low read precharge sig-
nal (R_PRE) pulls-up both readlines (Q and Q) to VDD C. Estimation of Area and Refresh Overhead
as shown in Fig. 6. Depending upon the storage pattern Estimation of the area and refresh overhead is carried out
either RC1 or RC2 discharges one of the readlines’ value. for the DTCAM refresh module. The estimation takes the
The nMOS connected to nets N1 and N2 are scaled to module layout size and average interconnection routing space
have lower voltage threshold and the charge variations into consideration. The data provided here are normalized and
are sensed by the column sense amplifier [31] present can be taken as a standard for all DTCAMs of any matchline
in the refresh driver. The strengthen sensed out values size at any technology node.
(SO and SO) are then sent to the write/refresh control. The area overhead is the relative percentage of extra space
The values SO and SO can be visualized as data acquired by the refresh modules and associated routing space.
(D and D) values during regular write phase. The refresh Most of the interconnections required by refresh structure are
driver is precharged in pipeline for the subsequent reads shared by the DTCAM bank drivers and hence not considered
during writing of present row. in the estimation. The primary modules in the architecture
MISHRA et al.: LOW-OVERHEAD DYNAMIC TCAM WITH PIPELINED READ-RESTORE REFRESH SCHEME 1595
shown in Fig. 7 are: 1) a DTCAM cell; 2) write and word (M) dependent. The delay through the refresh driver and
search drivers; 3) row decoder; 4) matchline sense amplifier; write/refresh control decides the entry refresh time. The refresh
5) refresh driver and controller. The size dependency for a overhead is estimated based on worst case entry refresh for
M-word×N-bit DTCAM can be represented as: NAND and NOR MLs. The overhead can be calculated as:
DTCAM bank → M × N × 1-bit DTCAM cell (A) (1-entry refresh time) × Entries
OHR = × 100 % (2)
Search driver → N × 1-bit searchline driver (B) Refresh interval
Write driver → N × 1-bit dataline–maskline driver (C) Presented DTCAM structure has been designed using
M
Z
the predictive 45-nm CMOS technology. Replacing all
Row decoder → × M × 2 − input NAND (D) the 1-bit/1-entry modules’ area designed in 45-nm in the
M Equation (1), the area overhead can be re-structured as
Z=4
Matchline S/A → N × 1-entry MLSA (E)
[15N(M + 5) + 22M] − 40N
Refresh driver → N × 1-bit refresh driver (F) OHA (45 − nm) = 1 − (3)
[15N(M + 5) + 22M]
Write/refresh control → N × 2 × 1-bit controller (G)
The area overhead is almost independent of TCAM array
Most of these are dependent on the matchline size. Row size irrespective of NAND or NOR-type matchlines. Refresh
decoder is reliant on words/entries and the DTCAM bank overhead is dependent on it but has been minimized to a great
is array size dependent. The relative design space difference extent through the proposed refresh scheme.
between the refresh module with other can be written as:
OHA IV. R ESULTS AND P ERFORMANCE C OMPARISON
M The proposed structure has been implemented using the
[MNA + NB + NC + ( M )MD+NE]−(NF+2NG)
Z
generic process design kit (GPDK) 45-nm CMOS technology.
Z=4
= 1− An extensive performance comparison with relevant dynamic
M
MNA+NB + NC + ( M )MD + NE
Z CAMs (decoupled 4T [18] and 6T DCAM [21]) have been
Z=4 carried out with a reorganization for both NOR and NAND-type
M ML sensing to prove the efficacy of the proposed design.
N[(MA + B + C + E) − (F + 2G)] + ( M )MD
Z
Z=4
Arrays of 128×32-bit DCAM structures have been designed
= 1− (1) using the same technology for comparison at the same
M
N[(MA + B + C + E)] + ( M )MD
Z environment (PVT). Transistors with standard thresholds
Z=4 (0.36 and –0.4 V) and sizes (120/45 nm) except IV-C have
The overhead is mostly ML size dependent as concluded been used in all the compared designs for a legitimate analysis.
from Equation (1) which allows the designer to increase the The structures are scaled from 8-bit to 64-bit for testing their
entry size to the maximum. The time required for refresh is stability and integrity. Energy-delay analysis are made on
1596 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS, VOL. 65, NO. 5, MAY 2018
Fig. 8. DTCAM core cell layouts, refresh performance and NAND versus
NOR-ML scheme performance comparison.
Fig. 10. Power dissipation dependency over temperature variation.
Fig. 11. Power dissipation distribution comparison between (a) Static TCAM.
(b) Proposed dynamic TCAM.
Fig. 12. Power distribution and stability analysis. (a) Energy dissipation comparison over temperature variation. (b) Operational phase dependent power
distribution comparison. (c) Peak power analysis over process corner variation.
Fig. 14. Design sensitivity to supply voltage scaling from 1.2 to 0.6 V. (a) Peak power variation. (b) Energy dissipation. (c) Energy delay product variation.
Fig. 15. Variation of the matchline voltage and evaluation current over Fig. 16. Scattergram of the energy dissipation versus matchline delay
1000 runs of MC sampling method. on 1000 runs of MC sampling method.
TABLE III
TCAM C ORE C ELL P ERFORMANCE OF C OMPARED DYNAMIC CAMs
TABLE IV
F EATURE AND P ERFORMANCE C OMPARISON S UMMARY OF R EFERRED D ESIGNS
metric where the proposed DTCAM has least EfS among conventional refreshing. The proposed structure is compared
all the referred designs. A matchline delay of 0.65 ns is with other dynamic CAMs to prove the efficacy of its per-
acceptable considering the lower EDP metric. formance and it stands out to be the best performer with
The following conversion estimations inspired from [33] are both NAND as well as NOR-type core cells. Stability analysis
used to provide normalized energy dissipation and ML delay. were performed with all environment variations, transistor
It renders a fair comparison among the referred architectures sizes and TCAM macros. Low average change in the energy-
designed with any technology and tested at any supply voltage. delay for matchline size variation (8-bit to 64-bit) secures
2 the cascadability of proposed structure. With a low energy
Normalized EfS = EfS × ref.45 tech. × VDD
1
(6) dissipation of 0.583 fJ/bit/search, the DTCAM is competent
enough to perform in the low power TCAM search engine
Similarly, the normalized matchline delay can be estimated as
class.
VDD
Normalized MLD = MLD × ref.45 tech. × 1 (7)
R EFERENCES
Equation (7) is valid for both binary as well as ternary [1] V. Gaudet, “A survey and tutorial on contemporary aspects of multiple-
CAM with NAND-type structure. But when a NOR-type valued logic and its application to microelectronic circuits,” IEEE
ML scheme is used for TCAM [8], the ON-state pull-down Trans. Emerg. Sel. Topics Circuits Syst., vol. 6, no. 1, pp. 5–12,
Mar. 2016.
resistance (RML ) increases. Therefore, the Equation (7) for [2] R. Karam, R. Puri, S. Ghosh, and S. Bhunia, “Emerging trends in design
ternary CAMs with NOR matchline structure can be estimated and applications of memory-based computing and content-addressable
as memories,” Proc. IEEE, vol. 103, no. 8, pp. 1311–1330, Aug. 2015.
[3] Z. Ullah, K. Ilgon, and S. Baeg, “Hybrid partitioned SRAM-based
VDD ternary content addressable memory,” IEEE Trans. Circuits Syst. I, Reg.
Nor. MLD TNOR = 2 × MLD × ref.45 tech. × 1 (8) Papers, vol. 59, no. 12, pp. 2969–2979, Dec. 2012.
[4] C.-C. Wang, C.-H. Hsu, C.-C. Huang, and J.-H. Wu, “A self-disabled
Design presented in [30] has AND-type ML structure but with sensing technique for content-addressable memories,” IEEE Trans.
a binary CAM cell that results in the lower EDP. Referred Circuits Syst. II, Exp. Briefs, vol. 57, no. 1, pp. 31–35, Jan. 2010.
structures presented in [4] and [14] are proficient for high [5] A.-T. Do, S. Chen, Z.-H. Kong, and K. S. Yeo, “A high speed low power
CAM with a parity bit and power-gated ML sensing,” IEEE Trans. Very
density requirements but the proposed design excels them Large Scale Integr. (VLSI) Syst., vol. 21, no. 1, pp. 151–156, Jan. 2013.
considering the low energy-delay. [6] S. K. Maurya and L. T. Clark, “A dynamic longest prefix matching
content addressable memory for IP routing,” IEEE Trans. Very Large
Scale Integr. (VLSI) Syst., vol. 19, no. 6, pp. 963–972, Jun. 2011.
V. C ONCLUSION [7] S. Matsunaga et al., “Standby-power-free compact ternary content-
addressable memory cell chip using magnetic tunnel junction devices,”
A dynamic ternary CAM with low overhead refresh scheme Appl. Phys. Exp., vol. 2, no. 2, p. 023004, Feb. 2009.
is presented in this paper. As CMOS size shrinkage is [8] Y.-J. Chang, K.-L. Tsai, and H.-J. Tsai, “Low leakage TCAM for IP
inevitable in the modern information age, leakage becomes lookup using two-side self-gating,” IEEE Trans. Circuits Syst. I, Reg.
Papers, vol. 60, no. 6, pp. 1478–1486, Jun. 2013.
more pronounced and lessens the scope of high density TCAM [9] H. Jarollahi et al., “A nonvolatile associative memory-based context-
implementation. DCAMs certainly are the best alternatives driven search engine using 90 nm CMOS/MTJ-hybrid logic-in-memory
other than complex non-CMOS technologies to fill this gap architecture,” IEEE J. Emerg. Sel. Topics Circuits Syst., vol. 4, no. 4,
pp. 460–474, Dec. 2014.
but have issues of periodical cell refreshing. A pipelined [10] Y.-J. Chang and T.-C. Wu, “Master–Slave match line design for
read-restore refresh scheme is proposed which reduces both low-power content-addressable memory,” IEEE Trans. Very Large Scale
area and refresh overheads. It accommodates more number Integr. (VLSI) Syst., vol. 23, no. 9, pp. 1740–1749, Sep. 2015.
[11] Y.-J. Chang, “Using the dynamic power source technique to reduce
of searches per second while reducing the refresh energy TCAM leakage power,” IEEE Trans. Circuits Syst. II, Exp. Briefs,
dissipation by avoiding the extra equalization phase from vol. 57, no. 11, pp. 888–892, Nov. 2010.
MISHRA et al.: LOW-OVERHEAD DYNAMIC TCAM WITH PIPELINED READ-RESTORE REFRESH SCHEME 1601
[12] K. Noda, K. Matsui, K. Takeda, and N. Nakamura, “A loadless CMOS [32] N. Mohan, “Low-power high-performance ternary content addressable
four-transistor SRAM cell in a 0.18-μm logic technology,” IEEE Trans. memory circuits,” Ph.D. dissertation, Dept. Elect. Comput. Eng., Univ.
Electron Devices, vol. 48, no. 12, pp. 2851–2855, Dec. 2001. Waterloo, Waterloo, ON, Canada, 2006.
[13] I. Arsovski, T. Chandler, and A. Sheikholeslami, “A ternary content- [33] T.-S. Chen, D.-Y. Lee, T.-T. Liu, and A.-Y. Wu, “Dynamic reconfigurable
addressable memory (TCAM) based on 4T static storage and including ternary content addressable memory for openflow-compliant low-power
a current-race sensing scheme,” IEEE J. Solid-State Circuits, vol. 38, packet processing,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 63,
no. 1, pp. 155–158, Jan. 2003. no. 10, pp. 1661–1672, Oct. 2016.
[14] S. Mishra, T. V. Mahendra, and A. Dandapat, “A 9-T 833-MHz
1.72-fJ/bit/search quasi-static ternary fully associative cache tag with
selective matchline evaluation for wire speed applications,” IEEE Trans.
Circuits Syst. I, Reg. Papers, vol. 63, no. 11, pp. 1910–1920, Nov. 2016.
[15] L. Frontini, S. Shojaii, A. Stabile, and V. Liberali, “A new XOR-based Sandeep Mishra (M’14) received the B.Tech. and
content addressable memory architecture,” in Proc. 19th IEEE Int. Conf. M.Tech. degrees in electronics and communication
Electron., Circuits Syst. (ICECS), Dec. 2012, pp. 701–704. engineering from the Biju Patnaik University of
[16] J. P. Wade and C. G. Sodini, “Dynamic cross-coupled bit-line content Technology, Rourkela, India, in 2011 and 2013,
addressable memory cell for high-density arrays,” IEEE J. Solid-State respectively. He is currently pursuing the Ph.D.
Circuits, vol. SCC-22, no. 1, pp. 119–121, Feb. 1987. degree with the Department of Electronics and
[17] H. Noda et al., “A 143 MHz 1.1 W 4.5 Mb dynamic TCAM with Communication Engineering, National Institute of
hierarchical searching and shift redundancy architecture,” in IEEE Technology at Meghalaya, Shillong, India.
Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2004, His research interests include low-power memory
pp. 208–523. design, high-speed sense amplifier, and intelligent
[18] M. Chae, J.-W. Lee, and S. H. Hong, “Decoupled 4T dynamic CAM transportation system.
suitable for high density storage,” Electron. Lett., vol. 47, no. 7,
pp. 434–436, Mar. 2011.
[19] V. Vinogradov, J. Ha, C. Lee, A. Molnar, and S. H. Hong, “Dynamic
ternary cam for hardware search engine,” Electron. Lett., vol. 50, no. 4,
pp. 256–258, Feb. 2014.
[20] J. G. Delgado-Frias, J. Nyathi, and T. Sb, “Decoupled dynamic ternary Telajala Venkata Mahendra (M’16) received the
content addressable memories,” IEEE Trans. Circuits Syst. I, Reg. B.Tech. degree in electronics and communication
Papers, vol. 52, no. 10, pp. 2139–2147, Oct. 2005. engineering from JNTU, Kakinada, India, in 2013,
[21] S. Hanzawa, T. Sakata, K. Kajigaya, R. Takemura, and T. Kawahara, and the M.Tech. degree in VLSI design from the
“A large-scale and low-power CAM architecture featuring a one-hot- National Institute of Technology at Meghalaya,
spot block code for IP-address lookup in a network router,” IEEE Shillong, India, in 2016, where he is currently
J. Solid-State Circuits, vol. 40, no. 4, pp. 853–861, Apr. 2005. pursuing the Ph.D. degree with the Department of
[22] V. Lines et al., “66 MHz 2.3 M ternary dynamic content addressable Electronics and Communication Engineering.
memory,” in Proc. Rec. IEEE Int. Workshop Memory Technol., Des. His research interests include the design of low-
Test., Aug. 2000, pp. 101–105. power VLSI circuits, content addressable memories,
[23] H. Noda et al., “A cost-efficient high-performance dynamic TCAM volatile memories, and digital circuits.
with pipelined hierarchical searching and shift redundancy architecture,”
IEEE J. Solid-State Circuits, vol. 40, no. 1, pp. 245–253, Jan. 2005.
[24] Y. Riho and K. Nakazato, “Partial access mode: New method for
reducing power consumption of dynamic random access memory,”
IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 22, no. 7, Jyotishman Saikia received the B.Tech. degree in
pp. 1461–1469, Jul. 2014. electronics and telecommunication engineering from
[25] I. Bhati, M.-T. Chang, Z. Chishti, S. L. Lu, and B. Jacob, “DRAM KIIT University, Bhubaneswar, India, in 2016. He
refresh mechanisms, penalties, and trade-offs,” IEEE Trans. Comput., is currently an Assistant Project Engineer with the
vol. 65, no. 1, pp. 108–121, Jan. 2016. Department of Electronics and Electrical Engineer-
[26] Y.-H. Gong and S. Chung, “Exploiting refresh effect of DRAM read ing, IIT Guwahati, Amingaon, India.
operations: A practical approach to low-power refresh,” IEEE Trans. His research interests include the design of
Comput., vol. 65, no. 5, pp. 1507–1517, May 2016. memory systems, computer architecture, SoC/NoC,
[27] A. Teman, P. Meinerzhagen, R. Giterman, A. Fish, and A. Burg, “Replica secure hardware, and reconfigurable computing.
technique for adaptive refresh timing of gain-cell-embedded DRAM,”
IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 61, no. 4, pp. 259–263,
Apr. 2014.
[28] D. Hellkamp and K. Nepal, “Metallic tube-tolerant ternary dynamic
content-addressable memory based on carbon nanotube transistors,” IET
Micro Nano Lett., vol. 10, no. 4, pp. 209–212, Mar. 2015. Anup Dandapat (M’10–SM’15) received the Ph.D.
[29] S. Jeloka, N. B. Akesh, D. Sylvester, and D. Blaauw, “A 28 nm degree in digital VLSI design from Jadavpur Uni-
configurable memory (TCAM/BCAM/SRAM) using push-rule 6T bit versity, Kolkata, India, in 2008.
cell enabling logic-in-memory,” IEEE J. Solid-State Circuits, vol. 51, He is currently an Associate Professor with
no. 4, pp. 1009–1021, Apr. 2016. the Department of Electronics and Communication
[30] A. Agarwal et al., “A 128 × 128 b high-speed wide- and match-line Engineering, National Institute of Technology at
content addressable memory in 32 nm CMOS,” in Proc. ESSCIRC, Meghalaya, Shillong, India. He has authored or
Sep. 2011, pp. 83–86. co-authored over 50 national and international
[31] B. Wicht, T. Nirschl, and D. Schmitt-Landsiedel, “Yield and speed journal papers. His current research interests include
optimization of a latch-type voltage sense amplifier,” IEEE J. Solid-State low-power VLSI design, low-power memory design,
Circuits, vol. 39, no. 7, pp. 1148–1158, Jul. 2004. and low-power digital design.