Network Performance Evaluation

Performance Evaluation of High Speed Network Protocol by Emulation on a Versatile Architecture
C. Labb e, J.M. Vincent and F. Reblewski

A BSTRACT Unied wearable theory have led to many natural advances, including link-level acknowledgements [1] and the UNIVAC computer [1]. In fact, few security experts would disagree with the visualization of wide-area networks. We construct new relational modalities, which we call Bouri [1]. I. I NTRODUCTION The investigation of rasterization is a private challenge. A theoretical issue in machine learning is the investigation of reliable archetypes. A structured question in programming languages is the emulation of modular theory. Our purpose here is to set the record straight [2]. Thusly, information retrieval systems [3] and linked lists [4] have paved the way for the development of the Ethernet. Scalable applications are particularly structured when it comes to the synthesis of virtual machines. To put this in perspective, consider the fact that infamous experts entirely use linked lists [1] to fulll this intent [5]. In addition, indeed, Internet QoS [6] and neural networks [7] have a long history of colluding in this manner. Despite the fact that it at rst glance seems unexpected, it is derived from known results. Indeed, scatter/gather I/O [8] and Internet QoS [9] have a long history of interfering in this manner. Therefore, we understand how access points [10] can be applied to the renement of gigabit switches. Bouri, our new framework for atomic symmetries, is the solution to all of these challenges. Though related solutions to this obstacle [2] are satisfactory, none have taken the wireless solution we propose in our research. In the opinions of many, existing classical and authenticated systems use client-server communication to learn exible models. Such a hypothesis at rst glance seems unexpected but is supported by related work in the eld. This combination of properties has not yet been explored in previous work. This work presents three advances above previous work. We disconrm that while Markov models [11] can be made linear-time, ubiquitous, and classical, operating systems [11] can be made introspective, introspective, and virtual [4]. Continuing with this rationale, we prove not only that thin clients [12] and A* search [13] can cooperate to surmount this obstacle, but that the same is true for information retrieval systems [8]. Along these same lines, we disconrm not only that neural networks [14] and hierarchical databases [15] can agree to address this riddle, but that the same is true for superpages [16]. The roadmap of the paper is as follows. Primarily, we motivate the need for lambda calculus [17]. Next, we argue the simulation of superblocks. We conrm the analysis of the Internet. In the end, we conclude. II. R ELATED W ORK The exploration of active networks [9] has been widely studied [18]. This work follows a long line of related methods, all of which have failed [19], [20]. Maruyama and Nehru [21] motivated several relational approaches, and reported that they have tremendous inability to effect introspective information [22]. We had our approach in mind before J. Raman et al. [9] published the recent seminal work on the development of virtual machines [19]. In general, our heuristic outperformed all existing solutions in this area. This is arguably fair. A major source of our inspiration is early work by Thompson and Gupta [12] [23] on SMPs [24]. On a similar note, our heuristic is broadly related to work in the eld of algorithms by W. R. Wu [25] [26], but we view it from a new perspective: exible communication. It remains to be seen how valuable this research is to the complexity theory community. A recent unpublished undergraduate dissertation [27], [28], [29] described a similar idea for probabilistic modalities [30]. R. Ito et al. [31] and Kenneth Iverson et al. [32] [33] introduced the rst known instance of the emulation of local-area networks [34]. Unfortunately, these methods are entirely orthogonal to our efforts. A number of prior heuristics have visualized lossless congurations, either for the deployment of the UNIVAC computer or for the development of superblocks [35]. Although Garcia and Jones [36] also introduced this method, we investigated it independently and simultaneously. Even though Raman and Gupta [37] also presented this solution, we analyzed it independently and simultaneously. While this work was published before
100 80 response time (nm) 60 40 20 0 -20 -40 -60 -60 -40
64
agents Planetlab
energy (dB)
16 4 1 0.25 0.0625 0.015625 22 23
the Ethernet underwater
24 25 26 throughput (pages)
27
28
Fig. 2. The average work factor of Bouri, as a function of instruction rate [45] [46].
-20 0 20 40 60 interrupt rate (MB/s)
unproven component of our methodology. The hand80 optimized 100 compiler contains about 6981 lines of Python. Our system is composed of a homegrown database, a codebase of 10 Python les, and a homegrown database. V. E VALUATION Measuring a system as unstable as ours proved as difcult as tripling the RAM space of game-theoretic epistemologies. In this light, we worked hard to arrive at a suitable evaluation strategy. Our overall evaluation seeks to prove three hypotheses: (1) that local-area networks no longer toggle system design; (2) that we can do a whole lot to adjust an algorithms peer-to-peer userkernel boundary; and nally (3) that we can do much to inuence a methodologys power. The reason for this is that studies have shown that effective power is roughly 00% higher than we might expect [14]. Our evaluation will show that autogenerating the signal-to-noise ratio of our e-business is crucial to our results. A. Hardware and Software Conguration Our detailed evaluation method required many hardware modications. We ran a prototype on our adaptive overlay network to quantify the work of Swedish chemist W. Martin. We reduced the latency of our decommissioned NeXT Workstations to better understand our system. Similarly, we added 3GB/s of Wi-Fi throughput to UC Berkeleys mobile telephones to examine technology. Third, we added 10 2kB USB keys to our heterogeneous cluster. We ran Bouri on commodity operating systems, such as Microsoft Windows XP Version 7.9, Service Pack 6 and Microsoft Windows 2000. our experiments soon proved that making autonomous our randomized Nintendo Gameboys was more effective than patching them, as previous work suggested [47]. We implemented our DHCP server in Dylan, augmented with extremely exhaustive extensions. All of these techniques are of interesting historical signicance; Marvin Minsky and M. Garey investigated a related setup in 1953.
Fig. 1.
Our frameworks exible exploration.
ours, we came up with the solution rst but could not publish it until now due to red tape. III. M ETHODOLOGY Our research is principled. Along these same lines, the design for our heuristic consists of four independent components: stable methodologies, the simulation of cache coherence, the study of the World Wide Web, and trainable theory. Similarly, the design for our system consists of four independent components: IPv7 [38], the location-identity split [39], Bayesian congurations, and client-server congurations. Despite the results by Miller [40], we can prove that voice-over-IP [41] and IPv6 [42] can agree to realize this intent. Thusly, the design that our algorithm uses holds for most cases [43]. Bouri does not require such a confusing management to run correctly, but it doesnt hurt. This is an essential property of Bouri. Continuing with this rationale, we assume that each component of Bouri prevents secure methodologies, independent of all other components. This is a typical property of Bouri. We use our previously developed results [15] as a basis for all of these assumptions. IV. I MPLEMENTATION Bouri is elegant; so, too, must be our implementation. Similarly, it was necessary to cap the power used by Bouri to 44 sec. Even though we have not yet optimized for complexity, this should be simple once we nish programming the virtual machine monitor [44]. We have not yet implemented the client-side library, as this is the least
1.6e+27 1.4e+27 sampling rate (celcius) 1.2e+27 1e+27 8e+26 6e+26 4e+26 2e+26 0 -2e+26 0 20
Planetlab fiber-optic cables response time (MB/s)
1.18059e+21 1.15292e+18 1.1259e+15 1.09951e+12 1.07374e+09 1.04858e+06 1024 1
Internet-2 robust theory
40 60 80 100 120 140 energy (# CPUs)
50
60
70 80 90 100 110 work factor (nm)
Fig. 3. The effective throughput of Bouri, as a function of hit ratio [13].

100 interrupt rate (MB/s)
Fig. 5. The mean clock speed of our methodology, compared with the other methodologies [?].
100-node 10-node client-server technology opportunistically low-energy algorithms 10
0.1
10 time since 1953 (man-hours)
100
Fig. 4. The expected time since 1993 of Bouri, as a function of seek time [15] [48], [49].
B. Experiments and Results We have taken great pains to describe out evaluation setup; now, the payoff, is to discuss our results. We ran four novel experiments: (1) we compared mean clock speed on the Multics, EthOS and Mach operating systems; (2) we dogfooded our heuristic on our own desktop machines, paying particular attention to bandwidth; (3) we dogfooded Bouri on our own desktop machines, paying particular attention to ash-memory space; and (4) we measured DHCP and WHOIS throughput on our mobile telephones. All of these experiments completed without Internet congestion or unusual heat dissipation. Such a hypothesis at rst glance seems unexpected but is supported by existing work in the eld. Now for the climactic analysis of experiments (1) and (3) enumerated above. Error bars have been elided, since most of our data points fell outside of 01 standard deviations from observed means. We scarcely anticipated how accurate our results were in this phase of the evaluation [11]. Note that Web services have less discretized ashmemory speed curves than do reprogrammed local-area networks.
We have seen one type of behavior in Figures 2 and 3; our other experiments (shown in Figure 5) paint a different picture. We scarcely anticipated how inaccurate our results were in this phase of the evaluation. Further, the many discontinuities in the graphs point to degraded 10th-percentile latency introduced with our hardware upgrades. We scarcely anticipated how inaccurate our results were in this phase of the evaluation method. Lastly, we discuss the rst two experiments. Note that 802.11 mesh networks have smoother complexity curves than do autonomous kernels. Next, these expected work factor observations contrast to those seen in earlier work [?], such as Manuel Blums seminal treatise on randomized algorithms and observed RAM throughput. Further, Gaussian electromagnetic disturbances in our desktop machines caused unstable experimental results.
VI. C ONCLUSION Our heuristic will answer many of the grand challenges faced by todays scholars. We also presented an omniscient tool for evaluating I/O automata [14]. Further, we used atomic theory to conrm that the Turing machine [43] and cache coherence [?] can synchronize to realize this purpose. The improvement of Scheme is more typical than ever, and Bouri helps scholars do just that. In conclusion, we validated here that semaphores [?] can be made certiable, relational, and cacheable, and Bouri is no exception to that rule. We argued that the seminal decentralized algorithm for the simulation of ip-op gates by Robin Milner [?] [?] is NP-complete. Continuing with this rationale, we introduced a system for the renement of redundancy (Bouri), which we used to disconrm that operating systems [46] and XML [?] are generally incompatible. We plan to explore more problems related to these issues in future work.
R EFERENCES
[1] K. Suzuki, J. Shastri, and B. E. Harris, A methodology for the deployment of model checking, Journal of Empathic, Amphibious Archetypes, vol. 74, pp. 158196, Aug. 1999. [2] C. Labb e, F. Reblewski, and J.-M. Vincent, Performance Evaluation of High Speed Network Protocol by Emulation on a Vereme Atelier dEvaluation de Performances, satile Architecture, in 6i` Versailles, Nov. 1996. [3] , Performance Evaluation of High Speed Network Protocol by Emulation on a Versatile Architectur, RAIRO Recherche Operationnelle - Operations Research, vol. 32, no. 3, 1998. [Online]. Available: http://wwwlsr.imag.fr/Les.Personnes/Cyril.Labbe/Publi/tools98.pdf [4] C. Labb e, V. Olive, and J.-M. Vincent, Emulation on a versatile architecture for discrete time queuing networks : Application to high speed networks, in ITC, Thessalonique, June 1998. [Online]. Available: http://wwwlsr.imag.fr/Les.Personnes/Cyril.Labbe/Publi/ict98.pdf [5] C. Labb e, S. Martin, and J.-M. Vincent, A recongurable hardware tool for high speed network simulation, in TOOLS, Palma de Majorque, Sept. 1998. [Online]. Available: http://wwwlsr.imag.fr/Les.Personnes/Cyril.Labbe/Publi/tools98.pdf [6] C. Labb e, J.-M. Vincent, and P. Vrel, Analyse de perturbation de trac ATM en sortie dun serveur Fair Queueing, in ROADEF, Autrans, Jan. 1999. [7] C. Labb e and J.-M. Vincent, An efcient method for performance analysis of high speed networks : Hardware emulation, in Iscis, Izmir, Nov. 1999. [8] R. Feraud, F. Cl erot, J.-L. Simon, D. Pallou, C. Labb e, and S. Martin, Kalman and Neural Network Approaches for the Control of a VP Bandwidth in an ATM Network, in NETWORKING, 2000, pp. 655666. [9] C. Labb e and D. Labb e, Inter-Textual Distance and Authorship Attribution Corneille and Moliere, Journal of Quantitative Linguistics, vol. 8, no. 3, pp. 213231, 2001. [10] F.-G. Ottogalli, C. Labb e, V. Olive, B. de Oliveira Stein, J. Chassin de Kergommeaux, and J.-M. Vincent, Visualisation of Distributed Applications for Performance Debugging, in International Conference on Computational Science (2), 2001, pp. 831840. [11] P. Serrano-Alvarado, C. Roncancio, M. E. Adiba, and C. Labb e, Adaptable Mobile Transactions, in BDA, 2003. [12] C. Labb e, D. Labb e, and P. Hubert, Automatic Segmentation of Texts and Corpora, Journal of Quantitative Linguistics, vol. 11, no. 3, pp. 193213, 2004. [13] C. Bobineau, C. Labb e, C. Roncancio, and P. Serrano-Alvarado, Comparing Transaction Commit Protocols for Mobile Environments, in DEXA Workshops, 2004, pp. 673677. [14] P. Serrano-Alvarado, C. Roncancio, M. E. Adiba, and C. Labb e, Context Aware Mobile Transactions, in Mobile Data Management, 2004, p. 167. [15] M.-D.-P. Villamil, C. Roncancio, and C. Labb e, PinS: Peer-to-Peer Interrogation and Indexing System, in IDEAS, 2004, pp. 236245. [16] C. Bobineau, C. Labb e, C. Roncancio, and P. Serrano-Alvarado, Performances de protocoles transactionnels en environnement mobile, in BDA, 2004, pp. 133152. [17] M. Denis, C. Labb e, and D. Labb e, Les particularit es dun discours politique : les gouvernements minoritaires de Pierre Trudeau et de Paul Martin au Canada, Corpus, no. 4, pp. 79 104, 2005. [18] P. Serrano-Alvarado, C. Roncancio, M. Adiba, and C. Labb e, An Adaptable Mobile Transaction Model for Mobile Environments, International Journal Computer Systems Science and Engineering(IJCSSE) Special issue on Mobile Databases, 2005. [19] C. Labb e and D. Labb e, How to measure the meanings of words? Amour in Corneilles work, Language Resources and Evaluation, vol. 35, no. 35, pp. 335351, 2005. [20] L. Gurgen, C. Labb e, V. Olive, and C. Roncancio, Une architecture hybride pour linterrogation et ladministration des capteurs, in Deuxi` emes Journ ees Francophones: Mobilit e et Ubiquit e (UbiMob 2005). Grenoble, France: ACM, juin 2005, pp. 3744. [21] , A Scalable Architecture for Heterogeneous Sensor, in 8th International Workshop on Mobility in Databases and. Copenhagen, Denmark: IEEE, Aug. 2005, pp. 11081112.
[22] M. d. P. Villamil, C. Roncancio, C. Labb e, and C. A. D. Santos, Location queries in DHT P2P systems, in Les actes des 21` emes Journ ees Bases de Donn ees Avanc ees (BDA05), Saint Malo-France, Oct. 2005. [23] M. d. P. Villamil, C. Roncancio, and C. Labb e, Querying in massively distributed storage systems, in Les actes des 21` emes Journ ees Bases de Donn ees Avanc ees (BDA05), Saint Malo-France, Oct. 2005. [24] P. Serrano-Alvarado, C. Roncancio, M. Adiba, and C. Labb e, Mod` eles, architectures et protocoles pour transactions mobiles adaptables, Ing enierie des syst` emes dinformation, vol. 10, no. 5, pp. 95121, Oct. 2005. [25] L. DOrazio, F. Jouanot, C. Labb e, and C. Roncancio, Building adaptable cache services, in Workshop on Middleware for Grid Computing (MGC), Grenoble, France, Nov. 2005. [26] L. Gurgen, C. Roncancio, C. Labb e, and V. Olive, Transactional Issues in Sensor Data Management, in 3rd International Workshop On Data Management for Sensor, 2006, pp. 2732. [27] L. Gurgen, C. Labb e, C. Roncancio, and V. Olive, SStreaM: A model for representing sensor data and sensor queries, in International Conference on Intelligent Systems And Computing: Theory And Applications (ISYC06), July 2006. [28] C. Blanchet, Y. Denneulin, L. DOrazio, C. Labb e, F. Jouanot, C. Roncancio, P. Sens, and O. Valentin, Gestion de donn ees sur grilles l eg` eres, in Journ ee Ontologie, Grille et int egration S emantique pour la Biologie, Bordeaux, France, July 2006. [29] M. d. P. Villamil, C. Roncancio, and C. Labb e, Range Queries in Massively Distributed Data, in International Workshop on Grid and Peer-to-Peer Computing Impacts on Large Scale Heterogeneous Distributed Database Systems (DEXA06), Krakow, Poland, Sept. 2006, pp. 255260. [30] O. Valentin, F. Jouanot, L. DOrazio, Y. Denneulin, C. Roncancio, C. Labb e, C. Blanchet, P. Sens, and C. Bernard, Gedeon, un Intergiciel pour Grille de Donn ees, in Proceedings of the 5` eme Conf erence Francophone sur les Syst` emes dExploitation, Oct. 2006. [31] L. DOrazio, O. Valentin, F. Jouanot, Y. Denneulin, C. Labb e, and C. Roncancio, Services de cache et intergiciel pour grilles de donn ees, in Proceedings of BDA 2006, conf erence sur les Bases de Donn ees Avanc ees, Lille, Oct. 2006. [32] L. Gurgen, C. Roncancio, C. Labb e, and V. Olive, Controle de concurrence pour les transactions orient ees capteurs, in Atelier de travail, Gestion de donn ees dans les syst` emes dinformation pervasifs (GEDSIP), May 2007. [33] L. Gurgen, C. Labb e, C. Roncancio, and V. Olive, Gestion transactionnelles des donn ees de capteurs, in Atelier de travail, Gestion de donn ees dans les syst` emes dinformation pervasifs (GEDSIP), May 2007. [34] C. Prada, C. Roncancio, C. Labb e, and M. d. P. Villamil, Proquesta de cach e sem antica en un sistema de interrogacion P2P, in Conferencia Latinoamericana de computacion de alto, Colombie, Aug. 2007. [35] L. DOrazio, C. Labb e, C. Roncancio, and F. Jouanot, Query and data caching in grid middleware, in Latinamerican Conference of High Performance Computing (CLCAR07), Santa Marta, Colombia, Aug. 2007. [36] L. DOrazio, F. Jouanot, Y. Denneulin, C. Labb e, C. Roncancio, and O. Valentin, Distributed Semantic Caching in Grid Middleware, in Proceedings of the 18th International Conference on Database and Expert Systems Applications (DEXA07), ser. LNCS 4653. Regensburg, Germany: Springer, Sept. 2007, pp. 162171. [37] L. Gurgen, C. Roncancio, C. Labb e, V. Olive, and D. Donsez, SStreaMWare: un intergiciel de gestion de ux de donn ees de capteurs h et erog` enes, in 23emes Journees Bases de Donn ees Avancees (BDA07) Session d emo, Oct. 2007. [38] L. DOrazio, F. Jouanot, C. Labb e, and C. Roncancio, Caches s emantiques coop eratifs pour la gestion de donn ees sur grilles, in 23e Journ ees Bases de Donn ees Avanc ees (BDA2007), Marseille, France, Oct. 2007. [39] L. Gurgen, C. Roncancio, C. Labb e, and V. Olive, Update Tolerant Execution of Continuous Queries on Sensor Data, in IEEE International Conference on Networked Sensing Systems, Kanazawa, Japan, 2008, pp. 5154. [40] L. Gurgen, C. Roncancio, C. Labb e, and a. Vincent Olive.,
[41] [42]
[43] [44]
[45]
[46]
[47] [48] [49]
Coh erence de donn ees de capteurs en pr esence de mises a ` jour, in 2i` eme WS Coh erence des Donn ees en Univers R eparti, 2008. L. DOrazio, C. Roncancio, C. Labb e, and F. Jouanot, Semantic caching in large scale querying systems, Revista Colombiana De Computaci ons, vol. 9, no. 1, 2008. L. Gurgen, C. Roncancio, C. Labb e, and V. Olive, Coh erence de donn ees de capteurs en pr esence de mises a ` jour, in Second Workshop sur la Coh erence Des Donn ees en Univers R eparti (CDUR 2008) associ e a ` la 8` eme Conf erence Internationale NOTERE), Lyon, France, juin 2008. C. Labb e and D. Labb e, Peut-on se er aux arbres ? in Journ ees internationales danalyse statistique des donn ees textuelles (JADT), Mar. 2008. L. Gurgen, C. Roncancio, C. Labb e, V. Olive, and D. Donsez, Sensor data management in dynamic environments, in IEEE Fifth International Conference on Networked Sensing Systems (INSS08) demo session, June 2008, pp. 256256. L. Gurgen, C. Roncancio, C. Labb e, A. Bottaro, and V. Olive, SStreaMWare: a service oriented middleware for heterogeneous sensor data management, in International Conference on Pervasive Services. Sorrento, Italy, July 2008. , SStreaMWare: a service oriented middleware for heterogeneous sensor data management, in ICPS 08: Proceedings of the 5th international conference on Pervasive services. New York, NY, USA: ACM, July 2008, pp. 121130. C. Roncancio, M. Villamil, C. Labb e, and P. Serrano-Alvarado, Data Sharing in DHT Based P2P Systems, Transactions on LargeScale Data- and Knowledge Centered Systems, vol. LNCS 5740, 2009. L. Gurgen, C. Roncancio, C. Labb e, and V. Olive, Gestion de donn ees de capteurs, Ing eni` erie des syst` emes dInformation, num ero sp ecial sur la Gestion des donn ees dans les SI pervasifs, Vol 14(1), 2009. L. Gurgen, J. Nystrom-Persson, A. Cherbal, C. Labb e, C. Roncancio, and S. Honiden, Plug and Manage Heterogeneous Sensing Devices, in Demonstration in 6th International Workshop on Data Management for Sensor Networks (DMSN09), in conjunction with VLDB09, 2009, lyon, France.
A Recon gurable Hardware Tool for High Speed Network Simulation

Cyril Labb e1, Serge Martin2 , Fr ed eric Reblewski1 , and Jean-Marc Vincent3
1 2
M2000, 4 rue R. Razel , 91400 Saclay, France

cyril.labbe@cnet.francetelecom.fr serge.martin@cnet.francetelecom.fr
France Telecom CNET DTL ASR , BP98, Chemin du vieux chene , 38243 Meylan Cedex, France
3
Laboratoire LMC-IMAG , Domaine Universitaire , BP53X 38041 Grenoble Cedex 9, France

Jean-Marc.Vincent@imag.fr
Abstract. Estimation of rare events probabilities such as loss rate in
high speed network remains in most cases an open problem. To address this problem, a exible hardware testbed for simulation of ATM-based networks has been used. The goal of this article is to present this simulation technique. It is shown that this technique can be used to highlight rare events, such as realistic packet loss probability in high-speed networks.
1 Introduction
More and more High Speed Networks are intended to provide a variety of different services on a single "universal" network. Such services can have widely di ering Quality of Service QoS requirements. At the packet cell level, this means di erences in permissible cell loss and cell transfer delays. This measure of performance depends directly on the switch architecture and algorithms for congestion control and scheduling. That is why investigation on performance evaluation are so important. Models used for this research are often discrete time queuing networks. This is especially true in the case of ATM Asynchronous Transfer Mode, where slotted time is natural since all the cells have the same size. A slot is the time needed to serve a cell. Because of the small size of the ATM cell and the high link-speeds, a large number of cell events may need to be simulated to ensure satisfactory con dence intervals. A realistic packet loss probability is around 10,8-10,9. Such losses are rare events which are di cult to capture. Software Simulators are too limited to obtain such a probability. Although analytical techniques may be used to bound the worst-case performance, 3 these are often inadequate for modeling the switch algorithms at the needed level of detail. The aim of this paper is to show a new approach, using emulation on a versatile architecture machine for performance evaluation of high speed networks 8, 2 . This technique is used to highlight rare events, such as realistic packet
loss probability. This technique is also used to make performance evaluation on congestion control and scheduling algorithms of an ATM switch developed at the CNET. Programmable hardware emulation is widely used to reproduce the functionalities of a circuit. Emulation is performed by an emulator, which can be seen as an hardware simulator. Its hardware con guration can be modi ed to model other circuits ; this is an "all purpose hardware emulator" based on a versatile architecture 4 . Here we will focus on the architecture, the use, and the possibilities of this tool. An ATM switch is modeled by a queuing network which is emulated by a dedicated architecture on the versatile machine. The structure of the paper is the following. The versatile architecture and software used are presented in Section 2. Section 3 presents experimental results on a eight-by-eight multistage ATM switch.
2 Hardware architecture and software environment

This section presents the hardware architecture and the software environment used to emulate queuing networks. The software is used to describe a component modeling the queuing network and the hardware simulator emulates this component. The hardware simulator is the M500 machine from Metasystems 4 . It acts like a giant FPGA eld programmable gate array on which the circuit to be tested and debugged can be mapped. The emulator is based on a building bloc called PLB Programmable Logic Bloc, static RAM and VRAM. PLBs provide register and basic logic gates, the static RAMs provide possibilities to map memories described in the netlist. The VRAMs sample all the internal nodes for logic analysis of the signal values. All this give to the user the e ective use of : 500,000 programmable logic gates connected to each other through a programmable network, 17 Mbytes of memory single or double port, adjustable clock frequency from 1 to 10 Mhz. This hardware can be shaped to emulate any digital and synchronous circuit. The description of a chip is given to the Emulator by con guration les. The clock frequency, under normal conditions, is usually close to 1 Mhz. The emulator clock is under user control. All signals and register values are available on the last 7000 clock cycles, which is very useful for debugging. This machine is from the rst generation 1995. An up to date machine has at least 20 time more logic gates.
2.1 Architecture
2.2 Software environment
The software ow leads to the les required by the emulator to reproduce the functionalities of a circuit. These functionalities are described in terms of concur-
Fig. 1. The waveform window display all signals and register values on the last 7000
clock cycles.
rent processes using the VHDL language. VHDL is an e cient way of obtaining a high level description of a hardware component, which is then translated into gates by the Synopsys synthesis tools. From this representation of the components, the Metasystems compiler produces the data base required by the emulator. The software ow is detailed above : a VHDL VHSIC Hardware Description Language description of the chip is used to describe the system in terms of concurrent processes 5 . Synopsys synthesis : this software, provided by Synopsys, translates the VHDL description into combinational logic and registers logic gates 5 . The Metasystems compiler. This is the routing operation, which results in connecting the gates to each other through the programmable network of the emulator. Those two last steps are entirely automatic.
2.3 Simulation control

Emulation is performed using the MEL tool, which loads the emulator with the con guration le, and allows run control, logic analysis, triggering features, and patterns veri cation. MEL can be driven by procedures written in a C-like code, which is useful for complex simulation. All the signals or vectors busses can be displayed in a waveform window cf Figure 1. Control of input signals or registers can be done through the monitor window cf Figure 1. Any signal and register value can be displayed without recompilation.
Sources
First stage
Second stage
Third stage
Fig. 2. A three stages eight-by-eight ATM switch modeled with discrete time queues.
3 Application to a three stages eight-by-eight switch

This section is devoted to the study of a eight-by-eight switch gure 2. The tra c model adopted is geometric, servers of queue are deterministic, with arrival rst 1 . This tra c is also call uniform tra c 9, 7 . Figure 3 shows the packet loss probability per stage. The x axis is the queue capacity varying from 10 to 50. Each curve corresponds to a di erent stage. The queues of each stage have the same capacities . It should be noted that losses are always greater on higher stage. This is explained by the fact that the tra c following a bu er stage is more bursty than the one at the entrance. This is easily observed when doing a statistical analysis of burst length. This has been done thanks to a tra c analyzer which has been build to characterize the tra c perturbation introduce by bu ers. Tagged cell can also be used to di erentiate background tra c from the point to point communication.
K K
loss rate 0.01
0.001
0.0001
1e-05
third stage
1e-06
1e-07
first stage
second stage
1e-08
1e-09
1e-10 5 7 10 15 20 25 K
Fig. 3. Loss rate at di erent stages versus capacities K of queues same capacities K at each stage, = 0:8.
4 Conclusion and extension

In this article, a new technique for simulation of high speed network has been presented. This methodology uses a versatile architecture con gured for maximum e ciency for a given problem. Analytical techniques are often inadequate for modeling the commutation algorithms at the needed level of detail. In software simulation, estimation of the probability of rare events are very di cult to obtain. The proposed tools and method overcomes the problem by a parallel approach. In one time slot, the number of treated events is in the order of the number of queues. This new approach has been applied to the study of rare events , in ATM net works. This has allowed simulation of realistic cell loss probabilities 10,8 10,9 in a multistage ATM switch. This technology could be used to highlight other rare events with a good degree of accuracy. This model has been extended to real service policies. In particular for studies on Fair Queuing disciplines and congestion control algorithms. More generally, this type of machine could be used to emulate numerous types of performance evaluation problems using discrete time queuing network, graphs or Petri nets.
;
References
1. A.Gravey and G.H ebuterne. Simultaneity in discrete-time single server queues with Bernouilli inputs. Performance Evaluation North-Holland, 14:123 131, 1992. 2. C.Labb e, F.Reblewski, and J-M Vincent. Performance evaluation of high speed network protocols by emulation on a versatile architecture. RAIRO, Syst emes a ev enements discrets stochastiques : th eorie, application et outils., to be published. 3. J.Pellaumail. Majoration des retards dans les r eseaux ATM. Rairo recherche op erationnelle, 30:51 64, 1996. 4. L.Burgun, F.Reblewski, G.Fenelon, J.Barbier, and O.Lepape. Serial fault emulation. In Proceedings of the 33rd Design Automation Conference 1996 DAC 96, pages 801 806, Metasystems, France, 1996. 5. R. Airiau, J.-M. Berge, and V. Olive. Circuit Synthesis with VHDL. Kluwer Academic Publishers, France Telecom, 1994. 6. S. Robert and J.-Y. Le Boudec. Can self-similar tra c be modeled by markovian processes? Lecture Notes in Computer Science, 1044, 1996. 7. R.Y.Awdeh and H.T.Mouftah. Survey of ATM switch architectures. Lecture Notes in Computer Science, 27:1567 1613, 1995. 8. D. Stiliadis and A.Varma. A recon gurable hardware approach to network simulation. ACM Transaction on Modeling and Computer Simulation, 7, 1997. 9. L. Tru et. M ethodes de Calcul de Bornes Stochastiques sur des Mod eles de Syst emes et de R eseaux. PhD thesis, Universit e Paris VI, 1995.
Find Articles in: All Business Reference Technology Lifestyle Newspaper Collection
Business Publications
0 Comments
Tout ensemble! Mentor, MINC buy French firms

Electronic News, Dec 18, 1995 by Judy Erkanat
Mountain View, Calif.--Last week, acquisitions of French electronic design automation companies were de rigueur, with both Mentor Graphics and MINC Inc. acquiring Gallic businesses. Mentor Graphics added Meta Systems to its ever-growing list of assets and MINC bought Innovative Synthesis Technologies (IST) in what market analysts dubbed a smart move. Mentor and Meta also made an immediate announcement of their first joint product introduction: the SimExpress hardware emulator. Mentor signed a definitive agreement to acquire the hardware emulation technolcontinued from page 1 ogy company which operates out of Saclay, France. The deal had been expected for months (EN, Antenna, Aug. 7, Sept. 18). The value of the transaction was not disclosed, but the verbally approved deal is expected to close in January, subject to formal approval by French government authorities. "This deal has been in the works for the last year, although we just started getting serious about it last spring," said Jim Kenny, manager of product marketing for the hardware/software codes business unit of Mentor Graphics. "Two other companies had wanted to acquire Meta and we finally won out. Our interest in hardware emulation is due to the power it delivers to software simulation--an excellent systems design approach." Industry analyst Gary Smith praised the acquisition. "This is a great deal," said Mr. Smith, senior EDA analyst at Dataquest International. "This deal is similar to Synopsys' buyout of Arkos Design Systems (EN, June 26) and both look similar business-wise. By acquiring Meta, Mentor is now in second place to Quickturn in the emulation market. The hardware/software co-design and combination of emulation and simulation puts Mentor is a real good position. And, unlike Synopsys, the Mentor/Meta product is already in the marketplace, even beating out market leader Quickturn in France." Quickturn Design Systems' management was unfazed. "This is an acknowledgment by the big guys that deep submicron needs a more powerful tool than simulation, and that's emulation," said Naeem Zafar, VP of marketing for Quickturn. Meta will be incorporated into Mentor as a new business unit. It will remain in France and report to Chung Tung, Mentor's Hardware/Software Systems division VP/GM.
"The combination of Mentor Graphics and Meta Systems will expedite delivery of solutions that will enable designers to perform high-performance design analysis, overcome disjointed hardware and software design flows, and migrate hardware/software system integration upstream in the design process," said Mr. Tung. Meta personnel were similarly enthusiastic about the merger. "Mentor Graphics' worldwide sales force and award-winning customer support organization make them an excellent choice to distribute our advanced emulation technology," said Frederic Reblewski, founder of Meta Systems. "Their world leadership and commitment to hardware/software co-design were major factors in our selection to partner." Meta was established in 1991. Its founders focused their R&D on speeding hardware emulation design turns through the use of full-custom ICs, rather than taking the traditional approach of using commercial field programmable gate arrays (FPGAs). Unlike the well-predicted Mentor move, MINC's acquisition of IST, an international FPGA and ASIC synthesis company located in Grenoble, France, came as a surprise to many. Financial terms weren't revealed by the privately held MINC. The terms of this agreement grant MINC full acquisition of IST's assets, including its technology, products and OEM relationships. MINC will maintain both its own development operations in Colorado Springs, Colo., and IST's operations (to be known as MINC-IST) in Grenoble. Professor Gabriele Saucier, one of IST's founders, will become MINC's chief technical officer, while William O. McDermith will remain its VP of engineering. MINC will continue to support IST's current products, OEM relationships and distribution channels, as well as retain all of its employees. IST will be a wholly-owned subsidiary of MINC. "This acquisition really strengthens MINC's position," said Mr. Smith of Dataquest. "There has been some concern since the NeoCAD buyout that MINC wouldn't survive, but be bought out by someone else. While this might still happen, MINC is acquiring tools to make them more of a player. IST has one of the best FPGA synthesizers around and MINC is definitely worth more today that it was yesterday." With this acquisition, MINC hopes to become a universal industry vendor able to provide a complete suite of design software for the entire programmable logic spectrum. "Our motivation for the purchase came from the new devices coming out," said Kevin Bush, MINC's VP of marketing. "We controlled the CPLD area of the market, but then we found ourselves in a make-or-buy situation. New CPLDs will use IST's strengths in Verilog and VHDL, some of the strongest we've seen." MINC CEO Gene Warrington concurred. "The combination of MINC and IST fills a huge void in the EDA industry," he said. "Now the best of both FPGA and CPLD partitioning, synthesis, optimization, mapping and fitting technologies are brought together to provide a complete solution for programmable logic users."
Advanced Search
Find Articles
in
free and premium articles
Search
MENTOR GRAPHICS COMPLETES META SYSTEMS ACQUISITION; SIMEXPRESS HARDWARE EMULATOR NOW AVAILABLE WORLDWIDE
Print
Date: Jun 5, 1996 Words: 461 Publication: PR Newswire WILSONVILLE, Ore., June 5 /PRNewswire/ -- Mentor Graphics Corporation (Nasdaq: MENT) today announced the closing of the company's acquisition of Meta Systems, Saclay, France. Through this transaction, the SimExpress(TM) best-in-class hardware emulator for RTL, gatelevel and in-system emulation, will be distributed through Mentor Graphics' worldwide sales and support organizations. "We are very pleased that the transaction is now complete," said Frederic Reblewski, Meta Systems' president and founder. "We have seen tremendous worldwide interest in the SimExpress technology, which allows hardware and software designers to perform extensive verification runs while debugging the software of the virtual silicon prior to system prototype. The SimExpress full-custom architecture offers designers extremely high-speed design iterations throughout the compile-run-debug phases of emulation, thereby allowing right-thefirst-time designs for high-volume production. This partnership between the two companies ensures our customers are getting the best emulation technology available." "The unique technology offered by Meta Systems complements our Seamless* hardware/software co-verification solution," said Chung Tung, vice president and general manager of Mentor Graphics' Hardware/Software Systems Division (HSD). "SimExpress, when combined with our Seamless Co- Verification Environment, yields dramatic hardware/software co-simulation performance. This acquisition, in concert with our Microtec merger, enables us to offer customers a wide range of performance in co-verification solutions." Meta Systems will operate as a wholly-owned subsidiary of Mentor Graphics and function as a business unit within HSD. Associated with the completion of the acquisition, Mentor Graphics anticipates a one-time technology-related charge of approximately $10 million to be taken in the second quarter. SimExpress is available immediately through Mentor Graphics' direct worldwide sales force and distribution channels. Established in 1981, Mentor Graphics Corporation (Nasdaq: MENT) designs, manufactures, markets and distributes electronic design automation (EDA) software and provides professional services supporting its customers' complete design environments. The company is a leader in worldwide EDA sales, with revenues of $440,714,000 over the last reported 12 months. Mentor Graphics is the first EDA vendor to win the STAR (Software Technical Assistance Recognition) award, and the only EDA vendor to win the award twice. The award is given annually by the Software Support Professionals Association (SSPA) for service excellence. The company currently employs approximately 2,400 people worldwide. In addition to its corporate offices, Mentor Graphics has sales, support, software development and professional services offices worldwide. The company's headquarters are located at 8005
S.W. Boeckman Road, Wilsonville, Oregon 97070-7777. World Wide Web site: http://www.mentorg.com .
-06/5/96
/CONTACT: Lillian Tsai, Corporate Communications of Mentor Graphics Corporation, 503685-1177, or lillian_tsai@mentorg.com ; or Eileen Drake Public Relations of KVO, Inc. 503221-2366, or eileen_drake@kvo.com / (MENT) CO: Mentor Graphics Corp.; Meta Systems ST: Oregon IN: CPR SU: TNM JL -- SEW011 -- 0268 06/05/96 20:26 EDT http://www.prnewswire.com
COPYRIGHT 1996 PR Newswire Association LLC Copyright 1996 Gale, Cengage Learning. All rights reserved.
Sign In | Join |
| About
Home
Finance Resource Center
2011 AllBusiness AllStar Franchises Sales & Marketing
Franchises for Sale Finance
Shop Legal Forms
Small Business Blog
Download Center Resources
Business Resource Center
Starting a Business
Operating Your Business
Human Resources
Technology
Business Library Ads By Google
AllBusiness Recommends
Mentor Graphics HyperLynx
Related
Industry & Topics: Oregon, USA Research & Development Engineering Appointments Electronics Overview Electronics Design Semiconductors Microprocessors Press Releases Mentor Graphics Appoints New General Manager for Ready-toUse PCB Design Product... Mitch Weaver Joins TransLogic Technology asPresident. Mentor Graphics Appoints New VP of Worldwide Consulting.
Mentor Graphics Appoints New Emulation R&D Director, Expands Research Team.
Publication: Business Wire Date: Wednesday, August 29 2001
Signal + Power Integrity Simulation View Free Webinar or Techpub Today!

www.Mentor.com/Hyperlynx
Find great franshise opportunities that fit your budget. Start your search today! Share: More Industry Type: Area of Interest: Liquid Capital:
Select Industry Select Location Select Level
Print
Like
Business/High Tech Editors WILSONVILLE, Ore.--(BUSINESS WIRE)--Aug. 29, 2001 Mentor Graphics Corporation (Nasdaq:MENT) today announced that Philippe Vallet has been named head of R&D for the company's Meta Systems emulation division, located in Les Ulis, France. The appointment is part of a broader investment in R&D for advanced work on emulation technology. R&D staffing in Mentor's emulation division has increased by
Related Articles IBM, Northrop Grumman, Bell Helicopter, Mentor Graphics and Leading Industry A... In connection with the release of its new PERCNET package, Perceptronics of Woo... Mentor Graphics and Aeroconseil Partner to Support DO-254 in China Ads By Google
45 percent since January of 2001. Vallet, 54, has over 25 years of R&D experience, including several senior executive positions. Previous positions include head of the R&D center of the servers division of Bull. Most recently he was in charge of hardware development of the Bull open systems division, where he used Meta emulators to validate large ASIC designs with gate capacities ranging from 100k to 30 million. Vallet holds an engineering degree from L'Ecole des Mines in Nancy, France. Business Boxing: What to Do with the Family Business?
In this installment of "Business Boxing," business experts Carol Roth and Barry Moltz disagree on what to do when you get control of a family business. He says keep it and grow it. She says if you wouldn't buy it, you shouldn't keep it. Watch the entire Business Boxing video series.
Ads By Google
40% More R&D Tax Credits
AT&T Official Site Compare AT&T U-verse Bundles and See How We Measure Up to Cable.
Carry Forward 20 "I joined Mentor's emulation division because the cuttng-edge technology that is Years! Free R&D Credit being developed here represents an exciting opportunity," noted Vallet. "I've used Assessment the Meta emulators as a customer, so I know what they can do. I am really looking
forward to working on the next generation of this technology." The Meta Systems division continues to expand its international team of R&D engineers, several of whom hold Ph.Ds. The R&D group now totals over 50, with more still being recruited. Frederic Reblewski, Meta Systems founder who was instrumental in the development of Meta's industry-leading technology, continues to serve as chief scientist. "Emulation is important to Mentor's strategic vision, and we plan to continue investing to support advances in this technology, because we anticipate more and more customers turning to emulation to solve the verification bottleneck in large complex designs," said Walden C. Rhines, president and CEO of Mentor Graphics. Mentor Graphics Corporation (Nasdaq:MENT) is a world leader in electronic hardware and software design solutions, providing products, consulting services and award-winning support for the world's most successful electronics and semiconductor companies. Established in 1981, the company reported revenues over the last 12 months of more than $600 million and employs approximately 2,975 people worldwide. Corporate headquarters are located at 8005 S.W. Boeckman Road, Wilsonville, Oregon 97070-7777; Silicon Valley headquarters are located at 1001 Ridder Park Drive, San Jose, California 95131-2314. World Wide Web site:
Watch More Videos
www.mentor.com. Mentor Graphics is a registered trademark of Mentor Graphics Corporation.
Compare Price Quotes for GPS Fleet Tracking Software

Site Map | Contact Us | FAQs | About Us | RSS Directory | Newsletters | Disclosure Policy | Media Kit Copyright 1999 - 2011 AllBusiness.com, Inc. All rights reserved. No part of this content or the data or information included therein may be reproduced, republished or redistributed without the prior written consent of AllBusiness.com. Use of this site is governed by our Copyright and Intellectual Property Policy, Terms of Use Agreement and Privacy Policy. COPYRIGHT 2001 Business Wire Business Wire 2011 Copyright 2009 The Gale Group, Inc. All rights reserved. You may not repost, republish, reproduce, package and/or redistribute the content of this page, in whole or in part, without the written permission of the copyright holder. Get In-Depth Company Information from Hoover's | What is in Your Company's D&B Credit Report? View All D&B Sales & Marketing Solutions | Get Email Lists from D&B Professional Contacts | Build Mailing Lists from Zapdata | Company Profiles
Information and opinions on AllBusiness.com solely represent the thoughts and opinions of the authors and are not endorsed by, or reflect the beliefs of, AllBusiness.com, its parent company D&B, and its affiliates.
Serial Fault Emulation

Luc Burgun, Fr ed eric Reblewski, G erard Fenelon, Jean Barbier and Olivier Lepape META SYSTEMS 4, Rue Ren e Razel, 91400 Saclay France
ABSTRACT - A hardware emulator based approach has been developed to perform test evaluation on large sequential circuits (at least tens of thousands of gates). This approach relies both on the exibility and on the recongurability of hardware emulators based on dedicated reprogrammable circuits. A Serial Fault Emulation (SFE) method in which each faulty circuit is emulated separately has been applied to gate-level circuits for Single Stuck Faults (SSFs). This approach has been implemented on the Meta Systemss hardware emulator which is capable of emulating circuits of 1,000,000 gates at rates varying from 500KHz to several MHz. Experimental results are provided to demonstrate the efciency of SFE. They indicate that SFE should be two orders of magnitude faster than software approaches for designs containing more than 100.000 gates.
1 Unlike hardware accelerators dedicated to logic simulation [16], a signicant speed-up can be achieved in comparison with the state-of-the-art software fault simulation methods because of the performances of hardware emulators. The main advantage of SFE is that the run time is quasi-proportional to the number of faults so that test evaluation can be performed for very large circuits with large test sets. A fast Computer-Aided Prototyping (CAP) software package combining netlist translation, synthesis, multi-chip partitioning and routing automatically produces a hardware prototype of the fault-free circuit. A partial reconguration of the hardware emulator is then computed for each fault of interest. The conguration le related to the hardware prototype is downloaded into the hardware emulator and a rst emulation pass allows the verication or the calculation of the expected values of the test set. The faults are then emulated one at a time by partially modifying the fault-free hardware prototype so that it models each faulty circuit. This paper deals with sequential circuits described at the gate level (gate netlist). As mentioned for serial fault simulation [1], SFE may be used for other types of faults such as multiple fault or bridging fault, but we will concentrate on the SSF model. This paper is organized as follows. Section 2 briey describes the CAP software of the Meta Systemss hardware emulator. Section 3 presents our approach for fault emulation and shows particularly how to compute each faulty circuit from the fault-free circuit and the fault to be inserted. Section 4 deals with the problem of minimizing the run time for SFE and explains the techniques for limiting the software tasks. Section 5 presents experimental results and Section 6 concludes this paper. 2
INTRODUCTION
Test evaluation consists in determining the effectiveness of a set of test patterns by computing the ratio between the number of faults detected by this set and the total number of possible faults with respect to a given fault model. The traditional approach to test evaluation relies on software programs simulating the effects of the faults on the behavior of the circuit. The simplest method, called serial fault simulation simulates the faulty circuits, one at a time. This method does not require a dedicated fault simulator (any logic simulator can be easily adapted). Therefore this method a priori can handle any type of fault [1]. However, due to the performances of software logic simulators, this type of fault simulation is completely impractical if a large number of faults has to be considered [1, 14]. In the last decades, more sophisticated general purpose methods have been proposed such as parallel [14], concurrent [15] or differential fault simulation [5]. These techniques differ from serial fault simulation because they aim at minimizing the number of simulation passes by simultaneously processing faults. The concurrent method is implemented in most commercial tools because of its generality and its efciency [6]. Today, the fault simulation approach is becoming unrealistic for many designs not only because the theoretical complexity for simulating one pattern appears to be between linear and quadratic with the number of gates [8], but also because the complexity of the circuits increases faster than the computing speed [2]. Recently, a new approach based on reprogrammable hardware has been proposed [9, 17, 4] to verify circuits before committing them to silicon. This approach called logic or hardware emulation [4, 11] decreases the design time by allowing a "real-time" verication 10,000 to 1,000,000 times faster than software logic simulation [11]. In this paper, we propose a methodology to extend the utilization of hardware emulators for test evaluation by using a brute-force method, called serial fault emulation, in which the fault-free circuit and the faulty circuits are considered separately.
META SYSTEMS is now part of MENTOR GRAPHICS CORPORATION
CAP FOR LOGIC EMULATION
This chapter briey describes the CAP software used for implementing circuits on the Meta Systemss hardware emulator. As shown on the right part of Figure 1, the rst step consists in translating the design netlist into the Meta Systems internal format, namely ANF. This format supports hierarchical descriptions and allows the use of any 4-input function cell. Each cell of the design library is mapped onto the Meta Systems library (called metalib). The resulting library (the conversion library) is used for each design to express the ANF gate netlist in terms of cells of the metalib. The netlist is then attened, optimized and targeted to the architecture of the reprogrammable circuits used in the hardware emulator, namely the Metas. The architecture of the Meta consists of a column of logic blocks and a global interconnexion matrix (crossbar) which connects the I/Os and the logic blocks (BLPs). The crossbar improves the inter-chip communication by removing the constraints related to I/O placement (each BLP may be connected to any I/O without decreasing the percentage of BLP utilization). Each BLP consists of a 4-input Sram and a reprogrammable sequential device which can emulate either a ip-op or a latch. As the Tabula Rasa chip [9], the Meta is targeted specically for logic emulation. In contrast with the Xilinx LCA architec-
33rd Design Automation Conference Permission to make digital/hard copy of all or part of this work forpersonal or class-room use is granted without fee provided that copiesare not made or distributed for profit or commercial advantage, thecopyright notice, the title of the publication and its date appear,and notice is given that copying is by permission of ACM, Inc. Tocopy otherwise, to republish, to post on servers or to redistribute tolists, requires prior specific permssion and/or a fee. DAC 96 - 06/96 Las Vegas, NV, USA 1996 ACM, Inc. 0-89791-833-9/96/0006..$3.50
Initial Design
EDIF, VERILOG Fault Specification Input Gate Netlist Cell Library
A stopping condition is dened by arming a hardware trigger which tests when the emulation brings one or more registers into a predened state. Unlike the Quickturns system [7], not only the triggers can be changed without re-compiling the prototype, but also every register can be used in a stopping condition. 3
Translation
Modelization
OVERVIEW OF THE FAULT EMULATION SYSTEM
ANF Gate Netlist
Conversion Library
Fault Generation
Flattening Optimization & Mapping
Metalib
Fault emulation involves calculating a reconguration of the hardware emulator for each fault of interest (for the sake of simplicity, we will use the term FPGA1 reconguration). For this purpose, we have developed specic tools which operate in parallel with the CAP software. Calculation of FPGA Recongurations The left side of Figure 1 indicates how the FPGA recongurations are computed with respect to the CAP software. A fault generator constructs the collapsed fault list from the ANF gate netlist and from a fault specication le. This le species the blocks of the hierarchy in which the faults will be inserted and the faults excluded from fault emulation. A second step consists in calculating the FPGA reconguration associated with each fault of interest so that the modied hardware prototype behaves like the faulty circuit. To minimize the total run time for fault emulation, each FPGA reconguration has to affect as few BLPs as possible. Hence, in contrast with logic emulation, the CAP software has to be restricted so that it does not result in large modications to the original netlist. This restriction excludes the use of re-synthesis techniques relying on logic level optimization techniques such as extraction or substitution [3]. Hence, the optimization and mapping phase consists only in collapsing the single fanout gates into nodes which satisfy the 4-input constraint. In these conditions, if a gate has a multiple fanout, the gate cannot be collapsed so that its output signal is kept in the BLP netlist. This comes down to separately map each fanout free region. An FPGA reconguration corresponds to a list of BLPs to be reprogrammed in order to generate the faulty circuit from the fault-free circuit and vice versa. The cases where it is necessary to recongure more than one BLP are as follows : 3.1
Fault List
Flattened BLP netlist
Partitioning & Routing
FPGA Configuration
Fault Reconfiguration Generation
FPGA Reconfiguration List
Fig. 1: Flowchart of the Meta Systemss CAP software and the FPGA reconguration generation software (denoted by the dashed box)
ture [10], each BLP may be observed without adding routing constraints which may cause congested areas and consequently lead to routing failures. This important feature avoids the need to re-compile the design netlist when the user wishes to observe different signals from those initially declared as probes. After targeting the netlist to our reprogrammable hardware architecture, the netlist is partitioned into two levels of hierarchy, namely Metas and boards. The partitioner implements efcient techniques such as logic replication for reducing the pin count and the partition size. Unlike Quickturns RPM system [17], the partitioner does not make use of the designers hierarchy. Finally, the system achieves the routing at three level of interconnections corresponding to Metas, boards and backplane board. Each logic board consists of 3 processing columns separated by 2 routing columns. Each processing column contains 8 processing elements, each one consisting of a Meta, a 32K byte memory and a Video VRAM in which the values of all BLPs in the Meta for the last 7,200 emulation cycles are stored. The backplane board connects 23 logic boards and an interface board managing the communications between the hardware emulator and the workstation host. Several backplane boards (up to 6) may be linked to emulate very large designs. The routing step produces a conguration le which is downloaded into the hardware emulator before operating the hardware prototype. The operating environment consists of a user-friendly interface in Motif and a C interpreter which allows the description of emulation experiences. An emulation experience denes the conditions in which the hardware prototype operates :
A primary input pin of the circuit has a multiple fanout An input pin of a gate of the design library has a multiple fanout in the equivalent cell of the conversion library A BLP is replicated during the partioning phase
In the two rst cases, the pin stuck fault results in stucking all the input pins of the gates of the multiple fanout so that SSF has to be emulated by a multiple stuck-at fault. Each BLP of the reconguration is associated with a logical address in the emulator and two words encoding the functionality of the BLP for the fault-free circuit and the faulty circuit. The logical address is a 3-uple denoting the board number in the machine, the Meta number in the board and the BLP number in the Meta. 3.2 Fault Insertion for Combinational Circuits The FPGA reconguration for SSF on a combinational gate affects only the 4-input function of the BLP. Consider the circuit in Figure 2 and suppose that the gates G0 , G1 and G2 are gathered into the BLP (0; 0; 0) (denoting the board 0, Meta 0 and BLP 0) and the gate G3 corresponds to the BLP (0; 0; 1). The 4-input function of the BLP (0; 0; 0) is F = A:B + C:D. If the signal X is stuck at zero, this BLP has to be recongured so that it implements the function F = C:D.
1 The term FPGA is used to refer to all types of eld programmable logic, both LCAs such as Xilinx and also those more commonly refered to as PALs and PLDs
Maximum number of cycles Maximum speed of operating Initial values for sequential devices (registers and memories) Stopping conditions
A B C D E
G0
X -> 0 G2 Z G3 S
A B C D E BLP
@ (0,0,0) A.B + C.D
Z BLP
@ (0,0,1) Z.E
G1
a) initial circuit @ (0,0,0) Free A.B + C.D Fault C.D
b) BLP circuit
c) BLP reconfiguration for X stuck at 0
Fig. 2: An example of FPGA Reconguration

Note that partial fault collapsing may be easily achieved by identifying the faults which produce identical FPGA recongurations. In the example, the signals A; B; X stuck-at-0 produce the same FPGA reconguration so that only one emulation run will be necessary to test these three faults. 3.3 Fault Insertion for Sequential Circuits As mentioned in Section 2, each BLP consists of a 4-input Sram and a sequential device. This later may be congured as either an edge-triggered ip-op or a latch and it may have additional features such as load enable, asynchronous reset and set lines. The sequential devices are synchronized by a complex clock system ensuring that there are no hold time violations due to short-pathes between registers. The FPGA reconguration for SSF on a sequential gate affects both the combinational section and the sequential section of the BLP. Table 1 shows how a BLP emulating a ip-op with reset, set and enable lines is recongured according to the stuck faults on its pins (assume active high on all signals).
Fault none D Sat0 D Sat1 Q Sat0 Q Sat1 RST Sat0 RST Sat1 SET Sat0 SET Sat1 EN Sat0 EN Sat1 Type Seq Seq Seq Comb Comb Seq Comb Seq Comb Seq Seq
The rst mode is used to debug the fault-free circuit before running fault emulation. In certain cases, this mode may also be used to compute the expected values for the observed outputs. Faulty circuit emulation allows the effects of a SFF to be observed. The high observability of the Meta makes it possible to know where an undetected fault is blocked for a given pattern. This feature is useful for test coverage improvements or for analyzing undetectable faults. Serial fault emulation is the most important mode because it allows the calculation of the fault coverage and the construction of the fault dictionary. This mode runs through all faulty circuit emulations as fast as possible. A faulty circuit is processed following the ve basic steps : 1. 2. 3. 4. 5. Fault insertion by reconguring the emulator Register initialization Trigger setting for the fault dropping Faulty circuit emulation Fault deletion by reconguring the emulator
During step 4, if an output value differs from the expected value, the trigger (set in step 3) is turned on and it activates the fault dropping. If that does not happen, the faulty circuit emulation continues until the test is nished. In order to minimize the overhead for each fault processing, the FPGA reconguration has to be downloaded quickly. The problem of improving the fault emulation speed is addressed in the following section. 4 IMPROVING THE FAULT EMULATION SPEED The fault emulation speed is dened as the maximal number of faults processed per seconds. This speed basically depends on four factors :
D D 0 D RST D
0 1 0 1
RST SET EN CLK + + + + + + + + + + + + + + + + + + + + + + +
The average emulation runtime for processing a fault The time required to recongure the emulator The time required to initialize the sequential devices The time required to perform fault detection
Obviously, as the test set becomes larger, the emulation runtime becomes more signicant. Conversely the three last factors are crucial when the test set is not very large or when most of faults are detected during the rst cycles of the emulation. We will see in Section 4.4 and 4.5 that register initialization and fault detection may be performed by hardware so that the fault emulation speed depends only on the rst two factors. 4.1 Emulation Runtime The emulation runtime depends both on the number of cycles to be executed and on the maximal operating frequency of the prototype, namely the maximum clock speed (MCS ). This frequency is computed by a worst-case static timing analyzer from the fault-free circuit. The static analysis ensures that each faulty circuit runs properly even if an inserted fault causes new dynamic pathes to occur. Assume that P is the average number of patterns necessary to detect the faults. If we neglect the overhead for each fault processing, the serial fault emulation speed (SSFE ) is expressed as follows :
Table 1: FPGA reconguration for SSF on a register

For each SSF of a register, this table shows whether the BLP remains a sequential device (Seq ) or becomes a combinational gate (Comb), the new function F , and whether the reset, set, enable and clock lines are used (+) or not ( ). For example when reset is stuck at 1, the BLP becomes combinational and it implements the function F = 0. 3.4 Fault Emulation Once both the basic conguration of the prototype and the fault recongurations have been generated, fault emulation can be performed. There are three emulation modes.
Fault-free Circuit Emulation Faulty Circuit Emulation Serial Fault Emulation
Depending on the design, MCS typically varies from 500 KHz to 5 MHz. Assuming that a given circuit operates at 1MHz and that the faults are detected in average at P = 10,000, then the emulator will be able to process 100 faults per second.
SSFE = MCS P
4.2
Fast Reconguration Let Treconf be the time required to recongure the hardware prototype, the SFE speed is now dened as follows :
Treconf depends mainly on the time needed to recongure BLPs. Unlike Xilinx, the Meta architecture provides the ability to read or modify only a portion of the chip at a time. This feature allows the reconguration time to be signicantly reduced. A BLP can be recongured in 0.2 millisecond regardless of its location. On average, Treconf is equal to 0.8 millisecond (4 BLPs to be recongured) and consequently 1,200 faults can be processed every second if P is close to 0 (all the faults are detected in the rst cycles). Assuming that a circuit operates at 1 Mhz, then the time required to recongure the prototype will be greater than the emulation runtime if P < 800. Conversely the reconguration time will be negligible as soon as P > 10,000.
Theoretical Complexity It is clear that SSFE depends on the size of the circuits and that the run time cannot be considered as linear with the number of gates. On the other hand, there exists no explicit relation between the number of gates and the maximum clock frequency. Furthermore experimental results show that some large circuits can operate at higher speed than small circuits. Hence, SFE can be considered as quasi-linear with the number of gates of the circuit. 4.4 Register Initialization Test evaluation generally requires the capability of bringing the circuits in a given state without applying any initialization sequence. This may be used when the test sets are so large that they have to be cut in several test sequences. This may also be used in the following case. When hardware reset is not available on the registers, the test set consists of an initialization sequence which brings the circuit in a given state followed by an actual test sequence. In this case, register initialization may also be used to separately observe the effects of faults for the initializing sequence and the actual test sequence. In logic emulation, the time required to initialize the registers is not signicant in comparison with the emulation runtime. As indicated in Section 2, the registers (and more generally the sequential devices) may be initialized by writing a C program in which specied values (0 or 1) are assigned to them. When the program is interpreted, the BLP corresponding to each register is forced to the specied value. Note that the registers which are not forced by the program remain in an unpredictable state (either 0 or 1). The main drawback of this "software" initialization technique is that the time required to force the BLPs can signicantly increase the overhead between each fault processing.
R1 D1 Additional Register rst 1 0 set Ck Q1 D2 rst Ck R2 Q2 S3 D3 set Ck R3 Q3
SSFE = P + T MCS:MCS
reconf
of forcing the value of the sequential device of each BLP with the asynchronous set and reset lines. In Figure 3, the register R1 and R3 have to be forced to 1 whereas the register R2 has to be forced to 0. An additional register is connected to the set or reset pins of the registers to be initialized. If a pin is already connected to another gate, a 2-input OR gate is added to the design for ORing the initial signal and the forcing line (R3 for example). In this case, the set or reset pin stuck faults are inserted on the input pins of the OR gate. At the beginning of the faulty circuit emulation, the additional register is forced to 1 by the software initialization described above so that it brings the circuit in the given state. Note that if the user wants to bring the circuit in another initial state, the hardware prototype has to be re-compiled by the CAP software. Unlike software initialization, hardware initialization avoids spending time between each fault pass. Furthermore, this technique has a negligible impact on the initial netlist and consequently on MCS . 4.5 Fault Detection The Meta Systems hardware emulator provides two techniques for injecting stimuli. The rst consists in using the 24 hardware memories available on each logic board in a generator mode. In this mode, each memory is directly addressed by a non-recongurable hardware counter which has to be reset at the beginning of each emulation. The user can load up to 32K patterns regardless of the number of bits. It is possible to extend the number of patterns capacity by successively loading 32K pattern pages. However this last possibility is not well-suited for fault emulation because of the loading time. The second technique consists in replacing one or more logic boards by specialized memory boards. Each memory board can also operate in a generator mode and it can be loaded with up to 256K patterns of 384 bits. Memory boards may be combined to provide a very large memory. To improve the fault detection speed, both the controlled input values and the expected output values are considered as stimuli. Hence, it is possible to perform a hardware comparison between the output values calculated by emulation and the expected values.
4.3
Input Stimuli Test Patterns Expected Outputs
Faulty Circuit = Fault Counter
Trigger Memory Boards Logic Boards
Fig. 4: Hardware Fault Detection

As shown in Figure 4, an equality comparator is inserted into the initial design so that only one signal has to be tested to verify whether the output values are different from the expected values or not. The trigger dened in Section 3.4 is set on this signal. A counter may also be inserted into the design in order to calculate the number of times a fault is detected. In this mode, the trigger is turned off to prevent the emulation of the faulty circuit from being stopped before the end of the test. The comparator may have an impact on MCS since an observed signal can be located on the critical path (calculated by the static timing analyzer). In this case, the propagation time through the comparator is added to the critical path so that MCS decreases. Each observed signal is propagated within the comparator through a 2-input NXOR and a N-input AND (where N is
Design Registers
Fig. 3: Hardware initialization of Registers

We propose a hardware technique for initializing the registers into a given state. This technique takes advantage of the capability
the number of observed signals). If a balanced technique is used for the mapping of the N-input AND, there are 1 + log4 (N ) BLPs between each observed signal and the output of the comparator on which the trigger condition is set. Furthermore the comparator induces a partitioning and routing constraint since the observed signals have to be connected to the comparator through one or several Metas and/or boards.
without comparison 7 6 5 with comparison
CAP Circuit mul4x4 mul8x8 mul16x16 mul32x32 mul64x64 #G 179 699 2561 10203 39899
TCAP
#F 1000 3876 14620 57320 218860
SFE #R 424 1628 6100 23300 88962
[sec.] 1.9 4.6 14.2 137.3 1950.2
TSFE
[sec.] 0.3 1.3 5.1 19.6 77.4
Table 2: Fault emulation of multipliers

from the standard set of the ISCAS89 benchmarks. We have considered the SSF model for all the gate outputs without collapsing the faults. A full scan path has been inserted into the original design so that the random test can bring the circuit into many distinct states to obtain a good fault coverage (obviously, our method does not intrinsically require full scanned circuits). Furthermore, all the ip-ops are set to 0 at the beginning of each faulty circuit emulation by using the technique explained in Section 4.4.
MCS (Mhz)
4 3 2 1
mul4
mul8
mul16
mul32
mul64
Fig. 5: Impact of the hardware comparison on MCS

Figure 5 shows the impact of the hardware comparison on MCS . We have selected several Booths multipliers varying from 4 to 64 inputs. Each multiplier is generated with the CAP software and MCS is calculated with the hardware comparator (denoted by the dashed line) and without the hardware comparator (denoted by the solid line). It can be seen that the propagation time through the comparator is added to the critical path of each multiplier since those circuits are combinational and all the outputs are observed (and compared). The two curves show that as the circuit size increases, the effect of the comparator decreases. Since our approach is targeted for large circuits, the effects of the hardware comparison can be disregarded. Note that for smaller circuits, the impact can nevertheless be minimized by using a pipeline architecture in which registers are inserted after the observed signals. 5
Circuit s9234 s13207 s15850 s35932 s38584 s38417
#G
Description #FF #I 19 31 14 35 12 28
#O 22 121 87 320 278 106
TCAP MCS
[sec.] 48.5 91.1 88.7 279.2 385.3 356.2
CAP
5725 228 8620 669 10369 597 17793 1728 19705 1452 23715 1636
[MHz] 1.3 1.2 1.3 2.1 1.1 1.5
Table 3: CAP for ISCAS89 benchmarks

Table 3 gives the description of these circuits in terms of the number of gates (G), the number of ip-ops (FF), the number of inputs (I) and the number of outputs (O). The CAP results are reported in terms of the time required to compile the fault-free hardware prototype and to compute the FPGA recongurations (TCAP ) on a Sun workstation (Sparc 10 - 64 MBytes RAM) and the maximum clock speed (MCS ). Table 4 reports the results obtained by SFE in terms of the number of faults (#F), the fault coverage (C ) obtained with the random test set, the average number of patterns needed to detect a fault (T ), the runtime for fault emulation (TSFE ) and the fault emulation speed (SSFE ).
EXPERIMENTAL RESULTS
In order to measure the efciency of SFE, we have conducted two sets of experiments. Evolution of performances for a xed architecture First, we have conducted experiments to demonstrate the range of performance according to the size of a xed architecture, namely for several Booths multipliers made up of 2-input gates. We have generated a 1K random test patterns (this size is sufcient to obtain a 95% coverage). Each I/O pin of the gates of the circuits are stuck at 0 and 1, but fault collapsing is performed to minimize the number of faulty circuit runs. Table 2 indicates the number of gates (#G), the time required to compile the fault-free hardware prototype and to compute the FPGA recongurations (TCAP ) on a Sun workstation (Sparc 10 - 64 MBytes RAM), the number of faults (#F), the number of runs needed to test all the faults (#R) and the runtime for fault emulation (TSFE ). Most of the faults (90%) are detected in the rst ten patterns so that fault emulation runs at the maximal reconguration speed (around 1,200 runs per second). It is obvious that in these conditions TSFE is linear with the number of gates #G. The time required to CAP is considerably greater than the runtime for fault emulation (this is especially true when the circuits are large). However it is not necessary to re-compile the circuits for other test sets or other fault sets. 5.2 SFE on ISCAS89 Benchmarks In the second experiment, we have performed test evaluation with 50K random test patterns on the largest sequential circuits 5.1
Circuit s9234 s13207 s15850 s35932 s38584 s38417
#F 13020 21256 24322 45956 50124 57448
C
81.1 82.7 82.6 91.6 92.7 95.1
SFE
TSFE SSFE
[sec.] 148.1 196.4 231.3 169.7 225.9 170.5
15432 11501 12611 9781 4946 4531
[f./sec.] 87.9 108.2 105.1 270.8 221.8 336.9
Table 4: Results for 50K Pattern Fault Emulation
MCS does not decrease with the complexity of the circuits so that SFE is linear in time with the number of gates. SSFE increases as the average number of patterns needed to detect a fault decreases. We have compared our results with those obtained by HOPE (version 1.1) [12] [13] , a state-of-the-art fault simulator combining efcient techniques such as single fault propagation and
parallel fault processing. HOPE has been tested under similar conditions using the same fault model and the same test set. Furthermore, the performance of HOPE is measured on the same machine (Sparc 10). To compare the evolution of performance with the number of gates, we have to normalize the run time according to the average number of patterns required to detect a fault. So we have normalized the results according to the rst circuit (s9234).
4
[2] P. S. Bottorff Test Generation and Fault Simulation, VLSI Testing, North Holland Ed., 1985, pp. 29-64 [3] R.K. Brayton, G.D. Hatchel and A.L. Sangiovanni-Vincentelli Multilevel Logic Synthesis, Proc. of the IEEE, Vol. 78, No 2, Feb. 1990, pp. 264-300 [4] M. Butts, J. Bacheler and J. Varghese An Efcient Logic Emulation System, Proc. ICCD, 1992, pp. 138-141 [5] W. T. Cheng and M.L. Yu Differential Fault Simulation - A Fast Method Using Minimal Memory, Proc. 26th DAC, 1989, pp. 424428 [6] S. Gai and P. L. Montessoro Creator : New Advanced Concepts in Concurrent Simulation, IEEE Trans. on CAD, Vol 13, No 6, June 1994, pp. 786-795 [7] J. Gateley et al. UltraSPARC-I Emulation, Proc. 32nd DAC, 1995, pp. 535-540 [8] D. Harel and B. Krishnamurthy Is There Hope for Linear Time Fault Simulation ?, Fault Tolerant Computing Symposium, July 1987, pp.28-33
10
Normalized processing time (sec.)
x20 10
3
x8
HOPE SFE
2
10
[9] D.D. Hill and D.R. Cassiday Preliminary Description of Tabula Rasa, an Electrically Recongurable Hardware Engine, Proc. ICCD, Sept. 1990, pp. 391-395 [10] H-C. Hsieh et al. A Second Generation User-Programmable Gate Array, Proc. Custom Integrated Circuit Conference, 1987, pp. 515521 [11] U. R. Khan, H.L. Owen and J. L. A. Hughes FPGA Architectures for ASIC Hardware Emulator, Proc. 6th IEEE ASIC Conference, 1993, pp. 336-340 [12] H.K. Lee and D.S. Ha HOPE: An Efcient Parallel Fault Simulator for Synchronous Sequential Circuits, Proc. 29th DAC, 1992 pp. 336-340 [13] H.K. Lee and D.S. Ha New Techniques for Improving Parallel Fault Simulation in Synchronous Sequential Circuits, Proc. ICCAD, 1993 pp. 10-17 [14] E. W. Thomson and S. A. Szygenda Parallel Fault Simulation, Computer, Vol. 8, No 3, March. 1975, pp. 177-188 [15] E. G. Ulrich and T. Baker Concurrent Simulation of nearly Identical Digital Networks, Computer, Vol. 7, April 1974, pp. 204-209 [16] N. Van Brunt The Zycad Logic Evaluator and its Application to Modern System Design, Proc. ICCD, 1983, pp. 232-233 [17] S. Walters Computer-aided Prototyping for ASIC-based Systems, IEEE Design and Test, June 1991, pp. 4-10
5k
10k
15k
20k
25k
Number of gates
Fig. 6: Normalized Performance Comparison

Figure 6 shows the processing time for SFE and HOPE. The speedup of SFE over HOPE varies from 8 to 20. It is clear that SFE is especially advantageous for large circuits. For 100 K gate circuits, we can hope to reach a speedup of two orders of magnitude with respect to commercial tools. In contrast with HOPE, these tools are less efcient because they have to implement mechanisms to take into account users libraries, memories or complex synchronization schemes. 6 CONCLUSIONS An approach to evaluate test sets for large sequential circuits has been presented. This approach relies on the utilization of a hardware emulator to observe the effects of the faults on the circuits. Serial fault emulation takes advantage not only of the recongurability of Sram-based reprogrammable circuits but also of the reconguration speed of the Meta Systemss hardware emulators. The main advantage of serial fault emulation is that in contrast with software fault simulation the computing time is quasi-linear with the number of gates. So for large designs our approach can drastically reduce the time taken in the analysis of fault coverage, aliasing probability and detectability. After logic verication and fast prototyping, test evaluation is a new application of hardware emulators that will encourage designer teams to adopt hardware emulator based methodology. 7 ACKNOWLEDGMENTS The authors would like to thank Dong S. Ha of the University of Virginia for providing them with HOPE and B. Bailey of Mentor Graphics for its help in the preparation of this paper. They also thank E. Legai, J-S. Weil, F. Touzard and G. Morisset for their assistance with the Meta Systemss CAP software. References
[1] M. Abramovici, M. A. Breuer and A. D. Friedman Digital Systems Testing and Testable Design, New York, W.H. Freeman and Company, 1990, p. 134
D&R Headline News | Most Popular | SoC News Alerts |
M2000 starts offering IP cores and tools that marry ASIC and FPGA design
EE Times : Latest News
After eight years, M2000 now open to public

Anthony Cataldo
(09/27/2004 9:00 AM EDT) URL: http://www.eetimes.com/showArticle.jhtml?articleID=47902997

San Jose, Calif. - More than a few chip companies have tried their hand at embedding blocks of FPGA logic into otherwise-hardwired ASIC devices. Startup M2000 says it wants to be the first to make a business out of it. In the coming months, the reclusive company will open its doors to customers outside the close network of partners with which it has been working for more than three years. By November, M2000 (Bievres, France) will start offering intellectual-property (IP) cores and tools that will marry ASIC and FPGA design, said chief executive officer Frederic Reblewski. Though the company is small and relatively unknown, the 15-person team at M2000 is hardly new to FPGA technology. The company was started in 1996 by the founders of FPGA-based emulation vendor Meta Systems, which was bought by Mentor Graphics Corp. the same year. The founders hold many patents related to configurable logic and its use for electronic system testing. It's been a long gestation period for M2000, but Reblewski said the company refrained from rushing to market in order to avoid the mistakes made by others. Embedding FPGA blocks into ASICs has been seen as a way to make ASICs more flexible and to reduce development costs, but most of the handful of chip makers and startups that have tried it have retreated because of cost and software issues. For that reason, M2000 said it has put most of its effort into ensuring that it has the right software tools and a high-density FPGA fabric. The company's proposed design flow includes the use of commercial synthesis tools along with its own mapping, placement and routing, and configuration tools. After place and route, the tools automatically output an embedded FPGA macro that includes all the data for design-fortest, floor planning and physical verification. SDF files for static timing analysis and Verilog models for simulation are also automatically generated. As for the FPGA hardware, the company says it is now finishing an eighth-generation device structure, targeting 90-nanometer designs. The cell structure is based on a basic four-input lookup table and SRAM technology. Where M2000 parts ways with other FPGA vendors is in the architecture. Rather than use uniform routing resources, the company has developed a compiler that tailors the routing to the design, giving it three times the logic density of standard FPGAs. "What others propose in 90 nm is what we propose in 0.15 micron in terms of density," Reblewski said. The company says it can port its FPGA architecture to any foundry within two months. Another claimed benefit to this approach is that timing becomes more predictable. Reblewski said the company can achieve logic speed of 700 MHz using 0.13-micron design rules. "As soon as we know where to put each element, the timing is known," he said. Assuming M2000 can deliver the technology as promised, the next question is whether it will fly as a business. Among those that have tried and failed are LSI Logic, Adaptive Silicon and Actel. Leopard Logic Inc., which started out offering FPGA IP cores, is now fielding semicustom chips that combine fixed logic functions with on-chip FPGA. The one heavyweight to watch in this area is IBM Corp., which has licensed FPGA technology from Xilinx Inc. that it can use in 90-nm ASICs. M2000 has been working with six partners, in Europe, the United States and Japan, and one of them is said to be shipping chips with the embedded FPGA core for wireless-infrastructure systems. (The only partner that it has disclosed is STMicroelectronics, which worked with M2000 several years ago to design an image sensor that used embedded FPGA gates for reconfigurable logic.) Moreover, M2000 been raising investment capital since early 2004. "There are some very big names looking at our technology," Reblewski said. "We are very confident." All material on this site Copyright 2005 CMP Media LLC. All rights reserved. Privacy Statement | Your California Privacy Rights | Terms of Service
E-mail This Article
Printer-Friendly Page
Verizon - Official Site Get FiOS Triple Play - Just $84.99 in California. Limited Time Offer! verizon.com/FiOS Patent Attorney/Engineer Over 36 years experience. Staff available 24/7 by phone. www.invention.net AT&T Official Site Compare AT&T U-verse Bundles and See How We Measure Up to Cable. att.com/U-verse
Home | Feedback | Register | Site Map
All material on this site Copyright 2009 Design And Reuse S.A. All rights reserved.
A Reconfigurable System featuring Dynamically Extensible Embedded Microprocessor, FPGA and Customisable I/O
Michele Borgatti, Francesco Lertora, Benoit Fort and Lorenzo Cal
STMicroelectronics Innovative Systems Design, NVM-DP, Central R&D Agrate Brianza (MI), ITALY Abstract A system-chip targeting image and voice processing and recognition application domains is implemented as a representative of the potential of using programmable logic in system design. It features an embedded reconfigurable processor built by joining a configurable and extensible processor core and a SRAM-based embedded FPGA. Application-specific bus-mapped coprocessors and flexible I/O peripherals and interfaces can also be added and dynamically modified by reconfiguring the embedded FPGA. The architecture of the system is discussed as well as the design flows for pre- and post-silicon design and customisation. The silicon area required by the system is 20mm2 in a 0.18um CMOS technology. The embedded FPGA accounts for about 40% of the system area. Introduction These days we are witnessing two conflicting trends in the electronic industry. At one side the economics of system integration pushes logic suppliers towards ever more complex system-chip devices. On the other side, increasing complexity of design and associated risks, increase of non-recurrent engineering expenses and shorter time-to-market and product life are causing OEMs to look for faster turnaround and lower risk design solutions and technology. The recent introduction of embedded programmable logic allows ASIC and ASSP vendors to broaden the appeal of their products. Also, hardware programmability can be exploited by system integrators for product customisation. In this paper we present a pragmatic approach to introduce flexibility in system-chip design and exploit embedded programmable silicon fabrics to enhance system performances. In particular, enabling application-specific configurations to adapt the underlying hardware architecture to time-varying application demands can improve execution speed and reduce power consumption compared to a general-purpose programmable solution. In the proposed system the embedded programmable logic allows static or dynamic configuration of the instruction set of an embedded microprocessor, the creation of busmapped application-specific hardware coprocessors and accelerators, and the customisation of the system I/O. The latter feature allows the device to potentially connect to any external unit/sensor given that its communication protocol can be mapped to the on-chip programmable logic. Also, some computations can be performed on-the-fly when data is captured. The proposed system has been built using a set of state-of-theart IP cores and system design methodology. In particular, a configurable and extensible processor (1) with associated tools, and an embedded FPGA (2) were used. The resulting system has been developed to target image and voice processing and recognition application domains. Design flows for system exploration and implementation are also introduced. System Architecture One of the main goals of this work was to build a flexible architecture, working at a reasonable high clock frequency, built around an embedded FPGA and an extensible 32-bit microprocessor. The base processor is a specific customisation of that described in (1). It comes with a complete set of tools for configuration and performance analysis. Main features of the processor core used in our system are: 5-stage pipeline, 8+8kB direct-mapped data/instruction caches, a 24 or 16 bit instruction format for improved code density, a 64 bit processor interface (PIF) with burst transfers for cache-page refill, 13 interrupt lines organized in 4 priority interrupt levels. The system architecture is illustrated in Fig.1. The PIF/AHB Bridge translates processor cycles to the AMBA AHB bus (3) with support for fast burst and locked transfers. An external memory interface (EMI) exploits the available peak throughput of fastest commercial external non-volatile flash memories. It allows a wide range of burst mode and page mode configurations under software control and supports low-voltage, low-swing operations. If required, an external RAM port allows the extension of the on-chip 48kB SRAM. The heart of the system is an embedded FPGA and its multiple interfaces to main system units, in particular the functional purposes of the e-FPGA programmable logic are: extension of the processor datapath supporting a set of additional special-purpose instructions (TIE). This is done by connecting the processor datapath through a wide bus and a specific interface (TIE bus/interface in Fig. 1); bus-mapped coprocessor. Hardware units mapped into the e-FPGA can be interfaced to the system bus through an AHB bus master/slave;
flexible I/O. The programmable general-purpose I/O pads interface is used to connect external units or sensors with their application-specific communication protocol. All these possibilities may be mixed in a singular configuration for the FPGA and this results in a highly configurable device. To accelerate communications between the configurable hardware and software tasks running on the processor, 4 interrupt channels can be driven by logic mapped into the e-FPGA. A two-way HW/SW communication can be implemented by the joint usage of these interrupt channels and dedicated AMBA APB registers.
48 KB SRAM
AHB Wrapper External Memory Interface 32 bit External RAM Port 32 bit External ROM/FLASH Port
8KB D$
8KB I$
Processor INTERFACE (PIF)
32 bit Extensible Microprocessor
64 bit PIF BUS
PIF/AHB BRIDGE
64 bit AHB BUS
runtime re-configuration of the instruction set. This implies that the number of user-defined instructions available at a given time is limited by the e-FPGA logic capacity and instruction logic complexity. However, a set of additional instructions can be defined to target specific application needs. If the logic size of the set of additional instructions exceeds the logic capacity of the e-FPGA, it might be split into a number of contexts fitting the size constraints of the eFPGA. These contexts might be used to dynamically reprogram the FPGA to support application needs. The flexibility advantage of this architecture implies a speed penalty for the part of logic mapped inside the e-FPGA. In particular, specific processor instructions mapped in the reconfigurable fabric may be 1x to 10x slower than their equivalent implementation in standard cells. Fig.2 details the processor-FPGA interface: a focus is given on how Instruction Extensions are mapped inside the FPGA and how synchronisation between the microprocessor and the e-FPGA is guaranteed.
Instruction Other FPGA Purposes
DMA Master/Slave AHB Interface Interrupt Manager Interrupt Interface TIE Interface FPGA Programming Interface Dual Port Buffer Interface
AHB Wrapper 1KB Dual Port Buffer AHB/APB Bridge
Instruction Decode
Pipe Control
Register R stage File Bypass
State
Decode
Decode
Data
Embedded FPGA
TIE X
Branch
Shifter
Adder
AGen
E stage
TIE BUS
General Purpose I/O Interface
result
N1 N2 N3
64 bit APB BUS
e-FPGA FPGA TIE-Clock Decode

Opcodes
vAddr Base Processor
General Purpose I/O Lines
Programmable General Purpose I/O
General Purpose Registers
I2C Master
I2C BUS
Type Decoder Clock Control
Type 0 Default Delay N=1
Type 1
Type 2
Type 3
System Clock
TIE 0~4 TIE 5~9 TIE 10~21 TIE 22~31
Delay N1 Delay N2 Delay N3
Fig. 1: System Architecture Block diagram Download of the FPGA bitstream is performed by a flexible programming interface. To allow validation of the FPGA configuration, the bitstream may be read-back by hardware support. Most audio or video applications require storage buffers to interface fast decoding hardware and slower software running on the processor. With this concept in mind, a 1kByte dual port buffer has been added and organised as 4x256 bytes rows. One port of this buffer is connected to the AHB bus while the second port is directly accessed by the FPGA dual port buffer interface. The AMBA APB Bus connects all the configuration/general purpose registers to the system. On the same bus, an I2C master interface has been added to connect external devices or sensors like LCD display, CMOS camera, etc A programmable general-purpose I/O module features mono input/output and bi-directional pads under the control of both the e-FPGA and the microprocessor. A. The Microprocessor-FPGA interface The configurable processor allows adding user-defined instructions. In the proposed architecture, this capability was mapped exclusively into the e-FPGA, allowing
System Clock
Clock Stratcher Mechanism
Fig. 2: Embedded FPGA Microprocessor Interface As the additional instruction set is part of the processor pipeline (1), slowing down this logic results in a drastic reduction of processor maximum speed hence affecting processor performance when using the baseline generalpurpose instruction set. A mechanism is introduced to allow the processor to be clocked at its maximum speed while executing standard instructions, whereas it is slowed down by a programmable, instruction-dependent number of cycles (1-16) when executing processor instructions mapped into the FPGA. A clock control system allows the processor to be synchronised with the e-FPGA for the number of cycles the instruction is executed. A dedicated module is able to identify instructions whose performance is not aligned with the processor. As each of these instructions needs to be associated to its execution time, the set was partitioned. A pre-defined map-table divides in 4 the whole set of opcodes reserved for user-defined instructions. For each set that belongs to a configuration, a number, mapped as a constant output of the FPGA, defines the number of times the clock needs to be stretched to synchronise properly the
execution of the pipeline between the FPGA and the base processor. Thus, the system allows executing a set of TIEs among a panel of 4 user-defined speed penalties for any FPGA configuration. In this way, the processor CPU is tied to the FPGA speed for the strictly required number of cycles. The set of user-instructions can be defined after tape out thanks to the FPGA. More, the system allows to parametrise its execution time, to exploit the performances of both hard-wired and programmable logic. B. Block Description of the e-FPGA The architecture of the e-FPGA (2) is organised as a hierarchical multi-level interconnect network (see Fig.3)
level. The microprocessor core is abstracted in the coverification with its Instruction Set Simulator integrated into the simulation engine. Extensive simulations of the system with the usage of the profiler (memory accesses, cpu load, exceptions) help in finding the computational kernels of the software running on the core (performance analysis).
Functional model (untimed simulation) Partitioning / Interface Synthesis / Refinement Cycle Accurate Simulation Performance Analysis CoWare libraries (HW/SW platform) uP ISS
VHDL (eFPGA)
Cluster 1
MFC
TIE verilog code
Cluster 24
MFC
HW (RTL) uP, AHB/APB Bus Peripherals
C code
MFC
MFC
LUT
Local network
FF / Latch
Local network
Soft Hardware eFPGA mapping Applications
Multi Function logic Cell
Global network
IPad IPad
384 Inputs
eFPGA HARD MACRO

Configuration & Test Interface
32 Bits Control Bus
SoC Integration
IPad
OPad OPad OPad

384 Outputs
Fig. 4: System to RTL At this point it is possible to group segments of codes that result timing consuming as new instructions of the extensible processor. Those extensions of the Instruction Set can be easily mapped on the e-FPGA as well as the VHDL code that results from the refinement process done during partitioning phase. The system integration flow ends producing: Soft Hardware to be mapped on the eFPGA: HDL RTL code of instruction extensions, bus-mapped coprocessors and special purpose I/O peripherals. Conventional fixed hardware: Microprocessor RTL code, AHB/APB bus and Peripherals. Embedded Software (C code): Application software and low-level drivers for the hardware platform. The C code generated by the flow described above became the final application while the RTL of the system with the eFPGA hard macro goes into the system integration flow. B. The RTL-to-Layout design flow In the Fig.5 both silicon implementation flow and e-FPGA configuration flows are shown. These flows are run at different times. Once silicon implementation flow has produced the routed database its possible to implement eFPGA flow that can be repeated for each different function built as a soft macro. The RTL code of the CPU core, IP blocks and Interface modules (system bus) is synthesized and integrated with RAM blocks and FPGA hard macro in the floorplanning environment. To meet timing requirements at the boundary of the e-FPGA, a special care was taken during synthesis process for the logic cells that interfaces e-FPGA with the rest of the system. A particular set of constraints was specified to reach minimum delay of the hardwired logic. After the place and
Fig. 3: Block diagram of the e-FPGA An array of logic elements called Multi Function logic Cell (MFC) allows implementation of digital logic. The MFC is a 4 input / 1 output programmable structure associating a 4 input Look-Up Table and a storage element (dff, latch). There are 3k MFC shared among 24 clusters. The Global Interconnect Network links the clusters together and to IPads & OPads peripherals cells. At a lower level, a Local Interconnect Network links MFC together and to the global network. The architecture allows defining up to 1 clock signal per cluster. The MFC clock is one of 3 global signals defined to be connected to any input of the cluster. This insures a low skew between cluster clocks and a full IO assignment flexibility. The input (respectively output) pin set counts 384 independent and fully equivalent inputs (respectively outputs). Design Flow and System Integration A. The System-to-RTL design flow In Fig.4 the design flow used for system architecture exploration and integration is described. The starting point is an untimed model of the system written in C/C++ code describing the desired functionality; at this stage the verification is done with simulations in CoWare N2C environment (4). This methodology allows designers to validate the system specifications and consequently, with a progressive refinement of the functional blocks into hardware and software (partitioning process) and the generation of the HW/SW interface (interface synthesis), the verification of the system at a cycle accurate abstraction
route stage, the final database is statically and dynamically verified against the RTL simulations in order to make verification at all levels of abstraction.
V E R I F I C A T I O N CPU core, IP Interface RTL code RAM eFPGA core
recognition computing kernels. Additional 1.5x to 2x performance improvements are reported on specific I/O intensive tasks to interface an external CMOS camera and doing some image processing computations on-the-fly using the e-FPGA.
Acknowledgements: The authors thank Sara Bocchio, G. Repetto, C. Gazzina and L. Fumagalli for their valuable help and support. They also thank O. Lepape, J. Barbier and F. Reblewsky at M2000, J. Massingham and B. Campbell at Tensilica, and K. Ahluwalia, D. Tilley, M. Woodward and P. Bingham at CoWare. A special thank to Dr. A. Kramer for his support and encouragement. TABLE I DEVICE PERFORMANCES AND POWER CONSUMPTION Processor maximum speed: Reconfiguration speed: Chip average power consumption 125MHz (WCMIL) 175MHz (TYP) ~500us @ 100MHz clock ~300mW @ 100MHz, 1.8V
Synthesis
TIE
Coproc.
I/O Interf.
Floorplanning (full chip) / P&R Static Timing Analysis, Verification with FPGA black box, Dynamic Verification Constraints file Synthesis (fpga lib) Mapping (fpga p&r)
Netlist + Timing Database
FPGA Timing Database
FPGA Bitstream
Silicon fab
Final verification with FPGA timing model Static Timing Analysis
Fig. 5 RTL to Layout The timed database used for the verification, built after a paracsitic extraction and a delay calculation process, allows knowing the effective delays at the boundary of the e-FPGA hard macro (all e-FPGA I/O pins are characterized with the static timing analyzer in the worst case condition). This information is exported in the eFPGA flow as a constraint file and used during synthesis/mapping of the soft hardware by specific eFPGA tools. This is done to correctly constrain the logic mapped on the e-FPGA with the real timing budget. Finally the generation of the bitstream and a timed view of the macro can be used for the final sign-off. Static timing analysis of the e-FPGA results in both a backannotated netlist and a timing view for full chip static timing analysis. System Implementation and Test The full-chip has been implemented in a standard CMOS 1.8V/3.3V, 0.18um technology featuring 6 metal layers. The layout of the system has been integrated using commercial place and route tools for digital ASIC. To avoid external multiple power supply, an internal DC (3V to 1.8V) voltage regulator has been integrated. The chip is being tested and is fully functional at the clock rate of 175MHz. The processor system is able to reconfigure the e-FPGA at full speed. Reconfiguration takes about 500us at a clock rate of 100MHz. During reconfiguration the average throughput sustained by external memories, EMI and programming interface is 50MB/sec. Device performances and power consumption are summarized in Table I. Technology and device characteristics are summarized in Table II and a chip micrograph is shown in Fig.6 with a floorplan view of system components. The system is being tested using both a face recognition application and a speech recognition application. During architecture development we reported speedups of 4x to 8x using instruction extensions to accelerate face-
TABLE II TECHNOLOGY AND DEVICE CHARACTERISTICS Technology SRAM Memory Chip size Core size e-FPGA size Customisable I/O Power supply 0.18m CMOS 6-ML Main: 48kB (64-bit wide) I$: 8kB (64-bit wide) D$: 8kB (64-bit wide) Buffers: 4x256B (8-bit wide) 5.5x5.5 mm2 (pad limited) 20 mm2 8.2 mm2 (15k useable equivalent ASIC gates) 24 general-purpose inputs 24 general-purpose outputs (tristate) 8 general-purpose bidirs 2.7-3.6V (external), 1.8V(core, internally generated / regulated)
References
(1) R.E.Gonzalez., "Xtensa: A Configurable and Extensible Processor" , IEEE Micro, March-April 2000, pp. 60-70. (2) M2000, Flexeos family technical manual, www.m2000.fr (3) ARM Ltd., AMBA Specification Rev 2.0 (4) I.Bolsens, H.De Man, B. Lin, C.Van Rompaey, S.Vercauteren and D.Verkest, Hardware/Software Co-Design of Digital Telecommunication Systems, Proceedings of the IEEE, Vol. 85, No. 3, March 1997, pp 391418.
DC REGULATOR
CACHE TAGS
8 + 8 KB INSTRUCTION And DATA CACHE
32b EXTENSIBLE CORE + 64b AHB BUS + 64b APB BUS + AHB & APB PERIPHERALS STANDARD CELLS (250k GATES)
Embedded FPGA
48 KB SRAM
1KB DUAL PORT BUFFER
Fig.6 Chip Micrograph
40.3
A Reconfigurable Signal Processing IC with embedded FPGA and Multi-Port Flash Memory
M. Borgatti, L. Cal, G. De Sandre, B. Fort, D. Iezzi, F. Lertora, G. Muzzi, M. Pasotti, M. Poles and P.L. Rolandi STMicroelectronics, Central R&D Agrate Brianza, Italy michele.borgatti@st.com 1. INTRODUCTION
Increasing complexity of system design and shorter time-tomarket requirements are leading research towards the investigation of hybrid systems including processors enhanced by programmable logic [1][2]. Embedded programmable logic allows ASIC and ASSP vendors to broaden the appeal of their products. NRE reduction and shorter time-to-market are key to OEMs looking for faster turnaround and lower risk design solutions and technology. System integrators can also exploit hardware programmability for in-house product customization. In this paper we present a pragmatic approach to introduce flexibility in system-chip design and exploit embedded programmable silicon fabrics to enhance system performances. In particular, enabling application-specific configurations to adapt the underlying hardware architecture to time-varying application demands can improve execution speed and reduce power consumption compared to a general-purpose programmable solution. This paper describes a dynamically reconfigurable processing unit tightly connected to a Flash EEPROM memory subsystem. The reconfigurable processing unit targets image-voice processing and recognition application domains and is implemented by joining a configurable and extensible processor core and an SRAM-based embedded FPGA. Application-specific HW units are added and dynamically modified by embedded FPGA reconfiguration. By implementing application-specific vector processing instructions, the unit shows a peak computing power of 1GOPS. Efficient read-write-erase access to code, data and FPGA bitstreams is provided by a specific memory subsystem based on a modular 8Mb, 4-bank Flash memory. It features 3 content-specific I/O ports and delivers an aggregate peak read throughput of 1.2GB/s. The proposed system has been built using a set of state-of-the-art IP cores and system design methodology. Design flows for system exploration and implementation are also described.
ABSTRACT
A 1GOPS dynamically reconfigurable processing unit with embedded Flash memory and SRAM-based FPGA targets imagevoice processing and recognition applications. Code, data and FPGA bitstreams are stored in the embedded Flash memory and are independently accessible through 3 content-specific, 64-bit I/O ports with a peak read rate of 1.2GB/s. The system is implemented in a 0.18um, 2PL-6ML CMOS Flash technology, chip area is 70mm2.
Categories and Subject Descriptors

B.7.1 [INTEGRATED CIRCUITS]: Types and Design Styles Advanced technologies, Algorithms implemented in hardware, Gate arrays, Input/output circuits, Memory technologies, Microprocessors and microcomputers, VLSI. C.1.3 [PROCESSOR ARCHITECTURES]: Other Architecture Styles Adaptable architectures, Heterogeneous (hybrid) systems. B.3.1 [MEMORY STRUCTURES]: Semiconductor Memories. C.3 [SPECIAL-PURPOSE AND APPLICATION-BASED SYSTEMS]: Signal processing systems.
General Terms
Design, Performance, Algorithms.
Keywords
Application-specific integrated circuits (ASICs), digital signal processors, field-programmable gate arrays (FPGAs), integrated circuit design, multimedia computing, reconfigurable architectures.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DAC 2003, June 2-6, 2003, Anaheim, California, USA. Copyright 2003 ACM 1-58113-688-9/03/0006$5.00.
2. SYSTEM ARCHITECTURE
The system architecture is illustrated in Figure 1. The functional purposes of the embedded FPGA are: i) extension of the processor datapath supporting a set of additional specialpurpose C-callable microprocessor instructions; ii) bus-mapped coprocessors (connected to the system bus through a master/slave
691
8KB D$
8KB I$
Processor INTERFACE (PIF)
32 bit Extensible Microprocessor
64 bit PIF BUS
PIF/AHB BRIDGE
48 KB SRAM
External Memory Interface
32 bit External RAM Port 32 bit External ROM/FLASH Port
64 bit AHB BUS
Interrupt Manager Master/Slave AHB Interface Interrupt Interface Instruction Extension Interface
DMA
FP
CP
DP
AHB/APB Bridge
Flash Memory
FPGA Programming Interface Dual Port Buffer Interface 1KB Dual Port Buffer
Instruction Extension BUS
Embedded FPGA
General Purpose I/O Interface
64 bit APB BUS

PC Parallel Port I2C BUS
General Purpose I/O Lines
Programmable General Purpose I/O
General Purpose Registers
PC Parallel Port Interface
I2C Master
Figure 1. System Architecture. interface); iii) flexible I/O (to connect external units or sensors featuring application-specific communication protocols). Even though such different circuit purposes would require different kinds of programmable logic for best implementation of either arithmetic-dominated or control-dominated logic, we implemented a single programmable logic fabric to be shared among different purposes both in space (same configuration) and time (subsequent configurations). A single, high I/O count, finegrain e-FPGA operates as a datapath for the microprocessor pipeline and as dedicated control logic for bus coprocessor and I/O control interface. FPGA reconfiguration is concurrent to software execution. A local bus connects a dedicated 32-bit Flash memory port (FP) to the FPGA programming interface. A DMA channel handles the bitstream transfer while microprocessor fetches instructions and data from different Flash memory ports: 64-bit wide code port (CP) and data port (DP). To support streaming applications a 1kB dual-port buffer is used to interface fast decoding hardware and slower software running on the processor. The memory sub-system architecture is shown in Figure 2. The modular memory (dotted line) includes charge pumps (Power Block), testability circuits (DFT), a power management arbiter (PMA) and a customizable array of N independent 2Mb flash memory modules, depending on the storage requirements (N=4 in the current implementation). The modular memory features (N+2) 128-bit target ports and implements a N-bank uniform memory. An 8-bit microprocessor (uP) is devoted to handle complex file-system functions (defrag, compression, virtual erase, etc.) not natively supported by DP, and assists for built-in self test. A (N+2)x4 128-bit crossbar connects the modular memory with the four initiators (CP, DP, FP and uP) providing that three banks can be read in parallel at full speed. The memory space of the four modules is arranged in three programmable user-defined partitions, each one devoted to a port. Each 2Mb flash memory module has a 128-bit IO data bus with 40ns access time, resulting in 400Mbyte/s, and a program/erase control unit. Simultaneous memory operations use the power management arbiter (PMA) for optimal scheduling. Available power and user-defined priorities are considered to schedule conflicting resource requests in a single clock cycle. The memory system allows up to four simultaneous operations (with a limit of one both for write and erase).
ADC
DFT
PMA
Power Block
2MbFlash 2MbFlash 2MbFlash 2MbFlash module 0 module 1 module 2 module 3
128 bit Memory Sub-System Crossbar

128 128 128 128
DP
64 Data Port
CP
64 Code Port
FP
32 FPGA Port
P interface
8 bit P
Figure 2. Flash Memory Architecture.
692
Figure 3 depicts the memory hierarchy and parallelism across the system. CP and DP are interfaced to the 64-bit, 800MB/s AHB system bus. At a system clock rate of 100MHz each I/O port can independently operate at maximum speed. So, an aggregate peak read rate of 1.2GB/s can be sustained as it is limited by memory access time. In the current implementation the e-FPGA reconfiguration takes 500us @ 100 MHz. 50MB/s average throughput out of the available 400MB/s are currently sustained by the e-FPGA configuration interface.
32 bit Microprocessor Register File External Processor Interface & AHB Bridge 64 bit AHB Bus 32 bit FPGA PI
The overall performance improvements for the face recognition tasks are shown in Table 1. Execution time is compared for 32-bit RISC with basic DSP extensions (MAC, zero-overhead loops, etc) and the same processor enhanced with application-specific instructions. Measured speed-ups range from 1.8x to 10.6x (on the most-demanding task), with an overall improvement of 8.5x. Notice that switching between algorithm stages requires only one reconfiguration of the e-FPGA. Reconfiguration time is negligible. The speed-up factors take into account the possible multi-cycle clock penalty due to processorFPGA synchronization in case of instruction extensions slower than the processor clock. Energy efficiency figures are also depicted in Table 1. As the average power consumption of the system extended with the eFPGA is slightly higher, the energy reduction for executing each of the tasks on its specific HW configuration (power-delay product improvement) results in an overall reduction of 6.7x. Only one task showed slightly worse total execution energy, though showing benefits on execution speed. Last column of Figure 5 reports the energy-delay improvement of each specific HW configuration compared to the general-purpose counterpart. Energy required for e-FPGA reconfiguration is always negligible. Measurements show the best energy efficiency in the range of several MOPS/mW at 1.8V supply. It lies between conventional ASIP/DSP and dedicated configurable hardware implementations [2]. Table 1. Benchmarks at 100MHz. Algorit. Stage Bayer Filter Edge Detect. Face Detect. Face Recog. Totals RISC with basic DSP 58ms 4.5ms 1.5s 9.15s 10.7s RISC with uP extens. 24.7s 2.5s 382ms 860ms 1.26s Speed Up x 2.3 x 1.8 x4 x 10.6 x 8.5 Energy Reduct. x 1.4 x 0.95 x 2.9 x9 x 6.7 Energy Effic. Gain x 3.2 x 1.7 x 11.6 x 95.4
64 bit AHB Port
32 bit Port
64 bit AHB CP Interface 64 bit Port CP
64 bit AHB DP Interface 64 bit Port DP 512 Bytes Page Buffer
AHB DMA 32 bit Port FP
2x64 bit + 1x32 bit Flash Memory Port Interfaces 6x4 128 bit Flash Memory Crossbar Flash Memory Controller Logic 4x16384x128 bit Memory Module
Figure 3. Memory Hierarchy. System performance is evaluated for an image processing application (facial recognition) and a speech recognition application. More than 20 specific instructions were designed as C/assembly-callable functions, automatically translated to RTL, then synthesized and mapped to the e-FPGA. Figure 4 shows two examples of specific microprocessor extensions. On the righthand side, an 8-issue, 8-bit, L2 calculation accounts for 23 8-bit arithmetic operations and 6 64-bit operations requiring about 10k ASIC equivalent gates. On the left-hand side, a datapath for an optimized fixed-point calculation of the square root accounts for 12 32-bit operations for about 2k ASIC equivalent gates.
31 0
Register
16
31
31
31
+
64 bit Aligned Address 64 bit Load Processor Load Unit
Root Reg.
Remainder Reg.
Number Reg.
+1
>> 1
<< 2
>> 30
>> 2
31
31
4 Segments
4 Segments
+ > -
3. DESIGN FLOW AND SYSTEM INTEGRATION 3.1 The System-to-RTL Design Flow
In Figure 5 the design flow used for system architecture exploration and integration is described. The starting point is an untimed model of the system written in C/C++ code describing the desired functionality; at this stage the verification is done with simulations in CoWare N2C environment [3]. This methodology allows designers to validate the system specifications and consequently, with a progressive refinement of the functional blocks into hardware and software (partitioning process) and the generation of the HW/SW interface (interface synthesis), the verification of the system at a cycle accurate abstraction level. The microprocessor core is abstracted in the co-verification with its Instruction Set Simulator integrated into the simulation engine.
x
63 0
Pipeline Register
+2
+
31 0 31
<< 1
0
Result Register
Result Register
Figure 4. Added DSP instructions examples.
693
Extensive simulations of the system with the usage of the profiler (memory accesses, CPU load, exceptions) help in finding the computational kernels of the software running on the core (performance analysis).
Functional model (untimed simulation) Partitioning / Interface Synthesis / Refinement Cycle Accurate Simulation Performance Analysis Libraries (HW/SW platform) uP ISS
particular set of constraints was specified to reach minimum delay of the hardwired logic. After the place and route stage, the final database is statically and dynamically verified against the RTL simulations in order to make verification at all levels of abstraction.
V E R I F I C A T I O N
CPU core, IP
Interface RTL code
Flash RAM
eFPGA core
Synthesis
Inst Ext
Coproc.
I/O Interf.
VHDL (eFPGA)
Instruction Extensions Verilog HDL
Floorplanning (full chip) / P&R Static Timing Analysis, Verification with FPGA black box, Dynamic Verification
Constraints file
HW (RTL) uP, AHB/APB Bus Peripherals
C code
Synthesis (fpga lib) Mapping (fpga p&r)
Soft Hardware eFPGA mapping Applications
Netlist + Timing Database
FPGA Timing Database
FPGA Bitstream
Silicon fab
eFPGA HARD MACRO
SoC Integration
Final verification with FPGA timing model Static Timing Analysis
Figure 5. System to RTL Flow At this point it is possible to group segments of codes that are the most time consuming as new instructions of the extensible processor. Those extensions of the Instruction Set can be easily mapped on the e-FPGA together with the VHDL code that results from the refinement process done after the HW/SW partition phase. The system integration flow ends producing: 1. Soft Hardware to be mapped on the e-FPGA: HDL RTL code of instruction extensions, bus-mapped coprocessors and special purpose I/O peripherals. Conventional fixed hardware: Microprocessor RTL code, AHB/APB bus and Peripherals. Embedded Software (C code): Application software and low-level drivers for the hardware platform.
Figure 6. RTL to Layout Flow. The timed database used for the verification, built after a parasitic extraction and a delay calculation process, allows knowing the effective delays at the boundary of the e-FPGA hard macro (all eFPGA I/O pins are characterized with the static timing analyzer in the worst case condition). This information is exported in the eFPGA flow as a constraint file and used during synthesis/mapping of the soft hardware by specific e-FPGA tools. This is done to correctly constrain the logic mapped on the e-FPGA with the real timing budget. Finally the generation of the bitstream and a timed view of the macro can be used for the final sign-off. Static timing analysis of the e-FPGA results in both a backannotated netlist and a timing view for full chip static timing analysis.
2. 3.
4. SYSTEM IMPLEMENTATION AND TEST

The full-chip is implemented in a 0.18um, 2-poly, 6-metal, CMOS embedded Flash technology. The layout of the system has been integrated using commercial place and route tools for digital ASIC. The chip is being tested and is fully functional at the clock rate of 125MHz (worst-case conditions). The processor system is able to reconfigure the e-FPGA at full speed. Reconfiguration takes about 500us at a clock rate of 100MHz. Technology and device characteristics are summarized in Table 2 and a chip micrograph is shown in Figure 7 with a floorplan view of system components. The system is being tested using both a face recognition application and a speech recognition application. As discussed in Section 2 we reported speedups of up to 8x using instruction extensions to accelerate face-recognition computing kernels. Additional 1.5x to 2x performance improvements are reported on specific I/O intensive tasks to interface an external CMOS camera and doing some image processing computations on-the-fly using the e-FPGA.
The C code generated by the flow described above is the final application while the RTL of the system with the e-FPGA hard macro goes into the SoC integration flow (RTL to layout).
3.2 The RTL-to-Layout Design Flow

In Figure 6 both silicon implementation flow and e-FPGA configuration flows are shown. These flows are run at different times. Once silicon implementation flow has produced the routed database its possible to implement e-FPGA flow that can be repeated for each different function built as a soft macro. The RTL code of the CPU core, IP blocks and Interface modules (system bus) is synthesized and integrated with RAM blocks, Flash modules and FPGA hard macro in the floorplanning environment. To meet timing requirements at the boundary of the e-FPGA, a special care was taken during synthesis process for the logic cells that interfaces e-FPGA with the rest of the system. A
694
Table 2. Technology and chip characteristics. Process Flash Memory (4x) 0.18 mm CMOS 2-Poly, 6-Metal Tunneling oxide: 10nmFlash cell size: 0.35mm2 256Kbit x 9 Sectors Word: 128 bit Program Throughput: 1Mbyte/s Typ Read Rate: 400 Mbyte/s I$: 8kB (64-bit wide) D$: 8kB (64-bit wide) Buffers: 4x256B (8-bit wide) Main: 48kB (64-bit wide) 8.4x8.4 mm2 8.2 mm2 24 general-purpose inputs 24 general-purpose outputs (tri-state) 8 general-purpose bidirs 2.7-3.6V (I/O), 1.6-2.0V (core)
DFT
1MB FLASH Memory
SRAM memory
Flash Ports Buffers 48kB SRAM
32bit uP AHB APB 8+8 kB I$+D$
FPGA
Chip size e-FPGA size Customizable I/O Power supply
Figure 7. Chip Micrograph.
5. ACKNOWLEDGMENTS
The authors thank all the colleagues of NVM-DP Dept., A. Maurelli, F. Piazza and L. Fumagalli.
6. REFERENCES
[1] Young-Don Bae et al., "A Single-Chip Programmable
Platform Base on A Multithreaded Processor and Configurable Logic Clusters", ISSCC 2002 Digest of Technical Papers, pp 336-337, Feb. 2002.
[2] Zhang et al., "A 1V Heterogeneous Reconfigurable

Processor IC for Baseband Wireless Applications", ISSCC 2000 Digest of Technical Papers, pp 68-69,488, Feb. 2000.
[3] I.Bolsens, H.De Man, B. Lin, C.Van Rompaey,

S.Vercauteren and D.Verkest, "Hardware/Software CoDesign of Digital Telecommunication Systems", Proceedings of the IEEE, Vol. 85, No. 3, March 1997, pp 391-418.
695
New Products
Abound claims dense interconnect is key to Raptor FPGA

April 24, 2009 | | 217100249 Abound Logic Inc. (Santa Clara, Calif.), an FPGA company formerly known as M2000, has started shipping its Raptor FPGAs and claims that improved interconnect is the key to achieving triple the logic density of equivalent FPGAs from established competitors. LONDON Abound Logic Inc. (Santa Clara, Calif.), an FPGA company formerly known as M2000, has started shipping its Raptor FPGAs and claims that improved interconnect is the key to achieving triple the logic density of equivalent FPGAs from established competitors. As the company was going into is development phase for the Raptor architecture its engineers realized that in FPGAs interconnect takes up more than 80 percent of the resources and also provides lots of room for improvement, according to Frederic Reblewski, CEO and founder of Abound Logic. The company has therefore stayed with a classical SRAM-based look-up table (LUT) based architecture. In addition the interconnect is transparent to the user meaning that while density can be increased it need not have an impact on EDA software and how Raptor FPGAs are designed. Reblewski claimed that Abound has been able to create an interconnect architecture with a larger effective fan-out and in which each wire adds an additional effective route. More effective routes for an EDA tool translates into denser logic functionality. Reblewski claimed the overall effect is that Raptor FPGAs are three times denser than competing architectures. "Memory, DSP, I/0 are not denser but overall is this still translates into 2X, or a process node advantage," he said. The result is what Reblewski claims is the highest capacity FPGA in 65-nm process node and one that can compete with devices from rivals made on 45-nm to 40-nm silicon. The Raptor has 750k LUTs 38-Mbits of memory, 448 DSP ALUs capable of 24 x 24 bit multiplication, up to 1,200 I/Os) and up to 32 SerDes lanes. The silicon consumes 2.5 watts of static power and can typical yield twice the performance at less power than competing products Abound Logic, received first silicon from its foundry Taiwan Semiconductor Manufacturing Co. Ltd. in August 2008 and packaged devices in October. Reblewski said engineering samples are available now and volume shipments would begin in September. As to price, Reblewski said Raptor is "competitive." Related links and articles: www.aboundlogic.com Name-change firm preps dense, low-power FPGAs for market FPGA startup crunch: Cswitch's fortunes switch FPGA startup crunch: Is Achronix flush enough? ST rolls Morpheus reconfigurable processor
All news
Please login to post your comment - click here
The Raptor Family of FPGAs

Product Brief (Advanced Data) PB001 (V 0.2) ~ February 2009
Highlights
Unprecedented level of logic density 750k 4-input LUTs 750k D-type flip-flops 38.5 Mb of embedded RAM Built on 65 nm TSMC CMOS process 448 high-performance DSP blocks:
High-performance, fully programmable I/O cells capable of 1.25 Gb/s performance in LVDS mode: Support for a wide range of both single-ended and differential I/O standards High-bandwidth memory interface support for DDR, DDR2, RLDRAM and QDR memories Per I/O programmable delay and dynamic phase alignment Programmable on-chip termination with built-in calibration Hierarchical interconnect architecture increases logic density by a factor of three and reduces power by up to 60% versus other 65 nm solutions
24 24 multiplier with a 64-bit multifunction arithmetic logic unit (ALU) Dual-stage pipelines, and special routing for fast cascaded operation Each DSP block can also be configured as two independent 12 12 multipliers and 32-bit ALUs Built-in high-speed interfaces:
8 or 32 channels of high-speed SerDes running at up to 3.125 Gbps Embedded 4- or 8-lane PCI Express controllers with up to 20 GB/s transfer rate, supporting both Endpoint and Root Complex configurations Embedded support for XAUI, Gigabit Ethernet, Serial RapidIO, Serial ATA, and FibreChannel protocols 500 MHz hierarchical global clock tree:

Volatile and non-volatile 256-bit encryption for design security
AES
bitstream
Integrated, programmable SEU detection and repair for increased reliability Programming support for low-cost flash memories via JTAG, and high-speed serial and parallel interfaces Industry-standard RTL flow with support from Mentor Graphics Precision synthesis Soft 32-bit RISC CPU with wishbone interface: SDK, compiler, and debugger available Support for C/OS II Delivered in a 1935-ball organic FBGA:

32 independent low-skew clocks available device wide 80 I/O register clocks Abundant local clock resources 40 low-jitter PLLs that can be used independently or cascaded, with programmable frequency, phase, duty cycle, waveform and compensation mode
45 45 mm, 1-mm ball-pitch package RoHS-Clause-5 compliant Enhanced signal integrity (7:1:1 ratios for generalpurpose I/O, and 2:1:1 ratios for SerDes I/O)
Table 1: Family Product Table
RAM Device LUTs DFFs 576b Register Files

5,880 5,880
18Kb Blocks
1,960 1,960
Total (Kb)
38,588 38,588
DSP Blocks
PLLs
PCIe SerDes Controllers
User I/O
Rso750 Rsx750
752,640 752,640
752,640 752,640
448 448
40 40
1 (x4) 2 (x8)
8 32
1,200 1,020
PB001 (V 0.2) ~ February 2009
Abound Logic, Inc.
Page 1
Introduction
Introduction
Based on a dense, hierarchical routing structure, Raptor FPGAs deliver an unprecedented level of density, far beyond that of any other FPGA. Providing more than 750,000 LUTs and an equal number of flip-flops to the designer, Raptor FPGAs are an attractive alternative to ASICs in many applications. Built on an advanced 65 nm CMOS process, Raptor FPGAs include advanced features such as dedicated, dual-port SRAM blocks, fast adders, powerful DSP blocks, highperformance SerDes, and low-skew, programmable clocks. The high-density, low-power Raptor FPGAs from Abound Logic are ideal for demanding, high-density applications such as data communications, wireless infrastructure, high-performance computing, professional video processing, medical imaging, ASIC prototyping or even as ASIC/structured ASIC alternatives.
Groups
MFCs are organized into three types of groups: logic, memory and arithmetic. Each type of group contains 32 MFCs, sharing a common clock, reset and enable. Each flipflop in the group can be configured individually as to how these signals are used. The simplest group is the logic group composed solely of 32 logic MFCs, intended for general-purpose logic use. Memory groups are composed of 32 memory MFCs and an embedded 32 18 register file. Arithmetic groups consist of 16 arithmetic MFCs tied to four embedded 4-bit adders plus 16 logic MFCs. Each adder can be used independently, or can be cascaded using the dedicated carry chain to form a larger adder of up to 96-bits.
Clusters and Tiles

A cluster is composed of three logic, three memory and six arithmetic groups along with an embedded 18 Kb RAM. 35 clusters are grouped together along with eight DSP blocks to form a tile. The 56-tile fabric, arranged in a 7 8 matrix, is surround by the I/O ring composed of general-purpose I/O, embedded PCIe controllers, and embedded SerDes (Figure 2). Figure 2: Raptor FPGA Layout (Rso)
SerDes
Architectural Overview
Multifunction Cells
At the heart of the Raptor FPGA is the multifunction cell (MFC) composed of a 4-input LUT plus a D-type flip-flop, offering invertible clock polarity, programmable enable and asynchronous/synchronous reset (Figure 1). The Raptor architecture includes three types of MFCs to provide access to other resources: logic, memory and arithmetic. Logic MFCs contain a single LUT and D-type flip-flop; memory MFCs add access to an embedded 32 18 register file; arithmetic MFCs provide access to 4-bit adders. Figure 1: Raptor Multifunction Cell
Logic MFC
I0 I1
OUT LUT
D Q
I2 I3
DFF
EN 1 CLK RST
EN CLK RST
Config
SerDes
pb001_02_V02
0
pb001_01_v03
Page 2
Abound Logic, Inc.
PB001 (V 0.2) ~ February 2009
Embedded Memory
Figure 3: Raptor Routing Structure

Device Tile X Cluster X Group X
MFC MFC 32 12 MFC 35
Mesh Routing
X X X
12 MFC MFC
X X
MFC MFC 32
35
X X
12 MFC MFC
X
MFC MFC 32
X
MFC 32
12 MFC
X
MFC MFC 32
X
MFC 32
32
32
pb001_03_v01
Routing
The Raptor routing structure consists of four hierarchical levels. At the group, cluster, and tile level, fast local interconnect tied to crossbar switch matrices is used to connect elements within that level. At the device level, a mesh routing structure connects the tiles and I/O ring. This dense routing structure results in small device size, high utilization, high performance, and a 60% reduction in dynamic power compared to competing solutions at the same process node (Figure 3).
DSP Blocks
Each tile includes eight DSP blocks for a total of 448 per device. Each DSP block features dual pipeline stages, one multiplier block and one ALU, configurable as either:

A 24 24 multiplier with a 64-bit multifunction ALU, or As two independent 12 12 multipliers and 32-bit ALUs
The highly configurable DSP block supports multiple operational modes:

Two's complement multiplication (signed) Multiply and accumulation
Embedded Memory
In addition to the abundant flip-flops found in the Raptor architecture, two additional memory resources are available: 576b register files and embedded 18 Kb RAM blocks. The 32 18 register file is a dual-port (asynchronous read, synchronous write) memory, with independent 5-bit address busses for each port. The register file operates from the same clock sources available to the group to which it is connected. The 18 Kb block RAM can be configured as either 2048 9, 1024 8 or 512 36 memory. The block is a true, synchronous, dual-port memory supporting three modes of operation:
The eight DSP blocks in a tile can be cascaded to support high-performance DSP implementations such has highprecision multiplication or butterfly computations. Each of the three cascade outputs of each DSP block drives the cascade inputs of up to three neighboring blocks. At the tile boundary, only one cascade path is provided in each direction.
Clock Network
Each Raptor FPGA has 32 global nets spanning the fabric, driving to the center of each tile to minimize skew (marked in blue in Figure 2). These low-skew nets can be driven by the outputs of a clock generator, clock-capable I/O pins (either single-ended or differentially), or internal net. At the tile level, eight of the 32 low-skew global nets can be selected and driven to each cluster and DSP blocks of that tile (marked in green in Figure 2). In turn, each cluster has access to all of the eight selected low-skew nets plus local signals from within that cluster. The cluster then provides these signals to the RAM blocks and MFC groups.
True dual-port mode: both read/write ports can be configured independently as either 2048x9 or 1024x18. Single-port, bit-enable mode: one read port, and one read/write port with write bit enable. Either 2048 9 or 1024 18 configurations are supported. Single-port mode: one port is configured as read-only, and the second port is configured as write-only. Memory configuration is set to 512 36.
PB001 (V 0.2) ~ February 2009
Abound Logic, Inc.
Page 3
PB001 (V 0.2) ~ February 2009
I/O Ring
Surrounding the core array of tiles is the I/O ring, composed of either 17 or 20 banks (Rsx and Rso, respectively). Each I/O bank consists of 30 I/O pairs and two clock generators. These embedded clock generators (and their PLLs) can drive the four I/O register clocks in each bank as well as the global network. The I/Os feature support for a wide range of I/O standards (LVCMOS, PCI, SSTL, HSTL, and LVDS), both single-ended and differential and from 1.2V to 3.3V. Included is support for dynamic termination, programmable delay, and dynamic phase alignment. In addition, support for automatic calibration to account for process, temperature, and voltage (PVT) variations is built in. Of the 30 I/O pairs in a bank, four pairs are clock capable and can feed adjacent clock generators, and six pairs have data strobe (DQS) support for DDR applications (the associated DLLs are embedded in the I/O bank).
High-Speed Interfaces
Each Raptor device contains either 8 or 32 embedded multi-gigabit transceivers capable of operating at line rates of up to 3.75 Gbps. Raptor Gigabit transceivers are structured into four-channel groups (QUADs), and supports a wide variety of standard and custom protocols such as XAUI, PCI Express, Gigabit Ethernet, Serial RapidIO, Serial ATA, and FibreChannel. To support PCI Express Endpoint and Root Complex configurations, each Raptor device contains one 4-lane (Rso) or two 8-lane (Rsx) PCI Express controllers.
Configuration
The Raptor architecture includes a flexible interface supporting a number of configuration modes: JTAG, master parallel, slave parallel, slave serial, and master SPI. Configuration can be accomplished via a microcontroller or directly from serial and parallel flash memory.
Ordering Information
Rso750 F1935-1C LF
Family
Rso Rsx
Suffix
LF Lead-free ES Engineering Samples
Device
750
Operating Conditions
C Commercial I Industrial
Package
F FBGA
Speed Grade
1 Fast 2 Medium 3 Normal
pb001_04_v01
Abound Logic, Inc.

3052 Bunker Hill Lane, Suite 200, Santa Clara, California 95054-1145 Email: inforeq@aboundlogic.com
Copyright 2009 Abound Logic, Inc. All rights reserved. Abound and Raptor are trademarks of Abound Logic, Inc. All other trademarks are the property of their prospective owners. All specifications subject to change without notice. NOTICE of DISCLAIMER: Abound Logic reserves the right to revise this documentation and to make changes in content from time to time without obligation on the part of Abound Logic to provide notification of such revision or change. Abound Logic provides this documentation without warranty of any kind, either implied or expressed, including but not limited to, the implied warranties of merchantability and fitness for a particular purpose. Abound Logic may make improvements or changes to the product(s) and/or the program(s) described in this document at any time.
Abound Logic, Inc.
Conference Highlights
Distinguished Speakers:
(click on name or scroll down page for more information) SOCC 2005 - Washington, DC Keynote: Hans Stork
Plenary: Plenary:
Ivo Bolsens Jacques Benkoski
Luncheon:
Rajesh Galivanche
Senior VP & CTO, Texas Instruments, Inc. Advanced CMOS Technology for Digital Communication Systems Vice President and CTO, Xilinx FPGA, The Heart of Embedded Systems Entrepreneur in Residence, US Venture Partners The Evolving Silicon Infrastructure: Issues and Opportunities Principal Engineer, Intel Test Challenges for Nanometer Designs
Panel Discussion:
Will the technical realities and economics presented by 65nm and 45nm silicon technologies drive system applications toward fully integrated SoC implementations or alternative, disintegrated solutions? For an answer to this question, visit the panel discussion on Tuesday afternoon.
Corporate Sponsors:
Corporate sponsors of our conference may be present with tabletop displays. For more information on corporate sponsorship, please contact the Conference Office.
Tutorial Workshops:
Like in our previous conferences, there will be several half-day tutorials on Sunday.
Keynote Speaker
Dr. Johannes M.C. (Hans) Stork Senior Vice President and Chief Technical Officer, Texas Instruments, Inc. Advanced CMOS Technology for Digital Communication Systems Dr. Stork is Senior Vice President, and Chief Technology officer, of Texas Instruments. As Director of the Silicon Technology Development organization, his primary responsibilities are the development of advanced CMOS, packaging and mixed signal process technologies. He joined Texas Instruments in September 2001, as Vice President and Director of Silicon Technology Research. Prior to joining Texas Instruments, Dr. Stork was Director of the Internet Systems and Storage Lab at HP Laboratories,
Hewlett-Packard in Palo Alto, California from 1999 until 2001. The IS&S Lab focused on highly scalable, dynamic, federated computer and storage systems. After joining Hewlett-Packard in 1994, Dr. Stork held the position of Director of the ULSI Research Lab between 1995 and 1999. This laboratory was established in 1994 and closed in 1999 with the split between Agilent and Hewlett-Packard. The operational staff of the 40,000 square foot, Class 1, clean room facility improved the productivity per person hour to the best recorded in HP's facilities. During his leadership the researchers of the ULSI Lab developed a high performance 0.18 um CMOS technology with Al/low-k interconnect, and developed and transferred then world's lowest dark-current CMOS image sensors and technology to Agilent's image component division. The ULSI Research Lab demonstrated the world's smallest FRAM cell feasibility jointly with TI and Applied Materials in 1999. Dr. Stork started his professional career in 1982 at IBM's T.J.Watson Research Center, researching advanced bipolar technology and circuits. In 1987, he established and managed an Exploratory Devices group. This group explored and demonstrated SiGe HBTs, resulting in new speed records at device and circuit level, and presented invited talks at all major conferences including six papers at a single IEDM conference. Within 10 years, this SiGe technology was transferred to manufacturing and established IBM's entry in high-speed communication technologies. In 1990, Hans became manager of the Bipolar Devices group, and led one of the task forces on high-end computing that resulted in IBM's change in mainframe strategy. From 1992 to 1994, he assumed responsibility for the Exploratory Device and Technology programs at IBM Research. His teams demonstrated CMOS process technologies at 0.1 um channel lengths with world record speed performance, supported by an extensive E-beam lithography facility. They also published the first extensive simulations of double-gate devices as the best structure to the ultimate scaling challenges of FETs. Hans was awarded two Outstanding Technical Achievement Awards from IBM. He has written or co-authored over 90 cited papers and holds eleven US patents. He was elected IEEE Fellow in 1994 for his contributions to SiGe devices and technology. As a fellow member of the IEEE Electron Devices Society, Hans has served as a member of the 1988 BCTM program committee, was on the VLSI Technology Symposium program committee from 1986 to 1992, and was publications/publicity chairman for the 1990-1992 Technology and Circuit Symposia, publicity (vice) chairman for the (1991) 1992 IEDM, and technical program committee member of the 1994 IEDM. Hans was EDS editor of the Circuits and Devices magazine from 1993 to 1995, and was on the technical program committee of the Symposium on Low Power Electronics in 1995 and 1996. Dr. Stork serves on the Board of Directors for International Sematech (ISMT) since 2002, and for the Semiconductor Research Corporation (SRC) since 1999. Prior to that he was a member of the Executive Advisory Boards for both Sematech and the SRC from 19971999. He has been a member of the SIA Technology Strategy Committee since 1999. In 2000-2001, he participated as a technical advisor to Government efforts on high performance computing benchmarks and the national security issues emerging from Internet computing. Dr. Stork was born in Soest, The Netherlands, and received the Ingenieur degree in electrical engineering from Delft University of Technology, Delft, The Netherlands, and holds a PhD from Stanford University. His PhD and Ir. theses concerned the fabrication, modeling and measurement of Static Induction Transistors, non-volatile NMOS devices and junction FET CCDs.
back to top
Plenary Speakers
Dr. Ivo Bolsens Vice President and Chief Technical Officer, Xilinx FPGA, The Heart of Embedded Systems Dr. Ivo Bolsens joined Xilinx in June 2001 as vice president and chief technology officer (CTO). He is responsible for identifying Xilinx technologies and talent as well as heading up the Xilinx Research Laboratories, which focus on advanced research in the area of programmable logic. Dr. Bolsens came to Xilinx from the Belgium-based research center IMEC, where he was vice president of information and communication systems. He began there in 1984, holding various positions of increasing responsibility. His research included the development of knowledge-based verification for VLSI circuits, design of digital signal processing applications, and wireless communication terminals. He also headed the research on design technology for high level synthesis of DSP hardware, HW/SW co-design and system-onchip design. Bolsens earned his master's degree in electrical engineering and his Ph.D. in applied science from the Catholic University of Leuven in Belgium. He is author and co-author of more than 100 papers in the field of VLSI design, CAD, embedded system design, and wireless communication. He is also co-author of the book, "High Level Synthesis for Real Time Digital Signal Processing." Dr. Jacques Benkoski Entrepreneur in Residence, US Venture Partners The Evolving Silicon Infrastructure: Issues and Opportunities Dr. Benkoski joined US Venture Partners in 2005 as Entrepreneur in Residence following the acquisition of Monterey Designs Systems by Synopsys. He led Monterey as CEO & President since 1999 and during that tenure the company ramped up to 150 employees and had its products adopted by most semiconductor companies in North America, Europe and Japan. Previously he was Vice President of European Operations for EPIC Design Technology and has also held various research, marketing, sales and general management positions at Synopsys, STMicroelectronics, IMEC, and IBM. Dr. Benkoski has been a Director of the EDA Consortium since 2001 and is Chairman of the Board of Certess. He has received his B.Sc. in computer engineering from the Technion, Israel Institute of Technology and his M.Sc. and Ph.D. degrees in electrical and computer engineering from Carnegie Mellon University and has written over 30 technical papers.
back to top
Luncheon Speaker
Rajesh Galivanche Intel Corp.
Test Challenges for Nanometer Designs Rajesh Galivanche is a Principal Engineer and Manager of Advanced Test Technology development team in the Technology and Manufacturing Group at Intel. His group researches into Advanced Test and CAD methods for testing, debug and diagnosis of semiconductor devices. Rajesh has been with Intel for the last 10 years and before that he worked at Motorola, LSI Logic Corporation, and Sunrise Test Systems (which was later acquired by Viewlogic/Synopsys).
Panel Discussion:
Will the technical realities and economics presented by 65nm and 45nm silicon technologies drive system applications toward fully integrated SoC implementations or alternative, dis-integrated solutions? Moderator: Tom Bednar Distinguished Engineer, ASIC Product Development, IBM
Panelists: Tim Henricks, Vice President, Engineering Services, Cadence Joachim Kunkel, Vice President, Engineering, Synopsys Frederic Reblewski, Chairman and CEO, M2000 Peter Rickert, Fellow, Director, ASP program Mgmt, Texas Instruments Mark Templeton, Chief Strategy Officer, ARM Arnie Tran, Architecture Lead, SOC Design Center, IBM Abstract: The scale of advanced silicon technologies brings the possibility of very high levels of integration into consideration. 100M circuits, or more, could theoretically be integrated on a manufacturable die size. Improvements in overall performance, power consumption, and space efficiency could follow from such a silicon integration. However, the raw potential capability does not necessarily make these kinds of solutions technically practical or economically viable. Will technical issues such as, process complexity and variation, yield, and data volume prevent such large scale integration from being practical? Will economic factors such as mask costs, IP costs, time to market pressures, and functional flexibility requirements drive a solution? What emerging technologies,tools, or architectures will influence future integration strategies?
back to top
1 Jean Barbier, Olivier LePape, Frederic Reblewski: Emulation system having a scalable multi-level multistage programmableinterconnect network. Mentor Graphics November 1996: US 5574388 (4 worldwide citation)
2 Jean Barbier, Olivier LePape, Frederic Reblewski: Method and apparatus for performing fully visible tracing of an emulation. Mentor Graphics May 1998: US 5754827 (3 worldwide citation) 3 Jean Barbier, Olivier LePape, Frederic Reblewski: Field programmable gate array with integrated debugging facilities. Mentor Graphics July 1998: US 5777489 (2 worldwide citation) 4 Frederic Reblewski, Olivier Lepape: Reconfigurable integrated circuit with a scalable architecture. M July 2003: US 6594810 (2 worldwide citation)
5 Jean Barbier, Olivier Lepape, Frederic Reblewski: Method and apparatus tracing any node of an emulation. Mentor Graphics December 1999: US 5999725 (1 worldwide citation)
6 Luc Burgun, Olivier LePape, Frederic Reblewski: Method and apparatus for removing timing hazards in a circuit design. Mentor Graphics September 1998: US 5801955 (1 worldwide citation)
7 Francois Douezy, Frederic Reblewski, Jean Barbier: Clock generation and distribution in an emulation system. Mentor Graphics August 2005: US 6934674 (1 worldwide citation)
8 Frederic Reblewski, Jean Barbier, Olivier Lepape: Emulation system scaling. November 2003: US 6647362 (1 worldwide citation)
9 Jean Barbier, Olivier LePape, Frederic Reblewski: Reconfigurable integrated circuit with integrated debussing facilities and scalable programmable interconnect. May 2002: US 6388465 (1 worldwide citation)
10 Frederic Reblewski, Olivier Lepape: Reconfigurable integrated circuit with integrated debugging facilities for use in an emulation system. July 2001: US 6265894 (1 worldwide citation)
11 Frederic Reblewski: Configurable circuits with microcontrollers. M June 2007: US 20070139074 (1 worldwide citation) 12 Jean Barbier, Olivier LePape, Frederic Reblewski: Reconfigurable integrated circuit with integrated debugging facilities and scalable programmable interconnect. September 2004: US 20040178820 (1 worldwide citation)
13 Jean Barbier, Oliver LePape, Frederic Reblewski: Reconfigurable integrated circuit with integrated debugging facilities and scalable programmable interconnect. July 2002: US 20020089349 (1 worldwide citation) 14 Frederic Reblewski: Logic Design Modeling and Interconnection. Mentor Graphics March 2010: US 20100057426 (1 worldwide citation)
15 Jean Barbier, Olivier Lepape, Frederic Reblewski: Emulation system having a scalable multi-level multistage hybridprogrammable interconnect network. Mentor Graphics May 1999: US 5907697
16 Jean Barbier, Olivier Lepape, Frederic Reblewski: Method and apparatus for tracing any node of an emulation. Mentor Graphics August 1998: US 5790832
17 Luc Burgun, Olivier LePape, Frederic Reblewski: Method and apparatus for removing timing hazards in a circuit design. Mentor Graphics November 1998: US 5831866
18 Frederic Reblewski: Logic design modeling and interconnection. Mentor Graphics April 2010: US 7698118
19 Frederic Reblewski, Olivier Lepaps, Jean Barbier: Regionally time multiplexed emulation system. Mentor Graphics September 2005: US 6947882
20 Frederic Reblewski, Olivier Lepape: Crossbar device with reduced parasitic capacitive loading and usage of crossbar devices in reconfigurable circuits. M March 2005: US 6874136
21 Frederic Reblewski: Method and apparatus for concurrent emulation of multiple circuit designs on an emulation system. Mentor Graphics April 2005: US 6876962 22 Jean Barbier, Olivier LePape, Frederic Reblewski: Reconfigurable integrated circuit with integrated debugging facilities and scalable programmable interconnect. April 2004: US 6717433 23 Frederic Reblewski: Method and apparatus for concurrent emulation of multiple circuit designs on an emulation system. October 2002: US 6473726 24 Carl Ebeling, Frederic Reblewski, Olivier V Lepape, Jean Barbier: Crossbar device constructed with mems switches. M May 2010: US 20100108479
25 Jean Barbier, Olivier LePape, Frederic Reblewski: Field programmable gate array with integrated debugging facilities. Mentor Graphics May 2000: US 6057706
26 Frederic Reblewski: Emulation components and system including distributed routing and configuration of emulation resources. Mentor Graphics April 2006: US 7035787
27 Frederic Reblewski, Olivier LePape, Jean Barbier: Regionally time multiplexed emulation system. Mentor Graphics August 2006: US 7098688
28 Frederic Reblewski: Emulation components and system including distributed event monitoring, and testing of an IC design under emulation. Mentor Graphics October 2006: US 7130788
29 Frederic Josso, Xavier Montagne, Frederic Reblewski: Distributed configuration of integrated circuits in an emulation system. Mentor Graphics December 2007: US 7305633
30 Philippe Diehl, Gilles Laurent, Frederic Reblewski: Emulation of circuits with in-circuit memory. Mentor Graphics October 2007: US 7286976
31 Frederic Reblewski: Logic design modeling and interconnection. October 2005: US 20050234692 32 Frederic Reblewski: Runtime reconfiguration of reconfigurable circuits. M December 2007: US 20070283190 33 Frederic Reblewski, Cesar Douady: Packet-oriented communication in reconfigurable circuit(s). M August 2007: US 20070194807 34 Frederic Reblewski, Olivier V Lepape: Reconfigurable system with corruption detection and recovery. M July 2007: US 20070168718
35 Frederic Reblewski: On circuit finalization of configuration data in a reconfigurable circuit. M July 2007: US 20070162247
36 Frederic Reblewski: Runtime reconfiguration of reconfigurable circuits. M May 2007: US 20070118783
37
Frederic Reblewski, Olivier Lepape: Configurable circuit with configuration data protection features. M May 2007: US 20070103193
38 Frederic Reblewski: Reconfigurable circuit with redundant reconfigurable cluster(S). M March 2007: US 20070057693
39 David C Scott, Charles W Selvidge, Joshua D Marantz, Frederic Reblewski: Software state replay. Mentor Graphics April 2006: US 20060074622
40 Philippe Diehl, Marc Vieillot, Cyril Quennesson, Gilles Laurent, Frederic Reblewski: Message-based low latency circuit emulation signal transfer. March 2005: US 20050068949
41 Frederic Reblewski, Gilles Laurent, Philippe Diehl: Data compaction and pin assignment. December 2004: US 20040267489
42 Frederic Josso, Xavier Montagne, Frederic Reblewski: Distributed configuration of integrated circuits in an emulation system. December 2004: US 20040260530 43 Philippe Diehl, Gilles Laurent, Frederic Reblewski: Emulation of circuits with in-circuit memory. Mentor Graphics December 2004: US 20040254780 44 Frederic Reblewski: Emulation components and system including distributed routing and configuration of emulation resources. April 2004: US 20040078187
45 Frederic Reblewski, Olivier LePape, Jean Barbier: Regionally time multiplexed emulation system. Mentor Graphics April 2004: US 20040075469
46 Frederic Reblewski: Emulation components and system including distributed event monitoring, and testing of an IC design under emulation. February 2004: US 20040034841
47 Frederic Reblewski, Olivier Lepape: Crossbar device with reduced parasitic capacitive loading and usage of crossbar devices in reconfigurable circuits. July 2003: US 20030131331
48 Frederic Reblewski: Method and apparatus for concurrent emulation of multiple circuit designs on an emulation system. March 2003: US 20030055622
49 Frederic Reblewski: Reconfigurable circuit with redundant reconfigurable cluster(s). M July 2009: US 20090177912
50 Frederic Reblewski, Olivier Lepape: A reconfigurable integrated circuit with integrated debugging facilities for use in an emulation system. September 2003: HK 1052386

Network Performance Evaluation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Network Performance Evaluation

Uploaded by

Copyright:

Available Formats

Performance Evaluation of High Speed Network Protocol by Emulation on a Versatile Architecture

C. Labb e, J.M. Vincent and F. Reblewski

100 80 response time (nm) 60 40 20 0 -20 -40 -60 -60 -40

16 4 1 0.25 0.0625 0.015625 22 23

the Ethernet underwater

-20 0 20 40 60 interrupt rate (MB/s)

Our frameworks exible exploration.

Planetlab fiber-optic cables response time (MB/s)

1.18059e+21 1.15292e+18 1.1259e+15 1.09951e+12 1.07374e+09 1.04858e+06 1024 1

Internet-2 robust theory

40 60 80 100 120 140 energy (# CPUs)

70 80 90 100 110 work factor (nm)

Fig. 3. The effective throughput of Bouri, as a function of hit ratio [13].

100-node 10-node client-server technology opportunistically low-energy algorithms 10

10 time since 1953 (man-hours)

[47] [48] [49]

A Recon gurable Hardware Tool for High Speed Network Simulation

M2000, 4 rue R. Razel , 91400 Saclay, France

Laboratoire LMC-IMAG , Domaine Universitaire , BP53X 38041 Grenoble Cedex 9, France

Abstract. Estimation of rare events probabilities such as loss rate in

2 Hardware architecture and software environment

2.2 Software environment

2.3 Simulation control

3 Application to a three stages eight-by-eight switch

4 Conclusion and extension

Tout ensemble! Mentor, MINC buy French firms

free and premium articles

Finance Resource Center

2011 AllBusiness AllStar Franchises Sales & Marketing

Franchises for Sale Finance

Shop Legal Forms

Small Business Blog

Download Center Resources

Business Resource Center

Operating Your Business

Business Library Ads By Google

Mentor Graphics HyperLynx

Signal + Power Integrity Simulation View Free Webinar or Techpub Today!

40% More R&D Tax Credits

Watch More Videos

www.mentor.com. Mentor Graphics is a registered trademark of Mentor Graphics Corporation.

Compare Price Quotes for GPS Fleet Tracking Software

Serial Fault Emulation

CAP FOR LOGIC EMULATION

OVERVIEW OF THE FAULT EMULATION SYSTEM

ANF Gate Netlist

Flattening Optimization & Mapping

Flattened BLP netlist

Partitioning & Routing

Fault Reconfiguration Generation

FPGA Reconfiguration List

a) initial circuit @ (0,0,0) Free A.B + C.D Fault C.D

c) BLP reconfiguration for X stuck at 0

Fig. 2: An example of FPGA Reconguration

RST SET EN CLK + + + + + + + + + + + + + + + + + + + + + + +

Table 1: FPGA reconguration for SSF on a register

 Fault-free Circuit Emulation  Faulty Circuit Emulation  Serial Fault Emulation

Input Stimuli Test Patterns Expected Outputs

Faulty Circuit = Fault Counter

Trigger Memory Boards Logic Boards

Fig. 4: Hardware Fault Detection

Fig. 3: Hardware initialization of Registers

#F 1000 3876 14620 57320 218860

Abstract. Estimation of rare events probabilities such as loss rate in

Fault-free Circuit Emulation Faulty Circuit Emulation Serial Fault Emulation